Efficient Product Importer in Sitecore – Part 1

One of the common challenges in Sitecore implementation is the importing some kind of data into Sitecore and store it as an items. You usually do it, when you have some external PIM system, where you manage your products and want to display those on your site with limited calls to external system, which may be performance bottle-neck. Usually you want those products with some extra information managed in Sitecore directly, like rich text descriptions or images and take advantage of Sitecore personalization features. Of course you want to keep those product up-to-date in Sitecore.

Core part in our implementation is Sitecore scheduled task, which is a combination of items and custom code. In first part of this tutorial we’ll focus on the task, in second we will add some tweaks, including item bucket.

Scheduled task Sitecore items

Product repository

Let’s start with creating “repository” item, where all our products will be stored. In our case this is just a simple item without custom fields:

Task Command

Ok, let’s create definition item for our task’s command. We create it under /sitecore/system/Tasks/Commands. In the “Type” field we define where our code is located in format “namespace.class, assembly name”. In “Method” we tell Sitecore which method should be called by the task scheduler.

Task schedule

Now we need to create schedule definition item under /sitecore/system/Task/Schedules. This one is a little bit tricky. In “Command” we select item created in previous step. In “Items”, you can insert item IDs, separated with pipe (“|”), which you want to pass to your custom task code, we put here our “Product repository” item ID from the first step. In the “Schedule” field we define when the task should run for the first and last time and how often it should be executed. The format separated by pipe (“|”) is:

  • First date for the schedule in format yyyyMMdd
  • End date for the schedule in format yyyyMMdd
  • The days of the week on which task should run in bit mask format, where:
    • 1=Sunday
    • 2=Monday
    • 4=Tuesday
    • 8=Wednesday
    • 16=Thursday
    • 32=Friday
    • 64=Saturday
  • So for example Monday to Friday is 62 and everyday is 127
  • Interval between calls of the task in format HH:mm:ss

So 19990101|21000101|127|01.00:00 basically means start task immediately, run every hour every day, until year 2100.

“Last run” field informs Sitecore (and us) when the task run for the last time.  This field is automatically updated by Sitecore. So to force the task execution we can select date in the past and save the item.

“Async” means that the second instance of the task can be executed, while the other one is still in progress. “Auto remove” will remove schedule definition item, so the task is run only once. In our case we left both unchecked, cause this is not the behaviour we want in our product importer.

Sitecore Scheduler configuration

You may ask now, why some custom code is called periodically, after I created those Sitecore items? The answer is, there is scheduling agent defined in Sitecore configuration, which periodically checks all schedule definitions items and fire them up.By the way scheduling agents are another way to call your code periodically, if you are interested, you can have a look at John West article linked below.

The agent is called “Master_Database_Agent” of type “Sitecore.Tasks.DatabaseAgent”. In Sitecore 8.x it’s defined in Sitecore.Processing.config. Important notes:

  • This config file should be disabled on environments, where you don’t want scheduled tasks to be executed, for example you typically don’t want them on CD or Reporting roles.
  • Scheduling interval on an item can’t be lower, than the interval in the configuration file

Default configuration looks like this:

Sitecore Scheduled Task Code

Now we are ready to write some code, starting with public class “ProductImporter” in namespace and assembly defined in task’s command item (in “Type” field) and a method with same name as “Method” field value. Method has to be public with three parameters:

  • items: this is a list of items defined in “Items” field in task’s schedule item
  • command: definition item for task’s command
  • schedule: definition item for task’s schedule

The code is executed by a “scheduler” site defined in <sites> in Sitecore.config, without the context item, therefore we have to switch to the database we want (“master” in our case) and to simplify further implementation, set context item to our repository item:

In RunImporter method we call a webservice (productService.GetAll()) which returns the products from external system, loop for each product and checks if product with same unique code exists in our product repository in Sitecore. If not we insert new one, otherwise update it, if timestamp (creation or last modification date) returned from service is later than timestamp saved in Sitecore.

All the code related with importing is in try catch block, cause we can’t fully trust external service, which for example, at some point may not be available. We added some extra code measuring the performance using System.Diagnostics.StopWatch class.

Sitecore IndexCustodian

We already introduced some performance tweaks using Sitecore.ContentSearch.Maintenance.IndexCustodian class, which is out-of-the-box Sitecore helper for search index management.

What we do is basically: pause search index update before import process starts (IndexCustodian.PauseIndexing) and resume the indexing after all changes are done (IndexCustodian.ResumeIndexing).

Additionally we keep all new or modified items and update all at once in selected search indexes, in our case only the one related with Master database (IndexCustodian.IncrementalUpdate). This approach may improve the performance for large number of changes, if the default indexing strategy is used, which sync Master indexes after every item changed.

Saving Sitecore Item

Data model used on the importer is very simple, we assume that we will load from external system product code and name. Additionally each object we get from the service has timestamp:

The equivalent Sitecore template:

Last piece of our code is the “Map” method, responsible for mapping object returned by web the service into Sitecore item. Again we secure the code in case some data data is in unexpected format. We assume that product codes are unique and use them as an item names, but also define DisplayName for product items with both product code and name to improve editor experience.

Product items security improvements

We want to prevent content editors from manually modifying data imported from external system. It can be achieved with applying Sitecore security on template’s field items. Repeat this step for each field:

Test it by logging it using user with editor rights only (security settings don’t apply for admin account). After item check-in, fields containing imported data are still read-only, where the other ones are ready to edit: