Skip to content
  • en
  • pl
Home
  • Home
  • Sitecore
  • About

Blog about Sitecore coding and architecture

Search Site

May 06, 2019 Development Reading Time 10 mintes

Building Custom Cortex Content Tagging

Before going deeper into custom implementation, let’s start with some basic concepts behind Sitecore Cortex content tagging.

Content Tagging Providers

Content tagging process is divided into different Providers, which executes business logic. All those providers are called by tagContent pipeline processors and can be replaced with custom implementation by patching the configuration.

  • IContentProvider: is responsible for reading the content which should be tagged. Default implementation takes all non-standard fields (all fields without leading __) and  joins them into single string (fields are divided by single space).
  • IDiscoveryProvider: takes input content and calls the logic or external service which returns tags data for the content. Out-of-the-box, Sitecore contains OpenCalaisDiscoveryProvider, which call Open Calais API to generate tags, using Natural Language Processing and machine learning algorithms.
  • ITaxonomyProvider: takes tags data and creates tag items. Default implementation saves tag data into Item Bucket under /sitecore/system/Settings/Buckets/TagRepository.
  • ITagger: assigns tag items to the context item (and optionally subitems). Default implementation saves assigned tags into standard Semantics field under the Tagging section.

With customizing those parts, we can select which content from current item should be tagged, how to create tags from the content and decide how to structure our taxonomy inside Sitecore and how to assign created tags to the item. For example we can achieve following custom tags structure:

 

Sitecore Cortex Content Tagging

Open Calais Output Data

Open Calais returns tags in following categories:

  • Social Tags: describe what the text is about as a whole, based on Wikipedia Folksonomy.
  • Topic Tags: describe what the text is about as a whole, based on Thomson Reuters Coding Schema (TRCS) and the International Press Telecommunications Council (IPTC) news taxonomy.
  • Entity Tags: entities extracted from the text, like:
    • Companies
    • People
    • Products
    • Technologies
    • Industry Terms
    • Organizations
    • Places (Country, City)

Default implementation of content tagging doesn’t use those categories and store tags in plain structure (it is actually structured into item bucket, but it’s not categorized in meaningful way).

Custom Tagger Module

Module can be installed via Sitecore package or installed from Github sources. It solves some issues related with standard content tagging:

  • By default tags are stored in categories returned by tags discovery provider (e.g. Open Calais), but it can be manually adjusted afterwards and categories title can be changed.
  • Same tag items are not created multiple times. Module assigns to content items tags, which were created before.
  • Tag items have title field, which can be adjusted or translated, after the generation, therefore they can be displayed on website, e.g. as search facets.

Additionally module contains its own settings item, which is used to define taxonomy used for the content. After the installation it’s necessary to configure the module by selecting your custom taxonomy structure:

Custom Tagger Configuration

Sitecore Custom Tagger Configuration

Custom Tagger Usage

The usage of the module is very simple: content authors use built-in command from Content Editor ribbon, then:

  • new tags will appear in configured, content tags root item,
  • tags will by automatically assigned to the selected content item

Sitecore Custom Tagger Usage

Key implementation points

Select fields which should be passed to IDiscoveryProvider

Because we use custom “Tags” field in page items, we want to ignore its content and not send it to Open Calais for tagging. To achieve this, we need to replace default implementation of ContentProvider. This way we can skip fields, which we don’t want to tag:

Custom ContentProvider
C#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public class CustomizableContentProvider : IContentProvider<Item>
{
    ..
    public TaggableContent GetContent(Item source)
    {
        var stringContent = new StringContent();
        var stringBuilder = new StringBuilder();
        var settings = _tagsSettingService.GetCustomTaggerSettingModel(source);
 
        foreach (Field field in source.Fields)
        {
            //ignore custom "Tags" field and standard fields
            if (!field.Name.StartsWith("__", StringComparison.InvariantCulture) && field.ID != settings.TagsFieldTargetId)
            {
                stringBuilder.Append(field.Value);
                if (stringBuilder.Length > 0)
                    stringBuilder.Append(" ");
            }
        }
        stringContent.Content = stringBuilder.ToString();
        return stringContent;
    }
}

To replace default Provider we need to patch configuration (similar way we replace taxonomy, discovery or tagger providers):

Custom provider configuration patch
XHTML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/">
    <sitecore role:require="Standalone or ContentManagement">
        <contentTagging>
            <providers>
                <content>
                    <add name="CustomizableContentProvider" type="Sc.CustomTagger.Providers.CustomizableContentProvider,Sc.CustomTagger" />
                </content>
            </providers>
            <configurations>
                <config name="Default">
                    <content>
                        <provider name="DefaultContentProvider">
                            <patch:attribute name="name">CustomizableContentProvider</patch:attribute>
                        </provider>
                    </content>
                </config>
            </configurations>
        </contentTagging>
    </sitecore>
</configuration>

Read categories from Open Calais

To divide tags into categories we need to parse them from additional properties of TagData objects in custom implementation of TaxonomyProvider.Those properties are structured in Json format.

  • Sample Json structure for Social and Title Tags:

Json structure for Topics
JavaScript
1
2
3
4
5
6
"http://d.opencalais.com/dochash-1/c178fd67-5c21-3259-9aad-8b42821635a8/cat/1":{  
      "_typeGroup":"topics",
      "forenduserdisplay":"false",
      "score":1,
      "name":"Environment"
   }

  • Sample Json structure for Entity Tags:

Json structure for Entity Tag
JavaScript
1
2
3
4
5
6
7
8
"http://d.opencalais.com/genericHasher-1/d79374b4-3408-3a42-b74a-1e460d36bba0":{  
      "_typeGroup":"entities",
      "_type":"IndustryTerm",
      "forenduserdisplay":"false",
      "name":"battery energy",
      "relevance":0.2,
      ...
   }

Full source code of reading the categories is in CustomizableTaxonomyProvider and CustomizableTagCategoryService.

Pass Context Item to ITaxonomyProvider implementation

By default context item (the item which is tagged) is not available in TaxonomyProvider class. To solve this problem we can override default implementation of StoreTags pipeline processor and pass the item to provider:

Custom StoreTags pipeline processor
C#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
public class StoreTagsWithContentItem
{
    public void Process(TagContentArgs args)
    {
        List<Tag> list = new List<Tag>();
        foreach (ITaxonomyProvider taxonomyProvider in args.Configuration.TaxonomyProviders)
        {
            IEnumerable<Tag> tags;
            if (taxonomyProvider is CustomizableTaxonomyProvider)
            {
                tags = ((CustomizableTaxonomyProvider)taxonomyProvider).CreateTags(args.ContentItem, args.TagDataCollection);
            }
            else
            {
                tags = taxonomyProvider.CreateTags(args.TagDataCollection);
            }
            list.AddRange(tags);
        }
        args.Tags = list;
    }
}

Next we need to patch pipeline configuration:

Patch StoreTags processor
XHTML
1
2
3
4
5
6
7
8
9
10
11
12
13
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/">
    <sitecore role:require="Standalone or ContentManagement">
        <pipelines>
            <group name="ContentTagging" groupName="ContentTagging">
                <pipelines>
                    <tagContent>
                        <processor patch:instead="processor[contains(@type, 'StoreTags')]" type="Sc.CustomTagger.Pipelines.StoreTagsWithContentItem, Sc.CustomTagger" resolve="true" />
                    </tagContent>
                </pipelines>
            </group>
        </pipelines>
    </sitecore>
</configuration>

Finally in the custom TaxonomyProvider implementation we create new CreateTags method with two arguments:

CreateTags with additional contentItem parameter
C#
1
2
3
4
5
6
7
8
9
10
11
public class CustomizableTaxonomyProvider : ITaxonomyProvider
{
    ...
 
    public IEnumerable<Tag> CreateTags(Item contentItem, IEnumerable<TagData> tagData)
    {
        //access context item from Taxonomy provider
    }
    
    ...
}

Original authors

First version of this module was created during Sitecore Hackathon 2019 by:

  • Tomasz Juranek
  • Robert Debowski
  • Rafal Dolzynski
References
  • Sitecore Cortex Content Tagging: https://doc.sitecore.com/developers/91/sitecore-experience-management/en/sitecore-cortex-content-tagging.html
  • Understanding Open Calais Output: https://developers.refinitiv.com/article/practical-approach-understanding-and-ingesting-trit-output-your-use-case
  • Open Calais FAQ: https://developers.refinitiv.com/open-permid/intelligent-tagging-restful-api/docs?content=3575&type=documentation_item
  • Custom Content Tagger implementation: https://github.com/whuu/Sc.CustomTagger
Content TaggingSitecore 9.1Sitecore Cortex
  • Pingback: My Homepage()

Social Media

twitter linkedin github stackexchange
Sitecore MVP

Related Posts

Configure Sitecore 9.1 on Azure VM

Key new features of Sitecore 9.0

2021 © Tomasz Juranek. Crafted with love by SiteOrigin.