Before going deeper into custom implementation, let’s start with some basic concepts behind Sitecore Cortex content tagging.
Content Tagging Providers
Content tagging process is divided into different Providers, which executes business logic. All those providers are called by tagContent
pipeline processors and can be replaced with custom implementation by patching the configuration.
- IContentProvider: is responsible for reading the content which should be tagged. Default implementation takes all non-standard fields (all fields without leading __) and joins them into single string (fields are divided by single space).
- IDiscoveryProvider: takes input content and calls the logic or external service which returns tags data for the content. Out-of-the-box, Sitecore contains
OpenCalaisDiscoveryProvider
, which call Open Calais API to generate tags, using Natural Language Processing and machine learning algorithms. - ITaxonomyProvider: takes tags data and creates tag items. Default implementation saves tag data into Item Bucket under
/sitecore/system/Settings/Buckets/TagRepository
. - ITagger: assigns tag items to the context item (and optionally subitems). Default implementation saves assigned tags into standard
Semantics
field under theTagging
section.
With customizing those parts, we can select which content from current item should be tagged, how to create tags from the content and decide how to structure our taxonomy inside Sitecore and how to assign created tags to the item. For example we can achieve following custom tags structure:
Open Calais Output Data
Open Calais returns tags in following categories:
- Social Tags: describe what the text is about as a whole, based on Wikipedia Folksonomy.
- Topic Tags: describe what the text is about as a whole, based on Thomson Reuters Coding Schema (TRCS) and the International Press Telecommunications Council (IPTC) news taxonomy.
- Entity Tags: entities extracted from the text, like:
- Companies
- People
- Products
- Technologies
- Industry Terms
- Organizations
- Places (Country, City)
Default implementation of content tagging doesn’t use those categories and store tags in plain structure (it is actually structured into item bucket, but it’s not categorized in meaningful way).
Custom Tagger Module
Module can be installed via Sitecore package or installed from Github sources. It solves some issues related with standard content tagging:
- By default tags are stored in categories returned by tags discovery provider (e.g. Open Calais), but it can be manually adjusted afterwards and categories title can be changed.
- Same tag items are not created multiple times. Module assigns to content items tags, which were created before.
- Tag items have title field, which can be adjusted or translated, after the generation, therefore they can be displayed on website, e.g. as search facets.
Additionally module contains its own settings item, which is used to define taxonomy used for the content. After the installation it’s necessary to configure the module by selecting your custom taxonomy structure:
Custom Tagger Configuration
Custom Tagger Usage
The usage of the module is very simple: content authors use built-in command from Content Editor ribbon, then:
- new tags will appear in configured, content tags root item,
- tags will by automatically assigned to the selected content item
Key implementation points
Select fields which should be passed to IDiscoveryProvider
Because we use custom “Tags” field in page items, we want to ignore its content and not send it to Open Calais for tagging. To achieve this, we need to replace default implementation of ContentProvider. This way we can skip fields, which we don’t want to tag:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
public class CustomizableContentProvider : IContentProvider<Item> { .. public TaggableContent GetContent(Item source) { var stringContent = new StringContent(); var stringBuilder = new StringBuilder(); var settings = _tagsSettingService.GetCustomTaggerSettingModel(source); foreach (Field field in source.Fields) { //ignore custom "Tags" field and standard fields if (!field.Name.StartsWith("__", StringComparison.InvariantCulture) && field.ID != settings.TagsFieldTargetId) { stringBuilder.Append(field.Value); if (stringBuilder.Length > 0) stringBuilder.Append(" "); } } stringContent.Content = stringBuilder.ToString(); return stringContent; } } |
To replace default Provider we need to patch configuration (similar way we replace taxonomy, discovery or tagger providers):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/"> <sitecore role:require="Standalone or ContentManagement"> <contentTagging> <providers> <content> <add name="CustomizableContentProvider" type="Sc.CustomTagger.Providers.CustomizableContentProvider,Sc.CustomTagger" /> </content> </providers> <configurations> <config name="Default"> <content> <provider name="DefaultContentProvider"> <patch:attribute name="name">CustomizableContentProvider</patch:attribute> </provider> </content> </config> </configurations> </contentTagging> </sitecore> </configuration> |
Read categories from Open Calais
To divide tags into categories we need to parse them from additional properties of TagData
objects in custom implementation of TaxonomyProvider.Those properties are structured in Json format.
- Sample Json structure for Social and Title Tags:
1 2 3 4 5 6 |
"http://d.opencalais.com/dochash-1/c178fd67-5c21-3259-9aad-8b42821635a8/cat/1":{ "_typeGroup":"topics", "forenduserdisplay":"false", "score":1, "name":"Environment" } |
- Sample Json structure for Entity Tags:
1 2 3 4 5 6 7 8 |
"http://d.opencalais.com/genericHasher-1/d79374b4-3408-3a42-b74a-1e460d36bba0":{ "_typeGroup":"entities", "_type":"IndustryTerm", "forenduserdisplay":"false", "name":"battery energy", "relevance":0.2, ... } |
Full source code of reading the categories is in CustomizableTaxonomyProvider and CustomizableTagCategoryService.
Pass Context Item to ITaxonomyProvider implementation
By default context item (the item which is tagged) is not available in TaxonomyProvider class. To solve this problem we can override default implementation of StoreTags pipeline processor and pass the item to provider:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
public class StoreTagsWithContentItem { public void Process(TagContentArgs args) { List<Tag> list = new List<Tag>(); foreach (ITaxonomyProvider taxonomyProvider in args.Configuration.TaxonomyProviders) { IEnumerable<Tag> tags; if (taxonomyProvider is CustomizableTaxonomyProvider) { tags = ((CustomizableTaxonomyProvider)taxonomyProvider).CreateTags(args.ContentItem, args.TagDataCollection); } else { tags = taxonomyProvider.CreateTags(args.TagDataCollection); } list.AddRange(tags); } args.Tags = list; } } |
Next we need to patch pipeline configuration:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/"> <sitecore role:require="Standalone or ContentManagement"> <pipelines> <group name="ContentTagging" groupName="ContentTagging"> <pipelines> <tagContent> <processor patch:instead="processor[contains(@type, 'StoreTags')]" type="Sc.CustomTagger.Pipelines.StoreTagsWithContentItem, Sc.CustomTagger" resolve="true" /> </tagContent> </pipelines> </group> </pipelines> </sitecore> </configuration> |
Finally in the custom TaxonomyProvider implementation we create new CreateTags method with two arguments:
1 2 3 4 5 6 7 8 9 10 11 |
public class CustomizableTaxonomyProvider : ITaxonomyProvider { ... public IEnumerable<Tag> CreateTags(Item contentItem, IEnumerable<TagData> tagData) { //access context item from Taxonomy provider } ... } |
Original authors
First version of this module was created during Sitecore Hackathon 2019 by:
Pingback: My Homepage()