Extending the default ContentSearch functionality in Sitecore

In this blog post I'll explain how you can extend the default ContentSearch functionality in Sitecore, building on top of what I've explained in my previous posts in the Sitecore Search Series.

Rolling your own custom SearchResultItem implementation

If you have followed along the series so far, you may have noticed that I've been using the default SearchResultItem implemention, when retrieving data from Solr through the ContentSearch API. As explained in A re-introduction to the ContentSearch API in Sitecore - Part 1, the SearchResultItem type contains different properties decorated with the IndexField attribute, which mean that these properties are mapped to a given field in the Solr index, like for example the Content property maps to the __content field in the Solr entries, so when we retrieve search results from the search index using the ContentSearch API, that property will contain everything that was stored in the __content field in Sitecore.

For standard search operations this may be fine, but often you find yourself in a situation, where you have created your own custom templates, containing one or more custom fields, and where you need to be able to extract those fields, once they have been indexed into Solr, as part of the search using the the ContentSearch API. The good news is that Sitecore have designed the ContentSearch API in such a way that the SearchResultItem is open for extension (yet closed for modification), which allows us to extend from it, and create our own custom SearchResultItem implementation.

Let's see how we can create our own custom SearchResultItem implementation (disclaimer: blackjack and hookers not included). In order to do so, all we need to do is:

  1. Create a new class that inherits from the SearchResultItem class
  2. Add properties to the newly created class and decorate these with the IndexField attribute, in order to map them to a field in the Solr index
  3. Swap out the usage of SearchResultItem as the generic type, when using the ContentSearch API, with your custom type

The minimum viable example

Let's say that we created a custom template in Sitecore, that contained a single text field, named 'MyTextField', and that we wanted to retrieve search results based on the content of that field value. In order to see how this can be implemented, I've included a small example below, showing how this is done:

Taking it from the top, we create a custom SearchResultItem implementation, in this case MySearchResultItem, containing a single property named MyTextField of type string, since that's the mapped return type for text fields in Sitecore.

Tip: If you are unsure of how to know whether to use a string, bool or even DateTime as the type of the property, you should review my previous blog post A re-introduction to the ContentSearch API in Sitecore - Part 1.

Additionally, the property is mapped to the Solr field named mytextfield. The field name specified as the parameter for the IndexField attribute must be lowercase, because all fields stored in Solr when indexing Sitecore content will be lowercased. This is something you want to keep in mind when creating your own custom implementations, since spelling it without lower casing will result in the property being null, when retrieving values from Solr using the ContentSearch API.

Now that we have our custom SearchResultItem implementation ready, let's see how we can use it when making a query to the search index using the ContentSearch API:

When using the custom implementation, notice that SearchItemResult has been swapped out with MySearchItemResult in the method call GetQueryable<T> invoked on the context. When doing so, the ContentSearch API is instructed to return search results using the MySearchResultItem implementation, and we'll be able to retrieve values for the custom field on our items. Moreover, as shown in the usage example, we are now also able to use the property MyTextField as part of the filtering clause itself - neat, right? Also, if you want to extract more values mapped to the Solr index entries, simply add the corresponding properties with the IndexField attribute, and that's it, that's all there is to it.

How do I decide how many custom implementations I need?

You might be wondering how many custom implementations you should make, and when to do so. So the question is whether one custom implementation is enough for all your needs, or if you should create your custom implementations more specifically tailored for the different parts of your solution? This is a really good question, and something I'll be addressing in the next blog post Tackling the challenges when architecting a search indexing infrastructure in Sitecore, so stay tuned.

Putting computed index fields into use

Using a custom SearchResultItem implementation allows you to extract custom template fields, using the ContentSearch API. Eventually, you will be in a situation where the fields values being stored into the Solr index, needs a bit of tweaking or to be something completely different. Perhaps you want to change the value being stored a bit, merge values from different fields together, or do some custom computation based on the item being indexed.

Once more, the ContentSearch API got you covered, as you can do exactly that, using what is called Computed Index Fields.

How do you implement and configure your own

In order to create your own computed index fields, all you need to do is to create a class, that implements the Sitecore.ContentSearch.ComputedFields.IComputedIndexField interface. When implementing this interface, your class will get two string properties FieldName and ReturnType, along with a method named ComputedFieldValue(). The method takes in a single argument of type Sitecore.ContentSearch.IIndexable, which specifies the current "thing" being indexed (which can be a Sitecore item). Additionally, the method must return an object, which represents the final value we want to store in the Solr index - everything in between is where we can do our computations, etc.:

Once you have your code in place, you then need to tell Sitecore (using the ContentSearch configuration file) that it should include your computed index field. This is done using the following configuration patch file, that should be placed in the App_Config/Include/ folder, e.g. App_Config/Include/ComputedIndexFields.config:

Notice that I explicitly set the fieldName and the returnType of the computed index field - once the computed index field is being used, Sitecore will inject these attributes into the class of the computed index field, whereas the FieldName and ReturnType properties will be set accordingly. As for the returnType attribute, this will be used when storing the returned value into the index, which is returned from the ComputeFieldValue() method - this means that the returned object value will need to match the returnType attribute. Moreover, the name will be used to identify which field it should be stored into.

Once the configuration is created, all there is left is to rebuild the index and your computed index field will now be created in the search index, and you can use it in queries.

Tip: As explained by Martina in her blog post Sitecore 7 Search Tips: Computed Fields, you should return null instead of string.Empty, unless it is important that you store an empty string value. Do note that if you do, the empty value will be indexed, and you might end up with 12,000 entries for 'nothing' that will take up space in your index. If you return null, nothing will be added to your index.

Now that we've gone over the very basics of what a computed index field is, what it tries to solve and the implementation specific parts, let's look at some examples on how you can use these in real-life problems. I've chosen to highlight two examples, each addressing different issues and solutions, to show you the diversity and capabilities of computed index fields. The first example shows, how computed index fields may help you, when you need to work around a bad design decision in a legacy solution, and the second example will show you how you can work with the item being indexed and it's surrounding content-tree elements, in order to achieve great things.

Example 1: The one about the template that contained a date time separated in two fields

Recently I was working on a legacy Sitecore solution, where we needed to sort some Solr entries using the ContentSearch API, based on a date field found on the items indexed from Sitecore. After having implemented the sorting, the client came back to us saying that the sorting didn't work as expected. It turned out, the client wanted to sort the search entries by the date, and a specified time related to the date. There was only one problem, the time values was stored in another field on the item, and in fact it was a text field. Since the date and time was indeed strongly related to each other in the solutions information architecture, my best guess is that someone, at some point, just made a poor design decision.

To make things even worse, as the solution had grown in size, everything was designed around having the overall date time value split into two values, meaning that we could not just make the change and combine to two fields into one in the template for the item, not without doing refactoring to several other parts of the solution - something we didn't wanted to do, nor had the time to.

So what's the issue here?

Looking at it from the sorting perspective, the problem here is that you can't just sort by the date and then the time, since the date is sorted using date sorting logic, and the text/string for the time would be sorted as a string. Doing such a sort wouldn't make much sense, since the entries would be sorted by the date correctly (without the time), but the time would be sorted in a very unexpected manner, and certainly not the way the client expected it to.

Using a computed index field to solve our problems

So, what we needed to do was to take the selected date from the date field, and set the time on that date according to the time set in the time field (given that it could be parsed as a valid timespan). In essence, we basically wanted to combine the two field values, and doing a bit of computations on these, and do sorting on the combined field. This is where a computed index field comes in handy.

Using a computed index field, we could check if the item being indexed had the specific template containing the two fields in question. If so, we extracted the date field value and the time field value, then parsed the time, and finally combined the two into a new field dateWithTime which was stuffed back into the Solr index. Once the combined value was stored into the Solr index, we then added a property on our custom SearchResultItem type, which was mapped to the new computed index field. I've added the implementation down below:

Afterwards we were now able to perform the sorting on the computed index field, instead of the data and time fields separately, and now the entries were all sorted correctly.

Example 2: Tag, you're it! Doing a bit of keyword tagging

Going from an example on how to work around bad decisions in legacy Sitecore solutions, let's see how we can use computed index fields to work with the item being indexed and it's surrounding content-tree elements.

Imagine for a second that you have implemented a keyword list in Sitecore, which contains a set of keywords, each containing a name:

Now, the idea is that we want to let content editors tag items with those keywords, thus adding semantic metadata to those elements. So, if an item supports keyword tagging, we will let it inherit from a special template, let's call it KeywordSelector, that allows the content editor to select one or more keywords from the set of keywords, using a MultiList field, and place those on a given item:

Now comes the challenge, for each item, we want to know which keywords it has been tagged with, including the ones each ancestor item has. Using a computed index field, this task becomes quite trivial, as we are able to check if the current Sitecore item being indexed has the field from the KeywordSelector template, containing the selected keywords, and if so, then we grab those selected keywords, go up the tree to the parent item and check if that item contains any keywords, and so on and so forth - until we reach the top-level item in the Sitecore content tree.

I've added a sample implementation of such a computed index field down below:

On that note, you've now be given the knowledge to go forth and build your own custom SearchResultItem and computed index field implementations - use it wisely.

As always, if you got additional details to the content explained in this blog post, or feedback in general, please drop me a note in the comment section below.