A re-introduction to the ContentSearch API in Sitecore - Part 2

In this blog post, I'll explain how you can deal with more complex (and dynamic) queries, apply sorting and paging of the search query results, explain and use facets, as well as going into the more exotic parts of the ContentSearch API. At the end I'll provide you with a fully fledged piece of working sample code, that will show how all of the different bits and pieces fit together.

This is the second half in the re-introduction to the ContentSearch API in Sitecore. If you haven't read my first blog post A re-introduction to the ContentSearch API - Part 1, I recommend doing so, before digging into the content of this blog post.

Doing a bit of sorting

In part 1, I showed you how to create simply queries, but you may have noticed that they weren't sorted the way you needed them to be. So, once you have retrieved the results from your query to the search index, you probably want to do a bit of sorting on those results.

Sorting and ordering queried results can be a bit of a hairy subject, since most search index technologies by default use complex algorithms to determine the order of the results, which is normally referred to as relevance order. That being said, I won't go into those details in this blog post, but I will say that there are a few things to keep in mind. I highly recommend that you read the sorting and ordering results blog post from Sitecore on the subject, in order to fully get the grasp on the tricky parts you need to be aware of.

Assuming that you know all there is to know, about the lurky parts of sorting and ordering, let's see how you can leverage the ContentSearch API to do the heavy lifting of applying sorting to our queried results.

As when we perform a query using LINQ to filter and retrieve results, you can also use standard LINQ sorting methods, like OrderBy and OrderByDescending on the IQueryable<T> instance to apply simple sorting. As an example, let's say that we wanted to sort our retrieved results by the date they were created in Sitecore, then we would do something like this:

More importantly, you can also do sorting on multiple fields, using the ThenBy() method:

And that's the basics of doing sorting, easy right?

Taking some, skipping some

When you perform a query, the number of search results you get back from your query will be affected. But how many results can be returned from a single query? Let's take another look at the Sitecore.ContentSearch.Solr.DefaultIndexConfiguration.config file at the setting named ContentSearch.SearchMaxResults:

As you can see, the max number of results that a query can return is by default set to 500. You can choose to either increase or decrease that number, but you should be careful if you decide to increase the limit as it can result in performance penalties.

Depending on how your User-Interface (UI) is designed, you probably want to restrict the number of results presented to the user. Likewise, if you need to present the user with more than 500 search results at a time, chances are that there might be problem somewhere else along the design of your solution. Assuming that we want to limit the number of results, thus making it more manageable for the user to digest the information shown on a search result page, we'll need to apply the technique called pagination.

Pagination is simply a way to restrict the number of search results we want to view, by saying that we want to get a specific segment of search results. An example could be to retrieve all search results from the first result to, and including, the next 20 subsequent results. The way this is achieved in the ContentSearch API is by using the LINQ extension methods Skip() and Take():

Tip: Instead of using the Skip() and Take() methods directly, you can leverage the Page() extension method, which does the exact same thing, even more elegantly.

While Skip() and Take() works great for controlling which segment of the search results we return to the user, they may not always be the best solution for applying pagination to you solution. Let's say that we wanted to show the user the total number of search results, and have a paging mechanism that allows the user to select a given page, sort of like how Google works:

In order to get all the informations we need, two queries to the search index are required:

  • One query to retrieve the total number of search results
  • Another query to retrieve the segment from the pagination

Well, that's doesn't seem right, because we don't really want to perform a query that takes all results in order to get the total count, and then another one that only takes a segment of the results we need to present to the user.

Again, the ContentSearch API comes to our rescue, since there is another little hidden gem within the LINQ layer, that allows us to get all the information we need, in just one query to the search index:

As said, depending on what the UI requirements are, the first approach using Skip() and Take() might be sufficient if the UI uses Infinite Scrolling or "Load More" buttons to control the number of search results presented to the user. However, it will be more effective to use the GetResults() method, if you need more details for your pagination mechanism, like using the Google-alike pagination.

The many face(t)s of a search query

When I first started digging into search technologies, one of the things that struck me was facets, what are they and how can they be used?

Looking the documentation for Solr, facets are described as "... the arrangement of search results into categories based on indexed terms...". Translated, this basically means the side panel located on every search page that shows the numerical count of how many matching documents were found for each term, like the relationship, location or alike:

In essence, faceting makes it easy for users to explore search results, by narrowing down exactly the results they are looking for.

Using the ContentSearch API makes it easy to do faceted search, as you simply use the FacetOn() method on the IQueryable<T> instance to determine what to do faceting on, and finally call the GetFacets() method to get a list of facet results:

If you want to do faceted search on multiple properties, simply chain each FacetOn() before calling the GetFacets() method and you should be good to go.

Dealing with more complex queries

So far we have only worked with simple queries, which works perfectly fine for simple, small and static queries. However, at some point you'll eventually need to build up more complex queries, or even dynamic queries, that requires more work. Consider the following example; retrieve all search results, which has a parent with a given id, lives under a partial path and which names are starting with either A, B or C.

Normally, constructing a query that is able to fulfil the requirements put in the example, would require you to learn and use expression trees, which can be very daunting. The good news is, that Sitecore have already included a tailored version of the predicate builder in the ContentSearch API, that allows you to create complex queries using AND and OR statements. Once you have created the query, the predicate builder will generate the complex expression trees, without you having to know about their inner workings.

Let's see how we can use the predicate builder to construct the query mentioned in the example:

Tip: Initialize queries with False for OR queries and True for AND queries

An important difference from the simple query showed earlier in part 1, is that I'm using the Filter() method instead of Where() on the IQueryable<T> instance. You could use Where(), but the benefit of using the Filter() is that it doesn't calculate relevancy, so if you are doing any other queries, the scoring will not be affected. Furthermore, filters can be cached to optimize search performance.

More exotic extension methods

Apart from the default LINQ functionalities you can perform on the IQueryable<T> instance, the ContentSearch API provides a few more, let's say exotic extension methods for you to use.

For a lot of situations you might never have the need to use these types of extension methods. However, when you do have the need, it's good to know that the ContentSearch API provides that additional support for exactly those situations.

Like<T>()

If you are tasked to create a search query, that requires that you need to use either fuzzy or similarity queries, Sitecore has added support for this using the Like<T>() extension method.

Here are a few examples on how the extension method can be used to archive that:

Matches()

Other times, you might be tasked to create a search query that requires you to use regular expression queries. In order for us to do that, Sitecore have provided us with the Matches() extension method. The method itself takes in a regular expression pattern that is needed when doing the matching, and can be any .NET compliant regular expression.

Let's say we wanted to find template types that matched a specific naming convention, like all templates that describes a file type (.fileExtension), then we would do something like this:

The Matches() extension method is also very useful, when you want to find all results where a field is non-empty in the index. By default, Sitecore will not save null and empty strings into your indexes, as it wastes space. This means that if you try to make the following query context.GetQueryable<SearchResultItem>().Where(x => x.Name != null);, you will get the following exception thrown System.NotSupportedException: Comparison of null values is not supported.. To solve this, you can use to Matches() method to specify that you want to retrieve all results, where the given property of the SearchResultItem matches any value. Put in a different way, instead of asking for the results where field doesn't have a value, we ask for all the results where the field has a value.

Between<T>()

Out of the three, the Between<T>() extension method is probably the most useful one. In fact, chances are that you will need to use it every now and then. The Between<T> methods comes in handy, when you want to do range searches on numbers and dates.

Let's say that you wanted to do a search, where you needed to include all search results, that was created over the span of the past two weeks. The first thing that comes to mind, is doing something along the lines of this:

If you run the code, you'll quickly find out, that it won't work, as it's being serialized into to following Solr query:

__smallcreateddate:[* TO "2015-10-01T00:00:00Z"] AND __smallcreateddate:["2015-10-14T00:00:00Z" TO *]

Without going too much into the Solr range query syntax specifics, the * means everything. So what this query basically does is that it takes all results that is created before and up to the lower date, and all the results from the upper date and after that - not really what we expected, right?

In order to get the desired results, we can use the Between<T> method instead:

If you run the code, the query will now correctly be serialized into the following:

__smallcreateddate:["2015-10-01T00:00:00Z" TO "2015-10-14T00:00:00Z"]

So, if you need to do range searches on either numbers or dates, remember to use the Between<T> method, otherwise your query will not return what you expect.

Putting it all together...

Now that we have seen how all the different components work individually, it's time to see how they can be combined together to form a larger example that has more usage.

For the example, I've created a small product searcher, allowing the user to do searches for products. The overall search is performed in the ProductSearcher type by the user entering a search term, a page number and a page size, that is wrapped in the SearchCriteria type. The search term will be used by the searcher to look for products that matches the search term. The page number and page size is used to determine how many results that should be returned when performing the search. Additionally, the searcher will also look for products, which are created within the last year of the time the search occurs, and lastly some sorting will be applied. Once the search results are fetched, these are mapped to the product domain model, and returned together with the total search result count, as a ProductSearchResult type.

Let's see how this can be achieved when putting the different components together:

Depending on what you want to do, you can either choose to map the returned search results directly to the domain model, or fetch the related Sitecore item, and map that to the domain model.

To my experience, I’ve found that the Sitecore database query performance is pretty good, when fetching an item by it's id, and while keeping the amount of fetched search results down by using pagination. Of course, it will be faster to just map the search results, but there are times where you can't avoid doing that (like when you want to grab the link to the item, where you need to use the link manager), and when that happens you shouldn't be afraid of experiencing a performance loss when retrieving the Sitecore items, and mapping those to the domain models.

And there you have it, my take on a re-introduction to the ContentSearch API in a nutshell. If you have any additional knowledge on this subject, comments or questions, please let me know in the comments section below.