Diagnosing search queries and indexes in Sitecore using Solr

In this blog post, I'll explain how you can diagnose failing search queries and indexes when using Solr together with Sitecore.

I've outlined a handful of symptoms, both common and more exotic ones. For each symptom you'll find an explanation of the cause(s) and a treatment on how you can resolve the issue.

The tools that can help you narrow down the issues

Before going into the actual symptoms, causes and treatments, let's review the different tools that are available at your disposal, all capable of providing you with aid in diagnosing failing search queries as well as indexes.

  • The Sitecore log files for crawling and search are used for tracking search and indexing operations. The crawling.log file contains information about the indexing process and its operations, whereas the search.log file contains information about the search queries that Sitecore executes.

  • The Solr Administration UI is used to view Solr configuration details, as well as running and analyzing queries against the search index. This tool can be very useful when you want to try and understand, why the ContentSearch LINQ queries being executed from Sitecore to the Solr server isn't working as intended, or simply perform badly.

  • The Solr log files contains detailed information about what goes on under the hood inside the Solr server. The log files can found inside the Solr installation itself, and are written to the server/logs/solr.log file path.

So now that you know about the tools that can help you diagnose the different symptoms you may encounter, let's look at some of the types of symptoms, causes and treatments that you might encounter.

Unable to connect to [http://localhost:8983/solr], Core: [coreName]

This is the most common problem you see when you start to use Solr together with Sitecore. The most likely causes to this symptom includes:

  • The Solr server isn't running
  • The connection string to your Solr server is incorrect

You can easily verify whether the Solr server is running, by opening your favorite browser and navigate over the the localhost:8983/solr url, and check if you get a response. If you aren't getting any response, your Solr instance isn't running, and it could just be as easy as starting the Solr server. If the Solr server cannot be started, the best place to start looking for symptoms is within the Solr log files.

If your Solr instance is running, you need to make sure that your connection to Solr from Sitecore is configured correctly. This is done by opening the configuration file named Sitecore.ContentSearch.Solr.DefaultIndexConfiguration.config and verify that the setting ContentSearch.Solr.ServiceBaseAddress is configured to point to your Solr server.

Wait up, everything is set up correctly, why is it still not working?

If your Solr instance is running and the connection is set up correctly, there are a few other things you could verify:

  • Is Solr getting low on memory?
  • Is the Solr core placed in the wrong folder?

To answer the first question, log into the Solr Administration UI, and click the 'Logging' link in the left-side panel. Next, check if you see anything that resembles this kind of error in the log:

If you see the java.Lang.OutOfMemoryError error, chances are that your indexes are growing large in size and that the underlying Java installation is only a 32 bit version, and not 64 bit. Since you can't utilize more then 4GB RAM on a 32 bit Java installation, large indexes means that Java will run out of memory eventually. The best way to fix this issue is to re-install Java in a 64 bit version, and rebuild all of your indexes afterwards.

If you are not running low on memory, your Solr cores could be placed in the wrong folder. Look through the Solr log files and check if you can find a log entry with the following line: Can not find: schema.xml [C:\solr\solr-5.x.x\server\solr\coreName\conf\schema.xml]. If so, you have copied the schema.xml file to the wrong file location. In order to fix this, simply move the missing schema.xml file to coreName\conf, then restart your Sitecore instance and everything should now return back to normal - this kind of issue has also been described on Sitecore Stack Exchange.

The search query doesn't return any results

Every now and then, you construct a search query using the ContentSearch API, execute it, only to find out that you aren't getting any results back, as you expected.

There can be a number different reasons to why this happens, like:

  • You might be querying the wrong fields in the index
  • Your ContentSearch API search query might be translated into a Solr query in an way that wasn't intended
  • The content being sent from Solr back through the ContentSearch API may not look like what you would expect, or the other way around

In my experience, the best way to attack these sort of symptoms is to crack open Sitecore's search.log file, then locate the query being issued from Sitecore to Solr through the ContentSearch API, and execute the query from the Solr Administation UI. From here you can start tweaking the Solr query directly until it works as intended, where you can go back to your LINQ query code, and check what needs to be changed.

The 400 - Bad Request exceptions

The 400 - Bad Request exceptions can come in two different shapes, either they are subtle and you'd need to look through the crawling.log file for traces of these symptoms, otherwise you are not in doubt as they will crash Sitecore flat out.

In the following section, I'll explain what causes this kind of symptom, and how you can resolve them.

You are missing the field in the Solr core as a dynamic field

If you are seeing an error message containing the text "unknown field" in the crawling.log file, chances are that you have a misconfiguration in one of your schema.xml files.

The symptom is often seen when you are using a multilingual Sitecore solution, where you need to be aware that Sitecore stores language-specific fields in Solr by adding a suffix that corresponds to the locale name - i.e. danish translated fields will be named fieldname_t_da and so on.

The way you treat this problem is by adding the unknown field as an <dynamicField /> node under <fields> section, as explained here.

You have multiple fields with the same name, but with different field types

I encountered this symptom while migrating content from a legacy solution that had never used search before over to a new solution. Basically, the index wouldn't build properly, so I checked the crawling.log file for clues about what caused the symptoms. Here I saw this kind of error being outputted:

Based on this error, I located the item that was causing the issues by its ID, and verified if there was anything suspicious going on in terms of the template being used by the item. Once the template was located, I noticed that it had a field named 'Version' of type 'Single-line Text':

This field introduced a great deal of trouble, since Sitecore defines a computed index field named 'Version' (hint: see the AddComputedIndexField section within the Sitecore.ContentSearch.Solr.DefaultIndexConfiguration.config file) of type integer collection and this resulted in having the field 'Version' defined multiple times with different field types. This means that when Sitecore performs the indexing process using Solr, then there is no match between the current and expected field type, which ends up up breaking the Solr indexing process (do not be mistaken even though it's only showing up as a 'WARN', it will corrupt your indexed data).

Protip: Always ensure that your field names are uniquely defined for your custom templates, so you don't have conflicting fields stored in your index.

Document is missing mandatory uniqueKey field: id

Another concrete example of this kind of exception is the Document is missing mandatory uniqueKey field: id</str><int name="code">400</int> issue. The cause of this symptom is that you haven't copied over the schema.xml file modified by Sitecore as part of generating the Solr schema.xml file, or simply placed it into the wrong folder.

In both cases, Solr will be using the default schema.xml file that comes with Solr, which expects the documents being indexed into Solr to have a uniqueField name ID. If you review the schema.xml file modified by Sitecore, you'll notice that there is no longer an ID field that is required as uniqueField. As a result, this will be causing problems because the data being indexed from Sitecore into Solr doesn't match the schema defined for documents being indexed into Solr.

Final notes

In this blog post, I've only highlighted some of the symptoms you might encounter while using Sitecore and Solr together, but there is of course other cases I haven't covered.

On that note, If you got additional details to the content of this blog post, or encountered a symptom, a cause or even treatment that needs to be addressed, please drop me a note in the comment section below.