If you've been following along my Sitecore Search Series, I've so far been covering how you set up a Solr server and configuring it to work with Sitecore as well as going over the details on how you can work with, and extend, the ContentSearch API.
In this two part blog post, I'll go over the different challenges that you need to tackle, when you are architecting the search indexing infrastructure for your Sitecore solution - starting with how you can configure and ensure having a robust Solr infrastructure in a Sitecore environment.
The standard configuration of a Solr server in a Sitecore setup
When setting up Solr to be used in a production environment, the most common setup to use is having Solr placed on its own server, whereas both the Content Management (CM) and Content Delivery (CD) server(s), will communicate with it:
Depending on the setup, Solr could either be it's own server as shown on the diagram above, or, given that the client only have one CD and one CM server, live on the CM server, if the amount of data that needs to be processed through Solr isn't large.
Now, while this works, there are some significant issues with this type of setup, most notable that it's not very fault-tolerant, meaning that if your Sitecore solution relies heavily on the search index, an unexpected breakdown of the Solr server could potentially mean a great deal for the client. Moreover, if we wanted to scale the solution, we will need to do some adjustment as well.
Towards a more performant and reliable configuration
I recently had a brainstorming session with my colleague Thomas Stern, where we were trying to figure out, how we could create a simple setup that ensured both performance and availability. From our session of brainstorming, we came up with a few different solutions.
Solution 1 - Pairing the Sitecore servers with a dedicated Solr server
The first solution we came up with involves pairing our Sitecore servers with a dedicated Solr server in a replication setup:
When the user makes a request to the site, a load balancer determines which CD/Solr pair server to use. Looking at the setup in its entirety, this is forming what is called a master/slave replication setup. The CM server will have its own dedicated Solr server installed, which will be configured to be the master Solr server. The CM server will be responsible for updating the Solr index. In addition, each CD server will also have a dedicated Solr server installed, whereas these will be configured as slaves. The CD server will not perform any index updating, instead they will rely on the index updates coming from the CM server.
A keep point here is that, the latency between the Solr server and the CM/CD servers will be zero, since they are running on the same servers.
The idea is that when the master Solr index is updated by the CM server, the slave Solr indexes will be updated to reflect those changes - I should add that this happens very close to real-time. When the CD servers query the Solr index for data, each CD server will be asking their own Solr slave server for data.
Lastly, in order to make the setup more reliable, a load balancer is placed in front of the CD servers. Then if either the CD server or Solr server running on the CD server fails, the entire server "node" must be taken out of the load balancer rotation, until it is fixed and running again.
Solution 2 - A alternative replication setup without dedicated Sitecore/Solr pairing
As an alternative to solution 1, you could also design the Solr servers in a way, such that these will still use a replication setup, but instead of having the Solr servers paired the CD servers, you could have all CD servers communicating with a load balancer that sits in front of the slave Solr servers:
When the user makes a request to the site, a load balancer determines which CD server to use. Once the CD server has been selected, the next load balancer then determines which Solr server to use. This way, we won't have to take down an entire CD/Solr server pair, given the Solr server somehow breaks down, but the CD server still runs without problems. However, we still get the added performance and availability, should the search index be overloaded or suffer from unexpected server issues.
Solution 3 - SolrCloud
A third option would be to use SolrCloud, which has the ability to set up a cluster of Solr servers that combines fault tolerance and high availability. I haven't tried out this approach yet, but according to Apache, it seems like this should be the preferred way to go about building the distributed Solr setup of tomorrow.
A word of warning though, there has been a lot of fuzz about Sitecore and SolrCloud lately, both in several blog posts, but also on the Sitecore Slack Solr channel, since there are some issues related to Sitecore and SolrCloud.
Currently there is an unofficial patch that should address those issues, which has been reported to work in Sitecore 8.1, but keep in mind that this is unofficial, and not something Sitecore is supporting officially yet. Because of this, I haven't tried out SolrCloud myself, but that doesn't mean it can't work out for you. So, if you plan on going down this route, be sure to go over blog posts on the support issue and check out the Sitecore on Solr Cloud series written by Chris Sulham.
If you got additional details on how to get SolrCloud working with Sitecore 8.x, I'd love your feedback about it in the comments section down below, on Twitter or the Sitecore Slack channel.
In the next part, I'll be going over how you should organize your search indexes, as well as going into the details of how you can avoid downtime while rebuilding your indexes.