The subtleties of merging Contacts in Sitecore

I've recently been getting myself involved in Sitecore solutions that relies heavily on Sitecore's analytics features, and where working with Contacts are considered a cornerstone in the solution. During my involvement, I was made aware of a rather subtle, yet interesting gotcha, which is related to the merging of two contacts and how this relates to the chosen identifier of the contact(s).

In this blog post, I'll provide you an overview of what actually happens behind the scenes when you merge two contacts together, and what consequences of doing so are in terms of choosing the identifier for the contact.

The mechanics behind contact merging

In order to merge together two contacts, you will need to use the MergeContacts() method, which is found on the ContactRepository. As parameters, the methods takes one contact (* the dying contact*) and merge it into another contact (* the surviving contact*).

Merging

When the method to merge the contacts is called, there are two overall things happening within Sitecore:

  1. The mergeContact pipeline is called, which is responsible for merging over data from the dying contact to the surviving contact, including tags, counters, facets and other attributes

  2. The dying contact is registered as an merging source on the surviving contact (which will be relevant a bit later)

Once done, the surviving contact has the same data as the dying contact in-memory of the current session, but the merging itself has not yet been persisted back into the xDB. In order to do so, we either have to trigger the save and release of the contacts manually, or simply wait for the session to time out, whereas the contacts will be flushed back to the xDB.

Digging deeper into the inner workings of the merging

In order to understand what's actually going on from the moment the session is timed out, we have to take a closer look at the ContactManager class, where we can see how this is working:

When the timeout has occurred, the contact manager will save and release the contact to the xDB, using the method SaveAndReleaseContactToXdb() - which is also the same underlying method we would be calling, if we were to save and release the contact ourselves.

From here, there are actually a lot of things going on before the contacts data are stored back into the xDB. To better understand this process, I've created a high level diagram, that can serve as a mental picture of the individual steps, as I explain them:

The internal process of merging contacts

The contact manager will start submitting the contact by calling its own SubmitContact() method. During the submit of the contact, the contact is being saved through the ContactRepository by calling its SaveContact() method. When the contact is being saved, it is checked if the contact contains any merging sources, and if so, these will be marked as being obsolete contacts. In our case, the surviving contact will have the merging source of the dying contact, so the dying contact will be marked obsolete by calling the method ObsoleteContact() on the MongoDbDataAdapterProvider class. The data adapter provider will delegate the call to mark the contact as being obsolete through the MongoDbContactStorage implementation using its ObsoleteContact() method. In this method, the contact entry in the xDB (or MongoDb) will have a Successor attribute written into the surviving contact, containing the ID of the dying contact.

Note: In practice this means that the xDB knows that the surviving contact has ties back to the old (dying) contact, and that when we are identifying a contact, this will always point to the surviving contact.

At this point, we have stored the data of both the dying and surviving contact. An important point is that when the contacts are persisted in the xDB data storage, the identifier of the contacts haven't changed, they are remaining the same.

When would I need to merge contacts?

Let's say that we choose to have the e-mail as the identifier of the contact, and that the user is identified using a given e-mail the first time they log into the solution. At some point later in time, the user chooses to change the originally selected e-mail to something different.

Dr Evil

As part of letting the user change their e-mail, we now have to create a new contact and identify it with the new e-mail. However, since we still want to track the user based on all the previous interactions, we merge the old contact of the user based on the old e-mail into the new contact.

Going back to how the code works for merging contacts, this immediately gives us some challenges as the old contact still has the old e-mail as its identifier. This means that if the same user later would like to create a new user with a previously used e-mail address, Sitecore will think that this needs to be linked together with the obsolete contact in xDB, which will result in some unexpected behaviour; basically Sitecore will try to reuse the previous 'dying' contact and set its successor to be the active 'surviving' contact, which will end up in an infinite loop since they are both now a successor of their respective counterpart.

So, how do we solve this?

In order to overcome this issue, what we need to do is to separate the ties between the two contacts. One of the things you can do is to set the identifier of the dying contact to something different then the e-mail, for example a GUID value. That way, Sitecore no longer sees the two contacts as being related, just remember to do this right after the merging is done, and before the contact is being flushed back to the xDB.

By doing so, we are ensuring that the identifier of the dying contact will no longer hold any information to the previously set e-mail, thus allowing us to create a new user with the previously used e-mail address, without having any issues.

Final words

After seeing that this is indeed a use case that can happen, I strongly advice that you choose the identifier carefully. If you decide to identify a contact by something that might change over time, make sure that you truly invalidate the dying contact, as part of performing the merging of the two.

If you got additional details to the content of this blog post, please drop me a note in the comment section below.