Monday, June 25, 2012

Migration strategies

Migrations are often complicated and not very sexy, all the work you put into it, configuration, scripts and code will be more or less useless after the migration has been done. However, choosing the right strategy is essential for a good CRM implementation. I will go through some of the strategies that are common and discuss some of the advantages and disadvantages.

Greenfield migrations are not really migrations, but rather the lack of migration. This means that the system will be set up without any migrated data, no accounts, no contacts etc. The idea is then that all the users will create data in the new system as the feel the need for it.

This is often used for start-ups when there is no or very little data. It can also be used when the data quality is very bad and trying to filter out the "good" data is just to complicated. Normal problems can be large numbers of duplicates, errors in the data itself, like "." and "," being inserted instead of real data just to make the forms savable.

Advantages of Greenfield migrations is that the system will be up and running in no time. Migrations might otherwise take considerable amounts of time. This also mean that the cost of Greenfield migrations are very low. Do note however, that there is need for some general system configuration, like creating users, setting up queues.

The disadvantage is that the users do not feel that the new system is helping them out and they have to reinsert a lot of data which can create considerable amounts of annoyance with the new system. This can in the long term increase the bad-will towards the system, lowering usage which can jeopardize the entire CRM investment.

Standard import
The standard import functionality of Dynamics CRM is reasonably powerful and can usually be used for importing simpler data. It is based on data being stored in csv-files or excel as xml-files. The first of these being a bit problematic as it is rather old and dependant on the regional setting. For instance, the value separator in Sweden is semicolon ";" but in USA comma ",". It is also common that the csv-files contain data that risk messing up the file syntax, breaking the imports. In general the excel-as-xml format is to be preferred as it also contains field names and proper matching.

This functionality has greatly been enhanced in CRM 2011 compared to CRM 4. For instance, it now support uploading of zip-files containing xml-data files with relations between the files. It can even handle multiple relations that otherwise would take three import runs, all in one go. For instance if you have data with accounts containing primary contact and all contacts also contain parent customer.

The limitations of standard import is that it cannot really handle large amounts of data as the maximum upload size is 8 MB (can be changed in an onpremise installation). Complicated data structures are also hard to handle and it is rather time consuming to run it as there is no scripting capabilities. The lack of scripting also makes the ordering of the different imports in a zip more or less impossible. There is also no support for logic like "If the contacts email adress exists, update the data of that contact, otherwise create a new contact" which is often very useful when importing data from multiple sources. It is also hard to do testruns of the migration without a lot of manual steps involved. There is some error handling but it is very rudimentary and cannot handle more complex logic.

The advantages of standard import is mainly that it requires no extra software and it is relativly easy as the options are limited.

Third party products
One of the more powerful options when doing migrations, is using third party products, like Scribe, Import Manager. These include lots of options that the standard import doesn't like:
  • ODBC/OLE DB connections to a data source
  • Logic to handle update or create
  • Scripting - custom logic to handle some of the data conversions
  • Run as service - ability to be run in the background as a service
  • Reuse for integrations - these tools can also be used for integrations why the effort invested can at least partly be reused.
  • Support for custom addons
  • More powerful - has support for multi threading and large data sources
  • Easier to do test migrations as all steps can be set up in a run-script, less manual labour.
  • Can handle more complex data
  • Better error handling
Not all products support all of these features.

The main drawbacks are the licensing fees required. Do note that some of these companies have special migration licenses which are not the same as the full license used for integrations. These products are also a bit more complicated due to the larger sets of options available. Hence it also takes some time to learn the product why my recommendation is to choose one and stick with it.

Custom migration program
The most advanced for of migrations need to be done by using custom code. There are really very few limitations to what can be done when writing custom programs that migrated data, it is more up to the skills of the developer and the time available.

Some of the advantages:
  • Limitations only in skills and available time and money
  • Easy to do test runs
  • Complicated error handling logic can be created
The main disadvantages are:
  • Demands developers
  • Time consuming - from around 100 hours to several thousand hours
General recommendations
Migrations are often complex and it is not until you have worked with the data a bit that you start to get a feel for how problematic it is going to be. Hence I usually never give fixed prices on migrations as it would either be inflated or put me at risk. It is also often the case that customers do not realize the complexity of migrations, especially smaller customers it is therefor essential to involve them a lot in the work.

De-scoping the migration is also very important. Usually not all data is required, especially if the old system can be maintained with read-only access. Try to de-scope in width (which entities are really required) and in depth (how old data is required).

Data quality can also be a problem so try to evaluate this early on. Examples are names being stored both as "First name Last name" and "Last name, First name" or fill-in data like ".", "," or "-" that has been entered into required fields to make the forms savable.

Large migrations might required delta migrations, when migrations are run in two steps, first the major migration, then after it has been completed, a smaller to migrate the data that was changed during the main migration. This will put additional demands on migration scripts and/or code so try to avoid it if possible.

Sometimes the new system required data that doesn't exist and hence has to be created. I usually refer to this as migration of non-existent data. It is easy to forget if your perspective is to map the data from the old system without looking at which data is required in the new.

Migrations between two systems that with very different data models is theoretically complicated. There might, for instance exist data in several places that need to fit into one. Data might need to be restructured in complex ways. In depth knowledge of source and target data model is essential and the proper skills required to understand complex data modelling is also a strong recommendation.

I hope I have shed some light on this subject which can be discussed at length.

Gustaf Westerlund
CEO, Chief Architect and co-Founder at CRM-konsulterna AB


  1. Michael Pettersson25 June, 2012 14:48

    Great post
    Working with another CRM-system than Dynamics I can tell you that the challenges seems to be universal.

    One other challenge to add to the topic, and the expected time needed, is how duplicates should be handled. Duplicates can come both from migrating from multple sources that might contains the same information or when migrating to an enviroment that already contain part of the information you want to migrate.

  2. Thank you Michael, nice to know it is appreciated. Yes, I mentioned duplicates briefly but I agree there is a lot to discuss on that subject. Especially when the matter of which data is master is not clear or subject to some rules i.e. if this field is X then source Y is master otherwise source Z.

    1. Michael Pettersson25 June, 2012 17:51

      Yes, and how should the duplicate check be done?

      The best approach would in theory be to perform the duplicate checks before the migration so you don't pollute your system. But since you the information has different structures and layout, it is often hard to perform good de-duplication outside of the CRM-system.

      From my experience sometimes it is best to import into test or production environments and perform de-duplication within the CRM-system, that can provide better ways for both spotting and removing duplicates. But that's maybe a topic for another post.

    2. The rules for duplicate checking are very important and sometimes it is not so simple. The problem I have seen with the strategy of de-duplicating manually is that the run-time of the migration is often very limited (usually a weekend) due to the fact that the organization cannot be without a CRM system for more than that. It greatly limits what can manually be done with the data. Manual handling also limits the ability to do test-runs as they will take considerable amounts of manual work each time.

      For complex data migrations with multiple sources, my recommendation is to use a staging database where all data is first inserted into. Scripts can then be run on this to identify quality problems, duplicates etc. When the staging database meets a minimum criteria set together with the customer, it is imported into the CRM-system.

      There is a lot that can be discussed on this and I might write a separate post on the subject.

  3. Great blog post. Migrations sure can be tricky sometimes when dealing with multiple datasources. If you want to add even more complexity to the table, you might run into customers who later on wants to add data from from additional legacy systems that have data that might the existing. Then you have to create rules that defines when the data in your data source "wins" and when the data in CRM "wins". I Agree with you Gustaf about staging database, having a staging database eases the burden while trying to verify the data. Create one that is identical to the CRM-entity model that you are migrating data into. Then if you want to cheat, in CRM you can create space for primary and secondary keys fields representing the data relationships in your legacy. Bringing those keys from the legacy systems into CRM has saved me quite some time when I've needed to correct the data in CRM because of something I've forseen. With a third party tool such as Import Manager or Scribe you can have quite good control over the data in CRM throughout the migration and hopefulle it should bring you as much headache if you do something wrong.

    1. Thanks Peter. Storing the legacy keys is usually a good idea. With multiple sources it does sometimes become quite tricky as there are multiple keys to store, and creating a new entity for legacy keys seems a bit of an overkill.

    2. Just to clarify, the legacy keys represent the data model in the staging database, which is structured the same way as the CRM entity model - meaning 1 primary key for each CRM-entity (and as many foreign keys as you need). I've never ran into a situation where I've feelt the need for creating a whole entity for storing keys.. only a field. I'm not sure I follow you there.

    3. No, I have not felt the need either. But if not using a staging db and having multiple sources, instead of having one field per source, you would create an entity for managing data sources with 1:N relationsships to all migrated entities.