Thursday, February 08, 2018

Deleting a lot of records fast

A quick one today...

Needed to delete a couple of million records for a customer and the natural thing was to use the Bulk Deletion service, well, I turned it on and it was extremely slow. Only got about 10 records/s which would cause the entire delete to take over a week. I have checked with Microsoft and this is not a bug, but it is working as designed and is not designed to be super fast. According to Microsoft bulk deletion jobs are put on the async queue on low priority to allow other more important jobs higher prio.

And a favorite quote of mine from Purvin Patel of Microsoft "Does a dump truck need to outrace a Ferrari?" - and I think that the answer to that question is: it depends. Sometimes it does.  

Personally I would sometimes like it to be as fast as possible when removing a lot of records.

I also checked to see how fast the deletion would be with SSIS and Kingswaysoft. Used the following settings:

  • VM about 5 ms from the Dynamics 365 instance (important that it not be too far, use an Azure VM for this)
  • Used 64 threads
  • Used Execute Multiple batching with 10 (cannot use more that 10 if you are using a lot of threads, ie more than 2)
  • VM has 8 virtual cores and 32 GB memory
  • Loading in batches of 2000. Only loading the id-column, as that is all that is needed.

With this setup, I got somewhere around 345 records deleted per second. Which is a tad more than 34x faster than the bulk delete.

So, want to delete a lot of stuff, maybe Bulk Delete is not the way to go. Not yet anyway, let's hope Microsoft makes it faster!

(this post was updated on Feb the 9:th 2018)

Gustaf Westerlund
MVP, Founder and Principal Consultant at CRM-konsulterna AB
www.crmkonsulterna.se

4 comments:

  1. While I agree that it takes nigh-on forever to delete stuff (I semi-regularly have to bulk delete in excess of 1.5 million records, which most recently took almost 5 days in a non-prod environment), I look at these external options cautiously.

    I'm always concerned that they do these sort of things in a supported fashion (web service / SDK vs SQL delete). Also, even though it will be faster to delete stuff without workflows and plugins, I still would feel better if those were triggered on delete (or at least having the option). And it's not just plugins and workflows, stuff like cascading deletes should still happen.

    If this tool uses the web service / SDK to do these deletes, then I'll be very interested in it :-)

    Do you know whether or not it works like that? I can't find any specific mention of this sort of behaviour on their site (other than "for best performance, disable these things).

    Perhaps next time I do this, I'll write a little console application to do the "bulk delete" using the normal web service methods and compare the execution time accordingly :-)

    ReplyDelete
  2. As Gustaf has demonstrated you can in crease the throttle of the delete operation by creating multiple threads to perform the operation. This is done in a supported way via the API. Plugins and workflows do obviously slow it down but keeping them on is a decision made on a case by case basis. This is actually very easy to do in a simple console app. My current speed record to CRM online over a latent internet connection for data throughput is about 500 records a second which has been sufficient for my integration requirements.

    ReplyDelete
    Replies
    1. Yes, Indeed. Playing with the parameters of number of threads and number of messages per executemultiple you can probably increase this. Do read the Kingswaysoft blog as they have some excellent info on the throttle settings of Dynamics 365 Online.

      Delete
  3. Kingswaysofts adapter for SSIS works through the API. Deletes using direct SQL would be a lot faster, and I agree, don't do it, as you risk the integrity of the system. The example above was with Dynamics 365 online and you cannot even get access to the db there so the APK is the only way. Threading is the trick when working with the API though...

    ReplyDelete