Tip #187: Slow batch processing

Sometimes you need to perform checks or calculations against each record in a particular entity. If you have a sizable on premises deployment, it could be that the marketing department just came out with a new algorithm to rank the customers and some voodoo needs to be done for every account record. If you are an ISV, it could be some bulk data preparation that you need to perform (e.g. in our solution we had to analyze every phone number for every contact and account record in the system).

Whatever is your scenario, quite often these batch processes include hundreds of thousands or even millions of records, they are time-consuming and can span hours and days. Frequently, there are additional requirements of not running these during the business hours. So how do you keep track of which records have been processed already and which ones are still to be done?

Surely, we can create and fire up a small workflow for each record but having millions of workflows not going to win you any new friends among system administrators. We could add a simple boolean attribute to the target entity and update it after we process each record to indicate that we are done with this record. That’s better but not a very good choice for ISVs as it immediately drags the target entity into your managed solution thus introducing new dependencies without a good reason. In addition, changing this attribute would run unnecessary update on the target record, modifying timestamp and potentially introducing inconsistencies in the data.

That’s where native N:N relationship can help. Note: some steps below do require coding, we deliberately skip technical details to concentrate on the essence of the approach.

  1. Add a new entity called, say, Batch.
  2. Add a native N:N relationship between the target entity (e.h. contact) and batch.
  3. Create new batch record.
  4. Associate all target records with the batch record.
  5. Use either FetchXml or QueryExpression to find records associated with the batch record. Use pagination as required.
  6. Process records retrieved (one page at a time). Once record is processed, disassociate it from the batch.
  7. Rinse, repeat until all records are processed, i.e. the batch record has no associated entities.
  8. Delete batch entity (it’ll delete the relationship as well).

This approach has the following advantages:

  • It’s non-intrusive. Adding an entity and N:N relationship to another entity has no impact, introduces no dependencies and adds no customizations for the latter. Nada, zip, bupkis. Visually, there will be additional link in the navigation bar but it’ll be gone once batch entity is removed.
  • It’s durable and survives suspension of processing, system restarts, etc, etc.
  • It can be expressed and used by a programmer. Note that we have a step of associating all records to be processed first. In theory, we could have gone the opposite way, i.e. associate processed records with the batch. However, that would have required NOT IN expression which is slowly making its way through but is not yet available in CRM.
  • It’s simple to master and execute.

The final observation. If your target entity is contact, account or lead, marketing lists can be used instead of a custom batch entity but personally I prefer to keep unrelated functionality separate and would still recommend using a custom entity approach.

Leave a Reply

Your email address will not be published. Required fields are marked *