Monday, May 18, 2015

Be aware of Synchronous Workflows on incoming emails and Queue items

How synchronous workflows that fail, make emails disappear.

During the last few Days I have been having the not so pleasant experience of trying to find some really nasty errors in incoming emails to a queue in CRM. We are using CRM 2013 SP1 (on prem) and hence are using the Server Side Sync. Everything had been working great and we had developed some neat workflows that did some magic to convert emails to cases (as those features were not available, and not good enough either) when we develop our stuff.

Well, thing stopped working, I tried sending emails to the CRM server, but none were received in CRM. All I got in CRM was a not so descriptive general alert saying:

"An error occurred while creating incoming email in Microsoft Dynamics CRM for the mailbox XYZ. To view the email that failed, see the alerts in the mailbox record."

and in the mailbox:
"An error occurred while creating the incoming email "Test 1512" in Microsoft Dynamics CRM for the mailbox XYZ."

As I am working on my telepathic abilities, I have not yet reached a level where I can understand what is wrong so I had a look in the trace, and I was very happy I was running an on-prem server. It is a bit long, don't worry if you don't understand it all, nobody (I think) does.

[2015-05-15 15:21:13.459] Process:CrmAsyncService |Organization:673c5996-a1fa-e311-93ed-00155d0a831d |Thread:    4 |Category: Platform.Metadata |User: 00000000-0000-0000-0000-000000000000 |Level: Error |ReqId:  | <>c__DisplayClass1.<LoadMetadataContainerFromDatabase>b__0  ilOffset = 0x14
>Multi-org sharable cache loading system and non-system metadata with build number and language 1033
[2015-05-15 15:33:45.132] Process:CrmAsyncService |Organization:00000000-0000-0000-0000-000000000000 |Thread:   26 |Category: Exception |User: 00000000-0000-0000-0000-000000000000 |Level: Error |ReqId: cf367175-12a2-4719-a5af-f53f7f254399 | CrmException..ctor  ilOffset = 0x7
 at CrmException..ctor(String message, Exception innerException, Int32 errorCode, Boolean isFlowControlException)  ilOffset = 0x7
 at CrmException..ctor(String message, Exception innerException, Int32 errorCode)  ilOffset = 0x5
 at Exceptions.ThrowIfGuidEmpty(Guid parameter, String name)  ilOffset = 0x18
 at OrganizationSdkServiceInternal.Retrieve(String entityName, Guid id, ColumnSet columnSet, CorrelationToken correlationToken, CallerOriginToken callerOriginToken, WebServiceType serviceType, Boolean checkAdminMode)  ilOffset = 0x16
 at InprocessServiceProxy.RetrieveCore(String entityName, Guid id, ColumnSet columnSet)  ilOffset = 0x28
 at OrganizationServiceProxy.Retrieve(String entityName, Guid id, ColumnSet columnSet)  ilOffset = 0x4
 at EntityDictionary.TryRetrieveEntity(String key)  ilOffset = 0xF8
 at EntityDictionary.System.Collections.Generic.IDictionary<System.String,Microsoft.Xrm.Sdk.Entity>.get_Item(String key)  ilOffset = 0x1B
 at VisualBasicValue`1.Execute(CodeActivityContext context)  ilOffset = 0x3A
 at ActivityExecutor.ExecuteInResolutionContext(ActivityInstance parentInstance, Activity`1 expressionActivity)  ilOffset = 0x35
 at InArgument`1.TryPopulateValue(LocationEnvironment targetEnvironment, ActivityInstance activityInstance, ActivityExecutor executor)  ilOffset = 0x2F
 at ActivityInstance.InternalTryPopulateArgumentValueOrScheduleExpression(RuntimeArgument argument, Int32 nextArgumentIndex, ActivityExecutor executor, IDictionary`2 argumentValueOverrides, Location resultLocation, Boolean isDynamicUpdate)  ilOffset = 0x16
 at ActivityInstance.ResolveArguments(ActivityExecutor executor, IDictionary`2 argumentValueOverrides, Location resultLocation, Int32 startIndex)  ilOffset = 0x9C
 at ExecuteActivityWorkItem.ExecuteBody(ActivityExecutor executor, BookmarkManager bookmarkManager, Location resultLocation)  ilOffset = 0x1C
 at ActivityExecutor.OnExecuteWorkItem(WorkItem workItem)  ilOffset = 0x53
 at Callbacks.ExecuteWorkItem(WorkItem workItem)  ilOffset = 0xD
 at Scheduler.OnScheduledWork(Object state)  ilOffset = 0xC5
 at SendOrPostThunk.UnhandledExceptionFrame(Object state)  ilOffset = 0x0
 at PumpBasedSynchronizationContext.DoPump()  ilOffset = 0x2E
 at WorkflowApplication.Invoke(Activity activity, IDictionary`2 inputs, WorkflowInstanceExtensionManager extensions, TimeSpan timeout)  ilOffset = 0x23
 at WorkflowInvoker.Invoke(Activity workflow, IDictionary`2 inputs, TimeSpan timeout, WorkflowInstanceExtensionManager extensions)  ilOffset = 0x36
 at SynchronousWorkflowActivityHost.ExecuteWorkflowUsingInvoker(Activity workflow, ICommonWorkflowContext context)  ilOffset = 0xD0
 at SynchronousWorkflowActivityHost.StartWorkflow(WorkflowActivationData activationData, ICommonWorkflowContext context)  ilOffset = 0x73
 at SynchronousWorkflowActivityHost.StartWorkflow(ICommonWorkflowContext context)  ilOffset = 0xBA
 at WorkflowProcessServiceInternalHandler`1.ExecuteSyncWorkflow(Guid workflowId, PipelineExecutionContext pipelineContext, IGenericEventData workflowInstanceData, Boolean isTriggered)  ilOffset = 0x59
 at WorkflowProcessServiceInternalHandler`1.ExecuteTriggeredSyncWorkflow(Guid workflowId, PipelineExecutionContext pipelineContext)  ilOffset = 0x21
 at SyncWorkflowExecutionPlugin.Execute(IServiceProvider provider)  ilOffset = 0x2C
 at V5PluginProxyStep.ExecuteInternal(PipelineExecutionContext context)  ilOffset = 0x2A3
 at VersionedPluginProxyStepBase.Execute(PipelineExecutionContext context)  ilOffset = 0x65
 at Pipeline.Execute(PipelineExecutionContext context)  ilOffset = 0x65
 at MessageProcessor.Execute(PipelineExecutionContext context)  ilOffset = 0x1FB
 at InternalMessageDispatcher.Execute(PipelineExecutionContext context)  ilOffset = 0xE4
 at ExtensiblePlatformMessageDispatcher.Execute(PipelineExecutionContext pluginContext)  ilOffset = 0x0
 at ExtensiblePlatformMessageDispatcher.CreateWithInvocationSource(BusinessEntity entity, Int32 invocationSource, ExecutionContext context)  ilOffset = 0xB4
 at BusinessProcessObject.Create(IBusinessEntity entity, ExecutionContext context)  ilOffset = 0x54
 at GenericActivityServiceBase.Create(IBusinessEntity entity, ExecutionContext context)  ilOffset = 0xA1
 at CommunicationActivityServiceBase.Create(IBusinessEntity entity, ExecutionContext context)  ilOffset = 0x1A
 at EmailService.SetRecipientsAddAdditionalAttributeAndCreate(AddressEntry[][] allResolvedAddressEntries, Email email, Entity emailDeltaEntity, String traceSubject, ExecutionContext context)  ilOffset = 0x1DA
 at EmailService.Deliver(Boolean userPromote, Guid emailId, String messageId, String subject, String from, String to, String cc, String bcc, DateTime receivedOn, String submittedBy, String importance, String body, BusinessEntityCollection attachments, Guid campaignResponseId, Entity emailDeltaEntity, ExecutionContext context, Boolean validateBeforeDeliver)  ilOffset = 0x524
 at EmailService.DeliverIncoming(String messageId, String subject, String from, String to, String cc, String bcc, DateTime receivedOn, String submittedBy, String importance, String body, BusinessEntityCollection attachments, Entity extraProperties, Boolean validateBeforeCreate, ExecutionContext context)  ilOffset = 0xB8
 at RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)  ilOffset = 0xFFFFFFFF
 at RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)  ilOffset = 0x25
 at RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)  ilOffset = 0xCF
 at LogicalMethodInfo.Invoke(Object target, Object[] values)  ilOffset = 0x4F
 at InternalOperationPlugin.Execute(IServiceProvider serviceProvider)  ilOffset = 0x57
 at V5PluginProxyStep.ExecuteInternal(PipelineExecutionContext context)  ilOffset = 0x200
 at VersionedPluginProxyStepBase.Execute(PipelineExecutionContext context)  ilOffset = 0x65
 at Pipeline.Execute(PipelineExecutionContext context)  ilOffset = 0x65
 at MessageProcessor.Execute(PipelineExecutionContext context)  ilOffset = 0x1C5
 at InternalMessageDispatcher.Execute(PipelineExecutionContext context)  ilOffset = 0xE4
 at ExternalMessageDispatcher.ExecuteInternal(IInProcessOrganizationServiceFactory serviceFactory, IPlatformMessageDispatcherFactory dispatcherFactory, String messageName, String requestName, Int32 primaryObjectTypeCode, Int32 secondaryObjectTypeCode, ParameterCollection fields, CorrelationToken correlationToken, CallerOriginToken originToken, UserAuth userAuth, Guid callerId, Guid transactionContextId, Int32 invocationSource, Nullable`1 requestId, Version endpointVersion)  ilOffset = 0x16E
 at OrganizationSdkServiceInternal.ExecuteRequest(OrganizationRequest request, CorrelationToken correlationToken, CallerOriginToken callerOriginToken, WebServiceType serviceType, UserAuth userAuth, Guid targetUserId, OrganizationContext context, Boolean returnResponse, Boolean checkAdminMode)  ilOffset = 0x1F1
 at OrganizationSdkServiceInternal.ExecuteRequest(OrganizationRequest request, CorrelationToken correlationToken, CallerOriginToken callerOriginToken, WebServiceType serviceType, Boolean checkAdminMode)  ilOffset = 0x2D
 at OrganizationSdkServiceInternal.Execute(OrganizationRequest request, CorrelationToken correlationToken, CallerOriginToken callerOriginToken, WebServiceType serviceType, Boolean checkAdminMode)  ilOffset = 0x26
 at InprocessServiceProxy.ExecuteCore(OrganizationRequest request)  ilOffset = 0x34
 at IncomingEmailProviderBase.DeliverMessageInternal(EmailMessage emailMessage, Boolean validateBeforeCreate)  ilOffset = 0x1B7
 at IncomingEmailProviderBase.DeliverMessage(EmailMessage emailMessage, Boolean validateBeforeCreate)  ilOffset = 0x19
 at GetItemsStep.ProcessResponse()  ilOffset = 0x5AB
 at ExchangeIncomingEmailProviderStep.EndOperation()  ilOffset = 0xFC
 at ExchangeIncomingEmailProvider.ReceiveEmails()  ilOffset = 0x92
 at IncomingActivityProviderBase.Run()  ilOffset = 0x42
 at MailboxEmailOperation.PerformOperation()  ilOffset = 0x2C
 at MailboxOperationBase`1.Execute()  ilOffset = 0xAA
 at MailboxProcessingOperation.PerformOperation()  ilOffset = 0x1A
 at MailboxOperationBase`1.Execute()  ilOffset = 0xAA
 at MailboxOperationCommand.InternalExecute(MailboxAsyncEvent asyncEvent)  ilOffset = 0x64
 at AsyncCallbackHandler`2.ResumeExecution(IAsyncEvent asyncEvent)  ilOffset = 0x56
 at AsyncEventExecutionManager`2.ExecuteHandler(IAsyncEventHandlerFactory handlerFactory)  ilOffset = 0x8A
 at PoolHandler.ProcessAsyncEvent(IAsyncEventExecutionManager asyncEventExecutionManager)  ilOffset = 0x144
 at AsyncEventOperation.<.ctor>b__2(IServiceOperation operation)  ilOffset = 0x0
 at FaultToleranceBehavior.Execute(ServiceOperationAction operation, IServiceOperation operationParameter)  ilOffset = 0x18
 at MonitoredOperation.Execute()  ilOffset = 0xD
 at ThreadPoolQueueExecutionEngine.InvokeNextOperationInThreadPool(Object state)  ilOffset = 0xC
 at ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)  ilOffset = 0x70
 at ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)  ilOffset = 0x4
 at QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()  ilOffset = 0x30
 at ThreadPoolWorkQueue.Dispatch()  ilOffset = 0xA3
>Crm Exception: Message: Expected non-empty Guid., ErrorCode: -2147220989, InnerException: System.ArgumentException: Expected non-empty Guid.
Parameter name: id
[2015-05-15 15:33:45.304] Process:CrmAsyncService |Organization:00000000-0000-0000-0000-000000000000 |Thread:   26 |Category: Platform |User: 00000000-0000-0000-0000-000000000000 |Level: Error |ReqId: cf367175-12a2-4719-a5af-f53f7f254399 | ExceptionConverter.ConvertToFault  ilOffset = 0x69
>UNEXPECTED: no fault?

My first thought was that the stack trace was really long... and then there was some mentioning of User with Guid 00000000-0000-0000-0000-000000000000.

Another kind of queue ticket
As the Server Side Sync has something of a rep for not being the most reliable guy in Town, I wasn't really sure if this error really was an error that we had caused or if there was something wrong with the server side sync. As my perception was that we hadn't really changed any thing, I started with the recreating the mailbox, recreated the server profile, ran test and enable and that worked, but when I tried sending my own emails, from different email accounts, as you never know if there is something spooking it with a specific email address, I still never got the queue to read any emails, not even the test emails created by CRM. I was starting to get a bit short on hair by this time, so I resorted to my classical troubleshooting method of backing off as far as possible and seeing if things still were broken, and I hence deactivated all workflows and plugins that were associated to email and Queue item, and gave the user full access and I was once again awarded with this method, emails started arriving in the queue in CRM.

I then revoked admin access and started re-enabling each single workflow/plugin until thing started
breaking again. This turned out to be a painstaking process as the only way I found to test this was to send an email to CRM and then wait for about 5 minutes until it was/or was not pulled into the queue.
It took me more or less a days work just to find the two workflows that were causing the problems.

The actual problems in them selves, arn't really that interesting, it turned out that the error message above was actually quite correct, If you try to set a null-user to an owner field, CRM will respond like this.

What I did notice however was that the workflows that we were using on "queue item" in this case were synchronous (yes, yes, we do have good reasons for it, I hope you do to, Otherwise, please choose asynchronous/background workflows) and as they broke, CRM did two interesting facts that are not very desirable and noteworthy:
  • The mailbox was shut down for further email processing. It had to be "Test and Enabled" to start receiving emails again. I have yet to see a good way to do this in an automated fashion and to be able to monitor the mailboxes to see if they are working as expected.
  • The email that was the cause of the breaking of the workflow, got lost. Even after the mailbox got reactivated, it had marked this email as read and did not read it into CRM despite the fact that it was neither in the queue nor as a lone activity which might have been the case if the Creation of the queue item broke. But obviously the entire process is handled atomically.
Based on this I have the following recommendations:
  1. If possible try to avoid an synchronous logic on receiving email or Creation of queue items. Try to move logic to asynchronous as this will cause errors to be non-fatal, like the two types of effects that are hard to handle above.
  2. If you do need to have synchronous logic on the receiving of emails or Creation of queue items try to add conditions in the beginning that will ensure that the logic only runs in those very specific cases where they are supposed to. Be aware of the fact that queue items can be associated to many different entities, so don't rely on the fact that they should be only connected to something specific. At least take into consideration the fact that they will be connected to emails at some Point in time.
  3. I read an interesting fact today, that the activation of business rules Controls in which order they will be executed. If you are using CRM 2015 with server side (Entity) scope, this could probably cause similar issues if the activation is done haphazardly and causes the execution to be done in a way that can cause a bug and break processing of the queue item or email. I havn't checked if these are run pre- or poststate but I think that depends on the action.
  4. If you get errors, especially when using CRM Online, try to replicate the error in the sandbox Environment and then either move backwards to a Point where it starts working again, or deactive Everything related to emails and queue items, make sure it works, and move forward. When you know where the error is, finding what the error is, is so much easier.

We also found a related bug in CRM today, if you try to create an asynchronous workflow that triggers on a field which has field based rights activated on it, it will break, with some weird error message about not being able to change domain for the user. The simple fix for that is to make the workflow synchronous and to make it run as the user that activates it, that worked. We didn't try to run it as the user who created the workflow, as that is the same logic that is used in asynchronous workflows, I wasn't too keen on trying something that was more likely to fail. If you have tried it, please leave a comment below.

I hope to see a lot of people tomorrow on the 19:th at the Swedish CRM UG Meeting!

Gustaf Westerlund
MVP, Founder and CTO at CRM-konsulterna AB