I posted, previously, about my foray into Windows Workflow. In that post, I described a bit about the application I was writing and how I was using Workflow to solve the problem of moving orders through various business processes.
Well, the application is maturing and I recently deployed an update to the original flow definitions and everything ground to a halt. I was aware of the versioning issues, thought I had it covered, but I didn't. Before I delve into the details, I want to say that while I find the current story around versions of flows to be lacking in many ways, I honestly can't think of a better solution to the problem. That said, the approach reminds me a bit of the versioning of COM interfaces in the good old days.
The problem goes something like this...I deploy a version of my flow (say version 1) and I start sending items down the flow. These are long running processes, so I'm using the persistence service and database to store the state of my flow instances as needed. At any point in time, I have several instances at various states of completion with state stored in the database. I decide I need to change one of my flows and replace version 1 with version 2, of my code. Now, all the instances that were in my persistence database that were expecting to run on version 1 of my flow are stuck. There is a strong affinity between a persisted instance and the version of code it was started on because the version of the code is part of the persisted state. Also, the state of each activity in the flow is persisted. If I've added, removed or rearranged any of my activities, the state will no longer fit into the flow when it's dehydrated. If an attempt is made to continue the a flow from version 1 on the version 2 code base, the result is an odd index out of bounds exception.
I did a lot of searching to find a solution to this problem and no where did I find the complete solution. After piecing it all together and doing some experimentation, I finally hit on the secret sauce. The only mention of this problem, that I could find on MSDN was here and mentions only that .Net versioning practices should be employed. So, being the bright guy that I am, I proceeded to generate a key pair (using SN.EXE), marked my project containing the flows as being strong named and associated it with my key pair for signing. The objective, here, is to be able to keep all versions of your flow around. So, I GAC version 1 and also version 2. The workflow engine is smart enough to discern which version the instance should be loaded back in to. I understand this approach and it seemed to me it would work, but it didn't.
If you're using persistence and not using a timer, you must be firing events into your flow to get it to continue after it's been persisted. To do this, you need to use the External Data Exchange service. You define an interface that contains the events that can be fired into the flow - and another for events the flow can fire out to your host. You then create classes that implement the two interfaces ad register them for use. In my implementation, I had events going both ways. After deploying both versions of my flows, I tried to fire an event to an existing instance. This produced an exception stating that the event could not be delivered because the queue had not yet been created. WHAT???
The problem here was how I had my classes grouped into projects. I had my flows and External Data Exchange interfaces defined in the same project which meant that I could not version my flows apart from the interface. While I have not found documentation to back it up, my suspicion is that regardless of what versioning tactic you use, a new version of the interfaces will not be tolerated. The only solution to this is to move the interfaces and the code that implements them into a separate project and never touch the version number for it. After rearranging my classes and projects, I was able to start instances, allow them to persist, deploy an addition version of my flows and continue the previous instances.
The bottom line for me was that I had to make such deep changes to my workflow code, that some of the existing instances had to be thrown away as unusable. The moral, here, is to make sure you have a correct versioning strategy in place from the get-go or risk losing data later on.
Before closing I wanted to mention that I did see mention of folks being able to actually hack the persistence database records to recover from this situation. This is not for the faint of heart and was definitely not for me. You can find some instruction, out there, on how to do this but do it at your own risk.