[werkflow-user] Clustered workflow

Discussion:

Paul Russell

2004-02-25 23:45:25 UTC

Guys,

Firstly, a caveat: There's going to be a lot of teaching grandma to
suck eggs in this e-mail. I know you guys know most of this, but I'm
deliberately going from 'the top' because it helps me collate my
thoughts, and it also helps when I look back at this e-mail later and
can't remember why I said something!

Also, I'm still new to Werkflow, so I might have misunderstood how bits
of it work, feel free to pick me up on this!

Apologies for the length of this e-mail, it's a bit of a brain-dump.

========

In a separate mail, I've been talking about running business processes
across multiple nodes in a cluster with a view to providing high
availability and to a lesser extent load balancing.

I'm going to describe this by example, rather than in abstract terms.
I've put a diagram of a trivial business process on my website at...

Loading Image...

The blue boxes represent logical units of work, i.e. steps we want to
execute as a unit. The implication here is that each blue box executes
atomically. In general, we don't care where each step executes -- it'd
be nice if the load was evenly distributed across the cluster of
servers. More importantly, we don't want to lose an 'in flight'
business process if one of the nodes fails. We need to be sure that we
follow some rules to make sure we don't mess things up:

* Only execute each task once.
* Don't let the process instance/case state get out of sync with the
executing tasks.
* Processes must be forward recoverable if something fails.
* Processes must be able to undo their previous actions if something
terminal happens.

What do we need to make this work? Well, we need:

* Some way to persist the state of a running process at any time.
* Some way to resurrect the state of a running process at any time on
any node.
* Clear separation of activities. The activities mustn't 'run into'
each other.
* Some way to encapsulate 'what to do next' and place the encapsulated
command somewhere safe where it can be executed from anywhere, even if
the node that created it ceases to exist.
* Synchronize the command and the state so that we only commit the
state changes if we commit the command and vice verse.

Helpfully, Werkflow and J2EE provides most of the gubbins we need to do
this:

* Werkflow already has support for pluggable persistence providers, so
connecting to some kind of cluster wide persistent store shouldn't be a
problem.
* J2EE provides JMS which can be used with the Command Message pattern
to dispatch commands. Helpfully, JMS can be clustered, which means that
each message can only be consumed by one host.
* J2EE provides Message Driven Beans, which provide an easy way to
consume messages on queues, and more importantly provides automatic
thread pooling across a cluster, so it should automatically load
balance messages across all nodes in a cluster as well as providing
transparent failover.
* J2EE provides transparent (container managed) transaction support,
which means that we can rely on the container to make sure that we
either commit both the process state and the next command message or we
persist neither. (i.e. we make it look like the whole activity never
happened)

What's missing then?:
* Werkflow doesn't currently provide JMS support, and neither does it
provide the hooks to plug this support in.
* Werkflow doesn't provide a mechanism for 'undoing' changes that have
been committed already.

Neither of these things strike me as particularly complicated to
address. We should be able to perform a dependancy inversion to
abstract the current Concurrent-based scheduler out into a service,
just like the existing persistence and messaging services. The
compensation services ('undo') could be provided as an external module
that maintains a 'compensation list' of tasks which need to be executed
to undo the results of a process. It may be possible to inject these
tasks into the same scheduler that's running the rest of the workflow.

The structure I'm describing above looks a little like this, I think:

http://paulrussell.dyndns.org/images/work/workflow/werkflow-with-
clustering.png

Everything that supports clustering should be 'bolted on' around
Werkflow as services rather than being part of the core, otherwise
we'll penalize people who don't want silly things like clustering.

At runtime, everything would be controlled by the scheduler (which I
/think/ is pretty much what happens now); a process would start by the
core scheduling a task for the first activity. The scheduler would then
resurrect this 'task' whenever (and wherever) it is ready to do so. The
task would invoke the activity via the core. The last thing the core
would do is schedule the next activity. Control then returns to the
scheduler which commits the transaction and the whole process starts
over. This is illustrated in the following diagram.

http://paulrussell.dyndns.org/images/work/workflow/werkflow-simple-
sequence.png

There are a couple of neat things about this:

* If the activity fails for any reason (the database is down,
connectivity fails, some other kind of failure), then the fact that the
message came from a queue and was read within the scope of a
transaction comes to the rescue -- it's as if the message never left
the queue, it'll simply get picked up again later and retried.
* Equally, if for some reason the 'outgoing' transition can't be
scheduled, the transaction will roll back all the state changes /and/
the original message 'get' so it'll be like the whole unit of work was
never executed. It'll get retried later.
* If one of the nodes in a cluster gets shut down, any process running
on that box will transparently start running on another node. (subject
to in-doubt transactions, I guess. It's too late to remember exactly
how 2PC works right now ;)

What do you guys think about this? Am I talking cobblers? If you guys
agree with what I'm talking about here, my next steps would likely be
to start looking in detail at how easy it would be to perform the
dependancy inversion on the scheduler. I don't think this should be
/too/ painful, and would be a good way for me to finish learning the
innards of Werkflow!

Let me know what you think!

Paul

--
Paul Russell
***@paulrussell.org

peter royal

2004-02-26 00:54:18 UTC

Permalink

Post by Paul Russell
What do you guys think about this? Am I talking cobblers? If you guys
agree with what I'm talking about here, my next steps would likely be
to start looking in detail at how easy it would be to perform the
dependancy inversion on the scheduler. I don't think this should be
/too/ painful, and would be a good way for me to finish learning the
innards of Werkflow!

Your ideas are intriguing to me. I'm not looking to run in a clustered
environment, but the transactional support and full persistence are
both appealing to me.
-pete

Mark H. Wilkinson

2004-02-26 11:07:20 UTC

Permalink

Post by Paul Russell
Firstly, a caveat: There's going to be a lot of teaching grandma to
suck eggs in this e-mail.

No problem! I've had similar thoughts floating around in my head for a
bit, but I've never got around to putting them down in writing, so
having something to slot my thoughts into is very useful.

Post by Paul Russell
Helpfully, Werkflow and J2EE provides most of the gubbins we need to do
* Werkflow already has support for pluggable persistence providers, so
connecting to some kind of cluster wide persistent store shouldn't be a
problem.

I'd place a bit of a caveat over that. The persistence interface is
there, but I don't know whether it's currently sufficient to persist the
full state of a case. As far as I know, no-one's actually done that yet
(although many of us want to). Architecturally I think the approach is
more-or-less there, but the interface is probably more simple than it
should be. I think bob has described the persistence interface and the
change-set stuff as work in progress.

One major question that probably needs answering straight off: at the
moment the werkflow core works with state held in memory, and the
persistence interface seems more like an in-flight backup mechanism.
Process execution causes changes to the in-memory state, and then it's
flushed to disc through the persistence interface.

In an environment like that you're describing, one node can't know about
all the processes in existence. Similarly, even if you've only got one
node you probably won't have enough resources to keep all your process
state in memory, so the core can't rely on having all the process cases
in memory.

To me this suggests one of two things:
* The in-memory state held by the core should be considered a
cache of what is in the persistent store, so cache misses need
to go through the persistence API to check for cases that aren't
currently loaded. There's a little support for this in the core
at the moment (PersistenceManager has a getCorrelations() method
that should probably be correlating incoming messages against
cases that have been lost from the cache, but the method is
never called). This solution would have implications for
clustering - it would probably be more efficient if a process
case was handled by a node that had that case cached in memory.
* Alternatively, move the whole of the case state behind the
persistence API. Fleeting persistence then becomes a simple
'everything in memory' workflow manager for when you don't need
resilience, and the EJB/DB persistence module uses a database to
hold process state. Nothing is cached, which probably simplifies
things.
It's really a question of which module looks after process state. With
reference to your werkflow-with-clustering diagram, at the moment the
state is in the Werkflow Core box and the question is how much of that
should remain there and how much should move out to the implementations
of the Persistence Service.

I'm not sure which of the two approaches above would be simpler to
implement, and there may be other options I've not mentioned here. I'd
probably go with the first option to start with, on the basis that it's
more of a migratory path than the second option.

Post by Paul Russell
* J2EE provides JMS which can be used with the Command Message pattern
to dispatch commands. Helpfully, JMS can be clustered, which means that
each message can only be consumed by one host.
* J2EE provides Message Driven Beans, which provide an easy way to
consume messages on queues, and more importantly provides automatic
thread pooling across a cluster, so it should automatically load
balance messages across all nodes in a cluster as well as providing
transparent failover.

Is it worth thinking about the scheduler as two sub-modules: there's a
part that holds information about which processes are sleeping and what
they're sleeping for, and a second part which deals with how werkflow
gets woken up to do things. From your explanation of the IBM product it
sounds like they have a similar distinction - the scheduler state held
in the database and the JMS queue being effectively a way to wake engine
and get it to do things.

The reason I suggest this is I can think of a few different combinations
that might be useful:
* For the scheduler state we either have in memory (as we do now)
or persisted to disc (probably similar to the persistence
manager).
* For scheduling werkflow itself, we've got the current thread
pool mechanism, JMS MDBs and possibly the EJB Timer service.
Hmm; the scheduler state is kind-of a second persistence API. Perhaps we
should push that into the persistence service...

One question that crosses my mind about using JMS to schedule things. At
the moment werkflow doesn't do timed events, but there's prior art in
the petri-net world for transitions that fire a fixed amount of time
after they become enabled. Used in parallel with other transitions they
allow time-outs to be represented quite easily. Adding something like
this to werkflow would probably need the ability to wake a process up
either regularly, or at some fixed time in the future. Other than having
an external process feed a sequence of messages into a queue at a steady
rate, are there other ways to implement this kind of time-based
scheduling with JMS? Is this something the IBM product does?

Post by Paul Russell
* Werkflow doesn't provide a mechanism for 'undoing' changes that have
been committed already.

This isn't something I'd looked at in detail. I was under the vague
impression that BPEL and such like allowed you to specify compensating
actions as part of the process. Can these compensating action be treated
as normal parts of the process, or do they need to be collated based on
which actions have been performed (hence needing the mechanism you talk
about)?

Post by Paul Russell
What do you guys think about this? Am I talking cobblers?

You're definitely heading in the right direction, I think. If we can
sort out what we think needs doing I'll definitely be interested in
working on it.

-Mark.