379 lines
13 KiB
XML
379 lines
13 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE appendix PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
|
|
<appendix id="transactions">
|
|
<title>Batch Processing and Transactions</title>
|
|
|
|
<section id="transactionsNoRetry">
|
|
<title>Simple Batching with No Retry</title>
|
|
|
|
<para>Consider the following simple example of a nested batch with no
|
|
retries. This is a very common scenario for batch processing, where
|
|
an input source is processed until exhausted, but we commit
|
|
periodically at the end of a "chunk" of processing.</para>
|
|
|
|
<programlisting>
|
|
1 | REPEAT(until=exhausted) {
|
|
|
|
|
2 | TX {
|
|
3 | REPEAT(size=5) {
|
|
3.1 | input;
|
|
3.2 | output;
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
</programlisting>
|
|
|
|
<para>The input operation (3.1) could be a message-based receive
|
|
(e.g. JMS), or a file-based read, but to recover and continue
|
|
processing with a chance of completing the whole job, it must be
|
|
transactional. The same applies to the operation at (3.2) - it must
|
|
be either transactional or idempotent.</para>
|
|
|
|
<para>If the chunk at REPEAT(3) fails because of a database exception at
|
|
(3.2), then TX(2) will roll back the whole chunk.</para>
|
|
</section>
|
|
|
|
<section id="transactionStatelessRetry">
|
|
<title>Simple Stateless Retry</title>
|
|
|
|
<para>It is also useful to use a retry for an operation which is not
|
|
transactional, like a call to a web-service or other remote
|
|
resource. For example:</para>
|
|
|
|
<programlisting>
|
|
0 | TX {
|
|
1 | input;
|
|
1.1 | output;
|
|
2 | RETRY {
|
|
2.1 | remote access;
|
|
| }
|
|
| }
|
|
</programlisting>
|
|
|
|
<para>This is actually one of the most useful applications of a retry,
|
|
since a remote call is much more likely to fail and be retryable
|
|
than a database update. As long as the remote access (2.1)
|
|
eventually succeeds, the transaction TX(0) will commit. If the
|
|
remote access (2.1) eventually fails, then the transaction TX(0) is
|
|
guaranteed to roll back.</para>
|
|
</section>
|
|
|
|
<section id="repeatRetry">
|
|
<title>Typical Repeat-Retry Pattern</title>
|
|
|
|
<para>The most typical batch processing pattern is to add a retry to the
|
|
inner block of the chunk in the Simple Batching example.
|
|
Consider this:</para>
|
|
|
|
<programlisting>
|
|
1 | REPEAT(until=exhausted, exception=not critical) {
|
|
|
|
|
2 | TX {
|
|
3 | REPEAT(size=5) {
|
|
|
|
|
4 | RETRY(stateful, exception=deadlock loser) {
|
|
4.1 | input;
|
|
5 | } PROCESS {
|
|
5.1 | output;
|
|
6 | } SKIP and RECOVER {
|
|
| notify;
|
|
| }
|
|
|
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
</programlisting>
|
|
|
|
<para>The inner RETRY(4) block is marked as "stateful" - see the
|
|
typical use case for a description of a stateful
|
|
retry. This means that if the the retry PROCESS(5) block fails, the
|
|
behaviour of the RETRY(4) is as follows.</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Throw an exception, rolling back the transaction TX(2) at the
|
|
chunk level, and allowing the item to be re-presented to the input
|
|
queue.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>When the item re-appears, it might be retried depending on the
|
|
retry policy in place, executing PROCESS(5) again. The second and
|
|
subsequent attempts might fail again and rethrow the exception.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Eventually the item re-appears for the final time: the retry
|
|
policy disallows another attempt, so PROCESS(5) is never
|
|
executed. In this case we follow a RECOVER(6) path, effectively
|
|
"skipping" the item that was received and is being processed.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Notice that the notation used for the RETRY(4) in the plan above
|
|
shows explictly that the the input step (4.1) is part of the retry.
|
|
It also makes clear that there are two alternate paths for
|
|
processing: the normal case is denoted by PROCESS(5), and the
|
|
recovery path is a separate block, RECOVER(6). The two alternate
|
|
paths are completely distinct: only one is ever taken in normal
|
|
circumstances.</para>
|
|
|
|
<para>In special cases (e.g. a special <classname>TranscationValidException</classname>
|
|
type), the retry policy might be able to determine that the
|
|
RECOVER(6) path can be taken on the last attempt after PROCESS(5)
|
|
has just failed, instead of waiting for the item to be re-presented.
|
|
This is not the default behavior because it requires detailed
|
|
knowledge of what has happened inside the PROCESS(5) block, which is
|
|
not usually available - e.g. if the output included write
|
|
access before the failure, then the exception should be rethrown to
|
|
ensure transactional integrity.</para>
|
|
|
|
<para>The completion policy in the outer, REPEAT(1) is crucial to the
|
|
success of the above plan. If the output(5.1) fails it may throw an
|
|
exception (it usually does, as described), in which case the
|
|
transaction TX(2) fails and the exception could propagate up through
|
|
the outer batch REPEAT(1). We do not want the whole batch to stop
|
|
because the RETRY(4) might still be successful if we try again, so
|
|
we add the exception=not critical to the outer REPEAT(1).</para>
|
|
|
|
<para>Note, however, that if the TX(2) fails and we <emphasis>do</emphasis> try again, by
|
|
virtue of the outer completion policy, the item that is next
|
|
processed in the inner REPEAT(3) is not guaranteed to be the one
|
|
that just failed. It might well be, but it depends on the
|
|
implementation of the input(4.1). Thus the output(5.1) might fail
|
|
again, on a new item, or on the old one. The client of the batch
|
|
should not assume that each RETRY(4) attempt is going to process the
|
|
same items as the last one that failed. E.g. if the termination
|
|
policy for REPEAT(1) is to fail after 10 attempts, it will fail
|
|
after 10 consecutive attempts, but not necessarily at the same item.
|
|
This is consistent with the overall retry strategy: it is the inner
|
|
RETRY(4) that is aware of the history of each item, and can decide
|
|
whether or not to have another attempt at it.</para>
|
|
</section>
|
|
|
|
<section id="asyncChunkProcessing">
|
|
<title>Asynchronous Chunk Processing</title>
|
|
|
|
<para>The inner batches or chunks in the typical example
|
|
above can be executed concurrently by configuring the outer batch to
|
|
use an <classname>AsyncTaskExecutor</classname>. The outer batch waits for all the
|
|
chunks to complete before completing.</para>
|
|
|
|
<programlisting>
|
|
1 | REPEAT(until=exhausted, concurrent, exception=not critical) {
|
|
|
|
|
2 | TX {
|
|
3 | REPEAT(size=5) {
|
|
|
|
|
4 | RETRY(stateful, exception=deadlock loser) {
|
|
4.1 | input;
|
|
5 | } PROCESS {
|
|
| output;
|
|
6 | } RECOVER {
|
|
| recover;
|
|
| }
|
|
|
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
</programlisting>
|
|
</section>
|
|
|
|
<section id="asyncItemProcessing">
|
|
<title>Asynchronous Item Processing</title>
|
|
|
|
<para>The individual items in chunks in the typical
|
|
can also in principle be processed concurrently. In this case the
|
|
transaction boundary has to move to the level of the individual
|
|
item, so that each transaction is on a single thread:
|
|
</para>
|
|
|
|
<programlisting>
|
|
1 | REPEAT(until=exhausted, exception=not critical) {
|
|
|
|
|
2 | REPEAT(size=5, concurrent) {
|
|
|
|
|
3 | TX {
|
|
4 | RETRY(stateful, exception=deadlock loser) {
|
|
4.1 | input;
|
|
5 | } PROCESS {
|
|
| output;
|
|
6 | } RECOVER {
|
|
| recover;
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
|
|
|
| }
|
|
</programlisting>
|
|
|
|
<para>This plan sacrifices the optimisation benefit, that the simple plan
|
|
had, of having all the transactional resources chunked together. It
|
|
is only useful if the cost of the processing (5) is much higher than
|
|
the cost of transaction management (3).</para>
|
|
</section>
|
|
|
|
<section id="transactionPropagation">
|
|
<title>Interactions Between Batching and Transaction Propagation</title>
|
|
|
|
<para>There is a tighter coupling between batch-retry and TX management
|
|
than we would ideally like. In particular a stateless retry cannot
|
|
be used to retry database operations with a transaction manager that
|
|
doesn't support NESTED propagation.
|
|
</para>
|
|
|
|
<para>For a simple example using retry without repeat, consider this:</para>
|
|
|
|
<programlisting>
|
|
1 | TX {
|
|
|
|
|
1.1 | input;
|
|
2.2 | database access;
|
|
2 | RETRY {
|
|
3 | TX {
|
|
3.1 | database access;
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
</programlisting>
|
|
|
|
<para>Again, and for the same reason, the inner transaction TX(3) can
|
|
cause the outer transaction TX(1) to fail, even if the RETRY(2) is
|
|
eventually successful.</para>
|
|
|
|
<para>Unfortunately the same effect percolates from the retry block up to
|
|
the surrounding repeat batch if there is one:</para>
|
|
|
|
<programlisting>
|
|
1 | TX {
|
|
|
|
|
2 | REPEAT(size=5) {
|
|
2.1 | input;
|
|
2.2 | database access;
|
|
3 | RETRY {
|
|
4 | TX {
|
|
4.1 | database access;
|
|
| }
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
</programlisting>
|
|
|
|
<para>Now if TX(3) rolls back it can pollute the whole batch at TX(1) and
|
|
force it to roll back at the end.</para>
|
|
|
|
<para>What about non-default propagation?</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>In the last example PROPAGATION_REQUIRES_NEW at TX(3) will
|
|
prevent the outer TX(1) from being polluted if both transactions
|
|
are eventually successful. But if TX(3) commits and TX(1) rolls
|
|
back, then TX(3) stays committed, so we violate the transaction
|
|
contract for TX(1). If TX(3) rolls back, TX(1) does not necessarily (but it probably
|
|
will in practice because the retry will throw a roll back
|
|
exception).</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>PROPAGATION_NESTED at TX(3) works as we require in the retry
|
|
case (and for a batch with skips): TX(3) can commit, but
|
|
subsequently be rolled back by the outer transaction TX(1). If
|
|
TX(3) rolls back, again TX(1) will roll back in practice. This
|
|
option is only available on some platforms, e.g. not Hibernate or
|
|
JTA, but it is the only one that works consistently.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>So NESTED is best if the retry block contains any database access.</para>
|
|
</section>
|
|
|
|
<section id="specialTransactionOrthonogonal">
|
|
<title>Special Case: Transactions with Orthogonal Resources</title>
|
|
|
|
<para>Default propagation is always OK for simple cases where there are no
|
|
nested database transactions. Consider this (where the SESSION and
|
|
TX are not global XA resources, so their resources are orthogonal):
|
|
</para>
|
|
|
|
<programlisting>
|
|
0 | SESSION {
|
|
1 | input;
|
|
2 | RETRY {
|
|
3 | TX {
|
|
3.1 | database access;
|
|
| }
|
|
| }
|
|
| }
|
|
</programlisting>
|
|
|
|
<para>Here there is a transactional message SESSION(0), but it doesn't
|
|
participate in other transactions with
|
|
<classname>PlatformTransactionManager</classname>, so doesn't propagate when TX(3)
|
|
starts. There is no database access outside the RETRY(2) block. If
|
|
TX(3) fails and then eventually succeeds on a retry, SESSION(0) can
|
|
commit (it can do this independent of a TX block). This is similar
|
|
to the vanilla "best-efforts-one-phase-commit" scenario - the worst
|
|
that can happen is a duplicate message when the RETRY(2) succeeds
|
|
and the SESSION(0) cannot commit, e.g. because the message system is
|
|
unavailable.</para>
|
|
</section>
|
|
|
|
<section id="statelessRetryCannotRecover">
|
|
<title>Stateless Retry Cannot Recover</title>
|
|
|
|
<para>The distinction between a stateless and a stateful retry in the
|
|
typical example above is important. It is actually
|
|
ultimately a transactional constraint that forces the distinction,
|
|
and this constraint also makes it obvious why the distinction
|
|
exists.
|
|
</para>
|
|
|
|
<para>We start with the observation that there is no way to skip an item
|
|
that failed and successfully commit the rest of the chunk unless we
|
|
wrap the item processing in a transaction. So we simplify the
|
|
typical batch execution plan to look like this:</para>
|
|
|
|
<programlisting>
|
|
0 | REPEAT(until=exhausted) {
|
|
|
|
|
1 | TX {
|
|
2 | REPEAT(size=5) {
|
|
|
|
|
3 | RETRY(stateless) {
|
|
4 | TX {
|
|
4.1 | input;
|
|
4.2 | database access;
|
|
| }
|
|
5 | } RECOVER {
|
|
5.1 | skip;
|
|
| }
|
|
|
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
</programlisting>
|
|
|
|
<para>Here we have a stateless RETRY(3) with a RECOVER(5) path that kicks
|
|
in after the final attempt fails. The "stateless" label just means
|
|
that the block will be repeated without rethrowing any exception up
|
|
to some limit. This will only work if the transaction TX(4) has
|
|
propagation NESTED.</para>
|
|
|
|
<para>If the TX(3) has default propagation properties and it rolls back,
|
|
it will pollute the outer TX(1). The inner transaction is assumed by
|
|
the transaction manager to have corrupted the transactional
|
|
resource, and so it cannot be used again.</para>
|
|
|
|
<para>Support for NESTED propagation is sufficiently rare that we choose
|
|
not to support recovery with stateless retries in current versions of
|
|
Spring Batch. The same effect can always be achieved (at the
|
|
expense of repeating more processing) using the
|
|
typical pattern above.</para>
|
|
</section>
|
|
</appendix> |