spring-batch/build/reference-work/transaction-appendix.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE appendix PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
		"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
<appendix id="transactions">
	<title>Batch Processing and Transactions</title>

	<section id="transactionsNoRetry">
		<title>Simple Batching with No Retry</title>

		<para>Consider the following simple example of a nested batch with no
			retries.  This is a very common scenario for batch processing, where
			an input source is processed until exhausted, but we commit
			periodically at the end of a "chunk" of processing.</para>

		<programlisting>
1   |  REPEAT(until=exhausted) {
|
2   |    TX {
3   |      REPEAT(size=5) {
3.1 |        input;
3.2 |        output;
|      }
|    }
|
|  }
		</programlisting>

		<para>The input operation (3.1) could be a message-based receive
		(e.g. JMS), or a file-based read, but to recover and continue
		processing with a chance of completing the whole job, it must be
		transactional.  The same applies to the operation at (3.2) - it must
		be either transactional or idempotent.</para>

		<para>If the chunk at REPEAT(3) fails because of a database exception at
		(3.2), then TX(2) will roll back the whole chunk.</para>
	</section>

	<section id="transactionStatelessRetry">
		<title>Simple Stateless Retry</title>

		<para>It is also useful to use a retry for an operation which is not
			transactional, like a call to a web-service or other remote
			resource.  For example:</para>

		<programlisting>
0   |  TX {
1   |    input;
1.1 |    output;
2   |    RETRY {
2.1 |      remote access;
|    }
|  }
		</programlisting>

		<para>This is actually one of the most useful applications of a retry,
			since a remote call is much more likely to fail and be retryable
			than a database update.  As long as the remote access (2.1)
			eventually succeeds, the transaction TX(0) will commit.  If the
			remote access (2.1) eventually fails, then the transaction TX(0) is
			guaranteed to roll back.</para>
	</section>

	<section id="repeatRetry">
		<title>Typical Repeat-Retry Pattern</title>

		<para>The most typical batch processing pattern is to add a retry to the
			inner block of the chunk in the Simple Batching example.
			Consider this:</para>

		<programlisting>
1   |  REPEAT(until=exhausted, exception=not critical) {
|
2   |    TX {
3   |      REPEAT(size=5) {
|
4   |        RETRY(stateful, exception=deadlock loser) {
4.1 |          input;
5   |        } PROCESS {
5.1 |          output;
6   |        } SKIP and RECOVER {
|          notify;
|        }
|
|      }
|    }
|
|  }
		</programlisting>

		<para>The inner RETRY(4) block is marked as "stateful" - see the
			typical use case for a description of a stateful
			retry.  This means that if the the retry PROCESS(5) block fails, the
			behaviour of the RETRY(4) is as follows.</para>

		<itemizedlist>
			<listitem>
				<para>Throw an exception, rolling back the transaction TX(2) at the
					chunk level, and allowing the item to be re-presented to the input
					queue.</para>
			</listitem>
			<listitem>
				<para>When the item re-appears, it might be retried depending on the
					retry policy in place, executing PROCESS(5) again.  The second and
					subsequent attempts might fail again and rethrow the exception.</para>
			</listitem>
			<listitem>
				<para>Eventually the item re-appears for the final time: the retry
					policy disallows another attempt, so PROCESS(5) is never
					executed. In this case we follow a RECOVER(6) path, effectively
					"skipping" the item that was received and is being processed.</para>
			</listitem>
		</itemizedlist>

		<para>Notice that the notation used for the RETRY(4) in the plan above
			shows explictly that the the input step (4.1) is part of the retry.
			It also makes clear that there are two alternate paths for
			processing: the normal case is denoted by PROCESS(5), and the
			recovery path is a separate block, RECOVER(6).  The two alternate
			paths are completely distinct: only one is ever taken in normal
			circumstances.</para>

		<para>In special cases (e.g. a special <classname>TranscationValidException</classname>
			type), the retry policy might be able to determine that the
			RECOVER(6) path can be taken on the last attempt after PROCESS(5)
			has just failed, instead of waiting for the item to be re-presented.
			This is not the default behavior because it requires detailed
			knowledge of what has happened inside the PROCESS(5) block, which is
			not usually available - e.g. if the output included write
			access before the failure, then the exception should be rethrown to
			ensure transactional integrity.</para>

		<para>The completion policy in the outer, REPEAT(1) is crucial to the
			success of the above plan.  If the output(5.1) fails it may throw an
			exception (it usually does, as described), in which case the
			transaction TX(2) fails and the exception could propagate up through
			the outer batch REPEAT(1).  We do not want the whole batch to stop
			because the RETRY(4) might still be successful if we try again, so
			we add the exception=not critical to the outer REPEAT(1).</para>

		<para>Note, however, that if the TX(2) fails and we <emphasis>do</emphasis> try again, by
			virtue of the outer completion policy, the item that is next
			processed in the inner REPEAT(3) is not guaranteed to be the one
			that just failed.  It might well be, but it depends on the
			implementation of the input(4.1).  Thus the output(5.1) might fail
			again, on a new item, or on the old one.  The client of the batch
			should not assume that each RETRY(4) attempt is going to process the
			same items as the last one that failed.  E.g. if the termination
			policy for REPEAT(1) is to fail after 10 attempts, it will fail
			after 10 consecutive attempts, but not necessarily at the same item.
			This is consistent with the overall retry strategy: it is the inner
			RETRY(4) that is aware of the history of each item, and can decide
			whether or not to have another attempt at it.</para>
	</section>

	<section id="asyncChunkProcessing">
		<title>Asynchronous Chunk Processing</title>

		<para>The inner batches or chunks in the typical example
			above can be executed concurrently by configuring the outer batch to
			use an <classname>AsyncTaskExecutor</classname>.  The outer batch waits for all the
			chunks to complete before completing.</para>

		<programlisting>
1   |  REPEAT(until=exhausted, concurrent, exception=not critical) {
|
2   |    TX {
3   |      REPEAT(size=5) {
|
4   |        RETRY(stateful, exception=deadlock loser) {
4.1 |          input;
5   |        } PROCESS {
|          output;
6   |        } RECOVER {
|          recover;
|        }
|
|      }
|    }
|
|  }
		</programlisting>
	</section>

	<section id="asyncItemProcessing">
		<title>Asynchronous Item Processing</title>

		<para>The individual items in chunks in the typical
			can also in principle be processed concurrently.  In this case the
			transaction boundary has to move to the level of the individual
			item, so that each transaction is on a single thread:
		</para>

		<programlisting>
1   |  REPEAT(until=exhausted, exception=not critical) {
|
2   |    REPEAT(size=5, concurrent) {
|
3   |      TX {
4   |        RETRY(stateful, exception=deadlock loser) {
4.1 |          input;
5   |        } PROCESS {
|          output;
6   |        } RECOVER {
|          recover;
|        }
|      }
|
|    }
|
|  }
		</programlisting>

		<para>This plan sacrifices the optimisation benefit, that the simple plan
			had, of having all the transactional resources chunked together.  It
			is only useful if the cost of the processing (5) is much higher than
			the cost of transaction management (3).</para>
	</section>

	<section id="transactionPropagation">
		<title>Interactions Between Batching and Transaction Propagation</title>

		<para>There is a tighter coupling between batch-retry and TX management
			than we would ideally like.  In particular a stateless retry cannot
			be used to retry database operations with a transaction manager that
			doesn't support NESTED propagation.
		</para>

		<para>For a simple example using retry without repeat, consider this:</para>

		<programlisting>
1   |  TX {
|
1.1 |    input;
2.2 |    database access;
2   |    RETRY {
3   |      TX {
3.1 |        database access;
|      }
|    }
|
|  }
		</programlisting>

		<para>Again, and for the same reason, the inner transaction TX(3) can
			cause the outer transaction TX(1) to fail, even if the RETRY(2) is
			eventually successful.</para>

		<para>Unfortunately the same effect percolates from the retry block up to
			the surrounding repeat batch if there is one:</para>

		<programlisting>
1   |  TX {
|
2   |    REPEAT(size=5) {
2.1 |      input;
2.2 |      database access;
3   |      RETRY {
4   |        TX {
4.1 |          database access;
|        }
|      }
|    }
|
|  }
		</programlisting>

		<para>Now if TX(3) rolls back it can pollute the whole batch at TX(1) and
			force it to roll back at the end.</para>

		<para>What about non-default propagation?</para>

		<itemizedlist>
			<listitem>
				<para>In the last example PROPAGATION_REQUIRES_NEW at TX(3) will
					prevent the outer TX(1) from being polluted if both transactions
					are eventually successful.  But if TX(3) commits and TX(1) rolls
					back, then TX(3) stays committed, so we violate the transaction
					contract for TX(1).  If TX(3) rolls back, TX(1) does not necessarily (but it probably
					will in practice because the retry will throw a roll back
					exception).</para>
			</listitem>
			<listitem>
				<para>PROPAGATION_NESTED at TX(3) works as we require in the retry
					case (and for a batch with skips): TX(3) can commit, but
					subsequently be rolled back by the outer transaction TX(1).  If
					TX(3) rolls back, again TX(1) will roll back in practice.  This
					option is only available on some platforms, e.g. not Hibernate or
					JTA, but it is the only one that works consistently.</para>
			</listitem>
		</itemizedlist>

		<para>So NESTED is best if the retry block contains any database access.</para>
	</section>

	<section id="specialTransactionOrthonogonal">
		<title>Special Case: Transactions with Orthogonal Resources</title>

		<para>Default propagation is always OK for simple cases where there are no
			nested database transactions.  Consider this (where the SESSION and
			TX are not global XA resources, so their resources are orthogonal):
		</para>

		<programlisting>
0   |  SESSION {
1   |    input;
2   |    RETRY {
3   |      TX {
3.1 |        database access;
|      }
|    }
|  }
		</programlisting>

		<para>Here there is a transactional message SESSION(0), but it doesn't
			participate in other transactions with
			<classname>PlatformTransactionManager</classname>, so doesn't propagate when TX(3)
			starts.  There is no database access outside the RETRY(2) block. If
			TX(3) fails and then eventually succeeds on a retry, SESSION(0) can
			commit (it can do this independent of a TX block).  This is similar
			to the vanilla "best-efforts-one-phase-commit" scenario - the worst
			that can happen is a duplicate message when the RETRY(2) succeeds
			and the SESSION(0) cannot commit, e.g. because the message system is
			unavailable.</para>
	</section>

	<section id="statelessRetryCannotRecover">
		<title>Stateless Retry Cannot Recover</title>

		<para>The distinction between a stateless and a stateful retry in the
			typical example above is important.  It is actually
			ultimately a transactional constraint that forces the distinction,
			and this constraint also makes it obvious why the distinction
			exists.
		</para>

		<para>We start with the observation that there is no way to skip an item
			that failed and successfully commit the rest of the chunk unless we
			wrap the item processing in a transaction.  So we simplify the
			typical batch execution plan to look like this:</para>

		<programlisting>
0   |  REPEAT(until=exhausted) {
|
1   |    TX {
2   |      REPEAT(size=5) {
|
3   |        RETRY(stateless) {
4   |          TX {
4.1 |            input;
4.2 |            database access;
|          }
5   |        } RECOVER {
5.1 |          skip;
|        }
|
|      }
|    }
|
|  }
		</programlisting>

		<para>Here we have a stateless RETRY(3) with a RECOVER(5) path that kicks
			in after the final attempt fails.  The "stateless" label just means
			that the block will be repeated without rethrowing any exception up
			to some limit.  This will only work if the transaction TX(4) has
			propagation NESTED.</para>

		<para>If the TX(3) has default propagation properties and it rolls back,
			it will pollute the outer TX(1). The inner transaction is assumed by
			the transaction manager to have corrupted the transactional
			resource, and so it cannot be used again.</para>

		<para>Support for NESTED propagation is sufficiently rare that we choose
			not to support recovery with stateless retries in current versions of
			Spring Batch.  The same effect can always be achieved (at the
			expense of repeating more processing) using the
			typical pattern above.</para>
	</section>
</appendix>