spring-batch/build/reference-work/common-patterns.xml

<?xml version="1.0" encoding="UTF-8"?>
<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" xml:id="patterns"
    xmlns:xlink="http://www.w3.org/1999/xlink">

  <title>Common Batch Patterns</title>

  <para>Some batch jobs can be assembled purely from off-the-shelf components
  in Spring Batch. For instance the <classname>ItemReader</classname> and
  <classname>ItemWriter</classname> implementations can be configured to cover
  a wide range of scenarios. However, for the majority of cases, custom code
  will have to be written. The main API entry points for application
  developers are the <classname>Tasklet</classname>,
  <classname>ItemReader</classname>, <classname>ItemWriter</classname> and the
  various listener interfaces. Most simple batch jobs will be able to use
  off-the-shelf input from a Spring Batch <classname>ItemReader</classname>,
  but it is often the case that there are custom concerns in the processing
  and writing, which require developers to implement an
  <classname>ItemWriter</classname> or
  <classname>ItemProcessor</classname>.</para>

  <para>Here, we provide a few examples of common patterns in custom business
  logic. These examples primarily feature the listener interfaces. It should
  be noted that an <classname>ItemReader</classname> or
  <classname>ItemWriter</classname> can implement a listener interface as
  well, if appropriate.</para>

  <section id="loggingItemProcessingAndFailures">
    <title>Logging Item Processing and Failures</title>

    <para>A common use case is the need for special handling of errors in a
    step, item by item, perhaps logging to a special channel, or inserting a
    record into a database. A chunk-oriented <classname>Step</classname>
    (created from the step factory beans) allows users to implement this use
    case with a simple <classname>ItemReadListener</classname>, for errors on
    read, and an <classname>ItemWriteListener</classname>, for errors on
    write. The below code snippets illustrate a listener that logs both read
    and write failures:</para>

    <programlisting language="java">public class ItemFailureLoggerListener extends ItemListenerSupport {

    private static Log logger = LogFactory.getLog("item.error");

    public void onReadError(Exception ex) {
        logger.error("Encountered error on read", e);
    }

    public void onWriteError(Exception ex, Object item) {
        logger.error("Encountered error on write", ex);
    }

}</programlisting>

    <para>Having implemented this listener it must be registered with the
    step:</para>

    <programlisting language="xml">&lt;step id="simpleStep"&gt;
    ...
    &lt;listeners&gt;
        &lt;listener&gt;
            &lt;bean class="org.example...ItemFailureLoggerListener"/&gt;
        &lt;/listener&gt;
    &lt;/listeners&gt;
&lt;/step&gt;</programlisting>

    <para>Remember that if your listener does anything in an
    <code>onError()</code> method, it will be inside a transaction that is
    going to be rolled back. If you need to use a transactional resource such
    as a database inside an <code>onError()</code> method, consider adding a
    declarative transaction to that method (see Spring Core Reference Guide
    for details), and giving its propagation attribute the value
    REQUIRES_NEW.</para>
  </section>

  <section id="stoppingAJobManuallyForBusinessReasons">
    <title>Stopping a Job Manually for Business Reasons</title>

    <para>Spring Batch provides a <methodname>stop</methodname>() method
    through the <classname>JobLauncher</classname> interface, but this is
    really for use by the operator rather than the application programmer.
    Sometimes it is more convenient or makes more sense to stop a job
    execution from within the business logic.</para>

    <para>The simplest thing to do is to throw a
    <classname>RuntimeException</classname> (one that isn't retried
    indefinitely or skipped). For example, a custom exception type could be
    used, as in the example below:</para>

    <programlisting language="java">public class PoisonPillItemWriter implements ItemWriter&lt;T&gt; {

    public void write(T item) throws Exception {
        if (isPoisonPill(item)) {
            throw new PoisonPillException("Posion pill detected: " + item);
       }
    }

}</programlisting>

    <para>Another simple way to stop a step from executing is to simply return
    <code>null</code> from the <classname>ItemReader</classname>:</para>

    <programlisting language="java">public class EarlyCompletionItemReader implements ItemReader&lt;T&gt; {

    private ItemReader&lt;T&gt; delegate;

    public void setDelegate(ItemReader&lt;T&gt; delegate) { ... }

    public T read() throws Exception {
        T item = delegate.read();
        if (isEndItem(item)) {
            return null; // end the step here
        }
        return item;
    }

}</programlisting>

    <para>The previous example actually relies on the fact that there is a
    default implementation of the <classname>CompletionPolicy</classname>
    strategy which signals a complete batch when the item to be processed is
    null. A more sophisticated completion policy could be implemented and
    injected into the <classname>Step</classname> through the
    <classname>SimpleStepFactoryBean</classname>:</para>

    <programlisting language="xml">&lt;step id="simpleStep"&gt;
    &lt;tasklet&gt;
        &lt;chunk reader="reader" writer="writer" commit-interval="10"
               <emphasis role="bold">chunk-completion-policy="completionPolicy"</emphasis>/&gt;
    &lt;/tasklet&gt;
&lt;/step&gt;

&lt;bean id="completionPolicy" class="org.example...SpecialCompletionPolicy"/&gt;</programlisting>

    <para>An alternative is to set a flag in the
    <classname>StepExecution</classname>, which is checked by the
    <classname>Step</classname> implementations in the framework in between
    item processing. To implement this alternative, we need access to the
    current <classname>StepExecution</classname>, and this can be achieved by
    implementing a <classname>StepListener</classname> and registering it with
    the <classname>Step</classname>. Here is an example of a listener that
    sets the flag:</para>

    <programlisting language="java">public class CustomItemWriter extends ItemListenerSupport implements StepListener {

    private StepExecution stepExecution;

    public void beforeStep(StepExecution stepExecution) {
        this.stepExecution = stepExecution;
    }

    public void afterRead(Object item) {
        if (isPoisonPill(item)) {
            stepExecution.setTerminateOnly(true);
       }
    }

}</programlisting>

    <para>The default behavior here when the flag is set is for the step to
    throw a <classname>JobInterruptedException</classname>. This can be
    controlled through the <classname>StepInterruptionPolicy</classname>, but
    the only choice is to throw or not throw an exception, so this is always
    an abnormal ending to a job.</para>
  </section>

  <section id="addingAFooterRecord">
    <title>Adding a Footer Record</title>

    <para>Often when writing to flat files, a "footer" record must be appended
    to the end of the file, after all processing has be completed. This can
    also be achieved using the <classname>FlatFileFooterCallback</classname>
    interface provided by Spring Batch. The
    <classname>FlatFileFooterCallback</classname> (and its counterpart, the
    <classname>FlatFileHeaderCallback</classname>) are optional properties of
    the <classname>FlatFileItemWriter</classname>:</para>

    <programlisting language="xml">&lt;bean id="itemWriter" class="org.spr...FlatFileItemWriter"&gt;
    &lt;property name="resource" ref="outputResource" /&gt;
    &lt;property name="lineAggregator" ref="lineAggregator"/&gt;
    <emphasis role="bold">&lt;property name="headerCallback" ref="headerCallback" /&gt;</emphasis>
    <emphasis role="bold">&lt;property name="footerCallback" ref="footerCallback" /&gt;</emphasis>
&lt;/bean&gt;</programlisting>

    <para>The footer callback interface is very simple. It has just one method
    that is called when the footer must be written:</para>

    <programlisting language="java">public interface FlatFileFooterCallback {

    void writeFooter(Writer writer) throws IOException;

}</programlisting>

    <section id="writingASummaryFooter">
      <title>Writing a Summary Footer</title>

      <para>A very common requirement involving footer records is to aggregate
      information during the output process and to append this information to
      the end of the file. This footer serves as a summarization of the file
      or provides a checksum.</para>

      <para>For example, if a batch job is writing
      <classname>Trade</classname> records to a flat file, and there is a
      requirement that the total amount from all the
      <classname>Trade</classname>s is placed in a footer, then the following
      <classname>ItemWriter</classname> implementation can be used:</para>

      <programlisting language="java">public class TradeItemWriter implements ItemWriter&lt;Trade&gt;,
                                        FlatFileFooterCallback {

    private ItemWriter&lt;Trade&gt; delegate;

    private BigDecimal totalAmount = BigDecimal.ZERO;

    public void write(List&lt;? extends Trade&gt; items) {
        BigDecimal chunkTotal = BigDecimal.ZERO;
        for (Trade trade : items) {
            chunkTotal = chunkTotal.add(trade.getAmount());
        }

        delegate.write(items);

        // After successfully writing all items
        totalAmount = totalAmount.add(chunkTotal);
    }

    public void writeFooter(Writer writer) throws IOException {
        writer.write("Total Amount Processed: " + totalAmount);
    }

    public void setDelegate(ItemWriter delegate) {...}
}</programlisting>

      <para>This <classname>TradeItemWriter</classname> stores a
      <code>totalAmount</code> value that is increased with the
      <code>amount</code> from each <classname>Trade</classname> item written.
      After the last <classname>Trade</classname> is processed, the framework
      will call <methodname>writeFooter</methodname>, which will put that
      <code>totalAmount</code> into the file. Note that the
      <methodname>write</methodname> method makes use of a temporary variable,
      <varname>chunkTotalAmount</varname>, that stores the total of the trades
      in the chunk. This is done to ensure that if a skip occurs in the
      <methodname>write</methodname> method, that the
      <property>totalAmount</property> will be left unchanged. It is only at
      the end of the <methodname>write</methodname> method, once we are
      guaranteed that no exceptions will be thrown, that we update the
      <varname>totalAmount</varname>.</para>

      <para>In order for the <methodname>writeFooter</methodname> method to be
      called, the <classname>TradeItemWriter</classname> (which implements
      <classname>FlatFileFooterCallback</classname>) must be wired into the
      <classname>FlatFileItemWriter</classname> as the
      <code>footerCallback</code>:</para>

      <programlisting language="xml">&lt;bean id="tradeItemWriter" class="..TradeItemWriter"&gt;
    &lt;property name="delegate" ref="flatFileItemWriter" /&gt;
&lt;/bean&gt;

&lt;bean id="flatFileItemWriter" class="org.spr...FlatFileItemWriter"&gt;
   &lt;property name="resource" ref="outputResource" /&gt;
   &lt;property name="lineAggregator" ref="lineAggregator"/&gt;
  <emphasis role="bold"> &lt;property name="footerCallback" ref="tradeItemWriter" /&gt;</emphasis>
&lt;/bean&gt;</programlisting>

      <para>The way that the <classname>TradeItemWriter</classname> has been
      so far will only function correctly if the <classname>Step</classname>
      is not restartable. This is because the class is stateful (since it
      stores the <code>totalAmount</code>), but the <code>totalAmount</code>
      is not persisted to the database, and therefore, it cannot be retrieved
      in the event of a restart. In order to make this class restartable, the
      <classname>ItemStream</classname> interface should be implemented along
      with the methods <methodname>open</methodname> and
      <methodname>update</methodname>:</para>

      <programlisting language="java">public void open(ExecutionContext executionContext) {
    if (executionContext.containsKey("total.amount") {
        totalAmount = (BigDecimal) executionContext.get("total.amount");
    }
}

public void update(ExecutionContext executionContext) {
    executionContext.put("total.amount", totalAmount);
}</programlisting>

      <para>The <methodname>update</methodname> method will store the most
      current version of <code>totalAmount</code> to the
      <classname>ExecutionContext</classname> just before that object is
      persisted to the database. The <methodname>open</methodname> method will
      retrieve any existing <code>totalAmount</code> from the
      <classname>ExecutionContext</classname> and use it as the starting point
      for processing, allowing the <classname>TradeItemWriter</classname> to
      pick up on restart where it left off the previous time the
      <classname>Step</classname> was executed.</para>
    </section>
  </section>

  <section id="drivingQueryBasedItemReaders">
    <title>Driving Query Based ItemReaders</title>

    <para>In the chapter on readers and writers, database input using paging
    was discussed. Many database vendors, such as DB2, have extremely
    pessimistic locking strategies that can cause issues if the table being
    read also needs to be used by other portions of the online application.
    Furthermore, opening cursors over extremely large datasets can cause
    issues on certain vendors. Therefore, many projects prefer to use a
    'Driving Query' approach to reading in data. This approach works by
    iterating over keys, rather than the entire object that needs to be
    returned, as the following example illustrates:</para>

    <mediaobject>
      <imageobject role="html">
        <imagedata align="center" fileref="images/drivingQueryExample.png"
                   scale="80" />
      </imageobject>

      <imageobject role="fo">
        <imagedata align="center" fileref="images/drivingQueryExample.png"
                   scale="35" />
      </imageobject>
    </mediaobject>

    <para>As you can see, this example uses the same 'FOO' table as was used
    in the cursor based example. However, rather than selecting the entire
    row, only the ID's were selected in the SQL statement. So, rather than a
    FOO object being returned from <classname>read</classname>, an Integer
    will be returned. This number can then be used to query for the 'details',
    which is a complete Foo object:</para>

    <mediaobject>
      <imageobject role="html">
        <imagedata align="center" fileref="images/drivingQueryJob.png"
                   scale="85" />
      </imageobject>

      <imageobject role="fo">
        <imagedata align="center" fileref="images/drivingQueryJob.png"
                   scale="65" />
      </imageobject>
    </mediaobject>

    <para>An ItemProcessor should be used to transform the key obtained from
    the driving query into a full 'Foo' object. An existing DAO can be used to
    query for the full object based on the key.</para>
  </section>

  <section id="multiLineRecords">
    <title>Multi-Line Records</title>

    <para>While it is usually the case with flat files that one each record is
    confined to a single line, it is common that a file might have records
    spanning multiple lines with multiple formats. The following excerpt from
    a file illustrates this:</para>

    <programlisting>HEA;0013100345;2007-02-15
NCU;Smith;Peter;;T;20014539;F
BAD;;Oak Street 31/A;;Small Town;00235;IL;US
FOT;2;2;267.34</programlisting>

    <para>Everything between the line starting with 'HEA' and the line
    starting with 'FOT' is considered one record. There are a few
    considerations that must be made in order to handle this situation
    correctly:</para>

    <itemizedlist>
      <listitem>
        <para>Instead of reading one record at a time, the
        <classname>ItemReader</classname> must read every line of the
        multi-line record as a group, so that it can be passed to the
        <classname>ItemWriter</classname> intact.</para>
      </listitem>

      <listitem>
        <para>Each line type may need to be tokenized differently.</para>
      </listitem>
    </itemizedlist>

    <para>Because a single record spans multiple lines, and we may not know
    how many lines there are, the <classname>ItemReader</classname> must be
    careful to always read an entire record. In order to do this, a custom
    <classname>ItemReader</classname> should be implemented as a wrapper for
    the <classname>FlatFileItemReader</classname>.</para>

    <programlisting language="xml">&lt;bean id="itemReader" class="org.spr...MultiLineTradeItemReader"&gt;
    &lt;property name="delegate"&gt;
        &lt;bean class="org.springframework.batch.item.file.FlatFileItemReader"&gt;
            &lt;property name="resource" value="data/iosample/input/multiLine.txt" /&gt;
            &lt;property name="lineMapper"&gt;
                &lt;bean class="org.spr...DefaultLineMapper"&gt;
                    &lt;property name="lineTokenizer" ref="orderFileTokenizer"/&gt;
                    &lt;property name="fieldSetMapper"&gt;
                        &lt;bean class="org.spr...PassThroughFieldSetMapper" /&gt;
                    &lt;/property&gt;
                &lt;/bean&gt;
            &lt;/property&gt;
        &lt;/bean&gt;
    &lt;/property&gt;
&lt;/bean&gt;</programlisting>

    <para>To ensure that each line is tokenized properly, which is especially
    important for fixed length input, the
    <classname>PatternMatchingCompositeLineTokenizer</classname> can be used
    on the delegate <classname>FlatFileItemReader</classname>. See <xref
    linkend="prefixMatchingLineMapper" /> for more details. The delegate
    reader will then use a <classname>PassThroughFieldSetMapper</classname> to
    deliver a <classname>FieldSet</classname> for each line back to the
    wrapping <classname>ItemReader</classname>.</para>

    <programlisting language="xml">&lt;bean id="orderFileTokenizer" class="org.spr...PatternMatchingCompositeLineTokenizer"&gt;
    &lt;property name="tokenizers"&gt;
        &lt;map&gt;
            &lt;entry key="HEA*" value-ref="headerRecordTokenizer" /&gt;
            &lt;entry key="FOT*" value-ref="footerRecordTokenizer" /&gt;
            &lt;entry key="NCU*" value-ref="customerLineTokenizer" /&gt;
            &lt;entry key="BAD*" value-ref="billingAddressLineTokenizer" /&gt;
        &lt;/map&gt;
    &lt;/property&gt;
&lt;/bean&gt;</programlisting>

    <para>This wrapper will have to be able recognize the end of a record so
    that it can continually call <methodname>read()</methodname> on its
    delegate until the end is reached. For each line that is read, the wrapper
    should build up the item to be returned. Once the footer is reached, the
    item can be returned for delivery to the
    <classname>ItemProcessor</classname> and
    <classname>ItemWriter</classname>.</para>

    <programlisting language="java">private FlatFileItemReader&lt;FieldSet&gt; delegate;

public Trade read() throws Exception {
    Trade t = null;

    for (FieldSet line = null; (line = this.delegate.read()) != null;) {
        String prefix = line.readString(0);
        if (prefix.equals("HEA")) {
            t = new Trade(); // Record must start with header
        }
        else if (prefix.equals("NCU")) {
            Assert.notNull(t, "No header was found.");
            t.setLast(line.readString(1));
            t.setFirst(line.readString(2));
            ...
        }
        else if (prefix.equals("BAD")) {
            Assert.notNull(t, "No header was found.");
            t.setCity(line.readString(4));
            t.setState(line.readString(6));
          ...
        }
        else if (prefix.equals("FOT")) {
            return t; // Record must end with footer
        }
    }
    Assert.isNull(t, "No 'END' was found.");
    return null;
}</programlisting>
  </section>

  <section id="executingSystemCommands">
    <title>Executing System Commands</title>

    <para>Many batch jobs may require that an external command be called from
    within the batch job. Such a process could be kicked off separately by the
    scheduler, but the advantage of common meta-data about the run would be
    lost. Furthermore, a multi-step job would also need to be split up into
    multiple jobs as well.</para>

    <para>Because the need is so common, Spring Batch provides a
    <classname>Tasklet</classname> implementation for calling system
    commands:</para>

    <programlisting language="xml">&lt;bean class="org.springframework.batch.core.step.tasklet.SystemCommandTasklet"&gt;
    &lt;property name="command" value="echo hello" /&gt;
    &lt;!-- 5 second timeout for the command to complete --&gt;
    &lt;property name="timeout" value="5000" /&gt;
&lt;/bean&gt;</programlisting>
  </section>

  <section id="handlingStepCompletionWhenNoInputIsFound">
    <title>Handling Step Completion When No Input is Found</title>

    <para>In many batch scenarios, finding no rows in a database or file to
    process is not exceptional. The <classname>Step</classname> is simply
    considered to have found no work and completes with 0 items read. All of
    the <classname>ItemReader</classname> implementations provided out of the
    box in Spring Batch default to this approach. This can lead to some
    confusion if nothing is written out even when input is present. (which
    usually happens if a file was misnamed, etc) For this reason, the meta
    data itself should be inspected to determine how much work the framework
    found to be processed. However, what if finding no input is considered
    exceptional? In this case, programmatically checking the meta data for no
    items processed and causing failure is the best solution. Because this is
    a common use case, a listener is provided with just this
    functionality:</para>

    <programlisting language="java">public class NoWorkFoundStepExecutionListener extends StepExecutionListenerSupport {

    public ExitStatus afterStep(StepExecution stepExecution) {
        if (stepExecution.getReadCount() == 0) {
            return ExitStatus.FAILED;
        }
        return null;
    }

}</programlisting>

    <para>The above <classname>StepExecutionListener</classname> inspects the
    readCount property of the <classname>StepExecution</classname> during the
    'afterStep' phase to determine if no items were read. If that is the case,
    an exit code of FAILED is returned, indicating that the
    <classname>Step</classname> should fail. Otherwise, null is returned,
    which will not affect the status of the
    <classname>Step</classname>.</para>
  </section>

  <section id="passingDataToFutureSteps">
    <title>Passing Data to Future Steps</title>

    <para>It is often useful to pass information from one step to another.
    This can be done using the <classname>ExecutionContext</classname>. The
    catch is that there are two <classname>ExecutionContext</classname>s: one
    at the <classname>Step</classname> level and one at the
    <classname>Job</classname> level. The <classname>Step</classname>
    <classname>ExecutionContext</classname> lives only as long as the step
    while the <classname>Job</classname>
    <classname>ExecutionContext</classname> lives through the whole
    <classname>Job</classname>. On the other hand, the
    <classname>Step</classname> <classname>ExecutionContext</classname> is
    updated every time the <classname>Step</classname> commits a chunk while
    the <classname>Job</classname> <classname>ExecutionContext</classname> is
    updated only at the end of each <classname>Step</classname>.</para>

    <para>The consequence of this separation is that all data must be placed
    in the <classname>Step</classname> <classname>ExecutionContext</classname>
    while the <classname>Step</classname> is executing. This will ensure that
    the data will be stored properly while the <classname>Step</classname> is
    on-going. If data is stored to the <classname>Job</classname>
    <classname>ExecutionContext</classname>, then it will not be persisted
    during <classname>Step</classname> execution and if the
    <classname>Step</classname> fails, that data will be lost.</para>

    <programlisting language="java">public class SavingItemWriter implements ItemWriter&lt;Object&gt; {
    private StepExecution stepExecution;

    public void write(List&lt;? extends Object&gt; items) throws Exception {
        // ...

        ExecutionContext stepContext = this.stepExecution.getExecutionContext();
        stepContext.put("someKey", someObject);
    }

    @BeforeStep
    public void saveStepExecution(StepExecution stepExecution) {
        this.stepExecution = stepExecution;
    }
}</programlisting>

    <para>To make the data available to future <classname>Step</classname>s,
    it will have to be "promoted" to the <classname>Job</classname>
    <classname>ExecutionContext</classname> after the step has finished.
    Spring Batch provides the
    <classname>ExecutionContextPromotionListener</classname> for this purpose.
    The listener must be configured with the keys related to the data in the
    <classname>ExecutionContext</classname> that must be promoted. It can
    also, optionally, be configured with a list of exit code patterns for
    which the promotion should occur ("COMPLETED" is the default). As with all
    listeners, it must be registered on the
    <classname>Step</classname>.</para>

    <programlisting language="xml">&lt;job id="job1"&gt;
    &lt;step id="step1"&gt;
        &lt;tasklet&gt;
            &lt;chunk reader="reader" writer="savingWriter" commit-interval="10"/&gt;
        &lt;/tasklet&gt;
        &lt;listeners&gt;
            <emphasis role="bold">&lt;listener ref="promotionListener"/&gt;</emphasis>
        &lt;/listeners&gt;
    &lt;/step&gt;

    &lt;step id="step2"&gt;
       ...
    &lt;/step&gt;
&lt;/job&gt;

<emphasis role="bold">&lt;beans:bean id="promotionListener" class="org.spr....ExecutionContextPromotionListener"&gt;
    &lt;beans:property name="keys" value="someKey"/&gt;
&lt;/beans:bean&gt;</emphasis></programlisting>

    <para>Finally, the saved values must be retrieved from the
    <classname>Job</classname> <classname>ExeuctionContext</classname>:</para>

    <programlisting language="java">public class RetrievingItemWriter implements ItemWriter&lt;Object&gt; {
    private Object someObject;

    public void write(List&lt;? extends Object&gt; items) throws Exception {
        // ...
    }

    @BeforeStep
    public void retrieveInterstepData(StepExecution stepExecution) {
        JobExecution jobExecution = stepExecution.getJobExecution();
        ExecutionContext jobContext = jobExecution.getExecutionContext();
        this.someObject = jobContext.get("someKey");
    }
}</programlisting>
  </section>
</chapter>