spring-batch/build/reference-work/domain.xml

<?xml version="1.0" encoding="UTF-8"?>
<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" xml:id="domain"
    xmlns:xlink="http://www.w3.org/1999/xlink" xreflabel="Batch Domain Language">
  <title>The Domain Language of Batch</title>

  <para>To any experienced batch architect, the overall concepts of batch
  processing used in Spring Batch should be familiar and comfortable. There
  are "Jobs" and "Steps" and developer supplied processing units called
  ItemReaders and ItemWriters. However, because of the Spring patterns,
  operations, templates, callbacks, and idioms, there are opportunities for
  the following:<itemizedlist>
      <listitem>
        <para>significant improvement in adherence to a clear separation of
        concerns</para>
      </listitem>

      <listitem>
        <para>clearly delineated architectural layers and services provided as
        interfaces</para>
      </listitem>

      <listitem>
        <para>simple and default implementations that allow for quick adoption
        and ease of use out-of-the-box</para>
      </listitem>

      <listitem>
        <para>significantly enhanced extensibility</para>
      </listitem>
    </itemizedlist></para>

  <para>The diagram below is simplified version of the batch reference
  architecture that has been used for decades. It provides an overview of the
  components that make up the domain language of batch processing. This
  architecture framework is a blueprint that has been proven through decades
  of implementations on the last several generations of platforms
  (COBOL/Mainframe, C++/Unix, and now Java/anywhere). JCL and COBOL developers
  are likely to be as comfortable with the concepts as C++, C# and Java
  developers. Spring Batch provides a physical implementation of the layers,
  components and technical services commonly found in robust, maintainable
  systems used to address the creation of simple to complex batch
  applications, with the infrastructure and extensions to address very complex
  processing needs.</para>

  <mediaobject>
    <imageobject role="html">
      <imagedata align="center"
                 fileref="images/spring-batch-reference-model.png"
                 format="PNG" scale="100" />
    </imageobject>

    <imageobject role="fo">
      <imagedata align="center"
                 fileref="images/spring-batch-reference-model.png"
                 format="PNG" scale="55" />
    </imageobject>

    <caption><para>Figure 2.1: Batch Stereotypes</para></caption>
  </mediaobject>

  <para>The diagram above highlights the key concepts that make up the domain
  language of batch. A Job has one to many steps, which has exactly one
  ItemReader, ItemProcessor, and ItemWriter. A job needs to be launched
  (JobLauncher), and meta data about the currently running process needs to be
  stored (JobRepository).</para>

  <section id="domainJob">
    <title id="jobStereotypes">Job</title>

    <para>This section describes stereotypes relating to the concept of a
    batch job. A <classname>Job</classname> is an entity that encapsulates an
    entire batch process. As is common with other Spring projects, a
    <classname>Job</classname> will be wired together via an XML configuration
    file or Java based configuration. This configuration may be referred to as
    the "job configuration". However, <classname>Job</classname> is just the
    top of an overall hierarchy:</para>

    <mediaobject>
      <imageobject role="html">
        <imagedata align="center" fileref="images/job-heirarchy.png"
                   scale="100" />
      </imageobject>

      <imageobject role="fo">
        <imagedata align="center" fileref="images/job-heirarchy.png"
                   scale="60" />
      </imageobject>
    </mediaobject>

    <para>In Spring Batch, a Job is simply a container for Steps. It combines
    multiple steps that belong logically together in a flow and allows for
    configuration of properties global to all steps, such as restartability.
    The job configuration contains:</para>

    <itemizedlist>
      <listitem>
        <para>The simple name of the job</para>
      </listitem>

      <listitem>
        <para>Definition and ordering of Steps</para>
      </listitem>

      <listitem>
        <para>Whether or not the job is restartable</para>
      </listitem>
    </itemizedlist>

    <para>A default simple implementation of the <classname>Job</classname>
    interface is provided by Spring Batch in the form of the
    <classname>SimpleJob</classname> class which creates some standard
    functionality on top of <classname>Job</classname>, however the batch
    namespace abstracts away the need to instantiate it directly. Instead, the
    <code>&lt;job&gt;</code> tag can be used:</para>

    <programlisting language="xml">&lt;job id="footballJob"&gt;
    &lt;step id="playerload" next="gameLoad"/&gt;
    &lt;step id="gameLoad" next="playerSummarization"/&gt;
    &lt;step id="playerSummarization"/&gt;
&lt;/job&gt;</programlisting>

    <section id="domainJobInstance">
      <title id="s.2.1.2">JobInstance</title>

      <para>A <classname>JobInstance</classname> refers to the concept of a
      logical job run. Let's consider a batch job that should be run once at
      the end of the day, such as the 'EndOfDay' job from the diagram above.
      There is one 'EndOfDay' <classname>Job</classname>, but each individual
      run of the <classname>Job</classname> must be tracked separately. In the
      case of this job, there will be one logical
      <classname>JobInstance</classname> per day. For example, there will be a
      January 1st run, and a January 2nd run. If the January 1st run fails the
      first time and is run again the next day, it is still the January 1st
      run. (Usually this corresponds with the data it is processing as well,
      meaning the January 1st run processes data for January 1st, etc).
      Therefore, each <classname>JobInstance</classname> can have multiple
      executions (<classname>JobExecution</classname> is discussed in more
      detail below) and only one <classname>JobInstance</classname>
      corresponding to a particular <classname>Job</classname> and
      identifying <classname>JobParameter</classname>s can be running at a given
      time.</para>

      <para>The definition of a <classname>JobInstance</classname> has
      absolutely no bearing on the data the will be loaded. It is entirely up
      to the <classname>ItemReader</classname> implementation used to
      determine how data will be loaded. For example, in the EndOfDay
      scenario, there may be a column on the data that indicates the
      'effective date' or 'schedule date' to which the data belongs. So, the
      January 1st run would only load data from the 1st, and the January 2nd
      run would only use data from the 2nd. Because this determination will
      likely be a business decision, it is left up to the
      <classname>ItemReader</classname> to decide. What using the same
      <classname>JobInstance</classname> will determine, however, is whether
      or not the 'state' (i.e. the <classname>ExecutionContext</classname>,
      which is discussed below) from previous executions will be used. Using a
      new <classname>JobInstance</classname> will mean 'start from the
      beginning' and using an existing instance will generally mean 'start
      from where you left off'.</para>
    </section>

    <section id="domainJobParameters">
      <title id="s.2.1.3">JobParameters</title>

      <para>Having discussed <classname>JobInstance</classname> and how it
      differs from <classname>Job</classname>, the natural question to ask is:
      "how is one <classname>JobInstance</classname> distinguished from
      another?" The answer is: <classname>JobParameters</classname>.
      <classname>JobParameters</classname> is a set of parameters used to
      start a batch job. They can be used for identification or even as
      reference data during the run:</para>

      <para><mediaobject>
          <imageobject role="html">
            <imagedata align="center"
                       fileref="images/job-stereotypes-parameters.png"
                       scale="100" />
          </imageobject>

          <imageobject role="fo">
            <imagedata align="center"
                       fileref="images/job-stereotypes-parameters.png"
                       scale="60" />
          </imageobject>
        </mediaobject></para>

      <para>In the example above, where there are two instances, one for
      January 1st, and another for January 2nd, there is really only one Job,
      one that was started with a job parameter of 01-01-2008 and another that
      was started with a parameter of 01-02-2008. Thus, the contract can be
      defined as: <classname>JobInstance</classname> =
      <classname>Job</classname> + identifying <classname>JobParameters</classname>. This
      allows a developer to effectively control how a
      <classname>JobInstance</classname> is defined, since they control what
      parameters are passed in.</para>
    </section>
	<note>
		<para>Not all job parameters are required to contribute to the identification
		of a <classname>JobInstance</classname>.  By default they do, however the framework
		allows the submission of a <classname>Job</classname> with parameters that do
		not contribute to the identity of a <classname>JobInstance</classname> as well.</para>
	</note>

    <section id="domainJobExecution">
      <title id="jobExecution">JobExecution</title>

      <para>A <classname>JobExecution</classname> refers to the technical
      concept of a single attempt to run a <classname>Job</classname>. An
      execution may end in failure or success, but the
      <classname>JobInstance</classname> corresponding to a given execution
      will not be considered complete unless the execution completes
      successfully. Using the EndOfDay <classname>Job</classname> described
      above as an example, consider a <classname>JobInstance</classname> for
      01-01-2008 that failed the first time it was run. If it is run again
      with the same identifying job parameters as the first run (01-01-2008), a new
      <classname>JobExecution</classname> will be created. However, there will
      still be only one <classname>JobInstance</classname>.</para>

      <para>A <classname>Job</classname> defines what a job is and how it is
      to be executed, and <classname>JobInstance</classname> is a purely
      organizational object to group executions together, primarily to enable
      correct restart semantics. A <classname>JobExecution</classname>,
      however, is the primary storage mechanism for what actually happened
      during a run, and as such contains many more properties that must be
      controlled and persisted:</para>

      <table>
        <title>JobExecution Properties</title>

        <tgroup cols="2">
          <colspec colname="c1" colwidth="*" />

          <colspec colname="c2" colwidth="4*" />

          <tbody>
            <row>
              <entry>status</entry>

              <entry>A <classname>BatchStatus</classname> object that
              indicates the status of the execution. While running, it's
              BatchStatus.STARTED, if it fails, it's BatchStatus.FAILED, and
              if it finishes successfully, it's BatchStatus.COMPLETED</entry>
            </row>

            <row>
              <entry>startTime</entry>

              <entry>A <classname>java.util.Date</classname> representing the
              current system time when the execution was started.</entry>
            </row>

            <row>
              <entry>endTime</entry>

              <entry>A <classname>java.util.Date</classname> representing the
              current system time when the execution finished, regardless of
              whether or not it was successful.</entry>
            </row>

            <row>
              <entry>exitStatus</entry>

              <entry>The <classname>ExitStatus</classname> indicating the
              result of the run. It is most important because it contains an
              exit code that will be returned to the caller. See chapter 5 for
              more details.</entry>
            </row>

            <row>
              <entry>createTime</entry>

              <entry>A <classname>java.util.Date</classname> representing the
              current system time when the <classname>JobExecution</classname>
              was first persisted. The job may not have been started yet (and
              thus has no start time), but it will always have a createTime,
              which is required by the framework for managing job level
              <classname>ExecutionContext</classname>s.</entry>
            </row>

            <row>
              <entry>lastUpdated</entry>

              <entry>A <classname>java.util.Date</classname> representing the
              last time a <classname>JobExecution</classname> was
              persisted.</entry>
            </row>

            <row>
              <entry>executionContext</entry>

              <entry>The 'property bag' containing any user data that needs to
              be persisted between executions.</entry>
            </row>

            <row>
              <entry>failureExceptions</entry>

              <entry>The list of exceptions encountered during the execution
              of a <classname>Job</classname>. These can be useful if more
              than one exception is encountered during the failure of a
              <classname>Job</classname>.</entry>
            </row>
          </tbody>
        </tgroup>
      </table>

      <para>These properties are important because they will be persisted and
      can be used to completely determine the status of an execution. For
      example, if the EndOfDay job for 01-01 is executed at 9:00 PM, and fails
      at 9:30, the following entries will be made in the batch meta data
      tables:</para>

      <table>
        <title>BATCH_JOB_INSTANCE</title>

        <tgroup cols="2">
          <tbody>
            <row>
              <entry>JOB_INST_ID</entry>

              <entry>JOB_NAME</entry>
            </row>

            <row>
              <entry>1</entry>

              <entry>EndOfDayJob</entry>
            </row>
          </tbody>
        </tgroup>
      </table>

      <table>
        <title>BATCH_JOB_EXECUTION_PARAMS</title>

        <tgroup cols="5">
          <tbody>
            <row>
              <entry>JOB_EXECUTION_ID</entry>

              <entry>TYPE_CD</entry>

              <entry>KEY_NAME</entry>

              <entry>DATE_VAL</entry>

              <entry>IDENTIFYING</entry>
            </row>

            <row>
              <entry>1</entry>

              <entry>DATE</entry>

              <entry>schedule.Date</entry>

              <entry>2008-01-01</entry>

              <entry>TRUE</entry>
            </row>
          </tbody>
        </tgroup>
      </table>

      <table>
        <title>BATCH_JOB_EXECUTION</title>

        <tgroup cols="5">
          <tbody>
            <row>
              <entry>JOB_EXEC_ID</entry>

              <entry>JOB_INST_ID</entry>

              <entry>START_TIME</entry>

              <entry>END_TIME</entry>

              <entry>STATUS</entry>
            </row>

            <row>
              <entry>1</entry>

              <entry>1</entry>

              <entry>2008-01-01 21:00</entry>

              <entry>2008-01-01 21:30</entry>

              <entry>FAILED</entry>
            </row>
          </tbody>
        </tgroup>
      </table>

      <note>
        <para>column names may have been abbreviated or removed for clarity
        and formatting</para>
      </note>

      <para>Now that the job has failed, let's assume that it took the entire
      course of the night for the problem to be determined, so that the 'batch
      window' is now closed. Assuming the window starts at 9:00 PM, the job
      will be kicked off again for 01-01, starting where it left off and
      completing successfully at 9:30. Because it's now the next day, the
      01-02 job must be run as well, which is kicked off just afterwards at
      9:31, and completes in its normal one hour time at 10:30. There is no
      requirement that one <classname>JobInstance</classname> be kicked off
      after another, unless there is potential for the two jobs to attempt to
      access the same data, causing issues with locking at the database level.
      It is entirely up to the scheduler to determine when a
      <classname>Job</classname> should be run. Since they're separate
      <classname>JobInstance</classname>s, Spring Batch will make no attempt
      to stop them from being run concurrently. (Attempting to run the same
      <classname>JobInstance</classname> while another is already running will
      result in a <classname>JobExecutionAlreadyRunningException</classname>
      being thrown). There should now be an extra entry in both the
      <classname>JobInstance</classname> and
      <classname>JobParameters</classname> tables, and two extra entries in
      the <classname>JobExecution</classname> table:</para>

      <table>
        <title>BATCH_JOB_INSTANCE</title>

        <tgroup cols="2">
          <tbody>
            <row>
              <entry>JOB_INST_ID</entry>

              <entry>JOB_NAME</entry>
            </row>

            <row>
              <entry>1</entry>

              <entry>EndOfDayJob</entry>
            </row>

            <row>
              <entry>2</entry>

              <entry>EndOfDayJob</entry>
            </row>
          </tbody>
        </tgroup>
      </table>

      <table>
        <title>BATCH_JOB_EXECUTION_PARAMS</title>

        <tgroup cols="5">
          <tbody>
            <row>
              <entry>JOB_EXECUTION_ID</entry>

              <entry>TYPE_CD</entry>

              <entry>KEY_NAME</entry>

              <entry>DATE_VAL</entry>

              <entry>IDENTIFYING</entry>
            </row>

            <row>
              <entry>1</entry>

              <entry>DATE</entry>

              <entry>schedule.Date</entry>

              <entry>2008-01-01 00:00:00</entry>

              <entry>TRUE</entry>
            </row>

            <row>
              <entry>2</entry>

              <entry>DATE</entry>

              <entry>schedule.Date</entry>

              <entry>2008-01-01 00:00:00</entry>

              <entry>TRUE</entry>
            </row>

            <row>
              <entry>3</entry>

              <entry>DATE</entry>

              <entry>schedule.Date</entry>

              <entry>2008-01-02 00:00:00</entry>

              <entry>TRUE</entry>
            </row>
          </tbody>
        </tgroup>
      </table>

      <table>
        <title>BATCH_JOB_EXECUTION</title>

        <tgroup cols="5">
          <tbody>
            <row>
              <entry>JOB_EXEC_ID</entry>

              <entry>JOB_INST_ID</entry>

              <entry>START_TIME</entry>

              <entry>END_TIME</entry>

              <entry>STATUS</entry>
            </row>

            <row>
              <entry>1</entry>

              <entry>1</entry>

              <entry>2008-01-01 21:00</entry>

              <entry>2008-01-01 21:30</entry>

              <entry>FAILED</entry>
            </row>

            <row>
              <entry>2</entry>

              <entry>1</entry>

              <entry>2008-01-02 21:00</entry>

              <entry>2008-01-02 21:30</entry>

              <entry>COMPLETED</entry>
            </row>

            <row>
              <entry>3</entry>

              <entry>2</entry>

              <entry>2008-01-02 21:31</entry>

              <entry>2008-01-02 22:29</entry>

              <entry>COMPLETED</entry>
            </row>
          </tbody>
        </tgroup>
      </table>

      <note>
        <para>column names may have been abbreviated or removed for clarity
        and formatting</para>
      </note>
    </section>
  </section>

  <section id="domainStep">
    <title id="s.2.1">Step</title>

    <para>A <classname>Step</classname> is a domain object that encapsulates
    an independent, sequential phase of a batch job. Therefore, every
    <classname>Job</classname> is composed entirely of one or more steps. A
    <classname>Step</classname> contains all of the information necessary to
    define and control the actual batch processing. This is a necessarily
    vague description because the contents of any given
    <classname>Step</classname> are at the discretion of the developer writing
    a <classname>Job</classname>. A Step can be as simple or complex as the
    developer desires. A simple <classname>Step</classname> might load data
    from a file into the database, requiring little or no code. (depending
    upon the implementations used) A more complex <classname>Step</classname>
    may have complicated business rules that are applied as part of the
    processing. As with <classname>Job</classname>, a
    <classname>Step</classname> has an individual
    <classname>StepExecution</classname> that corresponds with a unique
    <classname>JobExecution</classname>:</para>

    <mediaobject>
      <imageobject role="html">
        <imagedata align="center" fileref="images/jobHeirarchyWithSteps.png"
                   scale="80" />
      </imageobject>

      <imageobject role="fo">
        <imagedata align="center" fileref="images/jobHeirarchyWithSteps.png"
                   scale="60" />
      </imageobject>
    </mediaobject>

    <section id="domainStepExecution">
      <title id="stepExecution">StepExecution</title>

      <para>A <classname>StepExecution</classname> represents a single attempt
      to execute a <classname>Step</classname>. A new
      <classname>StepExecution</classname> will be created each time a
      <classname>Step</classname> is run, similar to
      <classname>JobExecution</classname>. However, if a step fails to execute
      because the step before it fails, there will be no execution persisted
      for it. A <classname>StepExecution</classname> will only be created when
      its <classname>Step</classname> is actually started.</para>

      <para>Step executions are represented by objects of the
      <classname>StepExecution</classname> class. Each execution contains a
      reference to its corresponding step and
      <classname>JobExecution</classname>, and transaction related data such
      as commit and rollback count and start and end times. Additionally, each
      step execution will contain an <classname>ExecutionContext</classname>,
      which contains any data a developer needs persisted across batch runs,
      such as statistics or state information needed to restart. The following
      is a listing of the properties for
      <classname>StepExecution</classname>:</para>

      <table>
        <title>StepExecution Properties</title>

        <tgroup cols="2">
          <colspec colname="c1" colwidth="*" />

          <colspec colname="c2" colwidth="4*" />

          <tbody>
            <row>
              <entry>status</entry>

              <entry>A <classname>BatchStatus</classname> object that
              indicates the status of the execution. While it's running, the
              status is BatchStatus.STARTED, if it fails, the status is
              BatchStatus.FAILED, and if it finishes successfully, the status
              is BatchStatus.COMPLETED</entry>
            </row>

            <row>
              <entry>startTime</entry>

              <entry>A <classname>java.util.Date</classname> representing the
              current system time when the execution was started.</entry>
            </row>

            <row>
              <entry>endTime</entry>

              <entry>A <classname>java.util.Date</classname> representing the
              current system time when the execution finished, regardless of
              whether or not it was successful.</entry>
            </row>

            <row>
              <entry>exitStatus</entry>

              <entry>The <classname>ExitStatus</classname> indicating the
              result of the execution. It is most important because it
              contains an exit code that will be returned to the caller. See
              chapter 5 for more details.</entry>
            </row>

            <row>
              <entry>executionContext</entry>

              <entry>The 'property bag' containing any user data that needs to
              be persisted between executions.</entry>
            </row>

            <row>
              <entry>readCount</entry>

              <entry>The number of items that have been successfully
              read</entry>
            </row>

            <row>
              <entry>writeCount</entry>

              <entry>The number of items that have been successfully
              written</entry>
            </row>

            <row>
              <entry>commitCount</entry>

              <entry>The number transactions that have been committed for this
              execution</entry>
            </row>

            <row>
              <entry>rollbackCount</entry>

              <entry>The number of times the business transaction controlled
              by the <classname>Step</classname> has been rolled back.</entry>
            </row>

            <row>
              <entry>readSkipCount</entry>

              <entry>The number of times <methodname>read</methodname> has
              failed, resulting in a skipped item.</entry>
            </row>

            <row>
              <entry>processSkipCount</entry>

              <entry>The number of times <methodname>process</methodname> has
              failed, resulting in a skipped item.</entry>
            </row>

            <row>
              <entry>filterCount</entry>

              <entry>The number of items that have been 'filtered' by the
              <classname>ItemProcessor</classname>.</entry>
            </row>

            <row>
              <entry>writeSkipCount</entry>

              <entry>The number of times <methodname>write</methodname> has
              failed, resulting in a skipped item.</entry>
            </row>
          </tbody>
        </tgroup>
      </table>
    </section>
  </section>

  <section id="domainExecutionContext">
    <title>ExecutionContext</title>

    <para>An <classname>ExecutionContext</classname> represents a collection
    of key/value pairs that are persisted and controlled by the framework in
    order to allow developers a place to store persistent state that is scoped
    to a <classname>StepExecution</classname> or
    <classname>JobExecution</classname>. For those familiar with Quartz, it is
    very similar to <classname>JobDataMap</classname>. The best usage example
    is to facilitate restart. Using flat file input as an example, while
    processing individual lines, the framework periodically persists the
    <classname>ExecutionContext</classname> at commit points. This allows the
    <classname>ItemReader</classname> to store its state in case a fatal error
    occurs during the run, or even if the power goes out. All that is needed
    is to put the current number of lines read into the context, and the
    framework will do the rest:</para>

    <programlisting language="java">executionContext.putLong(getKey(LINES_READ_COUNT), reader.getPosition());</programlisting>

    <para>Using the EndOfDay example from the Job Stereotypes section as an
    example, assume there's one step: 'loadData', that loads a file into the
    database. After the first failed run, the meta data tables would look like
    the following:</para>

    <para><table>
        <title>BATCH_JOB_INSTANCE</title>

        <tgroup cols="2">
          <tbody>
            <row>
              <entry>JOB_INST_ID</entry>

              <entry>JOB_NAME</entry>
            </row>

            <row>
              <entry>1</entry>

              <entry>EndOfDayJob</entry>
            </row>
          </tbody>
        </tgroup>
      </table><table>
        <title>BATCH_JOB_PARAMS</title>

        <tgroup cols="4">
          <tbody>
            <row>
              <entry>JOB_INST_ID</entry>

              <entry>TYPE_CD</entry>

              <entry>KEY_NAME</entry>

              <entry>DATE_VAL</entry>
            </row>

            <row>
              <entry>1</entry>

              <entry>DATE</entry>

              <entry>schedule.Date</entry>

              <entry>2008-01-01</entry>
            </row>
          </tbody>
        </tgroup>
      </table><table>
        <title>BATCH_JOB_EXECUTION</title>

        <tgroup cols="5">
          <tbody>
            <row>
              <entry>JOB_EXEC_ID</entry>

              <entry>JOB_INST_ID</entry>

              <entry>START_TIME</entry>

              <entry>END_TIME</entry>

              <entry>STATUS</entry>
            </row>

            <row>
              <entry>1</entry>

              <entry>1</entry>

              <entry>2008-01-01 21:00</entry>

              <entry>2008-01-01 21:30</entry>

              <entry>FAILED</entry>
            </row>
          </tbody>
        </tgroup>
      </table><table>
        <title>BATCH_STEP_EXECUTION</title>

        <tgroup cols="6">
          <tbody>
            <row>
              <entry>STEP_EXEC_ID</entry>

              <entry>JOB_EXEC_ID</entry>

              <entry>STEP_NAME</entry>

              <entry>START_TIME</entry>

              <entry>END_TIME</entry>

              <entry>STATUS</entry>
            </row>

            <row>
              <entry>1</entry>

              <entry>1</entry>

              <entry>loadData</entry>

              <entry>2008-01-01 21:00</entry>

              <entry>2008-01-01 21:30</entry>

              <entry>FAILED</entry>
            </row>
          </tbody>
        </tgroup>
      </table><table>
        <title>BATCH_STEP_EXECUTION_CONTEXT</title>

        <tgroup cols="2">
          <tbody>
            <row>
              <entry>STEP_EXEC_ID</entry>

              <entry>SHORT_CONTEXT</entry>
            </row>

            <row>
              <entry>1</entry>

              <entry>{piece.count=40321}</entry>
            </row>
          </tbody>
        </tgroup>
      </table>In this case, the <classname>Step</classname> ran for 30 minutes
    and processed 40,321 'pieces', which would represent lines in a file in
    this scenario. This value will be updated just before each commit by the
    framework, and can contain multiple rows corresponding to entries within
    the <classname>ExecutionContext</classname>. Being notified before a
    commit requires one of the various <classname>StepListener</classname>s,
    or an <classname>ItemStream</classname>, which are discussed in more
    detail later in this guide. As with the previous example, it is assumed
    that the <classname>Job</classname> is restarted the next day. When it is
    restarted, the values from the <classname>ExecutionContext</classname> of
    the last run are reconstituted from the database, and when the
    <classname>ItemReader</classname> is opened, it can check to see if it has
    any stored state in the context, and initialize itself from there:</para>

    <programlisting language="java">if (executionContext.containsKey(getKey(LINES_READ_COUNT))) {
    log.debug("Initializing for restart. Restart data is: " + executionContext);

    long lineCount = executionContext.getLong(getKey(LINES_READ_COUNT));

    LineReader reader = getReader();

    Object record = "";
    while (reader.getPosition() &lt; lineCount &amp;&amp; record != null) {
        record = readLine();
    }
}</programlisting>

    <para>In this case, after the above code is executed, the current line
    will be 40,322, allowing the <classname>Step</classname> to start again
    from where it left off. The <classname>ExecutionContext</classname> can
    also be used for statistics that need to be persisted about the run
    itself. For example, if a flat file contains orders for processing that
    exist across multiple lines, it may be necessary to store how many orders
    have been processed (which is much different from than the number of lines
    read) so that an email can be sent at the end of the
    <classname>Step</classname> with the total orders processed in the body.
    The framework handles storing this for the developer, in order to
    correctly scope it with an individual <classname>JobInstance</classname>.
    It can be very difficult to know whether an existing
    <classname>ExecutionContext</classname> should be used or not. For
    example, using the 'EndOfDay' example from above, when the 01-01 run
    starts again for the second time, the framework recognizes that it is the
    same <classname>JobInstance</classname> and on an individual
    <classname>Step</classname> basis, pulls the
    <classname>ExecutionContext</classname> out of the database and hands it
    as part of the <classname>StepExecution</classname> to the
    <classname>Step</classname> itself. Conversely, for the 01-02 run the
    framework recognizes that it is a different instance, so an empty context
    must be handed to the <classname>Step</classname>. There are many of these
    types of determinations that the framework makes for the developer to
    ensure the state is given to them at the correct time. It is also
    important to note that exactly one <classname>ExecutionContext</classname>
    exists per <classname>StepExecution</classname> at any given time. Clients
    of the <classname>ExecutionContext</classname> should be careful because
    this creates a shared keyspace, so care should be taken when putting
    values in to ensure no data is overwritten. However, the
    <classname>Step</classname> stores absolutely no data in the context, so
    there is no way to adversely affect the framework.</para>

    <para>It is also important to note that there is at least one
    <classname>ExecutionContext</classname> per
    <classname>JobExecution</classname>, and one for every
    <classname>StepExecution</classname>. For example, consider the following
    code snippet:</para>

    <programlisting language="java">ExecutionContext ecStep = stepExecution.getExecutionContext();
ExecutionContext ecJob = jobExecution.getExecutionContext();
//ecStep does not equal ecJob</programlisting>

    <para>As noted in the comment, ecStep will not equal ecJob; they are two
    different <classname>ExecutionContext</classname>s. The one scoped to the
    <classname>Step</classname> will be saved at every commit point in the
    <classname>Step</classname>, whereas the one scoped to the
    <classname>Job</classname> will be saved in between every
    <classname>Step</classname> execution.</para>
  </section>

  <section id="domainJobRepository">
    <title>JobRepository</title>

    <para><classname>JobRepository</classname> is the persistence mechanism
    for all of the Stereotypes mentioned above. It provides CRUD operations
    for <classname>JobLauncher</classname>, <classname>Job</classname>, and
    <classname>Step</classname> implementations. When a
    <classname>Job</classname> is first launched, a
    <classname>JobExecution</classname> is obtained from the repository, and
    during the course of execution <classname>StepExecution</classname> and
    <classname>JobExecution</classname> implementations are persisted by
    passing them to the repository:</para>

    <programlisting language="xml">&lt;job-repository id="jobRepository"/&gt;</programlisting>
  </section>

  <section id="domainJobLauncher">
    <title>JobLauncher</title>

    <para><classname>JobLauncher </classname>represents a simple interface for
    launching a <classname>Job</classname> with a given set of
    <classname>JobParameters</classname>:</para>

    <programlisting language="java">public interface JobLauncher {

    public JobExecution run(Job job, JobParameters jobParameters)
                throws JobExecutionAlreadyRunningException, JobRestartException;
}</programlisting>

    <para>It is expected that implementations will obtain a valid
    <classname>JobExecution</classname> from the
    <classname>JobRepository</classname> and execute the
    <classname>Job</classname>.</para>
  </section>

  <section id="domainItemReader">
    <title id="s.5.1.1">Item Reader</title>

    <para><classname>ItemReader</classname> is an abstraction that represents
    the retrieval of input for a <classname>Step</classname>, one item at a
    time. When the <classname>ItemReader</classname> has exhausted the items
    it can provide, it will indicate this by returning null. More details
    about the <classname>ItemReader</classname> interface and its various
    implementations can be found in <xref
    linkend="readersAndWriters" />.</para>
  </section>

  <section id="domainItemWriter">
    <title id="s.5.1.2">Item Writer</title>

    <para><classname>ItemWriter</classname> is an abstraction that
    represents the output of a <classname>Step</classname>, one batch
    or chunk of items at a time.  Generally, an item writer has no
    knowledge of the input it will receive next, only the item that
    was passed in its current invocation. More details about the
    <classname>ItemWriter</classname> interface and its various
    implementations can be found in <xref linkend="readersAndWriters"
    />.</para>
  </section>

  <section id="domainItemProcessor">
    <title>Item Processor</title>

    <para><classname>ItemProcessor</classname> is an abstraction that
    represents the business processing of an item. While the
    <classname>ItemReader</classname> reads one item, and the
    <classname>ItemWriter</classname> writes them, the
    <classname>ItemProcessor</classname> provides access to transform or apply
    other business processing. If, while processing the item, it is determined
    that the item is not valid, returning null indicates that the item should
    not be written out. More details about the ItemProcessor interface can be
    found in <xref linkend="readersAndWriters" />.</para>
  </section>

  <section id="domainBatchNamespace">
    <title>Batch Namespace</title>

    <para>Many of the domain concepts listed above need to be configured in a
    Spring <classname>ApplicationContext</classname>. While there are
    implementations of the interfaces above that can be used in a standard
    bean definition, a namespace has been provided for ease of
    configuration:</para>

    <programlisting language="xml">&lt;beans:beans xmlns="<emphasis role="bold">http://www.springframework.org/schema/batch</emphasis>"
     xmlns:beans="http://www.springframework.org/schema/beans"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="
           http://www.springframework.org/schema/beans
           http://www.springframework.org/schema/beans/spring-beans.xsd
           <emphasis role="bold">http://www.springframework.org/schema/batch
           http://www.springframework.org/schema/batch/spring-batch-2.2.xsd</emphasis>"&gt;

    &lt;job id="ioSampleJob"&gt;
        &lt;step id="step1"&gt;
            &lt;tasklet&gt;
                &lt;chunk reader="itemReader" writer="itemWriter" commit-interval="2"/&gt;
            &lt;/tasklet&gt;
        &lt;/step&gt;
    &lt;/job&gt;

&lt;/beans:beans&gt;</programlisting>

    <para>As long as the batch namespace has been declared, any of its
    elements can be used. More information on configuring a
    <classname>Job</classname> can be found in <xref
    linkend="configureJob" />. More information on configuring a Step can be
    found in <xref linkend="configureStep" />.</para>
  </section>
</chapter>