Files
spring-batch/build/reference-work/domain.xml
Michael Minella 75ab909314 update
2017-03-23 10:18:33 -05:00

1069 lines
39 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" xml:id="domain"
xmlns:xlink="http://www.w3.org/1999/xlink" xreflabel="Batch Domain Language">
<title>The Domain Language of Batch</title>
<para>To any experienced batch architect, the overall concepts of batch
processing used in Spring Batch should be familiar and comfortable. There
are "Jobs" and "Steps" and developer supplied processing units called
ItemReaders and ItemWriters. However, because of the Spring patterns,
operations, templates, callbacks, and idioms, there are opportunities for
the following:<itemizedlist>
<listitem>
<para>significant improvement in adherence to a clear separation of
concerns</para>
</listitem>
<listitem>
<para>clearly delineated architectural layers and services provided as
interfaces</para>
</listitem>
<listitem>
<para>simple and default implementations that allow for quick adoption
and ease of use out-of-the-box</para>
</listitem>
<listitem>
<para>significantly enhanced extensibility</para>
</listitem>
</itemizedlist></para>
<para>The diagram below is simplified version of the batch reference
architecture that has been used for decades. It provides an overview of the
components that make up the domain language of batch processing. This
architecture framework is a blueprint that has been proven through decades
of implementations on the last several generations of platforms
(COBOL/Mainframe, C++/Unix, and now Java/anywhere). JCL and COBOL developers
are likely to be as comfortable with the concepts as C++, C# and Java
developers. Spring Batch provides a physical implementation of the layers,
components and technical services commonly found in robust, maintainable
systems used to address the creation of simple to complex batch
applications, with the infrastructure and extensions to address very complex
processing needs.</para>
<mediaobject>
<imageobject role="html">
<imagedata align="center"
fileref="images/spring-batch-reference-model.png"
format="PNG" scale="100" />
</imageobject>
<imageobject role="fo">
<imagedata align="center"
fileref="images/spring-batch-reference-model.png"
format="PNG" scale="55" />
</imageobject>
<caption><para>Figure 2.1: Batch Stereotypes</para></caption>
</mediaobject>
<para>The diagram above highlights the key concepts that make up the domain
language of batch. A Job has one to many steps, which has exactly one
ItemReader, ItemProcessor, and ItemWriter. A job needs to be launched
(JobLauncher), and meta data about the currently running process needs to be
stored (JobRepository).</para>
<section id="domainJob">
<title id="jobStereotypes">Job</title>
<para>This section describes stereotypes relating to the concept of a
batch job. A <classname>Job</classname> is an entity that encapsulates an
entire batch process. As is common with other Spring projects, a
<classname>Job</classname> will be wired together via an XML configuration
file or Java based configuration. This configuration may be referred to as
the "job configuration". However, <classname>Job</classname> is just the
top of an overall hierarchy:</para>
<mediaobject>
<imageobject role="html">
<imagedata align="center" fileref="images/job-heirarchy.png"
scale="100" />
</imageobject>
<imageobject role="fo">
<imagedata align="center" fileref="images/job-heirarchy.png"
scale="60" />
</imageobject>
</mediaobject>
<para>In Spring Batch, a Job is simply a container for Steps. It combines
multiple steps that belong logically together in a flow and allows for
configuration of properties global to all steps, such as restartability.
The job configuration contains:</para>
<itemizedlist>
<listitem>
<para>The simple name of the job</para>
</listitem>
<listitem>
<para>Definition and ordering of Steps</para>
</listitem>
<listitem>
<para>Whether or not the job is restartable</para>
</listitem>
</itemizedlist>
<para>A default simple implementation of the <classname>Job</classname>
interface is provided by Spring Batch in the form of the
<classname>SimpleJob</classname> class which creates some standard
functionality on top of <classname>Job</classname>, however the batch
namespace abstracts away the need to instantiate it directly. Instead, the
<code>&lt;job&gt;</code> tag can be used:</para>
<programlisting language="xml">&lt;job id="footballJob"&gt;
&lt;step id="playerload" next="gameLoad"/&gt;
&lt;step id="gameLoad" next="playerSummarization"/&gt;
&lt;step id="playerSummarization"/&gt;
&lt;/job&gt;</programlisting>
<section id="domainJobInstance">
<title id="s.2.1.2">JobInstance</title>
<para>A <classname>JobInstance</classname> refers to the concept of a
logical job run. Let's consider a batch job that should be run once at
the end of the day, such as the 'EndOfDay' job from the diagram above.
There is one 'EndOfDay' <classname>Job</classname>, but each individual
run of the <classname>Job</classname> must be tracked separately. In the
case of this job, there will be one logical
<classname>JobInstance</classname> per day. For example, there will be a
January 1st run, and a January 2nd run. If the January 1st run fails the
first time and is run again the next day, it is still the January 1st
run. (Usually this corresponds with the data it is processing as well,
meaning the January 1st run processes data for January 1st, etc).
Therefore, each <classname>JobInstance</classname> can have multiple
executions (<classname>JobExecution</classname> is discussed in more
detail below) and only one <classname>JobInstance</classname>
corresponding to a particular <classname>Job</classname> and
identifying <classname>JobParameter</classname>s can be running at a given
time.</para>
<para>The definition of a <classname>JobInstance</classname> has
absolutely no bearing on the data the will be loaded. It is entirely up
to the <classname>ItemReader</classname> implementation used to
determine how data will be loaded. For example, in the EndOfDay
scenario, there may be a column on the data that indicates the
'effective date' or 'schedule date' to which the data belongs. So, the
January 1st run would only load data from the 1st, and the January 2nd
run would only use data from the 2nd. Because this determination will
likely be a business decision, it is left up to the
<classname>ItemReader</classname> to decide. What using the same
<classname>JobInstance</classname> will determine, however, is whether
or not the 'state' (i.e. the <classname>ExecutionContext</classname>,
which is discussed below) from previous executions will be used. Using a
new <classname>JobInstance</classname> will mean 'start from the
beginning' and using an existing instance will generally mean 'start
from where you left off'.</para>
</section>
<section id="domainJobParameters">
<title id="s.2.1.3">JobParameters</title>
<para>Having discussed <classname>JobInstance</classname> and how it
differs from <classname>Job</classname>, the natural question to ask is:
"how is one <classname>JobInstance</classname> distinguished from
another?" The answer is: <classname>JobParameters</classname>.
<classname>JobParameters</classname> is a set of parameters used to
start a batch job. They can be used for identification or even as
reference data during the run:</para>
<para><mediaobject>
<imageobject role="html">
<imagedata align="center"
fileref="images/job-stereotypes-parameters.png"
scale="100" />
</imageobject>
<imageobject role="fo">
<imagedata align="center"
fileref="images/job-stereotypes-parameters.png"
scale="60" />
</imageobject>
</mediaobject></para>
<para>In the example above, where there are two instances, one for
January 1st, and another for January 2nd, there is really only one Job,
one that was started with a job parameter of 01-01-2008 and another that
was started with a parameter of 01-02-2008. Thus, the contract can be
defined as: <classname>JobInstance</classname> =
<classname>Job</classname> + identifying <classname>JobParameters</classname>. This
allows a developer to effectively control how a
<classname>JobInstance</classname> is defined, since they control what
parameters are passed in.</para>
</section>
<note>
<para>Not all job parameters are required to contribute to the identification
of a <classname>JobInstance</classname>. By default they do, however the framework
allows the submission of a <classname>Job</classname> with parameters that do
not contribute to the identity of a <classname>JobInstance</classname> as well.</para>
</note>
<section id="domainJobExecution">
<title id="jobExecution">JobExecution</title>
<para>A <classname>JobExecution</classname> refers to the technical
concept of a single attempt to run a <classname>Job</classname>. An
execution may end in failure or success, but the
<classname>JobInstance</classname> corresponding to a given execution
will not be considered complete unless the execution completes
successfully. Using the EndOfDay <classname>Job</classname> described
above as an example, consider a <classname>JobInstance</classname> for
01-01-2008 that failed the first time it was run. If it is run again
with the same identifying job parameters as the first run (01-01-2008), a new
<classname>JobExecution</classname> will be created. However, there will
still be only one <classname>JobInstance</classname>.</para>
<para>A <classname>Job</classname> defines what a job is and how it is
to be executed, and <classname>JobInstance</classname> is a purely
organizational object to group executions together, primarily to enable
correct restart semantics. A <classname>JobExecution</classname>,
however, is the primary storage mechanism for what actually happened
during a run, and as such contains many more properties that must be
controlled and persisted:</para>
<table>
<title>JobExecution Properties</title>
<tgroup cols="2">
<colspec colname="c1" colwidth="*" />
<colspec colname="c2" colwidth="4*" />
<tbody>
<row>
<entry>status</entry>
<entry>A <classname>BatchStatus</classname> object that
indicates the status of the execution. While running, it's
BatchStatus.STARTED, if it fails, it's BatchStatus.FAILED, and
if it finishes successfully, it's BatchStatus.COMPLETED</entry>
</row>
<row>
<entry>startTime</entry>
<entry>A <classname>java.util.Date</classname> representing the
current system time when the execution was started.</entry>
</row>
<row>
<entry>endTime</entry>
<entry>A <classname>java.util.Date</classname> representing the
current system time when the execution finished, regardless of
whether or not it was successful.</entry>
</row>
<row>
<entry>exitStatus</entry>
<entry>The <classname>ExitStatus</classname> indicating the
result of the run. It is most important because it contains an
exit code that will be returned to the caller. See chapter 5 for
more details.</entry>
</row>
<row>
<entry>createTime</entry>
<entry>A <classname>java.util.Date</classname> representing the
current system time when the <classname>JobExecution</classname>
was first persisted. The job may not have been started yet (and
thus has no start time), but it will always have a createTime,
which is required by the framework for managing job level
<classname>ExecutionContext</classname>s.</entry>
</row>
<row>
<entry>lastUpdated</entry>
<entry>A <classname>java.util.Date</classname> representing the
last time a <classname>JobExecution</classname> was
persisted.</entry>
</row>
<row>
<entry>executionContext</entry>
<entry>The 'property bag' containing any user data that needs to
be persisted between executions.</entry>
</row>
<row>
<entry>failureExceptions</entry>
<entry>The list of exceptions encountered during the execution
of a <classname>Job</classname>. These can be useful if more
than one exception is encountered during the failure of a
<classname>Job</classname>.</entry>
</row>
</tbody>
</tgroup>
</table>
<para>These properties are important because they will be persisted and
can be used to completely determine the status of an execution. For
example, if the EndOfDay job for 01-01 is executed at 9:00 PM, and fails
at 9:30, the following entries will be made in the batch meta data
tables:</para>
<table>
<title>BATCH_JOB_INSTANCE</title>
<tgroup cols="2">
<tbody>
<row>
<entry>JOB_INST_ID</entry>
<entry>JOB_NAME</entry>
</row>
<row>
<entry>1</entry>
<entry>EndOfDayJob</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>BATCH_JOB_EXECUTION_PARAMS</title>
<tgroup cols="5">
<tbody>
<row>
<entry>JOB_EXECUTION_ID</entry>
<entry>TYPE_CD</entry>
<entry>KEY_NAME</entry>
<entry>DATE_VAL</entry>
<entry>IDENTIFYING</entry>
</row>
<row>
<entry>1</entry>
<entry>DATE</entry>
<entry>schedule.Date</entry>
<entry>2008-01-01</entry>
<entry>TRUE</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>BATCH_JOB_EXECUTION</title>
<tgroup cols="5">
<tbody>
<row>
<entry>JOB_EXEC_ID</entry>
<entry>JOB_INST_ID</entry>
<entry>START_TIME</entry>
<entry>END_TIME</entry>
<entry>STATUS</entry>
</row>
<row>
<entry>1</entry>
<entry>1</entry>
<entry>2008-01-01 21:00</entry>
<entry>2008-01-01 21:30</entry>
<entry>FAILED</entry>
</row>
</tbody>
</tgroup>
</table>
<note>
<para>column names may have been abbreviated or removed for clarity
and formatting</para>
</note>
<para>Now that the job has failed, let's assume that it took the entire
course of the night for the problem to be determined, so that the 'batch
window' is now closed. Assuming the window starts at 9:00 PM, the job
will be kicked off again for 01-01, starting where it left off and
completing successfully at 9:30. Because it's now the next day, the
01-02 job must be run as well, which is kicked off just afterwards at
9:31, and completes in its normal one hour time at 10:30. There is no
requirement that one <classname>JobInstance</classname> be kicked off
after another, unless there is potential for the two jobs to attempt to
access the same data, causing issues with locking at the database level.
It is entirely up to the scheduler to determine when a
<classname>Job</classname> should be run. Since they're separate
<classname>JobInstance</classname>s, Spring Batch will make no attempt
to stop them from being run concurrently. (Attempting to run the same
<classname>JobInstance</classname> while another is already running will
result in a <classname>JobExecutionAlreadyRunningException</classname>
being thrown). There should now be an extra entry in both the
<classname>JobInstance</classname> and
<classname>JobParameters</classname> tables, and two extra entries in
the <classname>JobExecution</classname> table:</para>
<table>
<title>BATCH_JOB_INSTANCE</title>
<tgroup cols="2">
<tbody>
<row>
<entry>JOB_INST_ID</entry>
<entry>JOB_NAME</entry>
</row>
<row>
<entry>1</entry>
<entry>EndOfDayJob</entry>
</row>
<row>
<entry>2</entry>
<entry>EndOfDayJob</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>BATCH_JOB_EXECUTION_PARAMS</title>
<tgroup cols="5">
<tbody>
<row>
<entry>JOB_EXECUTION_ID</entry>
<entry>TYPE_CD</entry>
<entry>KEY_NAME</entry>
<entry>DATE_VAL</entry>
<entry>IDENTIFYING</entry>
</row>
<row>
<entry>1</entry>
<entry>DATE</entry>
<entry>schedule.Date</entry>
<entry>2008-01-01 00:00:00</entry>
<entry>TRUE</entry>
</row>
<row>
<entry>2</entry>
<entry>DATE</entry>
<entry>schedule.Date</entry>
<entry>2008-01-01 00:00:00</entry>
<entry>TRUE</entry>
</row>
<row>
<entry>3</entry>
<entry>DATE</entry>
<entry>schedule.Date</entry>
<entry>2008-01-02 00:00:00</entry>
<entry>TRUE</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>BATCH_JOB_EXECUTION</title>
<tgroup cols="5">
<tbody>
<row>
<entry>JOB_EXEC_ID</entry>
<entry>JOB_INST_ID</entry>
<entry>START_TIME</entry>
<entry>END_TIME</entry>
<entry>STATUS</entry>
</row>
<row>
<entry>1</entry>
<entry>1</entry>
<entry>2008-01-01 21:00</entry>
<entry>2008-01-01 21:30</entry>
<entry>FAILED</entry>
</row>
<row>
<entry>2</entry>
<entry>1</entry>
<entry>2008-01-02 21:00</entry>
<entry>2008-01-02 21:30</entry>
<entry>COMPLETED</entry>
</row>
<row>
<entry>3</entry>
<entry>2</entry>
<entry>2008-01-02 21:31</entry>
<entry>2008-01-02 22:29</entry>
<entry>COMPLETED</entry>
</row>
</tbody>
</tgroup>
</table>
<note>
<para>column names may have been abbreviated or removed for clarity
and formatting</para>
</note>
</section>
</section>
<section id="domainStep">
<title id="s.2.1">Step</title>
<para>A <classname>Step</classname> is a domain object that encapsulates
an independent, sequential phase of a batch job. Therefore, every
<classname>Job</classname> is composed entirely of one or more steps. A
<classname>Step</classname> contains all of the information necessary to
define and control the actual batch processing. This is a necessarily
vague description because the contents of any given
<classname>Step</classname> are at the discretion of the developer writing
a <classname>Job</classname>. A Step can be as simple or complex as the
developer desires. A simple <classname>Step</classname> might load data
from a file into the database, requiring little or no code. (depending
upon the implementations used) A more complex <classname>Step</classname>
may have complicated business rules that are applied as part of the
processing. As with <classname>Job</classname>, a
<classname>Step</classname> has an individual
<classname>StepExecution</classname> that corresponds with a unique
<classname>JobExecution</classname>:</para>
<mediaobject>
<imageobject role="html">
<imagedata align="center" fileref="images/jobHeirarchyWithSteps.png"
scale="80" />
</imageobject>
<imageobject role="fo">
<imagedata align="center" fileref="images/jobHeirarchyWithSteps.png"
scale="60" />
</imageobject>
</mediaobject>
<section id="domainStepExecution">
<title id="stepExecution">StepExecution</title>
<para>A <classname>StepExecution</classname> represents a single attempt
to execute a <classname>Step</classname>. A new
<classname>StepExecution</classname> will be created each time a
<classname>Step</classname> is run, similar to
<classname>JobExecution</classname>. However, if a step fails to execute
because the step before it fails, there will be no execution persisted
for it. A <classname>StepExecution</classname> will only be created when
its <classname>Step</classname> is actually started.</para>
<para>Step executions are represented by objects of the
<classname>StepExecution</classname> class. Each execution contains a
reference to its corresponding step and
<classname>JobExecution</classname>, and transaction related data such
as commit and rollback count and start and end times. Additionally, each
step execution will contain an <classname>ExecutionContext</classname>,
which contains any data a developer needs persisted across batch runs,
such as statistics or state information needed to restart. The following
is a listing of the properties for
<classname>StepExecution</classname>:</para>
<table>
<title>StepExecution Properties</title>
<tgroup cols="2">
<colspec colname="c1" colwidth="*" />
<colspec colname="c2" colwidth="4*" />
<tbody>
<row>
<entry>status</entry>
<entry>A <classname>BatchStatus</classname> object that
indicates the status of the execution. While it's running, the
status is BatchStatus.STARTED, if it fails, the status is
BatchStatus.FAILED, and if it finishes successfully, the status
is BatchStatus.COMPLETED</entry>
</row>
<row>
<entry>startTime</entry>
<entry>A <classname>java.util.Date</classname> representing the
current system time when the execution was started.</entry>
</row>
<row>
<entry>endTime</entry>
<entry>A <classname>java.util.Date</classname> representing the
current system time when the execution finished, regardless of
whether or not it was successful.</entry>
</row>
<row>
<entry>exitStatus</entry>
<entry>The <classname>ExitStatus</classname> indicating the
result of the execution. It is most important because it
contains an exit code that will be returned to the caller. See
chapter 5 for more details.</entry>
</row>
<row>
<entry>executionContext</entry>
<entry>The 'property bag' containing any user data that needs to
be persisted between executions.</entry>
</row>
<row>
<entry>readCount</entry>
<entry>The number of items that have been successfully
read</entry>
</row>
<row>
<entry>writeCount</entry>
<entry>The number of items that have been successfully
written</entry>
</row>
<row>
<entry>commitCount</entry>
<entry>The number transactions that have been committed for this
execution</entry>
</row>
<row>
<entry>rollbackCount</entry>
<entry>The number of times the business transaction controlled
by the <classname>Step</classname> has been rolled back.</entry>
</row>
<row>
<entry>readSkipCount</entry>
<entry>The number of times <methodname>read</methodname> has
failed, resulting in a skipped item.</entry>
</row>
<row>
<entry>processSkipCount</entry>
<entry>The number of times <methodname>process</methodname> has
failed, resulting in a skipped item.</entry>
</row>
<row>
<entry>filterCount</entry>
<entry>The number of items that have been 'filtered' by the
<classname>ItemProcessor</classname>.</entry>
</row>
<row>
<entry>writeSkipCount</entry>
<entry>The number of times <methodname>write</methodname> has
failed, resulting in a skipped item.</entry>
</row>
</tbody>
</tgroup>
</table>
</section>
</section>
<section id="domainExecutionContext">
<title>ExecutionContext</title>
<para>An <classname>ExecutionContext</classname> represents a collection
of key/value pairs that are persisted and controlled by the framework in
order to allow developers a place to store persistent state that is scoped
to a <classname>StepExecution</classname> or
<classname>JobExecution</classname>. For those familiar with Quartz, it is
very similar to <classname>JobDataMap</classname>. The best usage example
is to facilitate restart. Using flat file input as an example, while
processing individual lines, the framework periodically persists the
<classname>ExecutionContext</classname> at commit points. This allows the
<classname>ItemReader</classname> to store its state in case a fatal error
occurs during the run, or even if the power goes out. All that is needed
is to put the current number of lines read into the context, and the
framework will do the rest:</para>
<programlisting language="java">executionContext.putLong(getKey(LINES_READ_COUNT), reader.getPosition());</programlisting>
<para>Using the EndOfDay example from the Job Stereotypes section as an
example, assume there's one step: 'loadData', that loads a file into the
database. After the first failed run, the meta data tables would look like
the following:</para>
<para><table>
<title>BATCH_JOB_INSTANCE</title>
<tgroup cols="2">
<tbody>
<row>
<entry>JOB_INST_ID</entry>
<entry>JOB_NAME</entry>
</row>
<row>
<entry>1</entry>
<entry>EndOfDayJob</entry>
</row>
</tbody>
</tgroup>
</table><table>
<title>BATCH_JOB_PARAMS</title>
<tgroup cols="4">
<tbody>
<row>
<entry>JOB_INST_ID</entry>
<entry>TYPE_CD</entry>
<entry>KEY_NAME</entry>
<entry>DATE_VAL</entry>
</row>
<row>
<entry>1</entry>
<entry>DATE</entry>
<entry>schedule.Date</entry>
<entry>2008-01-01</entry>
</row>
</tbody>
</tgroup>
</table><table>
<title>BATCH_JOB_EXECUTION</title>
<tgroup cols="5">
<tbody>
<row>
<entry>JOB_EXEC_ID</entry>
<entry>JOB_INST_ID</entry>
<entry>START_TIME</entry>
<entry>END_TIME</entry>
<entry>STATUS</entry>
</row>
<row>
<entry>1</entry>
<entry>1</entry>
<entry>2008-01-01 21:00</entry>
<entry>2008-01-01 21:30</entry>
<entry>FAILED</entry>
</row>
</tbody>
</tgroup>
</table><table>
<title>BATCH_STEP_EXECUTION</title>
<tgroup cols="6">
<tbody>
<row>
<entry>STEP_EXEC_ID</entry>
<entry>JOB_EXEC_ID</entry>
<entry>STEP_NAME</entry>
<entry>START_TIME</entry>
<entry>END_TIME</entry>
<entry>STATUS</entry>
</row>
<row>
<entry>1</entry>
<entry>1</entry>
<entry>loadData</entry>
<entry>2008-01-01 21:00</entry>
<entry>2008-01-01 21:30</entry>
<entry>FAILED</entry>
</row>
</tbody>
</tgroup>
</table><table>
<title>BATCH_STEP_EXECUTION_CONTEXT</title>
<tgroup cols="2">
<tbody>
<row>
<entry>STEP_EXEC_ID</entry>
<entry>SHORT_CONTEXT</entry>
</row>
<row>
<entry>1</entry>
<entry>{piece.count=40321}</entry>
</row>
</tbody>
</tgroup>
</table>In this case, the <classname>Step</classname> ran for 30 minutes
and processed 40,321 'pieces', which would represent lines in a file in
this scenario. This value will be updated just before each commit by the
framework, and can contain multiple rows corresponding to entries within
the <classname>ExecutionContext</classname>. Being notified before a
commit requires one of the various <classname>StepListener</classname>s,
or an <classname>ItemStream</classname>, which are discussed in more
detail later in this guide. As with the previous example, it is assumed
that the <classname>Job</classname> is restarted the next day. When it is
restarted, the values from the <classname>ExecutionContext</classname> of
the last run are reconstituted from the database, and when the
<classname>ItemReader</classname> is opened, it can check to see if it has
any stored state in the context, and initialize itself from there:</para>
<programlisting language="java">if (executionContext.containsKey(getKey(LINES_READ_COUNT))) {
log.debug("Initializing for restart. Restart data is: " + executionContext);
long lineCount = executionContext.getLong(getKey(LINES_READ_COUNT));
LineReader reader = getReader();
Object record = "";
while (reader.getPosition() &lt; lineCount &amp;&amp; record != null) {
record = readLine();
}
}</programlisting>
<para>In this case, after the above code is executed, the current line
will be 40,322, allowing the <classname>Step</classname> to start again
from where it left off. The <classname>ExecutionContext</classname> can
also be used for statistics that need to be persisted about the run
itself. For example, if a flat file contains orders for processing that
exist across multiple lines, it may be necessary to store how many orders
have been processed (which is much different from than the number of lines
read) so that an email can be sent at the end of the
<classname>Step</classname> with the total orders processed in the body.
The framework handles storing this for the developer, in order to
correctly scope it with an individual <classname>JobInstance</classname>.
It can be very difficult to know whether an existing
<classname>ExecutionContext</classname> should be used or not. For
example, using the 'EndOfDay' example from above, when the 01-01 run
starts again for the second time, the framework recognizes that it is the
same <classname>JobInstance</classname> and on an individual
<classname>Step</classname> basis, pulls the
<classname>ExecutionContext</classname> out of the database and hands it
as part of the <classname>StepExecution</classname> to the
<classname>Step</classname> itself. Conversely, for the 01-02 run the
framework recognizes that it is a different instance, so an empty context
must be handed to the <classname>Step</classname>. There are many of these
types of determinations that the framework makes for the developer to
ensure the state is given to them at the correct time. It is also
important to note that exactly one <classname>ExecutionContext</classname>
exists per <classname>StepExecution</classname> at any given time. Clients
of the <classname>ExecutionContext</classname> should be careful because
this creates a shared keyspace, so care should be taken when putting
values in to ensure no data is overwritten. However, the
<classname>Step</classname> stores absolutely no data in the context, so
there is no way to adversely affect the framework.</para>
<para>It is also important to note that there is at least one
<classname>ExecutionContext</classname> per
<classname>JobExecution</classname>, and one for every
<classname>StepExecution</classname>. For example, consider the following
code snippet:</para>
<programlisting language="java">ExecutionContext ecStep = stepExecution.getExecutionContext();
ExecutionContext ecJob = jobExecution.getExecutionContext();
//ecStep does not equal ecJob</programlisting>
<para>As noted in the comment, ecStep will not equal ecJob; they are two
different <classname>ExecutionContext</classname>s. The one scoped to the
<classname>Step</classname> will be saved at every commit point in the
<classname>Step</classname>, whereas the one scoped to the
<classname>Job</classname> will be saved in between every
<classname>Step</classname> execution.</para>
</section>
<section id="domainJobRepository">
<title>JobRepository</title>
<para><classname>JobRepository</classname> is the persistence mechanism
for all of the Stereotypes mentioned above. It provides CRUD operations
for <classname>JobLauncher</classname>, <classname>Job</classname>, and
<classname>Step</classname> implementations. When a
<classname>Job</classname> is first launched, a
<classname>JobExecution</classname> is obtained from the repository, and
during the course of execution <classname>StepExecution</classname> and
<classname>JobExecution</classname> implementations are persisted by
passing them to the repository:</para>
<programlisting language="xml">&lt;job-repository id="jobRepository"/&gt;</programlisting>
</section>
<section id="domainJobLauncher">
<title>JobLauncher</title>
<para><classname>JobLauncher </classname>represents a simple interface for
launching a <classname>Job</classname> with a given set of
<classname>JobParameters</classname>:</para>
<programlisting language="java">public interface JobLauncher {
public JobExecution run(Job job, JobParameters jobParameters)
throws JobExecutionAlreadyRunningException, JobRestartException;
}</programlisting>
<para>It is expected that implementations will obtain a valid
<classname>JobExecution</classname> from the
<classname>JobRepository</classname> and execute the
<classname>Job</classname>.</para>
</section>
<section id="domainItemReader">
<title id="s.5.1.1">Item Reader</title>
<para><classname>ItemReader</classname> is an abstraction that represents
the retrieval of input for a <classname>Step</classname>, one item at a
time. When the <classname>ItemReader</classname> has exhausted the items
it can provide, it will indicate this by returning null. More details
about the <classname>ItemReader</classname> interface and its various
implementations can be found in <xref
linkend="readersAndWriters" />.</para>
</section>
<section id="domainItemWriter">
<title id="s.5.1.2">Item Writer</title>
<para><classname>ItemWriter</classname> is an abstraction that
represents the output of a <classname>Step</classname>, one batch
or chunk of items at a time. Generally, an item writer has no
knowledge of the input it will receive next, only the item that
was passed in its current invocation. More details about the
<classname>ItemWriter</classname> interface and its various
implementations can be found in <xref linkend="readersAndWriters"
/>.</para>
</section>
<section id="domainItemProcessor">
<title>Item Processor</title>
<para><classname>ItemProcessor</classname> is an abstraction that
represents the business processing of an item. While the
<classname>ItemReader</classname> reads one item, and the
<classname>ItemWriter</classname> writes them, the
<classname>ItemProcessor</classname> provides access to transform or apply
other business processing. If, while processing the item, it is determined
that the item is not valid, returning null indicates that the item should
not be written out. More details about the ItemProcessor interface can be
found in <xref linkend="readersAndWriters" />.</para>
</section>
<section id="domainBatchNamespace">
<title>Batch Namespace</title>
<para>Many of the domain concepts listed above need to be configured in a
Spring <classname>ApplicationContext</classname>. While there are
implementations of the interfaces above that can be used in a standard
bean definition, a namespace has been provided for ease of
configuration:</para>
<programlisting language="xml">&lt;beans:beans xmlns="<emphasis role="bold">http://www.springframework.org/schema/batch</emphasis>"
xmlns:beans="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
<emphasis role="bold">http://www.springframework.org/schema/batch
http://www.springframework.org/schema/batch/spring-batch-2.2.xsd</emphasis>"&gt;
&lt;job id="ioSampleJob"&gt;
&lt;step id="step1"&gt;
&lt;tasklet&gt;
&lt;chunk reader="itemReader" writer="itemWriter" commit-interval="2"/&gt;
&lt;/tasklet&gt;
&lt;/step&gt;
&lt;/job&gt;
&lt;/beans:beans&gt;</programlisting>
<para>As long as the batch namespace has been declared, any of its
elements can be used. More information on configuring a
<classname>Job</classname> can be found in <xref
linkend="configureJob" />. More information on configuring a Step can be
found in <xref linkend="configureStep" />.</para>
</section>
</chapter>