Files
spring-batch/build/reference/html/domain.html
Michael Minella 75ab909314 update
2017-03-23 10:18:33 -05:00

575 lines
51 KiB
HTML

<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>3.&nbsp;The Domain Language of Batch</title><link rel="stylesheet" type="text/css" href="css/manual-multipage.css"><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="Spring Batch - Reference Documentation"><link rel="up" href="index.html" title="Spring Batch - Reference Documentation"><link rel="prev" href="whatsNew.html" title="2.&nbsp;What's New in Spring Batch 4.0"><link rel="next" href="configureJob.html" title="4.&nbsp;Configuring and Running a Job"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">3.&nbsp;The Domain Language of Batch</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="whatsNew.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="configureJob.html">Next</a></td></tr></table><hr></div><div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="domain" href="#domain"></a>3.&nbsp;The Domain Language of Batch</h1></div></div></div>
<p>To any experienced batch architect, the overall concepts of batch
processing used in Spring Batch should be familiar and comfortable. There
are "Jobs" and "Steps" and developer supplied processing units called
ItemReaders and ItemWriters. However, because of the Spring patterns,
operations, templates, callbacks, and idioms, there are opportunities for
the following:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">
<p>significant improvement in adherence to a clear separation of
concerns</p>
</li><li class="listitem">
<p>clearly delineated architectural layers and services provided as
interfaces</p>
</li><li class="listitem">
<p>simple and default implementations that allow for quick adoption
and ease of use out-of-the-box</p>
</li><li class="listitem">
<p>significantly enhanced extensibility</p>
</li></ul></div>
<p>The diagram below is simplified version of the batch reference
architecture that has been used for decades. It provides an overview of the
components that make up the domain language of batch processing. This
architecture framework is a blueprint that has been proven through decades
of implementations on the last several generations of platforms
(COBOL/Mainframe, C++/Unix, and now Java/anywhere). JCL and COBOL developers
are likely to be as comfortable with the concepts as C++, C# and Java
developers. Spring Batch provides a physical implementation of the layers,
components and technical services commonly found in robust, maintainable
systems used to address the creation of simple to complex batch
applications, with the infrastructure and extensions to address very complex
processing needs.</p>
<div class="mediaobject" align="center"><img src="images/spring-batch-reference-model.png" align="middle"><div class="caption"><p>Figure 2.1: Batch Stereotypes</p></div></div>
<p>The diagram above highlights the key concepts that make up the domain
language of batch. A Job has one to many steps, which has exactly one
ItemReader, ItemProcessor, and ItemWriter. A job needs to be launched
(JobLauncher), and meta data about the currently running process needs to be
stored (JobRepository).</p>
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainJob" href="#domainJob"></a>3.1&nbsp;Job</h2></div></div></div>
<p>This section describes stereotypes relating to the concept of a
batch job. A <code class="classname">Job</code> is an entity that encapsulates an
entire batch process. As is common with other Spring projects, a
<code class="classname">Job</code> will be wired together via an XML configuration
file or Java based configuration. This configuration may be referred to as
the "job configuration". However, <code class="classname">Job</code> is just the
top of an overall hierarchy:</p>
<div class="mediaobject" align="center"><img src="images/job-heirarchy.png" align="middle"></div>
<p>In Spring Batch, a Job is simply a container for Steps. It combines
multiple steps that belong logically together in a flow and allows for
configuration of properties global to all steps, such as restartability.
The job configuration contains:</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">
<p>The simple name of the job</p>
</li><li class="listitem">
<p>Definition and ordering of Steps</p>
</li><li class="listitem">
<p>Whether or not the job is restartable</p>
</li></ul></div>
<p>A default simple implementation of the <code class="classname">Job</code>
interface is provided by Spring Batch in the form of the
<code class="classname">SimpleJob</code> class which creates some standard
functionality on top of <code class="classname">Job</code>, however the batch
namespace abstracts away the need to instantiate it directly. Instead, the
<code class="code">&lt;job&gt;</code> tag can be used:</p>
<pre class="programlisting"><span class="hl-tag">&lt;job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"footballJob"</span><span class="hl-tag">&gt;</span>
<span class="hl-tag">&lt;step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerload"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"gameLoad"</span><span class="hl-tag">/&gt;</span>
<span class="hl-tag">&lt;step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"gameLoad"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"playerSummarization"</span><span class="hl-tag">/&gt;</span>
<span class="hl-tag">&lt;step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerSummarization"</span><span class="hl-tag">/&gt;</span>
<span class="hl-tag">&lt;/job&gt;</span></pre>
<div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="domainJobInstance" href="#domainJobInstance"></a>3.1.1&nbsp;JobInstance</h3></div></div></div>
<p>A <code class="classname">JobInstance</code> refers to the concept of a
logical job run. Let's consider a batch job that should be run once at
the end of the day, such as the 'EndOfDay' job from the diagram above.
There is one 'EndOfDay' <code class="classname">Job</code>, but each individual
run of the <code class="classname">Job</code> must be tracked separately. In the
case of this job, there will be one logical
<code class="classname">JobInstance</code> per day. For example, there will be a
January 1st run, and a January 2nd run. If the January 1st run fails the
first time and is run again the next day, it is still the January 1st
run. (Usually this corresponds with the data it is processing as well,
meaning the January 1st run processes data for January 1st, etc).
Therefore, each <code class="classname">JobInstance</code> can have multiple
executions (<code class="classname">JobExecution</code> is discussed in more
detail below) and only one <code class="classname">JobInstance</code>
corresponding to a particular <code class="classname">Job</code> and
identifying <code class="classname">JobParameter</code>s can be running at a given
time.</p>
<p>The definition of a <code class="classname">JobInstance</code> has
absolutely no bearing on the data the will be loaded. It is entirely up
to the <code class="classname">ItemReader</code> implementation used to
determine how data will be loaded. For example, in the EndOfDay
scenario, there may be a column on the data that indicates the
'effective date' or 'schedule date' to which the data belongs. So, the
January 1st run would only load data from the 1st, and the January 2nd
run would only use data from the 2nd. Because this determination will
likely be a business decision, it is left up to the
<code class="classname">ItemReader</code> to decide. What using the same
<code class="classname">JobInstance</code> will determine, however, is whether
or not the 'state' (i.e. the <code class="classname">ExecutionContext</code>,
which is discussed below) from previous executions will be used. Using a
new <code class="classname">JobInstance</code> will mean 'start from the
beginning' and using an existing instance will generally mean 'start
from where you left off'.</p>
</div>
<div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="domainJobParameters" href="#domainJobParameters"></a>3.1.2&nbsp;JobParameters</h3></div></div></div>
<p>Having discussed <code class="classname">JobInstance</code> and how it
differs from <code class="classname">Job</code>, the natural question to ask is:
"how is one <code class="classname">JobInstance</code> distinguished from
another?" The answer is: <code class="classname">JobParameters</code>.
<code class="classname">JobParameters</code> is a set of parameters used to
start a batch job. They can be used for identification or even as
reference data during the run:</p>
<div class="mediaobject" align="center"><img src="images/job-stereotypes-parameters.png" align="middle"></div>
<p>In the example above, where there are two instances, one for
January 1st, and another for January 2nd, there is really only one Job,
one that was started with a job parameter of 01-01-2008 and another that
was started with a parameter of 01-02-2008. Thus, the contract can be
defined as: <code class="classname">JobInstance</code> =
<code class="classname">Job</code> + identifying <code class="classname">JobParameters</code>. This
allows a developer to effectively control how a
<code class="classname">JobInstance</code> is defined, since they control what
parameters are passed in.</p>
</div>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top">
<p>Not all job parameters are required to contribute to the identification
of a <code class="classname">JobInstance</code>. By default they do, however the framework
allows the submission of a <code class="classname">Job</code> with parameters that do
not contribute to the identity of a <code class="classname">JobInstance</code> as well.</p>
</td></tr></table></div>
<div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="domainJobExecution" href="#domainJobExecution"></a>3.1.3&nbsp;JobExecution</h3></div></div></div>
<p>A <code class="classname">JobExecution</code> refers to the technical
concept of a single attempt to run a <code class="classname">Job</code>. An
execution may end in failure or success, but the
<code class="classname">JobInstance</code> corresponding to a given execution
will not be considered complete unless the execution completes
successfully. Using the EndOfDay <code class="classname">Job</code> described
above as an example, consider a <code class="classname">JobInstance</code> for
01-01-2008 that failed the first time it was run. If it is run again
with the same identifying job parameters as the first run (01-01-2008), a new
<code class="classname">JobExecution</code> will be created. However, there will
still be only one <code class="classname">JobInstance</code>.</p>
<p>A <code class="classname">Job</code> defines what a job is and how it is
to be executed, and <code class="classname">JobInstance</code> is a purely
organizational object to group executions together, primarily to enable
correct restart semantics. A <code class="classname">JobExecution</code>,
however, is the primary storage mechanism for what actually happened
during a run, and as such contains many more properties that must be
controlled and persisted:</p>
<div class="table"><a name="d5e438" href="#d5e438"></a><p class="title"><b>Table&nbsp;3.1.&nbsp;JobExecution Properties</b></p><div class="table-contents">
<table summary="JobExecution Properties" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col class="c1"><col class="c2"></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">status</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">BatchStatus</code> object that
indicates the status of the execution. While running, it's
BatchStatus.STARTED, if it fails, it's BatchStatus.FAILED, and
if it finishes successfully, it's BatchStatus.COMPLETED</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">startTime</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">java.util.Date</code> representing the
current system time when the execution was started.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">endTime</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">java.util.Date</code> representing the
current system time when the execution finished, regardless of
whether or not it was successful.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">exitStatus</td><td style="border-bottom: 0.5pt solid ; ">The <code class="classname">ExitStatus</code> indicating the
result of the run. It is most important because it contains an
exit code that will be returned to the caller. See chapter 5 for
more details.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">createTime</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">java.util.Date</code> representing the
current system time when the <code class="classname">JobExecution</code>
was first persisted. The job may not have been started yet (and
thus has no start time), but it will always have a createTime,
which is required by the framework for managing job level
<code class="classname">ExecutionContext</code>s.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">lastUpdated</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">java.util.Date</code> representing the
last time a <code class="classname">JobExecution</code> was
persisted.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">executionContext</td><td style="border-bottom: 0.5pt solid ; ">The 'property bag' containing any user data that needs to
be persisted between executions.</td></tr><tr><td style="border-right: 0.5pt solid ; ">failureExceptions</td><td style="">The list of exceptions encountered during the execution
of a <code class="classname">Job</code>. These can be useful if more
than one exception is encountered during the failure of a
<code class="classname">Job</code>.</td></tr></tbody></table>
</div></div><br class="table-break">
<p>These properties are important because they will be persisted and
can be used to completely determine the status of an execution. For
example, if the EndOfDay job for 01-01 is executed at 9:00 PM, and fails
at 9:30, the following entries will be made in the batch meta data
tables:</p>
<div class="table"><a name="d5e480" href="#d5e480"></a><p class="title"><b>Table&nbsp;3.2.&nbsp;BATCH_JOB_INSTANCE</b></p><div class="table-contents">
<table summary="BATCH_JOB_INSTANCE" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-bottom: 0.5pt solid ; ">JOB_NAME</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="">EndOfDayJob</td></tr></tbody></table>
</div></div><br class="table-break">
<div class="table"><a name="d5e490" href="#d5e490"></a><p class="title"><b>Table&nbsp;3.3.&nbsp;BATCH_JOB_EXECUTION_PARAMS</b></p><div class="table-contents">
<table summary="BATCH_JOB_EXECUTION_PARAMS" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_EXECUTION_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">TYPE_CD</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">KEY_NAME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">DATE_VAL</td><td style="border-bottom: 0.5pt solid ; ">IDENTIFYING</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">DATE</td><td style="border-right: 0.5pt solid ; ">schedule.Date</td><td style="border-right: 0.5pt solid ; ">2008-01-01</td><td style="">TRUE</td></tr></tbody></table>
</div></div><br class="table-break">
<div class="table"><a name="d5e506" href="#d5e506"></a><p class="title"><b>Table&nbsp;3.4.&nbsp;BATCH_JOB_EXECUTION</b></p><div class="table-contents">
<table summary="BATCH_JOB_EXECUTION" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_EXEC_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">START_TIME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">END_TIME</td><td style="border-bottom: 0.5pt solid ; ">STATUS</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">2008-01-01 21:00</td><td style="border-right: 0.5pt solid ; ">2008-01-01 21:30</td><td style="">FAILED</td></tr></tbody></table>
</div></div><br class="table-break">
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top">
<p>column names may have been abbreviated or removed for clarity
and formatting</p>
</td></tr></table></div>
<p>Now that the job has failed, let's assume that it took the entire
course of the night for the problem to be determined, so that the 'batch
window' is now closed. Assuming the window starts at 9:00 PM, the job
will be kicked off again for 01-01, starting where it left off and
completing successfully at 9:30. Because it's now the next day, the
01-02 job must be run as well, which is kicked off just afterwards at
9:31, and completes in its normal one hour time at 10:30. There is no
requirement that one <code class="classname">JobInstance</code> be kicked off
after another, unless there is potential for the two jobs to attempt to
access the same data, causing issues with locking at the database level.
It is entirely up to the scheduler to determine when a
<code class="classname">Job</code> should be run. Since they're separate
<code class="classname">JobInstance</code>s, Spring Batch will make no attempt
to stop them from being run concurrently. (Attempting to run the same
<code class="classname">JobInstance</code> while another is already running will
result in a <code class="classname">JobExecutionAlreadyRunningException</code>
being thrown). There should now be an extra entry in both the
<code class="classname">JobInstance</code> and
<code class="classname">JobParameters</code> tables, and two extra entries in
the <code class="classname">JobExecution</code> table:</p>
<div class="table"><a name="d5e533" href="#d5e533"></a><p class="title"><b>Table&nbsp;3.5.&nbsp;BATCH_JOB_INSTANCE</b></p><div class="table-contents">
<table summary="BATCH_JOB_INSTANCE" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-bottom: 0.5pt solid ; ">JOB_NAME</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">1</td><td style="border-bottom: 0.5pt solid ; ">EndOfDayJob</td></tr><tr><td style="border-right: 0.5pt solid ; ">2</td><td style="">EndOfDayJob</td></tr></tbody></table>
</div></div><br class="table-break">
<div class="table"><a name="d5e546" href="#d5e546"></a><p class="title"><b>Table&nbsp;3.6.&nbsp;BATCH_JOB_EXECUTION_PARAMS</b></p><div class="table-contents">
<table summary="BATCH_JOB_EXECUTION_PARAMS" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_EXECUTION_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">TYPE_CD</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">KEY_NAME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">DATE_VAL</td><td style="border-bottom: 0.5pt solid ; ">IDENTIFYING</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">DATE</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">schedule.Date</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2008-01-01 00:00:00</td><td style="border-bottom: 0.5pt solid ; ">TRUE</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">DATE</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">schedule.Date</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2008-01-01 00:00:00</td><td style="border-bottom: 0.5pt solid ; ">TRUE</td></tr><tr><td style="border-right: 0.5pt solid ; ">3</td><td style="border-right: 0.5pt solid ; ">DATE</td><td style="border-right: 0.5pt solid ; ">schedule.Date</td><td style="border-right: 0.5pt solid ; ">2008-01-02 00:00:00</td><td style="">TRUE</td></tr></tbody></table>
</div></div><br class="table-break">
<div class="table"><a name="d5e574" href="#d5e574"></a><p class="title"><b>Table&nbsp;3.7.&nbsp;BATCH_JOB_EXECUTION</b></p><div class="table-contents">
<table summary="BATCH_JOB_EXECUTION" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_EXEC_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">START_TIME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">END_TIME</td><td style="border-bottom: 0.5pt solid ; ">STATUS</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2008-01-01 21:00</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2008-01-01 21:30</td><td style="border-bottom: 0.5pt solid ; ">FAILED</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2008-01-02 21:00</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2008-01-02 21:30</td><td style="border-bottom: 0.5pt solid ; ">COMPLETED</td></tr><tr><td style="border-right: 0.5pt solid ; ">3</td><td style="border-right: 0.5pt solid ; ">2</td><td style="border-right: 0.5pt solid ; ">2008-01-02 21:31</td><td style="border-right: 0.5pt solid ; ">2008-01-02 22:29</td><td style="">COMPLETED</td></tr></tbody></table>
</div></div><br class="table-break">
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top">
<p>column names may have been abbreviated or removed for clarity
and formatting</p>
</td></tr></table></div>
</div>
</div>
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainStep" href="#domainStep"></a>3.2&nbsp;Step</h2></div></div></div>
<p>A <code class="classname">Step</code> is a domain object that encapsulates
an independent, sequential phase of a batch job. Therefore, every
<code class="classname">Job</code> is composed entirely of one or more steps. A
<code class="classname">Step</code> contains all of the information necessary to
define and control the actual batch processing. This is a necessarily
vague description because the contents of any given
<code class="classname">Step</code> are at the discretion of the developer writing
a <code class="classname">Job</code>. A Step can be as simple or complex as the
developer desires. A simple <code class="classname">Step</code> might load data
from a file into the database, requiring little or no code. (depending
upon the implementations used) A more complex <code class="classname">Step</code>
may have complicated business rules that are applied as part of the
processing. As with <code class="classname">Job</code>, a
<code class="classname">Step</code> has an individual
<code class="classname">StepExecution</code> that corresponds with a unique
<code class="classname">JobExecution</code>:</p>
<div class="mediaobject" align="center"><img src="images/jobHeirarchyWithSteps.png" align="middle"></div>
<div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="domainStepExecution" href="#domainStepExecution"></a>3.2.1&nbsp;StepExecution</h3></div></div></div>
<p>A <code class="classname">StepExecution</code> represents a single attempt
to execute a <code class="classname">Step</code>. A new
<code class="classname">StepExecution</code> will be created each time a
<code class="classname">Step</code> is run, similar to
<code class="classname">JobExecution</code>. However, if a step fails to execute
because the step before it fails, there will be no execution persisted
for it. A <code class="classname">StepExecution</code> will only be created when
its <code class="classname">Step</code> is actually started.</p>
<p>Step executions are represented by objects of the
<code class="classname">StepExecution</code> class. Each execution contains a
reference to its corresponding step and
<code class="classname">JobExecution</code>, and transaction related data such
as commit and rollback count and start and end times. Additionally, each
step execution will contain an <code class="classname">ExecutionContext</code>,
which contains any data a developer needs persisted across batch runs,
such as statistics or state information needed to restart. The following
is a listing of the properties for
<code class="classname">StepExecution</code>:</p>
<div class="table"><a name="d5e638" href="#d5e638"></a><p class="title"><b>Table&nbsp;3.8.&nbsp;StepExecution Properties</b></p><div class="table-contents">
<table summary="StepExecution Properties" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col class="c1"><col class="c2"></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">status</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">BatchStatus</code> object that
indicates the status of the execution. While it's running, the
status is BatchStatus.STARTED, if it fails, the status is
BatchStatus.FAILED, and if it finishes successfully, the status
is BatchStatus.COMPLETED</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">startTime</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">java.util.Date</code> representing the
current system time when the execution was started.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">endTime</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">java.util.Date</code> representing the
current system time when the execution finished, regardless of
whether or not it was successful.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">exitStatus</td><td style="border-bottom: 0.5pt solid ; ">The <code class="classname">ExitStatus</code> indicating the
result of the execution. It is most important because it
contains an exit code that will be returned to the caller. See
chapter 5 for more details.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">executionContext</td><td style="border-bottom: 0.5pt solid ; ">The 'property bag' containing any user data that needs to
be persisted between executions.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">readCount</td><td style="border-bottom: 0.5pt solid ; ">The number of items that have been successfully
read</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">writeCount</td><td style="border-bottom: 0.5pt solid ; ">The number of items that have been successfully
written</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">commitCount</td><td style="border-bottom: 0.5pt solid ; ">The number transactions that have been committed for this
execution</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">rollbackCount</td><td style="border-bottom: 0.5pt solid ; ">The number of times the business transaction controlled
by the <code class="classname">Step</code> has been rolled back.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">readSkipCount</td><td style="border-bottom: 0.5pt solid ; ">The number of times <code class="methodname">read</code> has
failed, resulting in a skipped item.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">processSkipCount</td><td style="border-bottom: 0.5pt solid ; ">The number of times <code class="methodname">process</code> has
failed, resulting in a skipped item.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">filterCount</td><td style="border-bottom: 0.5pt solid ; ">The number of items that have been 'filtered' by the
<code class="classname">ItemProcessor</code>.</td></tr><tr><td style="border-right: 0.5pt solid ; ">writeSkipCount</td><td style="">The number of times <code class="methodname">write</code> has
failed, resulting in a skipped item.</td></tr></tbody></table>
</div></div><br class="table-break">
</div>
</div>
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainExecutionContext" href="#domainExecutionContext"></a>3.3&nbsp;ExecutionContext</h2></div></div></div>
<p>An <code class="classname">ExecutionContext</code> represents a collection
of key/value pairs that are persisted and controlled by the framework in
order to allow developers a place to store persistent state that is scoped
to a <code class="classname">StepExecution</code> or
<code class="classname">JobExecution</code>. For those familiar with Quartz, it is
very similar to <code class="classname">JobDataMap</code>. The best usage example
is to facilitate restart. Using flat file input as an example, while
processing individual lines, the framework periodically persists the
<code class="classname">ExecutionContext</code> at commit points. This allows the
<code class="classname">ItemReader</code> to store its state in case a fatal error
occurs during the run, or even if the power goes out. All that is needed
is to put the current number of lines read into the context, and the
framework will do the rest:</p>
<pre class="programlisting">executionContext.putLong(getKey(LINES_READ_COUNT), reader.getPosition());</pre>
<p>Using the EndOfDay example from the Job Stereotypes section as an
example, assume there's one step: 'loadData', that loads a file into the
database. After the first failed run, the meta data tables would look like
the following:</p>
<div class="table"><a name="d5e704" href="#d5e704"></a><p class="title"><b>Table&nbsp;3.9.&nbsp;BATCH_JOB_INSTANCE</b></p><div class="table-contents">
<table summary="BATCH_JOB_INSTANCE" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-bottom: 0.5pt solid ; ">JOB_NAME</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="">EndOfDayJob</td></tr></tbody></table>
</div></div><p><br class="table-break"></p><div class="table"><a name="d5e714" href="#d5e714"></a><p class="title"><b>Table&nbsp;3.10.&nbsp;BATCH_JOB_PARAMS</b></p><div class="table-contents">
<table summary="BATCH_JOB_PARAMS" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">TYPE_CD</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">KEY_NAME</td><td style="border-bottom: 0.5pt solid ; ">DATE_VAL</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">DATE</td><td style="border-right: 0.5pt solid ; ">schedule.Date</td><td style="">2008-01-01</td></tr></tbody></table>
</div></div><p><br class="table-break"></p><div class="table"><a name="d5e728" href="#d5e728"></a><p class="title"><b>Table&nbsp;3.11.&nbsp;BATCH_JOB_EXECUTION</b></p><div class="table-contents">
<table summary="BATCH_JOB_EXECUTION" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_EXEC_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">START_TIME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">END_TIME</td><td style="border-bottom: 0.5pt solid ; ">STATUS</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">2008-01-01 21:00</td><td style="border-right: 0.5pt solid ; ">2008-01-01 21:30</td><td style="">FAILED</td></tr></tbody></table>
</div></div><p><br class="table-break"></p><div class="table"><a name="d5e744" href="#d5e744"></a><p class="title"><b>Table&nbsp;3.12.&nbsp;BATCH_STEP_EXECUTION</b></p><div class="table-contents">
<table summary="BATCH_STEP_EXECUTION" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">STEP_EXEC_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_EXEC_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">STEP_NAME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">START_TIME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">END_TIME</td><td style="border-bottom: 0.5pt solid ; ">STATUS</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">loadData</td><td style="border-right: 0.5pt solid ; ">2008-01-01 21:00</td><td style="border-right: 0.5pt solid ; ">2008-01-01 21:30</td><td style="">FAILED</td></tr></tbody></table>
</div></div><p><br class="table-break"></p><div class="table"><a name="d5e762" href="#d5e762"></a><p class="title"><b>Table&nbsp;3.13.&nbsp;BATCH_STEP_EXECUTION_CONTEXT</b></p><div class="table-contents">
<table summary="BATCH_STEP_EXECUTION_CONTEXT" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">STEP_EXEC_ID</td><td style="border-bottom: 0.5pt solid ; ">SHORT_CONTEXT</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="">{piece.count=40321}</td></tr></tbody></table>
</div></div><p><br class="table-break">In this case, the <code class="classname">Step</code> ran for 30 minutes
and processed 40,321 'pieces', which would represent lines in a file in
this scenario. This value will be updated just before each commit by the
framework, and can contain multiple rows corresponding to entries within
the <code class="classname">ExecutionContext</code>. Being notified before a
commit requires one of the various <code class="classname">StepListener</code>s,
or an <code class="classname">ItemStream</code>, which are discussed in more
detail later in this guide. As with the previous example, it is assumed
that the <code class="classname">Job</code> is restarted the next day. When it is
restarted, the values from the <code class="classname">ExecutionContext</code> of
the last run are reconstituted from the database, and when the
<code class="classname">ItemReader</code> is opened, it can check to see if it has
any stored state in the context, and initialize itself from there:</p>
<pre class="programlisting"><span class="hl-keyword">if</span> (executionContext.containsKey(getKey(LINES_READ_COUNT))) {
log.debug(<span class="hl-string">"Initializing for restart. Restart data is: "</span> + executionContext);
<span class="hl-keyword">long</span> lineCount = executionContext.getLong(getKey(LINES_READ_COUNT));
LineReader reader = getReader();
Object record = <span class="hl-string">""</span>;
<span class="hl-keyword">while</span> (reader.getPosition() &lt; lineCount &amp;&amp; record != null) {
record = readLine();
}
}</pre>
<p>In this case, after the above code is executed, the current line
will be 40,322, allowing the <code class="classname">Step</code> to start again
from where it left off. The <code class="classname">ExecutionContext</code> can
also be used for statistics that need to be persisted about the run
itself. For example, if a flat file contains orders for processing that
exist across multiple lines, it may be necessary to store how many orders
have been processed (which is much different from than the number of lines
read) so that an email can be sent at the end of the
<code class="classname">Step</code> with the total orders processed in the body.
The framework handles storing this for the developer, in order to
correctly scope it with an individual <code class="classname">JobInstance</code>.
It can be very difficult to know whether an existing
<code class="classname">ExecutionContext</code> should be used or not. For
example, using the 'EndOfDay' example from above, when the 01-01 run
starts again for the second time, the framework recognizes that it is the
same <code class="classname">JobInstance</code> and on an individual
<code class="classname">Step</code> basis, pulls the
<code class="classname">ExecutionContext</code> out of the database and hands it
as part of the <code class="classname">StepExecution</code> to the
<code class="classname">Step</code> itself. Conversely, for the 01-02 run the
framework recognizes that it is a different instance, so an empty context
must be handed to the <code class="classname">Step</code>. There are many of these
types of determinations that the framework makes for the developer to
ensure the state is given to them at the correct time. It is also
important to note that exactly one <code class="classname">ExecutionContext</code>
exists per <code class="classname">StepExecution</code> at any given time. Clients
of the <code class="classname">ExecutionContext</code> should be careful because
this creates a shared keyspace, so care should be taken when putting
values in to ensure no data is overwritten. However, the
<code class="classname">Step</code> stores absolutely no data in the context, so
there is no way to adversely affect the framework.</p>
<p>It is also important to note that there is at least one
<code class="classname">ExecutionContext</code> per
<code class="classname">JobExecution</code>, and one for every
<code class="classname">StepExecution</code>. For example, consider the following
code snippet:</p>
<pre class="programlisting">ExecutionContext ecStep = stepExecution.getExecutionContext();
ExecutionContext ecJob = jobExecution.getExecutionContext();
<span class="hl-comment">//ecStep does not equal ecJob</span></pre>
<p>As noted in the comment, ecStep will not equal ecJob; they are two
different <code class="classname">ExecutionContext</code>s. The one scoped to the
<code class="classname">Step</code> will be saved at every commit point in the
<code class="classname">Step</code>, whereas the one scoped to the
<code class="classname">Job</code> will be saved in between every
<code class="classname">Step</code> execution.</p>
</div>
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainJobRepository" href="#domainJobRepository"></a>3.4&nbsp;JobRepository</h2></div></div></div>
<p><code class="classname">JobRepository</code> is the persistence mechanism
for all of the Stereotypes mentioned above. It provides CRUD operations
for <code class="classname">JobLauncher</code>, <code class="classname">Job</code>, and
<code class="classname">Step</code> implementations. When a
<code class="classname">Job</code> is first launched, a
<code class="classname">JobExecution</code> is obtained from the repository, and
during the course of execution <code class="classname">StepExecution</code> and
<code class="classname">JobExecution</code> implementations are persisted by
passing them to the repository:</p>
<pre class="programlisting"><span class="hl-tag">&lt;job-repository</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobRepository"</span><span class="hl-tag">/&gt;</span></pre>
</div>
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainJobLauncher" href="#domainJobLauncher"></a>3.5&nbsp;JobLauncher</h2></div></div></div>
<p><code class="classname">JobLauncher </code>represents a simple interface for
launching a <code class="classname">Job</code> with a given set of
<code class="classname">JobParameters</code>:</p>
<pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> JobLauncher {
<span class="hl-keyword">public</span> JobExecution run(Job job, JobParameters jobParameters)
<span class="hl-keyword">throws</span> JobExecutionAlreadyRunningException, JobRestartException;
}</pre>
<p>It is expected that implementations will obtain a valid
<code class="classname">JobExecution</code> from the
<code class="classname">JobRepository</code> and execute the
<code class="classname">Job</code>.</p>
</div>
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainItemReader" href="#domainItemReader"></a>3.6&nbsp;Item Reader</h2></div></div></div>
<p><code class="classname">ItemReader</code> is an abstraction that represents
the retrieval of input for a <code class="classname">Step</code>, one item at a
time. When the <code class="classname">ItemReader</code> has exhausted the items
it can provide, it will indicate this by returning null. More details
about the <code class="classname">ItemReader</code> interface and its various
implementations can be found in <a class="xref" href="readersAndWriters.html" title="6.&nbsp;ItemReaders and ItemWriters">Chapter&nbsp;6, <i>ItemReaders and ItemWriters</i></a>.</p>
</div>
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainItemWriter" href="#domainItemWriter"></a>3.7&nbsp;Item Writer</h2></div></div></div>
<p><code class="classname">ItemWriter</code> is an abstraction that
represents the output of a <code class="classname">Step</code>, one batch
or chunk of items at a time. Generally, an item writer has no
knowledge of the input it will receive next, only the item that
was passed in its current invocation. More details about the
<code class="classname">ItemWriter</code> interface and its various
implementations can be found in <a class="xref" href="readersAndWriters.html" title="6.&nbsp;ItemReaders and ItemWriters">Chapter&nbsp;6, <i>ItemReaders and ItemWriters</i></a>.</p>
</div>
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainItemProcessor" href="#domainItemProcessor"></a>3.8&nbsp;Item Processor</h2></div></div></div>
<p><code class="classname">ItemProcessor</code> is an abstraction that
represents the business processing of an item. While the
<code class="classname">ItemReader</code> reads one item, and the
<code class="classname">ItemWriter</code> writes them, the
<code class="classname">ItemProcessor</code> provides access to transform or apply
other business processing. If, while processing the item, it is determined
that the item is not valid, returning null indicates that the item should
not be written out. More details about the ItemProcessor interface can be
found in <a class="xref" href="readersAndWriters.html" title="6.&nbsp;ItemReaders and ItemWriters">Chapter&nbsp;6, <i>ItemReaders and ItemWriters</i></a>.</p>
</div>
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainBatchNamespace" href="#domainBatchNamespace"></a>3.9&nbsp;Batch Namespace</h2></div></div></div>
<p>Many of the domain concepts listed above need to be configured in a
Spring <code class="classname">ApplicationContext</code>. While there are
implementations of the interfaces above that can be used in a standard
bean definition, a namespace has been provided for ease of
configuration:</p>
<pre class="programlisting"><span class="hl-tag">&lt;beans:beans</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"</span><span class="bold"><strong>http://www.springframework.org/schema/batch</strong></span>"
xmlns:beans="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
<span class="bold"><strong>http://www.springframework.org/schema/batch
http://www.springframework.org/schema/batch/spring-batch-2.2.xsd</strong></span>"&gt;
<span class="hl-tag">&lt;job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"ioSampleJob"</span><span class="hl-tag">&gt;</span>
<span class="hl-tag">&lt;step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">&gt;</span>
<span class="hl-tag">&lt;tasklet&gt;</span>
<span class="hl-tag">&lt;chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"2"</span><span class="hl-tag">/&gt;</span>
<span class="hl-tag">&lt;/tasklet&gt;</span>
<span class="hl-tag">&lt;/step&gt;</span>
<span class="hl-tag">&lt;/job&gt;</span>
<span class="hl-tag">&lt;/beans:beans&gt;</span></pre>
<p>As long as the batch namespace has been declared, any of its
elements can be used. More information on configuring a
<code class="classname">Job</code> can be found in <a class="xref" href="configureJob.html" title="4.&nbsp;Configuring and Running a Job">Chapter&nbsp;4, <i>Configuring and Running a Job</i></a>. More information on configuring a Step can be
found in <a class="xref" href="configureStep.html" title="5.&nbsp;Configuring a Step">Chapter&nbsp;5, <i>Configuring a Step</i></a>.</p>
</div>
</div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="whatsNew.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="configureJob.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">2.&nbsp;What's New in Spring Batch 4.0&nbsp;</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;4.&nbsp;Configuring and Running a Job</td></tr></table></div></body></html>