1069 lines
39 KiB
XML
1069 lines
39 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" xml:id="domain"
|
|
xmlns:xlink="http://www.w3.org/1999/xlink" xreflabel="Batch Domain Language">
|
|
<title>The Domain Language of Batch</title>
|
|
|
|
<para>To any experienced batch architect, the overall concepts of batch
|
|
processing used in Spring Batch should be familiar and comfortable. There
|
|
are "Jobs" and "Steps" and developer supplied processing units called
|
|
ItemReaders and ItemWriters. However, because of the Spring patterns,
|
|
operations, templates, callbacks, and idioms, there are opportunities for
|
|
the following:<itemizedlist>
|
|
<listitem>
|
|
<para>significant improvement in adherence to a clear separation of
|
|
concerns</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>clearly delineated architectural layers and services provided as
|
|
interfaces</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>simple and default implementations that allow for quick adoption
|
|
and ease of use out-of-the-box</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>significantly enhanced extensibility</para>
|
|
</listitem>
|
|
</itemizedlist></para>
|
|
|
|
<para>The diagram below is simplified version of the batch reference
|
|
architecture that has been used for decades. It provides an overview of the
|
|
components that make up the domain language of batch processing. This
|
|
architecture framework is a blueprint that has been proven through decades
|
|
of implementations on the last several generations of platforms
|
|
(COBOL/Mainframe, C++/Unix, and now Java/anywhere). JCL and COBOL developers
|
|
are likely to be as comfortable with the concepts as C++, C# and Java
|
|
developers. Spring Batch provides a physical implementation of the layers,
|
|
components and technical services commonly found in robust, maintainable
|
|
systems used to address the creation of simple to complex batch
|
|
applications, with the infrastructure and extensions to address very complex
|
|
processing needs.</para>
|
|
|
|
<mediaobject>
|
|
<imageobject role="html">
|
|
<imagedata align="center"
|
|
fileref="images/spring-batch-reference-model.png"
|
|
format="PNG" scale="100" />
|
|
</imageobject>
|
|
|
|
<imageobject role="fo">
|
|
<imagedata align="center"
|
|
fileref="images/spring-batch-reference-model.png"
|
|
format="PNG" scale="55" />
|
|
</imageobject>
|
|
|
|
<caption><para>Figure 2.1: Batch Stereotypes</para></caption>
|
|
</mediaobject>
|
|
|
|
<para>The diagram above highlights the key concepts that make up the domain
|
|
language of batch. A Job has one to many steps, which has exactly one
|
|
ItemReader, ItemProcessor, and ItemWriter. A job needs to be launched
|
|
(JobLauncher), and meta data about the currently running process needs to be
|
|
stored (JobRepository).</para>
|
|
|
|
<section id="domainJob">
|
|
<title id="jobStereotypes">Job</title>
|
|
|
|
<para>This section describes stereotypes relating to the concept of a
|
|
batch job. A <classname>Job</classname> is an entity that encapsulates an
|
|
entire batch process. As is common with other Spring projects, a
|
|
<classname>Job</classname> will be wired together via an XML configuration
|
|
file or Java based configuration. This configuration may be referred to as
|
|
the "job configuration". However, <classname>Job</classname> is just the
|
|
top of an overall hierarchy:</para>
|
|
|
|
<mediaobject>
|
|
<imageobject role="html">
|
|
<imagedata align="center" fileref="images/job-heirarchy.png"
|
|
scale="100" />
|
|
</imageobject>
|
|
|
|
<imageobject role="fo">
|
|
<imagedata align="center" fileref="images/job-heirarchy.png"
|
|
scale="60" />
|
|
</imageobject>
|
|
</mediaobject>
|
|
|
|
<para>In Spring Batch, a Job is simply a container for Steps. It combines
|
|
multiple steps that belong logically together in a flow and allows for
|
|
configuration of properties global to all steps, such as restartability.
|
|
The job configuration contains:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>The simple name of the job</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Definition and ordering of Steps</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Whether or not the job is restartable</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>A default simple implementation of the <classname>Job</classname>
|
|
interface is provided by Spring Batch in the form of the
|
|
<classname>SimpleJob</classname> class which creates some standard
|
|
functionality on top of <classname>Job</classname>, however the batch
|
|
namespace abstracts away the need to instantiate it directly. Instead, the
|
|
<code><job></code> tag can be used:</para>
|
|
|
|
<programlisting language="xml"><job id="footballJob">
|
|
<step id="playerload" next="gameLoad"/>
|
|
<step id="gameLoad" next="playerSummarization"/>
|
|
<step id="playerSummarization"/>
|
|
</job></programlisting>
|
|
|
|
<section id="domainJobInstance">
|
|
<title id="s.2.1.2">JobInstance</title>
|
|
|
|
<para>A <classname>JobInstance</classname> refers to the concept of a
|
|
logical job run. Let's consider a batch job that should be run once at
|
|
the end of the day, such as the 'EndOfDay' job from the diagram above.
|
|
There is one 'EndOfDay' <classname>Job</classname>, but each individual
|
|
run of the <classname>Job</classname> must be tracked separately. In the
|
|
case of this job, there will be one logical
|
|
<classname>JobInstance</classname> per day. For example, there will be a
|
|
January 1st run, and a January 2nd run. If the January 1st run fails the
|
|
first time and is run again the next day, it is still the January 1st
|
|
run. (Usually this corresponds with the data it is processing as well,
|
|
meaning the January 1st run processes data for January 1st, etc).
|
|
Therefore, each <classname>JobInstance</classname> can have multiple
|
|
executions (<classname>JobExecution</classname> is discussed in more
|
|
detail below) and only one <classname>JobInstance</classname>
|
|
corresponding to a particular <classname>Job</classname> and
|
|
identifying <classname>JobParameter</classname>s can be running at a given
|
|
time.</para>
|
|
|
|
<para>The definition of a <classname>JobInstance</classname> has
|
|
absolutely no bearing on the data the will be loaded. It is entirely up
|
|
to the <classname>ItemReader</classname> implementation used to
|
|
determine how data will be loaded. For example, in the EndOfDay
|
|
scenario, there may be a column on the data that indicates the
|
|
'effective date' or 'schedule date' to which the data belongs. So, the
|
|
January 1st run would only load data from the 1st, and the January 2nd
|
|
run would only use data from the 2nd. Because this determination will
|
|
likely be a business decision, it is left up to the
|
|
<classname>ItemReader</classname> to decide. What using the same
|
|
<classname>JobInstance</classname> will determine, however, is whether
|
|
or not the 'state' (i.e. the <classname>ExecutionContext</classname>,
|
|
which is discussed below) from previous executions will be used. Using a
|
|
new <classname>JobInstance</classname> will mean 'start from the
|
|
beginning' and using an existing instance will generally mean 'start
|
|
from where you left off'.</para>
|
|
</section>
|
|
|
|
<section id="domainJobParameters">
|
|
<title id="s.2.1.3">JobParameters</title>
|
|
|
|
<para>Having discussed <classname>JobInstance</classname> and how it
|
|
differs from <classname>Job</classname>, the natural question to ask is:
|
|
"how is one <classname>JobInstance</classname> distinguished from
|
|
another?" The answer is: <classname>JobParameters</classname>.
|
|
<classname>JobParameters</classname> is a set of parameters used to
|
|
start a batch job. They can be used for identification or even as
|
|
reference data during the run:</para>
|
|
|
|
<para><mediaobject>
|
|
<imageobject role="html">
|
|
<imagedata align="center"
|
|
fileref="images/job-stereotypes-parameters.png"
|
|
scale="100" />
|
|
</imageobject>
|
|
|
|
<imageobject role="fo">
|
|
<imagedata align="center"
|
|
fileref="images/job-stereotypes-parameters.png"
|
|
scale="60" />
|
|
</imageobject>
|
|
</mediaobject></para>
|
|
|
|
<para>In the example above, where there are two instances, one for
|
|
January 1st, and another for January 2nd, there is really only one Job,
|
|
one that was started with a job parameter of 01-01-2008 and another that
|
|
was started with a parameter of 01-02-2008. Thus, the contract can be
|
|
defined as: <classname>JobInstance</classname> =
|
|
<classname>Job</classname> + identifying <classname>JobParameters</classname>. This
|
|
allows a developer to effectively control how a
|
|
<classname>JobInstance</classname> is defined, since they control what
|
|
parameters are passed in.</para>
|
|
</section>
|
|
<note>
|
|
<para>Not all job parameters are required to contribute to the identification
|
|
of a <classname>JobInstance</classname>. By default they do, however the framework
|
|
allows the submission of a <classname>Job</classname> with parameters that do
|
|
not contribute to the identity of a <classname>JobInstance</classname> as well.</para>
|
|
</note>
|
|
|
|
<section id="domainJobExecution">
|
|
<title id="jobExecution">JobExecution</title>
|
|
|
|
<para>A <classname>JobExecution</classname> refers to the technical
|
|
concept of a single attempt to run a <classname>Job</classname>. An
|
|
execution may end in failure or success, but the
|
|
<classname>JobInstance</classname> corresponding to a given execution
|
|
will not be considered complete unless the execution completes
|
|
successfully. Using the EndOfDay <classname>Job</classname> described
|
|
above as an example, consider a <classname>JobInstance</classname> for
|
|
01-01-2008 that failed the first time it was run. If it is run again
|
|
with the same identifying job parameters as the first run (01-01-2008), a new
|
|
<classname>JobExecution</classname> will be created. However, there will
|
|
still be only one <classname>JobInstance</classname>.</para>
|
|
|
|
<para>A <classname>Job</classname> defines what a job is and how it is
|
|
to be executed, and <classname>JobInstance</classname> is a purely
|
|
organizational object to group executions together, primarily to enable
|
|
correct restart semantics. A <classname>JobExecution</classname>,
|
|
however, is the primary storage mechanism for what actually happened
|
|
during a run, and as such contains many more properties that must be
|
|
controlled and persisted:</para>
|
|
|
|
<table>
|
|
<title>JobExecution Properties</title>
|
|
|
|
<tgroup cols="2">
|
|
<colspec colname="c1" colwidth="*" />
|
|
|
|
<colspec colname="c2" colwidth="4*" />
|
|
|
|
<tbody>
|
|
<row>
|
|
<entry>status</entry>
|
|
|
|
<entry>A <classname>BatchStatus</classname> object that
|
|
indicates the status of the execution. While running, it's
|
|
BatchStatus.STARTED, if it fails, it's BatchStatus.FAILED, and
|
|
if it finishes successfully, it's BatchStatus.COMPLETED</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>startTime</entry>
|
|
|
|
<entry>A <classname>java.util.Date</classname> representing the
|
|
current system time when the execution was started.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>endTime</entry>
|
|
|
|
<entry>A <classname>java.util.Date</classname> representing the
|
|
current system time when the execution finished, regardless of
|
|
whether or not it was successful.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>exitStatus</entry>
|
|
|
|
<entry>The <classname>ExitStatus</classname> indicating the
|
|
result of the run. It is most important because it contains an
|
|
exit code that will be returned to the caller. See chapter 5 for
|
|
more details.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>createTime</entry>
|
|
|
|
<entry>A <classname>java.util.Date</classname> representing the
|
|
current system time when the <classname>JobExecution</classname>
|
|
was first persisted. The job may not have been started yet (and
|
|
thus has no start time), but it will always have a createTime,
|
|
which is required by the framework for managing job level
|
|
<classname>ExecutionContext</classname>s.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>lastUpdated</entry>
|
|
|
|
<entry>A <classname>java.util.Date</classname> representing the
|
|
last time a <classname>JobExecution</classname> was
|
|
persisted.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>executionContext</entry>
|
|
|
|
<entry>The 'property bag' containing any user data that needs to
|
|
be persisted between executions.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>failureExceptions</entry>
|
|
|
|
<entry>The list of exceptions encountered during the execution
|
|
of a <classname>Job</classname>. These can be useful if more
|
|
than one exception is encountered during the failure of a
|
|
<classname>Job</classname>.</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<para>These properties are important because they will be persisted and
|
|
can be used to completely determine the status of an execution. For
|
|
example, if the EndOfDay job for 01-01 is executed at 9:00 PM, and fails
|
|
at 9:30, the following entries will be made in the batch meta data
|
|
tables:</para>
|
|
|
|
<table>
|
|
<title>BATCH_JOB_INSTANCE</title>
|
|
|
|
<tgroup cols="2">
|
|
<tbody>
|
|
<row>
|
|
<entry>JOB_INST_ID</entry>
|
|
|
|
<entry>JOB_NAME</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>1</entry>
|
|
|
|
<entry>EndOfDayJob</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<table>
|
|
<title>BATCH_JOB_EXECUTION_PARAMS</title>
|
|
|
|
<tgroup cols="5">
|
|
<tbody>
|
|
<row>
|
|
<entry>JOB_EXECUTION_ID</entry>
|
|
|
|
<entry>TYPE_CD</entry>
|
|
|
|
<entry>KEY_NAME</entry>
|
|
|
|
<entry>DATE_VAL</entry>
|
|
|
|
<entry>IDENTIFYING</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>1</entry>
|
|
|
|
<entry>DATE</entry>
|
|
|
|
<entry>schedule.Date</entry>
|
|
|
|
<entry>2008-01-01</entry>
|
|
|
|
<entry>TRUE</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<table>
|
|
<title>BATCH_JOB_EXECUTION</title>
|
|
|
|
<tgroup cols="5">
|
|
<tbody>
|
|
<row>
|
|
<entry>JOB_EXEC_ID</entry>
|
|
|
|
<entry>JOB_INST_ID</entry>
|
|
|
|
<entry>START_TIME</entry>
|
|
|
|
<entry>END_TIME</entry>
|
|
|
|
<entry>STATUS</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>1</entry>
|
|
|
|
<entry>1</entry>
|
|
|
|
<entry>2008-01-01 21:00</entry>
|
|
|
|
<entry>2008-01-01 21:30</entry>
|
|
|
|
<entry>FAILED</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<note>
|
|
<para>column names may have been abbreviated or removed for clarity
|
|
and formatting</para>
|
|
</note>
|
|
|
|
<para>Now that the job has failed, let's assume that it took the entire
|
|
course of the night for the problem to be determined, so that the 'batch
|
|
window' is now closed. Assuming the window starts at 9:00 PM, the job
|
|
will be kicked off again for 01-01, starting where it left off and
|
|
completing successfully at 9:30. Because it's now the next day, the
|
|
01-02 job must be run as well, which is kicked off just afterwards at
|
|
9:31, and completes in its normal one hour time at 10:30. There is no
|
|
requirement that one <classname>JobInstance</classname> be kicked off
|
|
after another, unless there is potential for the two jobs to attempt to
|
|
access the same data, causing issues with locking at the database level.
|
|
It is entirely up to the scheduler to determine when a
|
|
<classname>Job</classname> should be run. Since they're separate
|
|
<classname>JobInstance</classname>s, Spring Batch will make no attempt
|
|
to stop them from being run concurrently. (Attempting to run the same
|
|
<classname>JobInstance</classname> while another is already running will
|
|
result in a <classname>JobExecutionAlreadyRunningException</classname>
|
|
being thrown). There should now be an extra entry in both the
|
|
<classname>JobInstance</classname> and
|
|
<classname>JobParameters</classname> tables, and two extra entries in
|
|
the <classname>JobExecution</classname> table:</para>
|
|
|
|
<table>
|
|
<title>BATCH_JOB_INSTANCE</title>
|
|
|
|
<tgroup cols="2">
|
|
<tbody>
|
|
<row>
|
|
<entry>JOB_INST_ID</entry>
|
|
|
|
<entry>JOB_NAME</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>1</entry>
|
|
|
|
<entry>EndOfDayJob</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>2</entry>
|
|
|
|
<entry>EndOfDayJob</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<table>
|
|
<title>BATCH_JOB_EXECUTION_PARAMS</title>
|
|
|
|
<tgroup cols="5">
|
|
<tbody>
|
|
<row>
|
|
<entry>JOB_EXECUTION_ID</entry>
|
|
|
|
<entry>TYPE_CD</entry>
|
|
|
|
<entry>KEY_NAME</entry>
|
|
|
|
<entry>DATE_VAL</entry>
|
|
|
|
<entry>IDENTIFYING</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>1</entry>
|
|
|
|
<entry>DATE</entry>
|
|
|
|
<entry>schedule.Date</entry>
|
|
|
|
<entry>2008-01-01 00:00:00</entry>
|
|
|
|
<entry>TRUE</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>2</entry>
|
|
|
|
<entry>DATE</entry>
|
|
|
|
<entry>schedule.Date</entry>
|
|
|
|
<entry>2008-01-01 00:00:00</entry>
|
|
|
|
<entry>TRUE</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>3</entry>
|
|
|
|
<entry>DATE</entry>
|
|
|
|
<entry>schedule.Date</entry>
|
|
|
|
<entry>2008-01-02 00:00:00</entry>
|
|
|
|
<entry>TRUE</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<table>
|
|
<title>BATCH_JOB_EXECUTION</title>
|
|
|
|
<tgroup cols="5">
|
|
<tbody>
|
|
<row>
|
|
<entry>JOB_EXEC_ID</entry>
|
|
|
|
<entry>JOB_INST_ID</entry>
|
|
|
|
<entry>START_TIME</entry>
|
|
|
|
<entry>END_TIME</entry>
|
|
|
|
<entry>STATUS</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>1</entry>
|
|
|
|
<entry>1</entry>
|
|
|
|
<entry>2008-01-01 21:00</entry>
|
|
|
|
<entry>2008-01-01 21:30</entry>
|
|
|
|
<entry>FAILED</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>2</entry>
|
|
|
|
<entry>1</entry>
|
|
|
|
<entry>2008-01-02 21:00</entry>
|
|
|
|
<entry>2008-01-02 21:30</entry>
|
|
|
|
<entry>COMPLETED</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>3</entry>
|
|
|
|
<entry>2</entry>
|
|
|
|
<entry>2008-01-02 21:31</entry>
|
|
|
|
<entry>2008-01-02 22:29</entry>
|
|
|
|
<entry>COMPLETED</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<note>
|
|
<para>column names may have been abbreviated or removed for clarity
|
|
and formatting</para>
|
|
</note>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="domainStep">
|
|
<title id="s.2.1">Step</title>
|
|
|
|
<para>A <classname>Step</classname> is a domain object that encapsulates
|
|
an independent, sequential phase of a batch job. Therefore, every
|
|
<classname>Job</classname> is composed entirely of one or more steps. A
|
|
<classname>Step</classname> contains all of the information necessary to
|
|
define and control the actual batch processing. This is a necessarily
|
|
vague description because the contents of any given
|
|
<classname>Step</classname> are at the discretion of the developer writing
|
|
a <classname>Job</classname>. A Step can be as simple or complex as the
|
|
developer desires. A simple <classname>Step</classname> might load data
|
|
from a file into the database, requiring little or no code. (depending
|
|
upon the implementations used) A more complex <classname>Step</classname>
|
|
may have complicated business rules that are applied as part of the
|
|
processing. As with <classname>Job</classname>, a
|
|
<classname>Step</classname> has an individual
|
|
<classname>StepExecution</classname> that corresponds with a unique
|
|
<classname>JobExecution</classname>:</para>
|
|
|
|
<mediaobject>
|
|
<imageobject role="html">
|
|
<imagedata align="center" fileref="images/jobHeirarchyWithSteps.png"
|
|
scale="80" />
|
|
</imageobject>
|
|
|
|
<imageobject role="fo">
|
|
<imagedata align="center" fileref="images/jobHeirarchyWithSteps.png"
|
|
scale="60" />
|
|
</imageobject>
|
|
</mediaobject>
|
|
|
|
<section id="domainStepExecution">
|
|
<title id="stepExecution">StepExecution</title>
|
|
|
|
<para>A <classname>StepExecution</classname> represents a single attempt
|
|
to execute a <classname>Step</classname>. A new
|
|
<classname>StepExecution</classname> will be created each time a
|
|
<classname>Step</classname> is run, similar to
|
|
<classname>JobExecution</classname>. However, if a step fails to execute
|
|
because the step before it fails, there will be no execution persisted
|
|
for it. A <classname>StepExecution</classname> will only be created when
|
|
its <classname>Step</classname> is actually started.</para>
|
|
|
|
<para>Step executions are represented by objects of the
|
|
<classname>StepExecution</classname> class. Each execution contains a
|
|
reference to its corresponding step and
|
|
<classname>JobExecution</classname>, and transaction related data such
|
|
as commit and rollback count and start and end times. Additionally, each
|
|
step execution will contain an <classname>ExecutionContext</classname>,
|
|
which contains any data a developer needs persisted across batch runs,
|
|
such as statistics or state information needed to restart. The following
|
|
is a listing of the properties for
|
|
<classname>StepExecution</classname>:</para>
|
|
|
|
<table>
|
|
<title>StepExecution Properties</title>
|
|
|
|
<tgroup cols="2">
|
|
<colspec colname="c1" colwidth="*" />
|
|
|
|
<colspec colname="c2" colwidth="4*" />
|
|
|
|
<tbody>
|
|
<row>
|
|
<entry>status</entry>
|
|
|
|
<entry>A <classname>BatchStatus</classname> object that
|
|
indicates the status of the execution. While it's running, the
|
|
status is BatchStatus.STARTED, if it fails, the status is
|
|
BatchStatus.FAILED, and if it finishes successfully, the status
|
|
is BatchStatus.COMPLETED</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>startTime</entry>
|
|
|
|
<entry>A <classname>java.util.Date</classname> representing the
|
|
current system time when the execution was started.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>endTime</entry>
|
|
|
|
<entry>A <classname>java.util.Date</classname> representing the
|
|
current system time when the execution finished, regardless of
|
|
whether or not it was successful.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>exitStatus</entry>
|
|
|
|
<entry>The <classname>ExitStatus</classname> indicating the
|
|
result of the execution. It is most important because it
|
|
contains an exit code that will be returned to the caller. See
|
|
chapter 5 for more details.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>executionContext</entry>
|
|
|
|
<entry>The 'property bag' containing any user data that needs to
|
|
be persisted between executions.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>readCount</entry>
|
|
|
|
<entry>The number of items that have been successfully
|
|
read</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>writeCount</entry>
|
|
|
|
<entry>The number of items that have been successfully
|
|
written</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>commitCount</entry>
|
|
|
|
<entry>The number transactions that have been committed for this
|
|
execution</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>rollbackCount</entry>
|
|
|
|
<entry>The number of times the business transaction controlled
|
|
by the <classname>Step</classname> has been rolled back.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>readSkipCount</entry>
|
|
|
|
<entry>The number of times <methodname>read</methodname> has
|
|
failed, resulting in a skipped item.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>processSkipCount</entry>
|
|
|
|
<entry>The number of times <methodname>process</methodname> has
|
|
failed, resulting in a skipped item.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>filterCount</entry>
|
|
|
|
<entry>The number of items that have been 'filtered' by the
|
|
<classname>ItemProcessor</classname>.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>writeSkipCount</entry>
|
|
|
|
<entry>The number of times <methodname>write</methodname> has
|
|
failed, resulting in a skipped item.</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="domainExecutionContext">
|
|
<title>ExecutionContext</title>
|
|
|
|
<para>An <classname>ExecutionContext</classname> represents a collection
|
|
of key/value pairs that are persisted and controlled by the framework in
|
|
order to allow developers a place to store persistent state that is scoped
|
|
to a <classname>StepExecution</classname> or
|
|
<classname>JobExecution</classname>. For those familiar with Quartz, it is
|
|
very similar to <classname>JobDataMap</classname>. The best usage example
|
|
is to facilitate restart. Using flat file input as an example, while
|
|
processing individual lines, the framework periodically persists the
|
|
<classname>ExecutionContext</classname> at commit points. This allows the
|
|
<classname>ItemReader</classname> to store its state in case a fatal error
|
|
occurs during the run, or even if the power goes out. All that is needed
|
|
is to put the current number of lines read into the context, and the
|
|
framework will do the rest:</para>
|
|
|
|
<programlisting language="java">executionContext.putLong(getKey(LINES_READ_COUNT), reader.getPosition());</programlisting>
|
|
|
|
<para>Using the EndOfDay example from the Job Stereotypes section as an
|
|
example, assume there's one step: 'loadData', that loads a file into the
|
|
database. After the first failed run, the meta data tables would look like
|
|
the following:</para>
|
|
|
|
<para><table>
|
|
<title>BATCH_JOB_INSTANCE</title>
|
|
|
|
<tgroup cols="2">
|
|
<tbody>
|
|
<row>
|
|
<entry>JOB_INST_ID</entry>
|
|
|
|
<entry>JOB_NAME</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>1</entry>
|
|
|
|
<entry>EndOfDayJob</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table><table>
|
|
<title>BATCH_JOB_PARAMS</title>
|
|
|
|
<tgroup cols="4">
|
|
<tbody>
|
|
<row>
|
|
<entry>JOB_INST_ID</entry>
|
|
|
|
<entry>TYPE_CD</entry>
|
|
|
|
<entry>KEY_NAME</entry>
|
|
|
|
<entry>DATE_VAL</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>1</entry>
|
|
|
|
<entry>DATE</entry>
|
|
|
|
<entry>schedule.Date</entry>
|
|
|
|
<entry>2008-01-01</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table><table>
|
|
<title>BATCH_JOB_EXECUTION</title>
|
|
|
|
<tgroup cols="5">
|
|
<tbody>
|
|
<row>
|
|
<entry>JOB_EXEC_ID</entry>
|
|
|
|
<entry>JOB_INST_ID</entry>
|
|
|
|
<entry>START_TIME</entry>
|
|
|
|
<entry>END_TIME</entry>
|
|
|
|
<entry>STATUS</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>1</entry>
|
|
|
|
<entry>1</entry>
|
|
|
|
<entry>2008-01-01 21:00</entry>
|
|
|
|
<entry>2008-01-01 21:30</entry>
|
|
|
|
<entry>FAILED</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table><table>
|
|
<title>BATCH_STEP_EXECUTION</title>
|
|
|
|
<tgroup cols="6">
|
|
<tbody>
|
|
<row>
|
|
<entry>STEP_EXEC_ID</entry>
|
|
|
|
<entry>JOB_EXEC_ID</entry>
|
|
|
|
<entry>STEP_NAME</entry>
|
|
|
|
<entry>START_TIME</entry>
|
|
|
|
<entry>END_TIME</entry>
|
|
|
|
<entry>STATUS</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>1</entry>
|
|
|
|
<entry>1</entry>
|
|
|
|
<entry>loadData</entry>
|
|
|
|
<entry>2008-01-01 21:00</entry>
|
|
|
|
<entry>2008-01-01 21:30</entry>
|
|
|
|
<entry>FAILED</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table><table>
|
|
<title>BATCH_STEP_EXECUTION_CONTEXT</title>
|
|
|
|
<tgroup cols="2">
|
|
<tbody>
|
|
<row>
|
|
<entry>STEP_EXEC_ID</entry>
|
|
|
|
<entry>SHORT_CONTEXT</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>1</entry>
|
|
|
|
<entry>{piece.count=40321}</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>In this case, the <classname>Step</classname> ran for 30 minutes
|
|
and processed 40,321 'pieces', which would represent lines in a file in
|
|
this scenario. This value will be updated just before each commit by the
|
|
framework, and can contain multiple rows corresponding to entries within
|
|
the <classname>ExecutionContext</classname>. Being notified before a
|
|
commit requires one of the various <classname>StepListener</classname>s,
|
|
or an <classname>ItemStream</classname>, which are discussed in more
|
|
detail later in this guide. As with the previous example, it is assumed
|
|
that the <classname>Job</classname> is restarted the next day. When it is
|
|
restarted, the values from the <classname>ExecutionContext</classname> of
|
|
the last run are reconstituted from the database, and when the
|
|
<classname>ItemReader</classname> is opened, it can check to see if it has
|
|
any stored state in the context, and initialize itself from there:</para>
|
|
|
|
<programlisting language="java">if (executionContext.containsKey(getKey(LINES_READ_COUNT))) {
|
|
log.debug("Initializing for restart. Restart data is: " + executionContext);
|
|
|
|
long lineCount = executionContext.getLong(getKey(LINES_READ_COUNT));
|
|
|
|
LineReader reader = getReader();
|
|
|
|
Object record = "";
|
|
while (reader.getPosition() < lineCount && record != null) {
|
|
record = readLine();
|
|
}
|
|
}</programlisting>
|
|
|
|
<para>In this case, after the above code is executed, the current line
|
|
will be 40,322, allowing the <classname>Step</classname> to start again
|
|
from where it left off. The <classname>ExecutionContext</classname> can
|
|
also be used for statistics that need to be persisted about the run
|
|
itself. For example, if a flat file contains orders for processing that
|
|
exist across multiple lines, it may be necessary to store how many orders
|
|
have been processed (which is much different from than the number of lines
|
|
read) so that an email can be sent at the end of the
|
|
<classname>Step</classname> with the total orders processed in the body.
|
|
The framework handles storing this for the developer, in order to
|
|
correctly scope it with an individual <classname>JobInstance</classname>.
|
|
It can be very difficult to know whether an existing
|
|
<classname>ExecutionContext</classname> should be used or not. For
|
|
example, using the 'EndOfDay' example from above, when the 01-01 run
|
|
starts again for the second time, the framework recognizes that it is the
|
|
same <classname>JobInstance</classname> and on an individual
|
|
<classname>Step</classname> basis, pulls the
|
|
<classname>ExecutionContext</classname> out of the database and hands it
|
|
as part of the <classname>StepExecution</classname> to the
|
|
<classname>Step</classname> itself. Conversely, for the 01-02 run the
|
|
framework recognizes that it is a different instance, so an empty context
|
|
must be handed to the <classname>Step</classname>. There are many of these
|
|
types of determinations that the framework makes for the developer to
|
|
ensure the state is given to them at the correct time. It is also
|
|
important to note that exactly one <classname>ExecutionContext</classname>
|
|
exists per <classname>StepExecution</classname> at any given time. Clients
|
|
of the <classname>ExecutionContext</classname> should be careful because
|
|
this creates a shared keyspace, so care should be taken when putting
|
|
values in to ensure no data is overwritten. However, the
|
|
<classname>Step</classname> stores absolutely no data in the context, so
|
|
there is no way to adversely affect the framework.</para>
|
|
|
|
<para>It is also important to note that there is at least one
|
|
<classname>ExecutionContext</classname> per
|
|
<classname>JobExecution</classname>, and one for every
|
|
<classname>StepExecution</classname>. For example, consider the following
|
|
code snippet:</para>
|
|
|
|
<programlisting language="java">ExecutionContext ecStep = stepExecution.getExecutionContext();
|
|
ExecutionContext ecJob = jobExecution.getExecutionContext();
|
|
//ecStep does not equal ecJob</programlisting>
|
|
|
|
<para>As noted in the comment, ecStep will not equal ecJob; they are two
|
|
different <classname>ExecutionContext</classname>s. The one scoped to the
|
|
<classname>Step</classname> will be saved at every commit point in the
|
|
<classname>Step</classname>, whereas the one scoped to the
|
|
<classname>Job</classname> will be saved in between every
|
|
<classname>Step</classname> execution.</para>
|
|
</section>
|
|
|
|
<section id="domainJobRepository">
|
|
<title>JobRepository</title>
|
|
|
|
<para><classname>JobRepository</classname> is the persistence mechanism
|
|
for all of the Stereotypes mentioned above. It provides CRUD operations
|
|
for <classname>JobLauncher</classname>, <classname>Job</classname>, and
|
|
<classname>Step</classname> implementations. When a
|
|
<classname>Job</classname> is first launched, a
|
|
<classname>JobExecution</classname> is obtained from the repository, and
|
|
during the course of execution <classname>StepExecution</classname> and
|
|
<classname>JobExecution</classname> implementations are persisted by
|
|
passing them to the repository:</para>
|
|
|
|
<programlisting language="xml"><job-repository id="jobRepository"/></programlisting>
|
|
</section>
|
|
|
|
<section id="domainJobLauncher">
|
|
<title>JobLauncher</title>
|
|
|
|
<para><classname>JobLauncher </classname>represents a simple interface for
|
|
launching a <classname>Job</classname> with a given set of
|
|
<classname>JobParameters</classname>:</para>
|
|
|
|
<programlisting language="java">public interface JobLauncher {
|
|
|
|
public JobExecution run(Job job, JobParameters jobParameters)
|
|
throws JobExecutionAlreadyRunningException, JobRestartException;
|
|
}</programlisting>
|
|
|
|
<para>It is expected that implementations will obtain a valid
|
|
<classname>JobExecution</classname> from the
|
|
<classname>JobRepository</classname> and execute the
|
|
<classname>Job</classname>.</para>
|
|
</section>
|
|
|
|
<section id="domainItemReader">
|
|
<title id="s.5.1.1">Item Reader</title>
|
|
|
|
<para><classname>ItemReader</classname> is an abstraction that represents
|
|
the retrieval of input for a <classname>Step</classname>, one item at a
|
|
time. When the <classname>ItemReader</classname> has exhausted the items
|
|
it can provide, it will indicate this by returning null. More details
|
|
about the <classname>ItemReader</classname> interface and its various
|
|
implementations can be found in <xref
|
|
linkend="readersAndWriters" />.</para>
|
|
</section>
|
|
|
|
<section id="domainItemWriter">
|
|
<title id="s.5.1.2">Item Writer</title>
|
|
|
|
<para><classname>ItemWriter</classname> is an abstraction that
|
|
represents the output of a <classname>Step</classname>, one batch
|
|
or chunk of items at a time. Generally, an item writer has no
|
|
knowledge of the input it will receive next, only the item that
|
|
was passed in its current invocation. More details about the
|
|
<classname>ItemWriter</classname> interface and its various
|
|
implementations can be found in <xref linkend="readersAndWriters"
|
|
/>.</para>
|
|
</section>
|
|
|
|
<section id="domainItemProcessor">
|
|
<title>Item Processor</title>
|
|
|
|
<para><classname>ItemProcessor</classname> is an abstraction that
|
|
represents the business processing of an item. While the
|
|
<classname>ItemReader</classname> reads one item, and the
|
|
<classname>ItemWriter</classname> writes them, the
|
|
<classname>ItemProcessor</classname> provides access to transform or apply
|
|
other business processing. If, while processing the item, it is determined
|
|
that the item is not valid, returning null indicates that the item should
|
|
not be written out. More details about the ItemProcessor interface can be
|
|
found in <xref linkend="readersAndWriters" />.</para>
|
|
</section>
|
|
|
|
<section id="domainBatchNamespace">
|
|
<title>Batch Namespace</title>
|
|
|
|
<para>Many of the domain concepts listed above need to be configured in a
|
|
Spring <classname>ApplicationContext</classname>. While there are
|
|
implementations of the interfaces above that can be used in a standard
|
|
bean definition, a namespace has been provided for ease of
|
|
configuration:</para>
|
|
|
|
<programlisting language="xml"><beans:beans xmlns="<emphasis role="bold">http://www.springframework.org/schema/batch</emphasis>"
|
|
xmlns:beans="http://www.springframework.org/schema/beans"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="
|
|
http://www.springframework.org/schema/beans
|
|
http://www.springframework.org/schema/beans/spring-beans.xsd
|
|
<emphasis role="bold">http://www.springframework.org/schema/batch
|
|
http://www.springframework.org/schema/batch/spring-batch-2.2.xsd</emphasis>">
|
|
|
|
<job id="ioSampleJob">
|
|
<step id="step1">
|
|
<tasklet>
|
|
<chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
|
|
</tasklet>
|
|
</step>
|
|
</job>
|
|
|
|
</beans:beans></programlisting>
|
|
|
|
<para>As long as the batch namespace has been declared, any of its
|
|
elements can be used. More information on configuring a
|
|
<classname>Job</classname> can be found in <xref
|
|
linkend="configureJob" />. More information on configuring a Step can be
|
|
found in <xref linkend="configureStep" />.</para>
|
|
</section>
|
|
</chapter>
|