BATCH-598:Tidied up the reference docs.

This commit is contained in:
lucasward
2008-04-22 21:55:22 +00:00
parent 87e004bd30
commit c973e8215a
4 changed files with 450 additions and 451 deletions

View File

@@ -12,24 +12,24 @@
are “Jobs” and “Steps” and developer supplied processing units called
ItemReaders and ItemWriters. However, because of the Spring patterns,
operations, templates, callbacks, and idioms, there are opportunities for
<itemizedlist>
the following:<itemizedlist>
<listitem>
<para>significant improvement in adherence to a clear separation of
concerns,</para>
concerns</para>
</listitem>
<listitem>
<para>clearly delineated architectural layers and services provided
as interfaces,</para>
as interfaces</para>
</listitem>
<listitem>
<para>simple and default implementations that allowed for quick
adoption and ease of use out-of-the-box, and</para>
adoption and ease of use out-of-the-box</para>
</listitem>
<listitem>
<para>significantly enhanced extensibility.</para>
<para>significantly enhanced extensibility</para>
</listitem>
</itemizedlist></para>
@@ -44,8 +44,7 @@
physical implementation of the layers, components and technical services
commonly found in robust, maintainable systems used to address the
creation of simple to complex batch applications, with the infrastructure
and extensions to address very complex processing needs. The materials
below will walk through the details of the diagram.</para>
and extensions to address very complex processing needs.</para>
</section>
<section>
@@ -67,12 +66,14 @@
<caption><para>Figure 2.1: Batch Stereotypes</para></caption>
</mediaobject>
<para>The colors used on the above diagram are extremely important. Grey
<para>The above diagram highlights the interactions and key services
provided by the Spring Batch framework. The colors used are important to
understanding the responsibilities of a developer in Spring Batch. Grey
represents an external application such as an enterprise scheduler or a
database. It's important to note that scheduling is grey, and should thus
be considered separate from Spring Batch. Blue represents application
architecture services. In most cases these are provided by Spring Batch
with out of the box implementations, but an architecture time may make
with out of the box implementations, but an architecture team may make
specific implementations that better address their specific needs. Yellow
represents the pieces that must be configured by a developer. For example,
they need to configure their job schedule so that the job is kicked off at
@@ -87,7 +88,7 @@
which include Run, Job, Application, and Data. The primary goal for
organizing an application according to the tiers is to embed what is known
as "separation of concerns" within the system. These tiers can be
conceptual but may they prove effective in mapping the deployment of the
conceptual but may prove effective in mapping the deployment of the
artifacts onto physical components like Java runtimes and integration with
data sources and targets. Effective separation of concerns results in
reducing the impact of change to the system. The four conceptual tiers
@@ -130,9 +131,10 @@
<para>This section describes stereotypes relating to the concept of a
batch job. A job is an entity that encapsulates an entire batch process.
The file containing the job may sometimes be referred to as the "job
configuration". However, <classname>Job</classname> is just the top of an
overall hierarchy:</para>
As is common with other Spring projects, a <classname>Job</classname> will
be wired together via an XML configuration file. This file may be referred
to as the "job configuration". However, <classname>Job</classname> is just
the top of an overall hierarchy:</para>
<mediaobject>
<imageobject role="html">
@@ -148,8 +150,7 @@
<section>
<title id="s.2.1.1">Job</title>
<para>The job could be described as the heart of the Spring Batch
framework. It is represented by a Spring bean that implements the
<para>A job is represented by a Spring bean that implements the
<classname>Job</classname> interface and contains all of the information
necessary to define the operations performed by a job. A job
configuration is typically contained within a Spring XML configuration
@@ -198,9 +199,9 @@
<para>A <classname>JobInstance</classname> refers to the concept of a
logical job run. Let's consider a batch job that should be run once at
the end of the day, such as the 'EndOfDay' job from the diagram above.
There is a one 'EndOfDay' <classname>Job</classname>, but each
individual run of the <classname>Job</classname> must be tracked
separately. In the case of this job, there will be one logical
There is one 'EndOfDay' <classname>Job</classname>, but each individual
run of the <classname>Job</classname> must be tracked separately. In the
case of this job, there will be one logical
<classname>JobInstance</classname> per day. For example, there will be a
January 1st run, and a January 2nd run. If the January 1st run fails the
first time and is run again the next day, it's still the January 1st
@@ -208,30 +209,33 @@
meaning the January 1st run processes data for January 1st, etc) That is
to say, each <classname>JobInstance</classname> can have multiple
executions. (<classname>JobExecution</classname> is discussed in more
detail below) and only one instance can be running at a given time. The
definition of a <classname>JobInstance</classname> has absolutely no
bearing on the data the will be loaded. It is entirely up to the
<classname>ItemReader</classname> implementation used to determine how
data will be loaded. For example, in the EndOfDay scenario, there may be
a column on the data that indicates the 'effective date' or 'schedule
date' to which the data belongs. So, the January 1st run would only load
data from the 1st, and the January 2nd run would only use data from the
2nd. Because this determination will likely be a business decision, it
is left up to the <classname>ItemReader</classname> to decide. What
using the same JobInstance will decide, however, is whether or not the
'state' (i.e. the ExecutionContext, which is discussed below) will be
used. Using a new instace will mean 'start from the beginning' and using
an existing instance will generally mean 'start from where you left
off'.</para>
detail below) and only one <classname>JobInstance</classname>
corresponding to a particular <classname>Job</classname> can be running
at a given time. The definition of a <classname>JobInstance</classname>
has absolutely no bearing on the data the will be loaded. It is entirely
up to the <classname>ItemReader</classname> implementation used to
determine how data will be loaded. For example, in the EndOfDay
scenario, there may be a column on the data that indicates the
'effective date' or 'schedule date' to which the data belongs. So, the
January 1st run would only load data from the 1st, and the January 2nd
run would only use data from the 2nd. Because this determination will
likely be a business decision, it is left up to the
<classname>ItemReader</classname> to decide. What using the same
<classname>JobInstance</classname> will determine, however, is whether
or not the 'state' (i.e. the ExecutionContext, which is discussed below)
from previous executions will be used. Using a new
<classname>JobInstance</classname> will mean 'start from the beginning'
and using an existing instance will generally mean 'start from where you
left off'.</para>
</section>
<section>
<title id="s.2.1.3">JobParameters</title>
<para>Having discussed <classname>JobInstance</classname> and how it
differs from <classname>Job</classname>, the natural question to ask is,
"how is one JobInstance distinguished from another?" The answer is:
<classname>JobParameters</classname>.
differs from <classname>Job</classname>, the natural question to ask is:
"how is one <classname>JobInstance</classname> distinguished from
another?" The answer is: <classname>JobParameters</classname>.
<classname>JobParameters</classname> are any set of parameters used to
start a batch job, which can be used for identification or even as
reference data during the run. In the example above, where there are two
@@ -310,7 +314,7 @@
<para>These properties are important because they will be persisted and
can be used to completely determine the status of an execution. For
example, if the EndOfDay job for 01-01 is executed at 9:00 PM, and fails
at 9:30, the following entries will be in the batch meta data
at 9:30, the following entries will be made in the batch meta data
tables:</para>
<table>
@@ -405,14 +409,18 @@
completing successfully at 9:30. Because it's now the next day, the
01-02 job must be run as well, which is kicked off just afterwards at
9:31, and completes in it's normal one hour time at 10:30. There is no
requirement that one be kicked off after another, unless there is
potential for the two jobs to attempt to access the same data, causing
issues with locking at the database level. It is entirely up to the
scheduler to determine when to run. Since they're separate JobInstances,
Spring Batch will make no attempt to stop them from being run
concurrently. There should now be an extra entry in both the job
instance and job parameters table, and two extra entries in the job
execution table:</para>
requirement that one <classname>JobInstance</classname> be kicked off
after another, unless there is potential for the two jobs to attempt to
access the same data, causing issues with locking at the database level.
It is entirely up to the scheduler to determine when to run. Since
they're separate JobInstances, Spring Batch will make no attempt to stop
them from being run concurrently. (Attempting to run the same
<classname>JobInstance</classname> while another is already running will
result in a <classname>JobExecutionAlreadyRunningException</classname>
being thrown) There should now be an extra entry in both the
<classname>JobInstance</classname> and
<classname>JobParameters</classname> tables, and two extra entries in
the <classname>JobExecution</classname> table:</para>
<table>
<title>BATCH_JOB_INSTANCE</title>
@@ -539,17 +547,19 @@
<section>
<title id="s.2.1">Step Stereotypes</title>
<para>A <classname>Step</classname> is an entity that encapsulates a
single, independent phase of a batch job. Therefore, every batch job is
composed entirely of one or more batch steps. Steps should be thought of
as unique processing streams that will be executed in sequence. For
example, if you have one step that loads a file into a database, another
that reads from the database, validates the data, preforms processing, and
then writes to another table, and another that reads from that table and
writes out to a file. Each of these steps will be performed completely
before moving on to the next step. The file will be completely read into
the database before step 2 can begin. As with Job, a Step has individual
executions, that correspond with unique JobExecutions:</para>
<para>A <classname>Step</classname> is a domain object that encapsulates
an independent, sequential phase of a batch job. Therefore, every Job is
composed entirely of one or more steps. A <classname>Step</classname>
should be thought of as a unique processing stream that will be executed
in sequence. For example, if you have one step that loads a file into a
database, another that reads from the database, validates the data,
preforms processing, and then writes to another table, and another that
reads from that table and writes out to a file. Each of these steps will
be performed completely before moving on to the next step. The file will
be completely read into the database before step 2 can begin. As with
<classname>Job</classname>, a <classname>Step</classname> has an
individual <classname>StepExecution</classname> that corresponds with a
unique <classname>JobExecution</classname>:</para>
<mediaobject>
<imageobject role="html">
@@ -566,65 +576,52 @@
<title id="step">Step</title>
<para>A <classname>Step</classname> contains all of the information
necessary to define a discrete set of business logic within a job. This
is a necessarily vague description because the contents of any given
step are at the discretion of the developer writing a job. A step can be
as narrowly defined as a single line of code or as broadly defined as
necessary to complete the entire work of a job. There are several
factors that will affect the breadth of step configurations.</para>
<itemizedlist>
<listitem>
<para>Re-usability - step definitions can be shared between
jobs</para>
</listitem>
<listitem>
<para>Transaction Management - depending on your desired transaction
strategy, you may divide the work of your job differently between
steps</para>
</listitem>
<listitem>
<para>Extensibility - adequately granular definition of steps allows
the addition or subtraction of steps at a later time in the
appropriate position within your job configuration</para>
</listitem>
</itemizedlist>
necessary to define and control the actual batch processing. This is a
necessarily vague description because the contents of any given
<classname>Step</classname> are at the discretion of the developer
writing a <classname>Job</classname>. A Step can be as simple or complex
as the developer desires. A simple <classname>Step</classname> might
load data from a file into the database, requiring little or no code.
(depending upon the implementations used) A more complex
<classname>Step</classname> may have complicated business rules that are
applied as part of the processing.</para>
<para>Steps are defined by instantiating implementations of the
<classname>Step</classname> interface. Two step implementation classes
are available in the Spring Batch framework, and they are each discussed
in detail in other sections of this guide. For most situations, the
in detail in Chatper 4 of this guide. For most situations, the
<classname>ItemOrientedStep</classname> implementation is sufficient,
but for situations where only one call is needed, such as a stored
procedure call or a wrapper around existing script, a
<classname>TaskletStep</classname> may be the better option.</para>
<classname>TaskletStep</classname> may be a better option.</para>
</section>
<section>
<title id="stepExecution">StepExecution</title>
<para>A <classname>StepExecution</classname> represents the technical
concept of a single attempt to execute a <classname>Step</classname>.
For instance, using the example from
<classname>JobExecution</classname>, if we have a job instance
"EndOfJob-01-01-2008" that fails to successfully complete its work the
first time it is run, when we attempt to run it again, a new
<classname>StepExecution</classname> will be created. Each of these step
executions may represent a different invocation of the batch framework,
but they will all correspond to the same
<para>A <classname>StepExecution</classname> represents a single attempt
to execute a <classname>Step</classname>. Using the example from
<classname>JobExecution</classname>, if there is a
<classname>JobInstance</classname> for the "EndOfDayJob", with
<classname>JobParameters</classname> of "01-01-2008" that fails to
successfully complete its work the first time it is run, when it is
executed again, a new <classname>StepExecution</classname> will be
created. Each of these step executions may represent a different
invocation of the batch framework, but they will all correspond to the
same <classname>JobInstance</classname>, just as multiple
<classname>JobExecutions</classname> belong to the same
<classname>JobInstance</classname>.</para>
<para>Step executions are represented by objects of the
<classname>StepExecution</classname> class. Each execution contains a
reference to its corresponding step and job execution, and transaction
related data such as commit and rollback count and start and end times.
Additionally, each step execution will contain an
<classname>ExecutionContext</classname>, which contains any data a
developer needs persisted across batch runs, such as statistics or state
information needed to restart. The following is a listing of the
properties for <classname>StepExecution</classname>:</para>
reference to its corresponding step and
<classname>JobExecution</classname>, and transaction related data such
as commit and rollback count and start and end times. Additionally, each
step execution will contain an <classname>ExecutionContext</classname>,
which contains any data a developer needs persisted across batch runs,
such as statistics or state information needed to restart. The following
is a listing of the properties for
<classname>StepExecution</classname>:</para>
<table>
<title>StepExecution properties</title>
@@ -635,9 +632,10 @@
<entry>status</entry>
<entry>A <classname>BatchStatus</classname> object that
indicates the status of the execution. While it's running, it's
BatchStatus.STARTED, if it fails it's BatchStatus.FAILED, and if
it finishes successfully it's BatchStatus.COMPLETED</entry>
indicates the status of the execution. While it's running, the
status is BatchStatus.STARTED, if it fails the status is
BatchStatus.FAILED, and if it finishes successfully the status
is BatchStatus.COMPLETED</entry>
</row>
<row>
@@ -659,29 +657,29 @@
<entry>exitStatus</entry>
<entry>The <classname>ExitStatus</classname> indicating the
result of the run. It is most important because it contains an
exit code that will be returned to the caller. See chapter 5 for
more details.</entry>
result of the execution. It is most important because it
contains an exit code that will be returned to the caller. See
chapter 5 for more details.</entry>
</row>
<row>
<entry>executionContext</entry>
<entry>The 'property bag' containing any user data that needs to
be persisted between batch runs.</entry>
be persisted between executions.</entry>
</row>
<row>
<entry>commitCount</entry>
<entry>The number of times the transaction has been committed
for this execution</entry>
<entry>The number transactions that have been committed for this
execution</entry>
</row>
<row>
<entry>itemCount</entry>
<entry>The number of items that have been process for this
<entry>The number of items that have been processed for this
execution.</entry>
</row>
</tbody>
@@ -707,9 +705,13 @@
<programlisting>executionContext.putLong(getKey(LINES_READ_COUNT), reader.getPosition());</programlisting>
<para>When the <classname>ItemReader</classname> is opened, it can check
to see if it has any stored state in the context, and initialize itself
from there:</para>
<para>The call above will store the current number of lines read into
the ExecutionContext. It should be made just before the framework
commits. Being notified before a commit requires one of the various
StepListeners, or an ItemStream, which are discussed in more detail
later in this guide. When the <classname>ItemReader</classname> is
opened, it can check to see if it has any stored state in the context,
and initialize itself from there:</para>
<programlisting> if (executionContext.containsKey(getKey(LINES_READ_COUNT))) {
log.debug("Initializing for restart. Restart data is: " + executionContext);
@@ -759,9 +761,9 @@
<section>
<title>JobRepository</title>
<para>The <classname>JobRepository</classname> is the persistence
mechanism for all of the Stereotypes mentioned above. When a job is first
launched, a <classname>JobExecution</classname> is obtained by calling the
<para><classname>JobRepository</classname> is the persistence mechanism
for all of the Stereotypes mentioned above. When a job is first launched,
a <classname>JobExecution</classname> is obtained by calling the
repository's <methodname>createJobExecution</methodname> method, and
during the course of execution, <classname>StepExecution</classname> and
<classname>JobExecution</classname> are persisted by passing them to the

View File

@@ -7,8 +7,8 @@
<section>
<title>Introduction</title>
<para>In Chapter 2, the overall description of the architecture was
discussed, using the following diagram as a guide:</para>
<para>In Chapter 2, the overall architecture design was discussed, using
the following diagram as a guide:</para>
<mediaobject>
<imageobject role="html">
@@ -80,7 +80,7 @@
<section>
<title>Run Tier</title>
<para>As it's name suggests, this tier is entirely concerned with actually
<para>As its name suggests, this tier is entirely concerned with actually
running the job. Regardless of whether the originator is a Scheduler or an
HTTP request, a Job must be obtained, parameters must be parsed, and
eventually a <classname>JobLauncher</classname> called:</para>
@@ -102,11 +102,11 @@
<para>For users that want to run their jobs from an enterprise
scheduler, the command line is the primary interface. This is because
most schedulers (with the exception of Quartz unless using the
NativeJob) work directly with operating system processes, primarily
kicked off with shell scripts. There are many ways to launch a Java
process besides a shell script, such as Perl, Ruby, or even 'build
tools' such as ant or maven. However, because most people are familiar
with shell scripts, this example will focus on them.</para>
<classname>NativeJob</classname>) work directly with operating system
processes, primarily kicked off with shell scripts. There are many ways
to launch a Java process besides a shell script, such as Perl, Ruby, or
even 'build tools' such as ant or maven. However, because most people
are familiar with shell scripts, this example will focus on them.</para>
<section>
<title>The CommandLineJobRunner</title>
@@ -295,7 +295,7 @@
in order to obtain an execution:</para>
<programlisting> &lt;bean id="jobLauncher"
class="org.springframework.batch.core.launch.support.SimpleJobLauncher"&gt;
class="org.springframework.batch.execution.launch.SimpleJobLauncher"&gt;
&lt;property name="jobRepository" ref="jobRepository" /&gt;
&lt;/bean&gt;</programlisting>
@@ -341,7 +341,7 @@
<classname>TaskExecutor</classname>:</para>
<programlisting> &lt;bean id="jobLauncher"
class="org.springframework.batch.core.launch.support.SimpleJobLauncher"&gt;
class="org.springframework.batch.execution.launch.SimpleJobLauncher"&gt;
&lt;property name="jobRepository" ref="jobRepository" /&gt;
&lt;property name="taskExecutor"&gt;
&lt;bean class="org.springframework.core.task.SimpleAsyncTaskExecutor" /&gt;
@@ -444,7 +444,7 @@
convenience: <classname>JobRepositoryFactoryBean</classname>.</para>
<programlisting> &lt;bean id="jobRepository"
class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean"
class="org.springframework.batch.execution.repository.JobRepositoryFactoryBean"
&lt;property name="databaseType" value="hsql" /&gt;
&lt;property name="dataSource" value="dataSource" /&gt;
&lt;/bean&gt;</programlisting>
@@ -470,13 +470,13 @@
&lt;/bean&gt;
&lt;bean id="mapJobInstanceDao"
class="org.springframework.batch.core.repository.dao.MapJobInstanceDao" /&gt;
class="org.springframework.batch.execution.repository.dao.MapJobInstanceDao" /&gt;
&lt;bean id="mapJobExecutionDao"
class="org.springframework.batch.core.repository.dao.MapJobExecutionDao" /&gt;
class="org.springframework.batch.execution.repository.dao.MapJobExecutionDao" /&gt;
&lt;bean id="mapStepExecutionDao"
class="org.springframework.batch.core.repository.dao.MapStepExecutionDao" /&gt;</programlisting>
class="org.springframework.batch.execution.repository.dao.MapStepExecutionDao" /&gt;</programlisting>
<para>The Map* DAO implementations store the batch artifacts in a
transactional map. So, the repository and DAOs may still be used
@@ -510,9 +510,9 @@
&lt;/tx:attributes&gt;
&lt;/tx:advice&gt;</programlisting></para>
<para>This fragment can be used as is, or with almost no changes.
The isolation level in the <code>create*</code> method attiributes
is specified to ensure that when jobs are launched there if two
<para>This fragment can be used as is, with almost no changes. The
isolation level in the <code>create*</code> method attiributes is
specified to ensure that when jobs are launched there if two
processes are trying to launch the same job at the same time, only
one will succeed. This is quite aggressive, and READ_COMMITTED would
work just as well; READ_UNCOMMITTED would be fine if two processes
@@ -530,15 +530,15 @@
<title>Recommendations for Indexing Meta Data Tables</title>
<para>Spring Batch provides DDL samples for the meta-data tables in
the Core jar file for several common database platforms. We do not
include index declarations inthat DDL because there are too many
variations in how people want to do that dependeing on their precise
platform, local conventions and also the business requirements of
how the jobs will be operated. So here we give some indication as to
which columns are going to be used in a WHERE clause by the Dao
ipmlementations that we provide, and how frequently they might be
used, so that individual projects can make up their own minds about
indexing.</para>
the Core jar file for several common database platforms. Index
declarations are not included in that DDL because there are too many
variations in how users may want to index dependeing on their
precise platform, local conventions and also the business
requirements of how the jobs will be operated. The table below
provides some indication as to which columns are going to be used in
a WHERE clause by the Dao ipmlementations provided by Spring Batch,
and how frequently they might be used, so that individual projects
can make up their own minds about indexing.</para>
<table>
<title>Where clauses in SQL statements (exluding primary keys) and
@@ -722,8 +722,6 @@
&lt;bean class="org.springframework.batch.core.listener.JobListenerSupport" /&gt;
&lt;/property&gt;
&lt;/bean&gt;</programlisting>
<para></para>
</section>
</section>
@@ -731,26 +729,27 @@
<title>JobFactory and Stateful Components in Steps</title>
<para>Unlike many traditional Spring applications, many of the
components of a batch application are stateful - the file readers and
writers are the obvious examples. The recommended way to deal with this
is to create a fresh <classname>ApplicationContext</classname> for each
job execution. If the job is launched from the command line with
<classname>CommandLineJobRunner</classname> this is trivial. For more
complex launching scenarios, where jobs are executed in parallel or
serially from the same process, some extra steps have to be taken to
ensure that the <classname>ApplicationContext</classname> is refreshed.
This is preferable to using prototype scope for the stateful beans
because then they would not receive lifecycle callbacks from the
container at the end of use (e.g. through destroy-method in XML).</para>
components of a batch application are stateful, the file readers and
writers are obvious examples. The recommended way to deal with this is
to create a fresh <classname>ApplicationContext</classname> for each job
execution. If the <classname>Job</classname> is launched from the
command line with <classname>CommandLineJobRunner</classname> this is
trivial. For more complex launching scenarios, where jobs are executed
in parallel or serially from the same process, some extra steps have to
be taken to ensure that the <classname>ApplicationContext</classname> is
refreshed. This is preferable to using prototype scope for the stateful
beans because then they would not receive lifecycle callbacks from the
container at the end of use. (e.g. through destroy-method in XML)</para>
<para>The strategy provided by Spring Batch to deal with this scenario
is the <classname>JobFactory</classname>, and the samples provide an
example of a specialised implementation that can load an
example of a specialized implementation that can load an
<classname>ApplicationContext</classname> and close it properly when the
job is finished. Look at the
job is finished. A relevant examples is
<classname>ClassPathXmlApplicationContextJobFactory</classname> and its
use in the <code>adhoc-job-launcher-context.xml</code> and the
<code>quartz-job-launcher-context.xml</code>.</para>
<code>quartz-job-launcher-context.xml</code>, which can be found in the
Samples project.</para>
</section>
</section>
@@ -862,10 +861,7 @@
transaction. At the beginning of processing a transaction is begun,
and each time <markup>read</markup> is called on the
<classname>ItemReader</classname>, a counter is incremented. When it
reaches 10, the transaction will be committed. This also means that if
an item is skipped it will still count as an item against the commit
interval even though it hasn't been written out. (Skipping items will
be covered in more detail later in this chapter)</para>
reaches 10, the transaction will be committed.</para>
</section>
<section>
@@ -885,8 +881,9 @@
manually before it can be run again. This is configurable on the
step level, since different steps have different requirements. One
Step that may only be executed once can exist as part of the same
Job as Step that can be run infinitely. Below is an example start
limit configuration:</para>
<classname>Job</classname> as <classname>Step</classname> that can
be run infinitely. Below is an example start limit
configuration:</para>
<programlisting> &lt;bean id="simpleStep"
class="org.springframework.batch.core.step.item.SimpleStepFactoryBean" &gt;
@@ -953,27 +950,28 @@
information about football games and summarizes them. It contains
three steps: playerLoad, gameLoad, and playerSummarization. The
playerLoad <classname>Step</classname> loads player information from
a flatfile, while the <classname>gameLoad</classname> Step does the
same for games. The final step, playerSummarization, then summarizes
a flat file, while the <classname>gameLoad</classname>
<classname>Step</classname> does the same for games. The final
<classname>Step</classname>, playerSummarization, then summarizes
the statistics for each player based upon the provided games. It is
assumed that the file loaded by 'playerLoad' must be loaded only
once, but that 'gameLoad' will load any games found within a
particular directory, deleting them after they have been
successfully loaded into the database. As a result, the playerLoad
<classname>Step</classname> contains no additionaly configuration.
It can be started almost limitlessly, and if complete will be
skipped. The 'gameLoad' <classname>Step</classname>, however, needs
to be run everytime, in case extra files have been dropped since it
last executed, so it has 'allowStartIfComplete' set to 'true' in
order to always be started. (It is assumed that the database tables
games are loaded into has a process indicator on it, to ensure new
games can be properly found by the summarization step) The
summarization <classname>step</classname>, which is the most
important in the <classname>Job</classname>, is configured to have a
start limit of 3. This is useful in case it continually fails, a new
exit code will be returned to the operators that control job
execution, and it won't be allowed to start again until manual
intervention has taken place.</para>
<classname>Step</classname> contains no additional configuration. It
can be started almost limitlessly, and if complete will be skipped.
The 'gameLoad' <classname>Step</classname>, however, needs to be run
everytime, in case extra files have been dropped since it last
executed, so it has 'allowStartIfComplete' set to 'true' in order to
always be started. (It is assumed that the database tables games are
loaded into has a process indicator on it, to ensure new games can
be properly found by the summarization step) The summarization
<classname>step</classname>, which is the most important in the
<classname>Job</classname>, is configured to have a start limit of
3. This is useful in case it continually fails, a new exit code will
be returned to the operators that control job execution, and it
won't be allowed to start again until manual intervention has taken
place.</para>
<note>
<para>This job is purely for example purposes and is not the same
@@ -1076,7 +1074,10 @@
<para>In this example, a <classname>FlatFileItemReader</classname> is
used, and if at any point a FlatFileParseException is thrown, it will
be skipped and counted against the total skip limit of 10.</para>
be skipped and counted against the total skip limit of 10. It should
be noted that any failures encountered while reading will not count
against the commit interval. In other words, the commit interval is
only incremented on writes (regardless of success or failure).</para>
</section>
<section>
@@ -1111,22 +1112,23 @@
<section>
<title>Registering ItemStreams with the Step</title>
<para>The step has to take care of the
<classname>ItemStream</classname> callbacks at the necessary points in
the flow. This is vital if a step is going to be fail, and might need
to be restarted, because the <classname>ItemStream</classname>
interface is where the step gets the information it needs about
persistent state between executions. The factory beans that Spring
Batch provides for convenient configuration of
<classname>Step</classname> instances have features that allow streams
to be registered with the step when it is configured.</para>
<para>The step has to take care of <classname>ItemStream</classname>
callbacks at the necessary points in its lifecycle. This is vital if a
step fails, and might need to be restarted, because the
<classname>ItemStream</classname> interface is where the step gets the
information it needs about persistent state between executions. The
factory beans that Spring Batch provides for convenient configuration
of <classname>Step</classname> instances have features that allow
streams to be registered with the step when it is configured.</para>
<para>If the ItemReader of ItemWriter themselves implement the
ItemStream interface, then these will be registered automatically. Any
other streams need to be registered separately. This is often the case
where there are indirect dependencies, like delegates being injected
into the reader and writer. To register these they can be injected
into teh factory beans through the streams property, e.g.:</para>
<para>If the <classname>ItemReader</classname> or
<classname>ItemWriter</classname> themselves implement the ItemStream
interface, then these will be registered automatically. Any other
streams need to be registered separately. This is often the case where
there are indirect dependencies, like delegates being injected into
the reader and writer. To register these they can be injected into the
factory beans through the streams property, as illustrated
below:</para>
<programlisting>&lt;bean id="step1" parent="simpleStep"
class="org.springframework.batch.core.step.item.StatefulRetryStepFactoryBean"&gt;
@@ -1160,12 +1162,13 @@
with one of many <classname>Step</classname> scoped listeners.</para>
<section>
<title>StepListener</title>
<title>StepExecutionListener</title>
<para>StepListener represents the most generic listener for
<classname>Step</classname> execution. It allows for notification
before a Step is started, after it has completed, and if any errors
are encountered during processing:</para>
<para><classname>StepExecutionListener</classname> represents the
most generic listener for <classname>Step</classname> execution. It
allows for notification before a <classname>Step</classname> is
started, after it has completed, and if any errors are encountered
during processing:</para>
<programlisting> public interface StepExecutionListener extends StepListener {
@@ -1180,7 +1183,8 @@
<methodname>onErrorInStep</methodname> and
<methodname>afterStep</methodname> in order to allow listeners the
chance to modify the exit code that is returned upon completion of a
<classname>Step</classname>. A StepListener can be applied to any
<classname>Step</classname>. A
<classname>StepExecutionListener</classname> can be applied to any
step factory bean via the listeners property:</para>
<programlisting> &lt;bean id="simpleStep"
@@ -1227,10 +1231,10 @@
<section>
<title>ItemReadListener</title>
<para>When discussing skip logic earlier, it was mentioned that it
may be beneficial to log out skipped records, so that they can be
deal with later. In the case of read errors, this can be done with
an <classname>ItemReaderListener:</classname><programlisting> public interface ItemReadListener extends StepListener {
<para>When discussing skip logic above, it was mentioned that it may
be beneficial to log out skipped records, so that they can be deal
with later. In the case of read errors, this can be done with an
<classname>ItemReaderListener:</classname><programlisting> public interface ItemReadListener extends StepListener {
void beforeRead();
@@ -1413,9 +1417,10 @@
<classname>ItemTransformer</classname>.</para>
<para>Here we provide a few examples of common patterns in custom
business logic, mainly using the listener interfaces - but remember that
a reader or writer can implement the listener interfaces as well if that
is appropriate.</para>
business logic, mainly using the listener interfaces . It should be
noted that an <classname>ItemReader</classname> or
<classname>ItemWriter</classname> can implement the listener interfaces
as well if appropriate.</para>
</section>
<section>
@@ -1424,10 +1429,11 @@
<para>A common use case is the need for special handling of errors in a
step, item by item, perhaps logging to a special channel, or inserting a
record into a database. The ItemOrientedStep (created from the step
factory beans) allows us to implement this use case with a simple
factory beans) allows users to implement this use case with a simple
<classname>ItemReadListener</classname>, for errors on read, and an
<classname>ItemWriteListener</classname>, for errors on write.
E.g.</para>
<classname>ItemWriteListener</classname>, for errors on write. The below
code snippets illustrate a listener that logs both read and write
failures:</para>
<programlisting>public class ItemFailureLoggerListener extends ItemListenerSupport {
@@ -1443,8 +1449,8 @@
}</programlisting>
<para>Having implemented this listener it just needs to be registered
with the step, e.g.</para>
<para>Having implemented this listener it must be registered with the
step:</para>
<programlisting>&lt;bean id="simpleStep"
class="org.springframework.batch.core.step.item.SimpleStepFactoryBean" &gt;
@@ -1456,11 +1462,11 @@
<para>Remember that if your listener does anything in an
<code>onError()</code> method, it will be inside a transaction that is
going to rollback. If you need to use a transactional resource like a
database inside an <code>onError()</code> method, consider adding a
declarative transaction to that method (see Spring Core Reference Guide
for details), and giving its propagation attribute the value
REQUIRES_NEW.</para>
going to be rolled back. If you need to use a transactional resource
such as a database inside an <code>onError()</code> method, consider
adding a declarative transaction to that method (see Spring Core
Reference Guide for details), and giving its propagation attribute the
value REQUIRES_NEW.</para>
</section>
<section>
@@ -1472,8 +1478,8 @@
sense to stop a job execution from within the business logic.</para>
<para>The simplest thing to do is to throw a RuntimeException (one that
isn't retried indefinitely or skipped), e.g. we could use a custom
exception type as in the example below</para>
isn't retried indefinitely or skipped), For example, a custom exception
type could be used, as in the example below:</para>
<programlisting>public class PoisonPillItemWriter extends AbstractItemWriter {
@@ -1488,8 +1494,8 @@
}</programlisting>
<para>Another simple way to stop a step from executing is to simply
return <code>null</code> from the <classname>ItemReader</classname>,
e.g.</para>
return <code>null</code> from the
<classname>ItemReader</classname>:</para>
<programlisting>public class EarlyCompletionItemReader extends AbstractItemReader {
@@ -1516,7 +1522,7 @@
strategy which signals a complete batch when the item to be processed is
null. A more sophisticated completion policy could be implemented and
injected into the <classname>Step</classname> through the
<classname>RepeatOperationsStepFactoryBean</classname>, e.g.</para>
<classname>RepeatOperationsStepFactoryBean</classname>:</para>
<programlisting>&lt;bean id="simpleStep"
class="org.springframework.batch.core.step.item.RepeatOperationsStepFactoryBean" &gt;
@@ -1533,10 +1539,10 @@
<para>An alternative is to set a flag in the
<classname>StepExecution</classname>, which is checked by the
<classname>Step</classname> implementations in the framework in between
item processing. To implement this alternative we need access to the
item processing. To implement this alternative, we need access to the
current StepExecution, and this can be achieved by implementing a
StepListener and registering it with the Step. Here is an example of a
listener that sets the flag</para>
listener that sets the flag:</para>
<programlisting>public class CustomItemWriter extends ItemListenerSupport implements StepListener {
@@ -1567,13 +1573,13 @@
<title>Adding a Footer Record</title>
<para>A very common requirement is to aggregate information during the
output process and to append a record at the end of a file summarising
output process and to append a record at the end of a file summarizing
the data, or providing a checksum. This can also be achieved with a
callbacks in the step, normally as part of a custom
<classname>ItemWriter</classname>. In this case, since we are
accumulating state that we do not want to lose if the job aborts, we
probably need to implement the <classname>ItemStream</classname>
interface.</para>
<classname>ItemWriter</classname>. In this case, since a job is
accumulating state that should not be lost if the job aborts, the
<classname>ItemStream</classname> interface should be
implemented:</para>
<programlisting>public class CustomItemWriter extends AbstractItemWriter implements
ItemStream, StepListener
@@ -1616,11 +1622,12 @@
state is stored through the <classname>ItemStream</classname> interface
in the <classname>ExecutionContext</classname>. In this way we can be
sure that when the <code>open()</code> callback is received on a
restart, we always get the last value that was committed.</para>
<para>N.B. We might not implement <classname>ItemStream</classname> if
the ItemWriter is re-runnable, in the sense that it maintains its own
state in a transactional resource like a database.</para>
restart. The framework garuntees we always get the last value that was
committed. It should be noted that it is not always necessary to
implement ItemStream. For example, if the ItemWriter is re-runnable, in
the sense that it maintains its own state in a transactional resource
like a database, there is no need to maintain state within the writer
itself.</para>
</section>
</section>
</chapter>
</chapter>

View File

@@ -10,17 +10,17 @@
<para>All batch processing can be described in its most simple form as
reading in large amounts of data, performing some type of calculation or
transformation, and writing the result out. Spring Batch provides two key
interfaces to help perform bulk reading and writing: ItemReader and
ItemWriter</para>
interfaces to help perform bulk reading and writing:
<classname>ItemReader</classname> and
<classname>ItemWriter</classname>.</para>
</section>
<section>
<title id="infrastructure.1">ItemReader</title>
<para>Although a simple concept, <emphasis
role="bold">ItemReader</emphasis>s are the means for providing data from
many different types of input. The most general examples include:
<itemizedlist>
<para>Although a simple concept, an <classname>ItemReader</classname> is
the means for providing data from many different types of input. The most
general examples include: <itemizedlist>
<listitem>
<para>Flat File- Flat File Item Readers read lines of data from a
flat file that typically describe records with fields of data
@@ -31,23 +31,24 @@
<listitem>
<para>XML - XML ItemReaders process XML independently of
technologies used for parsing, mapping and validating objects. Input
data allows for the validation of and XML file against and XSD
data allows for the validation of and XML file against an XSD
schema.</para>
</listitem>
<listitem>
<para>Database - A database resource accessed that returns
resultsets that can be mapped to objects for processing. The default
SQL Input Sources invoke a RowMapper to return objects, keep track
of the current row if restart is required, basic statistics, and
some transaction enhancements that will be explained later.</para>
<para>Database - A database resource is accessed that returns
resultsets which can be mapped to objects for processing. The
default SQL Input Sources invoke a <classname>RowMapper</classname>
to return objects, keep track of the current row if restart is
required, basic statistics, and some transaction enhancements that
will be explained later.</para>
</listitem>
</itemizedlist>There are many more possibilities, but we'll focus on the
basic ones for this chapter. A complete list of all available ItemReaders
can be found in Appendix A.</para>
<para>The Item Reader is a basic interface for generic input
operations:</para>
<para><classname>ItemReader</classname> is a basic interface for generic
input operations:</para>
<programlisting>public interface ItemReader {
@@ -64,7 +65,7 @@
Item, returning null if no more items are left. An item might represent a
line in a file, a row in a database, or an element in an XML file. It is
generally expected that these will be mapped to a usable domain object
(i.e. Trade or Foo, etc) but there is no requirement in the contract to do
(i.e. Trade, Foo, etc) but there is no requirement in the contract to do
so.</para>
<para>The <methodname>mark</methodname> and <methodname>reset</methodname>
@@ -81,11 +82,11 @@
<para><classname>ItemWriter</classname> is similar in functionality to an
<classname>ItemReader</classname> with the exception that the operations
are reversed. They still need to be located, opened and closed but they
differ in the case that we write out, rather than reading in. In the case
of databases or queues these may be inserts, updates or sends. The format
of the serialization of the output source is specific for every batch
job.</para>
are reversed. Resources still need to be located, opened and closed but
they differ in the case that an <classname>ItemWriter</classname> writes
out, rather than reading in. In the case of databases or queues these may
be inserts, updates or sends. The format of the serialization of the
output is specific for every batch job.</para>
<para>As with <classname>ItemReader</classname>,
<classname>ItemWriter</classname> is a fairly generic interface:</para>
@@ -101,19 +102,22 @@
</programlisting>
<para>As with <methodname>read</methodname> on
<classname>ItemReader</classname>, write provides the basic contract of
<classname>ItemWriter</classname>, it will attempt to write out the item
passed in as long as it is open. As with <methodname>mark</methodname> and
<methodname>reset</methodname>, <methodname>flush</methodname> and
<methodname>clear</methodname> are necessary due to the nature of batch
processing. Because it is generally expected that items will be 'batched'
together into a chunk, and then output, it is expected that an
<classname>ItemWriter</classname> will perform some type of buffering.
<methodname>flush</methodname> will empty the buffer by actually writing
the items out, whereas <methodname>clear</methodname> will simply throw
the contents of the buffer away. In most cases, a Step implementation will
call <methodname>flush</methodname> before a commit and
<methodname>clear</methodname> in case of rollback.</para>
<classname>ItemReader</classname>, <methodname>write</methodname> provides
the basic contract of <classname>ItemWriter</classname>, it will attempt
to write out the item passed in as long as it is open. As with
<methodname>mark</methodname> and <methodname>reset</methodname>,
<methodname>flush</methodname> and <methodname>clear</methodname> are
necessary due to the transactional nature of batch processing. Because it
is generally expected that items will be 'batched' together into a chunk,
and then output, it is expected that an <classname>ItemWriter</classname>
will perform some type of buffering. <methodname>flush</methodname> will
empty the buffer by actually writing the items out, whereas
<methodname>clear</methodname> will simply throw the contents of the
buffer away. In most cases, a Step implementation will call
<methodname>flush</methodname> before a commit and
<methodname>clear</methodname> in case of rollback. It is expected that
implementations of the <classname>Step</classname> interface will call
these methods.</para>
</section>
<section>
@@ -137,20 +141,20 @@
<para>Before describing each method, it's worth briefly mentioning the
<classname>ExecutionContext</classname>. Clients of an
<classname>ItemReader</classname> that is also an
<classname>ItemReader</classname> that also implements
<classname>ItemStream</classname> should call
<methodname>open</methodname> before any calls to
<methodname>read</methodname>, to open any resources such as files or
obtain connections. A similar restriction applies to an
<classname>ItemWriter</classname> that is also an
<classname>ItemWriter</classname> is also implements
<classname>ItemStream</classname>. As mentioned before, if expected data
is found in the <classname>ExecutionContext</classname>, it may be used to
start the <classname>ItemReader</classname> or
<classname>ItemWriter</classname> at a location other than its initial
state. Conversely, close will be called to ensure any resources allocated
during <methodname>open</methodname> will be released safely.
<methodname>update</methodname> is called primarily to ensure that any
state currently being held is loaded into the provided
state. Conversely, <methodname>close</methodname> will be called to ensure
any resources allocated during <methodname>open</methodname> will be
released safely. <methodname>update</methodname> is called primarily to
ensure that any state currently being held is loaded into the provided
<classname>ExecutionContext</classname>. This method will be called before
committing, to ensure that the current state is persisted in the database
before commit.</para>
@@ -213,38 +217,25 @@
<section>
<title id="infrastructure.1.2.1">FlatFileItemReader</title>
<para>One of the most common tasks performed in batch jobs involve
reading from some type of file. A flat file is basically any type of
file that contains at most two-dimensional (tabular) data. Reading flat
files in the Spring Batch framework is facilitated by the class
<para>A flat file is any type of file that contains at most
two-dimensional (tabular) data. Reading flat files in the Spring Batch
framework is facilitated by the class
<classname>FlatFileItemReader</classname>, which provides basic
functionality for reading and parsing flat files. In addition, there are
default implementations of the <classname>ItemReader</classname> and
<classname>ItemStream</classname> interfaces that solve the majority of
file processing needs.</para>
<para>The <classname>FlatFileItemReader</classname> class has several
properties. The three most important of these properties are
functionality for reading and parsing flat files.
<classname>FlatFileItemReader</classname> class has several properties.
The three most important of these properties are
<classname>Resource</classname>, <classname>FieldSetMapper</classname>
and <classname>LineTokenizer</classname>, which define the resource from
which data will be read and the method by which the read data will be
converted int distinct fields. The <classname>FieldSetMapper</classname>
and <classname>LineTokenizer</classname> interfaces will be explored
more in the next sections. In addition, we'll explore integration with
the file system via the resource property. The resource property
represents a Spring Core <classname>Resource</classname>. Documentation
explaining how to create beans of this type can be found in <ulink
and <classname>LineTokenizer. </classname>The
<classname>FieldSetMapper</classname> and
<classname>LineTokenizer</classname> interfaces will be explored more in
the next sections. The resource property represents a Spring Core
<classname>Resource</classname>. Documentation explaining how to create
beans of this type can be found in <ulink
url="http://static.springframework.org/spring/docs/2.5.x/reference/resources.html"><citetitle>Spring
Framework, Chapter 4.Resources</citetitle></ulink>. Therefore, this
guide will not go into the details of creating
<classname>Resource</classname> objects except to make a couple of
points on the locating files to process within a batch environment.
Tokenizers and field set mappers will be discussed a bit later.</para>
<para>As mentioned, the location of the file is defined by the resource
property. There are only a few methods exposed through a resource
service. A resource is used to help locate, open, and close resources.
It can be as simple as: <programlisting>
<classname>Resource</classname> objects. A resource is used to locate,
open, and close resources. It can be as simple as: <programlisting>
Resource resource = new FileSystemResource("resources/trades.csv");
</programlisting></para>
@@ -259,17 +250,8 @@
process of feeding the data into the pipe from this starting
point.</para>
<para>The flat file reader uses a
<classname>ResourceLineReader</classname> object to read from the file.
Optionally, you can specify a
<classname>RecordSeparatorPolicy</classname> through the
recordSeparatorPolicy property. This can be used to configure more
low-level features, such as what constitutes the end of a line and
whether to continue quoted strings over newlines, among other
things.</para>
<para>The other properties in the flat file readers allow you to further
specify how your data will be interpreted: <table>
<para>The other properties in <classname>FlatFileItemReader</classname>
allow you to further specify how your data will be interpreted: <table>
<title>Flat File Item Reader Properties</title>
<tgroup cols="3">
@@ -324,6 +306,16 @@
AbstractLineTokenizer, field names will be set automatically
from this line</entry>
</row>
<row>
<entry align="left">recordSeparatorPolicy</entry>
<entry align="left">RecordSeparatorPolicy</entry>
<entry align="left">Used to determine where the line endings
are and do things like continue over a line ending if inside a
quoted string.</entry>
</row>
</tbody>
</tgroup>
</table></para>
@@ -331,16 +323,14 @@
<section>
<title>FieldSetMapper</title>
<para>Field set mappers used by the
<classname>FlatFileItemReader</classname> implement the
<classname>FieldSetMapper</classname> interface. This interface
defines a single method, mapLine, which takes a FieldSet object and
maps its contents to some Object. This object may be a custom DTO or
domain object, or it could be as simple as an array, depending on your
needs. The <classname>FieldSetMapper</classname> is used in
conjunction with the <classname>LineTokenizer</classname> to translate
a line of data from a resource into an object of the desired
type:</para>
<para>The <classname>FieldSetMapper</classname> interface defines a
single method, <methodname>mapLine</methodname>, which takes a
<classname>FieldSet</classname> object and maps its contents to an
object. This object may be a custom DTO or domain object, or it could
be as simple as an array, depending on your needs. The
<classname>FieldSetMapper</classname> is used in conjunction with the
<classname>LineTokenizer</classname> to translate a line of data from
a resource into an object of the desired type:</para>
<programlisting> public interface FieldSetMapper {
@@ -348,7 +338,7 @@
}</programlisting>
<para>As you can see, the pattern used is exatly the same as
<para>As you can see, the pattern used is exactly the same as
<classname>RowMapper</classname> used by
<classname>JdbcTemplate</classname>.</para>
</section>
@@ -367,14 +357,13 @@
FieldSet tokenize(String line);
}
</programlisting>
}</programlisting>
<para>The contract of a <classname>LineTokenizer</classname> is such
that, given a line of input (in theory the
<classname>String</classname> could encompass more than one line) a
<classname>FieldSet</classname> representing the line will be
returned. This will then be based to a
returned. This will then be passed to a
<classname>FieldSetMapper</classname>. Spring Batch contains the
following LineTokenizers:</para>
@@ -405,8 +394,8 @@
<para>Now that the basic interfaces for reading in flat files have
been defined, a simple example explaining how they work together is
helpful. In it's most simple form, the flow when reading a line form a
file is this:</para>
helpful. In it's most simple form, the flow when reading a line from a
file is the following:</para>
<orderedlist>
<listitem>
@@ -479,7 +468,7 @@
}
} </programlisting></para>
<para>We can then read in from the filed by correctly constructing our
<para>We can then read in from the file by correctly constructing our
FlatFileItemReader and calling read():</para>
<programlisting> FlatFileItemReader itemReader = new FlatFileItemReader();
@@ -498,11 +487,11 @@
<section>
<title>Mapping fields by name</title>
<para>There is one additional functionality that is similar in
function to a JDBC <classname>ResultSet</classname>. The names of the
fields can be injected into the <classname>LineTokenizer</classname>
to increase the readability of the mapping function. We can expose
this behavior by adding the following. First, we tell the
<para>There is one additional functionality line tokenizers that is
similar in function to a JDBC <classname>ResultSet</classname>. The
names of the fields can be injected into the
<classname>LineTokenizer</classname> to increase the readability of
the mapping function. First, we tell the
<classname>LineTokenizer</classname> what the names of the fields in
the fieldset are:</para>
@@ -562,8 +551,8 @@
is required) in the same way the Spring container will look for
setters matching a property name. Each available field in the
<classname>FieldSet</classname> will be mapped, and the resultant
<classname>Player</classname> object will be returned, only there was
no code required.</para>
<classname>Player</classname> object will be returned, with no code
required.</para>
</section>
<section>
@@ -678,23 +667,22 @@
<para>Writing out to flat files has the same problems and issues that
reading in from a file must overcome. It must be able to write out in
either delimited or fixed length formats in a transactional
manger.</para>
manner.</para>
<section>
<title>LineAggregator</title>
<para>Just like file reading's <classname>LineTokenizer</classname>
interface is necessary to take a string and split it into tokens, file
writing must have a way to aggregate multiple fields into a single
string for writing to a file. In Spring Batch this is the
<para>Just as the <classname>LineTokenizer</classname> interface is
necessary to take a string and split it into tokens, file writing must
have a way to aggregate multiple fields into a single string for
writing to a file. In Spring Batch this is the
<classname>LineAggregator</classname>:</para>
<programlisting> public interface LineAggregator {
public String aggregate(FieldSet fieldSet);
}
</programlisting>
}</programlisting>
<para>The <classname>LineAggregator</classname> is exactly the
opposite of a <classname>LineTokenizer</classname>.
@@ -761,22 +749,22 @@
<classname>FlatFileItemWriter</classname> expresses this in
code:</para>
<programlisting>public void write(Object data) throws Exception {
FieldSet fieldSet = fieldSetCreator.mapItem(data);
getOutputState().write(lineAggregator.aggregate(fieldSet) + LINE_SEPARATOR);
}</programlisting>
<programlisting> public void write(Object data) throws Exception {
FieldSet fieldSet = fieldSetCreator.mapItem(data);
getOutputState().write(lineAggregator.aggregate(fieldSet) + LINE_SEPARATOR);
}</programlisting>
<para>A simple configuration with the smallest ammount of setters
would look like the following:</para>
<programlisting>&lt;bean id="itemWriter"
class="org.springframework.batch.io.file.FlatFileItemWriter"&gt;
&lt;property name="resource"
value="file:target/test-outputs/20070122.testStream.multilineStep.txt" /&gt;
&lt;property name="fieldSetCreator"&gt;
&lt;bean class="org.springframework.batch.io.file.mapping.PassThroughFieldSetMapper"/&gt;
&lt;/property&gt;
&lt;/bean&gt;</programlisting>
<programlisting> &lt;bean id="itemWriter"
class="org.springframework.batch.io.file.FlatFileItemWriter"&gt;
&lt;property name="resource"
value="file:target/test-outputs/20070122.testStream.multilineStep.txt" /&gt;
&lt;property name="fieldSetCreator"&gt;
&lt;bean class="org.springframework.batch.io.file.mapping.PassThroughFieldSetMapper"/&gt;
&lt;/property&gt;
&lt;/bean&gt;</programlisting>
</section>
<section>
@@ -788,17 +776,17 @@
File writing isn't quite so simple. At first glance it seems like a
similar straight forward contract should exist for
<classname>FlatFileItemWriter</classname>, if the file already exists,
throw an exception, if it does not, create it and start writing. Job
restart throws a bit of a kink into this. In the normal restart
scenario, the contract is reversed, if the file exists start writing
to it from the last known good position, if it does not, throw an
exception. However, what happens if the file name for this job is
always the same? In this case, you would want to delete the file if it
exists, unless it's a restart. Because of this possibility, the
<classname>FlatFileItemWriter</classname> contains the property,
<methodname>shouldDeleteIfExists</methodname>. Setting this property
to true will cause an existing file with the same name to be deleted
when the writer is opened.</para>
throw an exception, if it does not, create it and start writing.
However, potentially restarting a <classname>Job</classname> can cause
issues. In the normal restart scenario, the contract is reversed, if
the file exists start writing to it from the last known good position,
if it does not, throw an exception. However, what happens if the file
name for this job is always the same? In this case, you would want to
delete the file if it exists, unless it's a restart. Because of this
possibility, the <classname>FlatFileItemWriter</classname> contains
the property, <methodname>shouldDeleteIfExists</methodname>. Setting
this property to true will cause an existing file with the same name
to be deleted when the writer is opened.</para>
</section>
</section>
</section>
@@ -819,15 +807,15 @@
only to provide callbacks).</para>
</note>
<para>Lets take a closer look how XML input and output works in batch.
First, there are a few concepts that vary from file reading and writing
but are common across Spring Batch XML processing. With XML processing
instead of lines of records (FieldSets) that need to be tokenized, it is
assumed an XML resource is a collection of 'fragments' corresponding to
individual records. Note that OXM tools are designed to work with
standalone XML documents rather than XML fragments cut out of an XML
document, therefore the Spring Batch infrastructure needs to work around
this fact (as described below).</para>
<para>Lets take a closer look how XML input and output works in Spring
Batch. First, there are a few concepts that vary from file reading and
writing but are common across Spring Batch XML processing. With XML
processing instead of lines of records (FieldSets) that need to be
tokenized, it is assumed an XML resource is a collection of 'fragments'
corresponding to individual records. Note that OXM tools are designed to
work with standalone XML documents rather than XML fragments cut out of an
XML document, therefore the Spring Batch infrastructure needs to work
around this fact, as described below:</para>
<para><mediaobject>
<imageobject role="fo">
@@ -843,9 +831,11 @@
<caption><para>Figure 3.1: XML Input</para></caption>
</mediaobject></para>
<para>Spring Batch uses Object/XML Mapping (OXM) to bind fragments to
objects. However, Spring Batch is not tied to any particular xml binding
technology. Typical use is to delegate to <ulink
<para>The 'trade' tag is defined as the 'root element' in the scenario
above. Everything between '&lt;trade&gt;' and '&lt;/trade&gt;' is
considered one 'fragment'. Spring Batch uses Object/XML Mapping (OXM) to
bind fragments to objects. However, Spring Batch is not tied to any
particular xml binding technology. Typical use is to delegate to <ulink
url="http://static.springframework.org/spring-ws/site/reference/html/oxm.html"><citetitle>Spring
OXM</citetitle></ulink>, which provides uniform abstraction for the most
popular OXM technologies. The dependency on Spring OXM is optional and you
@@ -868,8 +858,8 @@
<caption><para>Figure 3.2: OXM Binding</para></caption>
</mediaobject></para>
<para>Now with and introduction into OXM and how one can use XML fragments
to represent records, let's take a closer look at Item Readers and Item
<para>Now with an introduction to OXM and how one can use XML fragments to
represent records, let's take a closer look at Item Readers and Item
Writers.</para>
<section>
@@ -901,27 +891,25 @@
&lt;price&gt;99.99&lt;/price&gt;
&lt;customer&gt;Customer3&lt;/customer&gt;
&lt;/trade&gt;
&lt;/records&gt;
</programlisting></para>
&lt;/records&gt;</programlisting></para>
<para>To be able to process the XML records we need the following:
<itemizedlist>
<listitem>
<para>Root Element Name - this is name of the root element of the
fragment that constitutes the object to be mapped. The example
<para>Root Element Name - Name of the root element of the fragment
that constitutes the object to be mapped. The example
configuration demonstrates this with the value of trade.</para>
</listitem>
<listitem>
<para>Resource - This is a Spring Resource that in the case of
this example will abstract the details of opening a file for
reading content.</para>
<para>Resource - Spring Resource that represents the file to be
read.</para>
</listitem>
<listitem>
<para><classname>FragmentDeserializer</classname> - this is the
UnMarshalling facility provided by Spring OXM for mapping the XML
fragment to an object.</para>
<para><classname>FragmentDeserializer</classname> - UnMarshalling
facility provided by Spring OXM for mapping the XML fragment to an
object.</para>
</listitem>
</itemizedlist></para>
@@ -1010,12 +998,13 @@
<para>Output works symmetrically to input. The
<classname>StaxEventItemWriter</classname> needs a
<classname>Resource</classname>, a serializer, and a rootTagName. A java
<classname>Resource</classname>, a serializer, and a rootTagName. A Java
object is passed to a serializer (typically a wrapper around Spring OXM
<classname>Marshaller</classname>) which writes to output using a custom
event writer that filters the StartDocument and EndDocument events
produced for each fragment by the OXM tools. We'll show this in an
example using the
<classname>Marshaller</classname>) which writes to a
<classname>Resource</classname> using a custom event writer that filters
the <classname>StartDocument</classname> and
<classname>EndDocument</classname> events produced for each fragment by
the OXM tools. We'll show this in an example using the
<classname>MarshallingEventWriterSerializer</classname>. The Spring
configuration for this setup looks as follows:</para>
@@ -1042,10 +1031,10 @@
&lt;/bean&gt;</parameter></programlisting>
<para>To summarize with a Java example, the following code illustrates
all of the points discussed. The code demonstrates the programmatic
setup of the required properties.</para>
all of the points discussed, demonstrating the programmatic setup of the
required properties.</para>
<programlisting>StaxEventItemWriter staxItemWriter = new StaxEventItemWriter()
<programlisting> StaxEventItemWriter staxItemWriter = new StaxEventItemWriter()
FileSystemResource resource = new FileSystemResource(File.createTempFile("StaxEventWriterOutputSourceTests", "xml"))
Map aliases = new HashMap();
@@ -1116,13 +1105,13 @@
<classname>ResourceEditor</classname> in Spring already filters and does
placeholder replacement on system properties.)</para>
<para>Often in a batch setting it is preferable to parameterise the file
<para>Often in a batch setting it is preferable to parameterize the file
name in the <classname>JobParameters</classname> of the job, instead of
through system properties, and access them that way. To allow for this,
Spring Batch provides the
<classname>StepExecutionResourceProxy</classname>. The proxy can use
either job name, step name, or any values from the
<classname>JobParameters</classname>, by surround them with %:</para>
<classname>JobParameters</classname>, by surrounding them with %:</para>
<programlisting> &lt;bean id="inputFile"
class="org.springframework.batch.core.resource.StepExecutionResourceProxy" /&gt;
@@ -1131,8 +1120,8 @@
<para>Assuming a job name of 'fooJob', and a step name of 'fooStep', and
the key-value pair of 'file.name="fileName.txt"' is in the
<classname>JobParameters</classname> the job is start with, the following
filename will be passed as the <classname>Resource</classname>:
<classname>JobParameters</classname> the job is started with, the
following filename will be passed as the <classname>Resource</classname>:
"<filename>//fooJob/fooStep/fileName.txt</filename>". It should be noted
that in order for the proxy to have access to the
<classname>StepExecution</classname>, it must be registered as a
@@ -1417,7 +1406,7 @@ itemReader.close(executionContext);</programlisting>
<classname>CustomerCredit</classname> objects in the exact same manner
as described by the <classname>JdbcCursorItemReader</classname>,
assuming hibernate mapping files have been created correctly for the
Customer table. The 'useStatelessSession' property default to true,
Customer table. The 'useStatelessSession' property defaults to true,
but has been added here to draw attention to the ability to switch it
on or off.</para>
</section>
@@ -1427,7 +1416,7 @@ itemReader.close(executionContext);</programlisting>
<title>Driving Query Based ItemReaders</title>
<para>In the previous section, Cursor based database input was
discussed. However, this isn't the only option. Many database vendors,
discussed. However, it isn't the only option. Many database vendors,
such as DB2, have extremely pessimistic locking strategies that can
cause issues if the table being read also needs to be used by other
portions of the online application. Furthermore, opening cursors over
@@ -1451,9 +1440,9 @@ itemReader.close(executionContext);</programlisting>
<para>As you can see, this example uses the same 'FOO' table as was used
in the cursor based example. However, rather than selecting the entire
row, only the ID's were selected in the SQL statement. So, rather than a
FOO object being returned from read(), an Integer will be returned. This
number can then be used to query for the 'details', which is a complete
Foo object:</para>
FOO object being returned from <classname>read</classname>, an Integer
will be returned. This number can then be used to query for the
'details', which is a complete Foo object:</para>
<mediaobject>
<imageobject role="html">
@@ -1476,8 +1465,8 @@ itemReader.close(executionContext);</programlisting>
<title>KeyCollector</title>
<para>As the previous example illustrates, the DrivingQueryItemReader
is fairly simple. It simply iteratoes over a list of keys. However,
the real complication is how those keys are obtained. The
is fairly simple. It simply iterates over a list of keys. However, the
real complication is how those keys are obtained. The
<classname>KeyCollector</classname> interface abstracts this:</para>
<programlisting> public interface KeyCollector {
@@ -1494,9 +1483,9 @@ itemReader.close(executionContext);</programlisting>
keys 1 through 1,000, and fails after processing key 500, upon
restarting keys 500 through 1,000 should be returned. This
functionality is made possible by the
<methodname>saveState</methodname> method, which saves the provided
key (which should be the current key being processed) in the provided
<classname>ExecutionContext</classname>. The
<methodname>updateContext</methodname> method, which saves the
provided key (which should be the current key being processed) in the
provided <classname>ExecutionContext</classname>. The
<methodname>retrieveKeys</methodname> method can then use this value
to retrieve a subset of the original keys:</para>
@@ -1520,10 +1509,10 @@ itemReader.close(executionContext);</programlisting>
<section>
<title>SingleColumnJdbcKeyCollector</title>
<para>The most common driving query scenario is that of a input that
has only one column that represents it's key. This is implemented as
the <classname>SingleColumnJdbcKeyCollector</classname> class, which
has the following options:</para>
<para>The most common driving query scenario is that of input that has
only one column that represents its key. This is implemented as the
<classname>SingleColumnJdbcKeyCollector</classname> class, which has
the following options:</para>
<table>
<title>SinglecolumnJdbcKeyCollector properties</title>
@@ -1717,25 +1706,24 @@ itemReader.close(executionContext);</programlisting>
example, let's assume that 20 items will be written per chunk, and the
15th item throws a DataIntegrityViolationException. As far as the Step
is concerned, all 20 item will be written out successfully, since
there's no way to know that and error will occur until they are actually
there's no way to know that an error will occur until they are actually
written out. Once
<classname>ItemWriter#</classname><methodname>flush</methodname>() is
called, the buffer will be emptied and the exception will be hit. At
this point, there's nothing the Step can do, the transaction must be
rolled back. Normally, this exception will cause the Item to be skipped
(depending upon the skip/retry policies), and then it won't be written
out again. However, in this scenario, there's no way for it to know
which item caused the issue, the whole buffer was being written out when
the failure happened. Because this is a common enough use case,
especially when using Hibernate, Spring Batch provides an implementation
to help: <classname>HibernateAwareItemWriter</classname>. The
<classname>HibernateAwareItemWriter</classname> solves the problem in a
straightforward way: if a chunk fails the first time, on subsequent runs
it will be flushed and the transaction committed after each time. This
effectively lowers the commit interval to one for the length of the
chunk. Doing so allows for items to be skipped reliably. The following
example illustrates how to configure the
<classname>HibernateAwareItemWriter</classname>:</para>
this point, there's nothing the <classname>Step</classname> can do, the
transaction must be rolled back. Normally, this exception will cause the
Item to be skipped (depending upon the skip/retry policies), and then it
won't be written out again. However, in this scenario, there's no way
for it to know which item caused the issue, the whole buffer was being
written out when the failure happened. Because this is a common enough
use case, especially when using Hibernate, Spring Batch provides an
implementation to help: <classname>HibernateAwareItemWriter</classname>.
The <classname>HibernateAwareItemWriter</classname> solves the problem
in a straightforward way: if a chunk fails the first time, on subsequent
runs it will be flushed after after each time. This effectively lowers
the commit interval to one for the length of the chunk. Doing so allows
for items to be skipped reliably. The following example illustrates how
to configure the <classname>HibernateAwareItemWriter</classname>:</para>
<programlisting> &lt;bean id="hibernateItemWriter"
class="org.springframework.batch.item.database.HibernateAwareItemWriter"&gt;
@@ -1855,10 +1843,10 @@ itemReader.close(executionContext);</programlisting>
Object transform(Object item) throws Exception;
}</programlisting>
<para>An ItemTransformer interface is very simple, given one object,
transorm it and return another. The object provided may or may not be of
the same type. The point is that business logic may be applied within
transform, and is completely up to the developer to create. An
<para>An ItemTransformer is very simple, given one object, transorm it and
return another. The object provided may or may not be of the same type.
The point is that business logic may be applied within transform, and is
completely up to the developer to create. An
<classname>ItemTransformer</classname> is used as part of the
<classname>ItemTransformerItemWriter</classname>, which accepts an
<classname>ItemWriter</classname> and an
@@ -1920,18 +1908,19 @@ itemReader.close(executionContext);</programlisting>
<section>
<para>Note that the <classname>ItemTransformerItemWriter</classname>
and the <classname>CompositeItemWriter</classname> are examples of a
delegation pattern, which is quite common usage in Spring Batch. The
delegates themselves might implement callback interfaces like
delegation pattern, which is common in Spring Batch. The delegates
themselves might implement callback interfaces like
<classname>ItemStream</classname> or
<classname>StepListener</classname>. If they do, and they are being
used in conjunction with Spring Batch Core as part of a step in a job,
then they almost certainly need to be registered manually with the
used in conjunction with Spring Batch Core as part of a
<classname>Step</classname> in a <classname>Job</classname>, then they
almost certainly need to be registered manually with the
<classname>Step</classname>. Registration is automatic when using the
factory beans (<classname>*StepFactoryBean</classname>) , but only for
the <classname>ItemReader</classname> and
<classname>ItemWriter</classname> injected directly - the delegates
are not known to the step, so they need to be injected as listeners or
streams (or both if appropriate).</para>
<classname>ItemWriter</classname> injected directly. The delegates are
not known to the <classname>Step</classname>, so they need to be
injected as listeners or streams (or both if appropriate).</para>
</section>
</section>

View File

@@ -69,9 +69,9 @@
have teamed to collaborate on the development of Spring Batch.</para>
<para>Accenture has contributed previously proprietary batch processing
architecture frameworks -- based upon decades worth of experience in
architecture frameworks, based upon decades worth of experience in
building batch architectures with the last several generations of
platforms (i.e., COBOL/Mainframe, C++/Unix, and now Java/anywhere) -- to
platforms, (i.e., COBOL/Mainframe, C++/Unix, and now Java/anywhere) to
the Spring Batch project along with committer resources to drive
support, enhancements, and the future roadmap.</para>
@@ -178,11 +178,11 @@
<para>Spring Batch is designed with extensibility and a diverse group of
end users in mind. The figure below shows a sketch of the layered
architecture that supports the extensibility and ease of use for
end-user developers.
<mediaobject>
end-user developers. <mediaobject>
<imageobject role="fo">
<imagedata align="center" fileref="src/site/resources/reference/images/spring-batch-layers.png"
format="PNG"/>
<imagedata align="center"
fileref="src/site/resources/reference/images/spring-batch-layers.png"
format="PNG" />
</imageobject>
<imageobject role="html">
@@ -204,7 +204,8 @@
are built on top of a common infrastructure. This infrastructure
contains common readers and writers, and services such as the
<classname>RetryTemplate</classname>, which are used both by application
developers(readers and writers) and the core framework itself.
developers(<classname>ItemReader</classname> and
<classname>ItemWriter</classname>) and the core framework itself.
(retry)</para>
</section>
</section>