BATCH-598:Tidied up the reference docs.
This commit is contained in:
@@ -12,24 +12,24 @@
|
||||
are “Jobs” and “Steps” and developer supplied processing units called
|
||||
ItemReaders and ItemWriters. However, because of the Spring patterns,
|
||||
operations, templates, callbacks, and idioms, there are opportunities for
|
||||
<itemizedlist>
|
||||
the following:<itemizedlist>
|
||||
<listitem>
|
||||
<para>significant improvement in adherence to a clear separation of
|
||||
concerns,</para>
|
||||
concerns</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>clearly delineated architectural layers and services provided
|
||||
as interfaces,</para>
|
||||
as interfaces</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>simple and default implementations that allowed for quick
|
||||
adoption and ease of use out-of-the-box, and</para>
|
||||
adoption and ease of use out-of-the-box</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>significantly enhanced extensibility.</para>
|
||||
<para>significantly enhanced extensibility</para>
|
||||
</listitem>
|
||||
</itemizedlist></para>
|
||||
|
||||
@@ -44,8 +44,7 @@
|
||||
physical implementation of the layers, components and technical services
|
||||
commonly found in robust, maintainable systems used to address the
|
||||
creation of simple to complex batch applications, with the infrastructure
|
||||
and extensions to address very complex processing needs. The materials
|
||||
below will walk through the details of the diagram.</para>
|
||||
and extensions to address very complex processing needs.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
@@ -67,12 +66,14 @@
|
||||
<caption><para>Figure 2.1: Batch Stereotypes</para></caption>
|
||||
</mediaobject>
|
||||
|
||||
<para>The colors used on the above diagram are extremely important. Grey
|
||||
<para>The above diagram highlights the interactions and key services
|
||||
provided by the Spring Batch framework. The colors used are important to
|
||||
understanding the responsibilities of a developer in Spring Batch. Grey
|
||||
represents an external application such as an enterprise scheduler or a
|
||||
database. It's important to note that scheduling is grey, and should thus
|
||||
be considered separate from Spring Batch. Blue represents application
|
||||
architecture services. In most cases these are provided by Spring Batch
|
||||
with out of the box implementations, but an architecture time may make
|
||||
with out of the box implementations, but an architecture team may make
|
||||
specific implementations that better address their specific needs. Yellow
|
||||
represents the pieces that must be configured by a developer. For example,
|
||||
they need to configure their job schedule so that the job is kicked off at
|
||||
@@ -87,7 +88,7 @@
|
||||
which include Run, Job, Application, and Data. The primary goal for
|
||||
organizing an application according to the tiers is to embed what is known
|
||||
as "separation of concerns" within the system. These tiers can be
|
||||
conceptual but may they prove effective in mapping the deployment of the
|
||||
conceptual but may prove effective in mapping the deployment of the
|
||||
artifacts onto physical components like Java runtimes and integration with
|
||||
data sources and targets. Effective separation of concerns results in
|
||||
reducing the impact of change to the system. The four conceptual tiers
|
||||
@@ -130,9 +131,10 @@
|
||||
|
||||
<para>This section describes stereotypes relating to the concept of a
|
||||
batch job. A job is an entity that encapsulates an entire batch process.
|
||||
The file containing the job may sometimes be referred to as the "job
|
||||
configuration". However, <classname>Job</classname> is just the top of an
|
||||
overall hierarchy:</para>
|
||||
As is common with other Spring projects, a <classname>Job</classname> will
|
||||
be wired together via an XML configuration file. This file may be referred
|
||||
to as the "job configuration". However, <classname>Job</classname> is just
|
||||
the top of an overall hierarchy:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject role="html">
|
||||
@@ -148,8 +150,7 @@
|
||||
<section>
|
||||
<title id="s.2.1.1">Job</title>
|
||||
|
||||
<para>The job could be described as the heart of the Spring Batch
|
||||
framework. It is represented by a Spring bean that implements the
|
||||
<para>A job is represented by a Spring bean that implements the
|
||||
<classname>Job</classname> interface and contains all of the information
|
||||
necessary to define the operations performed by a job. A job
|
||||
configuration is typically contained within a Spring XML configuration
|
||||
@@ -198,9 +199,9 @@
|
||||
<para>A <classname>JobInstance</classname> refers to the concept of a
|
||||
logical job run. Let's consider a batch job that should be run once at
|
||||
the end of the day, such as the 'EndOfDay' job from the diagram above.
|
||||
There is a one 'EndOfDay' <classname>Job</classname>, but each
|
||||
individual run of the <classname>Job</classname> must be tracked
|
||||
separately. In the case of this job, there will be one logical
|
||||
There is one 'EndOfDay' <classname>Job</classname>, but each individual
|
||||
run of the <classname>Job</classname> must be tracked separately. In the
|
||||
case of this job, there will be one logical
|
||||
<classname>JobInstance</classname> per day. For example, there will be a
|
||||
January 1st run, and a January 2nd run. If the January 1st run fails the
|
||||
first time and is run again the next day, it's still the January 1st
|
||||
@@ -208,30 +209,33 @@
|
||||
meaning the January 1st run processes data for January 1st, etc) That is
|
||||
to say, each <classname>JobInstance</classname> can have multiple
|
||||
executions. (<classname>JobExecution</classname> is discussed in more
|
||||
detail below) and only one instance can be running at a given time. The
|
||||
definition of a <classname>JobInstance</classname> has absolutely no
|
||||
bearing on the data the will be loaded. It is entirely up to the
|
||||
<classname>ItemReader</classname> implementation used to determine how
|
||||
data will be loaded. For example, in the EndOfDay scenario, there may be
|
||||
a column on the data that indicates the 'effective date' or 'schedule
|
||||
date' to which the data belongs. So, the January 1st run would only load
|
||||
data from the 1st, and the January 2nd run would only use data from the
|
||||
2nd. Because this determination will likely be a business decision, it
|
||||
is left up to the <classname>ItemReader</classname> to decide. What
|
||||
using the same JobInstance will decide, however, is whether or not the
|
||||
'state' (i.e. the ExecutionContext, which is discussed below) will be
|
||||
used. Using a new instace will mean 'start from the beginning' and using
|
||||
an existing instance will generally mean 'start from where you left
|
||||
off'.</para>
|
||||
detail below) and only one <classname>JobInstance</classname>
|
||||
corresponding to a particular <classname>Job</classname> can be running
|
||||
at a given time. The definition of a <classname>JobInstance</classname>
|
||||
has absolutely no bearing on the data the will be loaded. It is entirely
|
||||
up to the <classname>ItemReader</classname> implementation used to
|
||||
determine how data will be loaded. For example, in the EndOfDay
|
||||
scenario, there may be a column on the data that indicates the
|
||||
'effective date' or 'schedule date' to which the data belongs. So, the
|
||||
January 1st run would only load data from the 1st, and the January 2nd
|
||||
run would only use data from the 2nd. Because this determination will
|
||||
likely be a business decision, it is left up to the
|
||||
<classname>ItemReader</classname> to decide. What using the same
|
||||
<classname>JobInstance</classname> will determine, however, is whether
|
||||
or not the 'state' (i.e. the ExecutionContext, which is discussed below)
|
||||
from previous executions will be used. Using a new
|
||||
<classname>JobInstance</classname> will mean 'start from the beginning'
|
||||
and using an existing instance will generally mean 'start from where you
|
||||
left off'.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title id="s.2.1.3">JobParameters</title>
|
||||
|
||||
<para>Having discussed <classname>JobInstance</classname> and how it
|
||||
differs from <classname>Job</classname>, the natural question to ask is,
|
||||
"how is one JobInstance distinguished from another?" The answer is:
|
||||
<classname>JobParameters</classname>.
|
||||
differs from <classname>Job</classname>, the natural question to ask is:
|
||||
"how is one <classname>JobInstance</classname> distinguished from
|
||||
another?" The answer is: <classname>JobParameters</classname>.
|
||||
<classname>JobParameters</classname> are any set of parameters used to
|
||||
start a batch job, which can be used for identification or even as
|
||||
reference data during the run. In the example above, where there are two
|
||||
@@ -310,7 +314,7 @@
|
||||
<para>These properties are important because they will be persisted and
|
||||
can be used to completely determine the status of an execution. For
|
||||
example, if the EndOfDay job for 01-01 is executed at 9:00 PM, and fails
|
||||
at 9:30, the following entries will be in the batch meta data
|
||||
at 9:30, the following entries will be made in the batch meta data
|
||||
tables:</para>
|
||||
|
||||
<table>
|
||||
@@ -405,14 +409,18 @@
|
||||
completing successfully at 9:30. Because it's now the next day, the
|
||||
01-02 job must be run as well, which is kicked off just afterwards at
|
||||
9:31, and completes in it's normal one hour time at 10:30. There is no
|
||||
requirement that one be kicked off after another, unless there is
|
||||
potential for the two jobs to attempt to access the same data, causing
|
||||
issues with locking at the database level. It is entirely up to the
|
||||
scheduler to determine when to run. Since they're separate JobInstances,
|
||||
Spring Batch will make no attempt to stop them from being run
|
||||
concurrently. There should now be an extra entry in both the job
|
||||
instance and job parameters table, and two extra entries in the job
|
||||
execution table:</para>
|
||||
requirement that one <classname>JobInstance</classname> be kicked off
|
||||
after another, unless there is potential for the two jobs to attempt to
|
||||
access the same data, causing issues with locking at the database level.
|
||||
It is entirely up to the scheduler to determine when to run. Since
|
||||
they're separate JobInstances, Spring Batch will make no attempt to stop
|
||||
them from being run concurrently. (Attempting to run the same
|
||||
<classname>JobInstance</classname> while another is already running will
|
||||
result in a <classname>JobExecutionAlreadyRunningException</classname>
|
||||
being thrown) There should now be an extra entry in both the
|
||||
<classname>JobInstance</classname> and
|
||||
<classname>JobParameters</classname> tables, and two extra entries in
|
||||
the <classname>JobExecution</classname> table:</para>
|
||||
|
||||
<table>
|
||||
<title>BATCH_JOB_INSTANCE</title>
|
||||
@@ -539,17 +547,19 @@
|
||||
<section>
|
||||
<title id="s.2.1">Step Stereotypes</title>
|
||||
|
||||
<para>A <classname>Step</classname> is an entity that encapsulates a
|
||||
single, independent phase of a batch job. Therefore, every batch job is
|
||||
composed entirely of one or more batch steps. Steps should be thought of
|
||||
as unique processing streams that will be executed in sequence. For
|
||||
example, if you have one step that loads a file into a database, another
|
||||
that reads from the database, validates the data, preforms processing, and
|
||||
then writes to another table, and another that reads from that table and
|
||||
writes out to a file. Each of these steps will be performed completely
|
||||
before moving on to the next step. The file will be completely read into
|
||||
the database before step 2 can begin. As with Job, a Step has individual
|
||||
executions, that correspond with unique JobExecutions:</para>
|
||||
<para>A <classname>Step</classname> is a domain object that encapsulates
|
||||
an independent, sequential phase of a batch job. Therefore, every Job is
|
||||
composed entirely of one or more steps. A <classname>Step</classname>
|
||||
should be thought of as a unique processing stream that will be executed
|
||||
in sequence. For example, if you have one step that loads a file into a
|
||||
database, another that reads from the database, validates the data,
|
||||
preforms processing, and then writes to another table, and another that
|
||||
reads from that table and writes out to a file. Each of these steps will
|
||||
be performed completely before moving on to the next step. The file will
|
||||
be completely read into the database before step 2 can begin. As with
|
||||
<classname>Job</classname>, a <classname>Step</classname> has an
|
||||
individual <classname>StepExecution</classname> that corresponds with a
|
||||
unique <classname>JobExecution</classname>:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject role="html">
|
||||
@@ -566,65 +576,52 @@
|
||||
<title id="step">Step</title>
|
||||
|
||||
<para>A <classname>Step</classname> contains all of the information
|
||||
necessary to define a discrete set of business logic within a job. This
|
||||
is a necessarily vague description because the contents of any given
|
||||
step are at the discretion of the developer writing a job. A step can be
|
||||
as narrowly defined as a single line of code or as broadly defined as
|
||||
necessary to complete the entire work of a job. There are several
|
||||
factors that will affect the breadth of step configurations.</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Re-usability - step definitions can be shared between
|
||||
jobs</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>Transaction Management - depending on your desired transaction
|
||||
strategy, you may divide the work of your job differently between
|
||||
steps</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>Extensibility - adequately granular definition of steps allows
|
||||
the addition or subtraction of steps at a later time in the
|
||||
appropriate position within your job configuration</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
necessary to define and control the actual batch processing. This is a
|
||||
necessarily vague description because the contents of any given
|
||||
<classname>Step</classname> are at the discretion of the developer
|
||||
writing a <classname>Job</classname>. A Step can be as simple or complex
|
||||
as the developer desires. A simple <classname>Step</classname> might
|
||||
load data from a file into the database, requiring little or no code.
|
||||
(depending upon the implementations used) A more complex
|
||||
<classname>Step</classname> may have complicated business rules that are
|
||||
applied as part of the processing.</para>
|
||||
|
||||
<para>Steps are defined by instantiating implementations of the
|
||||
<classname>Step</classname> interface. Two step implementation classes
|
||||
are available in the Spring Batch framework, and they are each discussed
|
||||
in detail in other sections of this guide. For most situations, the
|
||||
in detail in Chatper 4 of this guide. For most situations, the
|
||||
<classname>ItemOrientedStep</classname> implementation is sufficient,
|
||||
but for situations where only one call is needed, such as a stored
|
||||
procedure call or a wrapper around existing script, a
|
||||
<classname>TaskletStep</classname> may be the better option.</para>
|
||||
<classname>TaskletStep</classname> may be a better option.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title id="stepExecution">StepExecution</title>
|
||||
|
||||
<para>A <classname>StepExecution</classname> represents the technical
|
||||
concept of a single attempt to execute a <classname>Step</classname>.
|
||||
For instance, using the example from
|
||||
<classname>JobExecution</classname>, if we have a job instance
|
||||
"EndOfJob-01-01-2008" that fails to successfully complete its work the
|
||||
first time it is run, when we attempt to run it again, a new
|
||||
<classname>StepExecution</classname> will be created. Each of these step
|
||||
executions may represent a different invocation of the batch framework,
|
||||
but they will all correspond to the same
|
||||
<para>A <classname>StepExecution</classname> represents a single attempt
|
||||
to execute a <classname>Step</classname>. Using the example from
|
||||
<classname>JobExecution</classname>, if there is a
|
||||
<classname>JobInstance</classname> for the "EndOfDayJob", with
|
||||
<classname>JobParameters</classname> of "01-01-2008" that fails to
|
||||
successfully complete its work the first time it is run, when it is
|
||||
executed again, a new <classname>StepExecution</classname> will be
|
||||
created. Each of these step executions may represent a different
|
||||
invocation of the batch framework, but they will all correspond to the
|
||||
same <classname>JobInstance</classname>, just as multiple
|
||||
<classname>JobExecutions</classname> belong to the same
|
||||
<classname>JobInstance</classname>.</para>
|
||||
|
||||
<para>Step executions are represented by objects of the
|
||||
<classname>StepExecution</classname> class. Each execution contains a
|
||||
reference to its corresponding step and job execution, and transaction
|
||||
related data such as commit and rollback count and start and end times.
|
||||
Additionally, each step execution will contain an
|
||||
<classname>ExecutionContext</classname>, which contains any data a
|
||||
developer needs persisted across batch runs, such as statistics or state
|
||||
information needed to restart. The following is a listing of the
|
||||
properties for <classname>StepExecution</classname>:</para>
|
||||
reference to its corresponding step and
|
||||
<classname>JobExecution</classname>, and transaction related data such
|
||||
as commit and rollback count and start and end times. Additionally, each
|
||||
step execution will contain an <classname>ExecutionContext</classname>,
|
||||
which contains any data a developer needs persisted across batch runs,
|
||||
such as statistics or state information needed to restart. The following
|
||||
is a listing of the properties for
|
||||
<classname>StepExecution</classname>:</para>
|
||||
|
||||
<table>
|
||||
<title>StepExecution properties</title>
|
||||
@@ -635,9 +632,10 @@
|
||||
<entry>status</entry>
|
||||
|
||||
<entry>A <classname>BatchStatus</classname> object that
|
||||
indicates the status of the execution. While it's running, it's
|
||||
BatchStatus.STARTED, if it fails it's BatchStatus.FAILED, and if
|
||||
it finishes successfully it's BatchStatus.COMPLETED</entry>
|
||||
indicates the status of the execution. While it's running, the
|
||||
status is BatchStatus.STARTED, if it fails the status is
|
||||
BatchStatus.FAILED, and if it finishes successfully the status
|
||||
is BatchStatus.COMPLETED</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
@@ -659,29 +657,29 @@
|
||||
<entry>exitStatus</entry>
|
||||
|
||||
<entry>The <classname>ExitStatus</classname> indicating the
|
||||
result of the run. It is most important because it contains an
|
||||
exit code that will be returned to the caller. See chapter 5 for
|
||||
more details.</entry>
|
||||
result of the execution. It is most important because it
|
||||
contains an exit code that will be returned to the caller. See
|
||||
chapter 5 for more details.</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry>executionContext</entry>
|
||||
|
||||
<entry>The 'property bag' containing any user data that needs to
|
||||
be persisted between batch runs.</entry>
|
||||
be persisted between executions.</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry>commitCount</entry>
|
||||
|
||||
<entry>The number of times the transaction has been committed
|
||||
for this execution</entry>
|
||||
<entry>The number transactions that have been committed for this
|
||||
execution</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry>itemCount</entry>
|
||||
|
||||
<entry>The number of items that have been process for this
|
||||
<entry>The number of items that have been processed for this
|
||||
execution.</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
@@ -707,9 +705,13 @@
|
||||
|
||||
<programlisting>executionContext.putLong(getKey(LINES_READ_COUNT), reader.getPosition());</programlisting>
|
||||
|
||||
<para>When the <classname>ItemReader</classname> is opened, it can check
|
||||
to see if it has any stored state in the context, and initialize itself
|
||||
from there:</para>
|
||||
<para>The call above will store the current number of lines read into
|
||||
the ExecutionContext. It should be made just before the framework
|
||||
commits. Being notified before a commit requires one of the various
|
||||
StepListeners, or an ItemStream, which are discussed in more detail
|
||||
later in this guide. When the <classname>ItemReader</classname> is
|
||||
opened, it can check to see if it has any stored state in the context,
|
||||
and initialize itself from there:</para>
|
||||
|
||||
<programlisting> if (executionContext.containsKey(getKey(LINES_READ_COUNT))) {
|
||||
log.debug("Initializing for restart. Restart data is: " + executionContext);
|
||||
@@ -759,9 +761,9 @@
|
||||
<section>
|
||||
<title>JobRepository</title>
|
||||
|
||||
<para>The <classname>JobRepository</classname> is the persistence
|
||||
mechanism for all of the Stereotypes mentioned above. When a job is first
|
||||
launched, a <classname>JobExecution</classname> is obtained by calling the
|
||||
<para><classname>JobRepository</classname> is the persistence mechanism
|
||||
for all of the Stereotypes mentioned above. When a job is first launched,
|
||||
a <classname>JobExecution</classname> is obtained by calling the
|
||||
repository's <methodname>createJobExecution</methodname> method, and
|
||||
during the course of execution, <classname>StepExecution</classname> and
|
||||
<classname>JobExecution</classname> are persisted by passing them to the
|
||||
|
||||
@@ -7,8 +7,8 @@
|
||||
<section>
|
||||
<title>Introduction</title>
|
||||
|
||||
<para>In Chapter 2, the overall description of the architecture was
|
||||
discussed, using the following diagram as a guide:</para>
|
||||
<para>In Chapter 2, the overall architecture design was discussed, using
|
||||
the following diagram as a guide:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject role="html">
|
||||
@@ -80,7 +80,7 @@
|
||||
<section>
|
||||
<title>Run Tier</title>
|
||||
|
||||
<para>As it's name suggests, this tier is entirely concerned with actually
|
||||
<para>As its name suggests, this tier is entirely concerned with actually
|
||||
running the job. Regardless of whether the originator is a Scheduler or an
|
||||
HTTP request, a Job must be obtained, parameters must be parsed, and
|
||||
eventually a <classname>JobLauncher</classname> called:</para>
|
||||
@@ -102,11 +102,11 @@
|
||||
<para>For users that want to run their jobs from an enterprise
|
||||
scheduler, the command line is the primary interface. This is because
|
||||
most schedulers (with the exception of Quartz unless using the
|
||||
NativeJob) work directly with operating system processes, primarily
|
||||
kicked off with shell scripts. There are many ways to launch a Java
|
||||
process besides a shell script, such as Perl, Ruby, or even 'build
|
||||
tools' such as ant or maven. However, because most people are familiar
|
||||
with shell scripts, this example will focus on them.</para>
|
||||
<classname>NativeJob</classname>) work directly with operating system
|
||||
processes, primarily kicked off with shell scripts. There are many ways
|
||||
to launch a Java process besides a shell script, such as Perl, Ruby, or
|
||||
even 'build tools' such as ant or maven. However, because most people
|
||||
are familiar with shell scripts, this example will focus on them.</para>
|
||||
|
||||
<section>
|
||||
<title>The CommandLineJobRunner</title>
|
||||
@@ -295,7 +295,7 @@
|
||||
in order to obtain an execution:</para>
|
||||
|
||||
<programlisting> <bean id="jobLauncher"
|
||||
class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
|
||||
class="org.springframework.batch.execution.launch.SimpleJobLauncher">
|
||||
<property name="jobRepository" ref="jobRepository" />
|
||||
</bean></programlisting>
|
||||
|
||||
@@ -341,7 +341,7 @@
|
||||
<classname>TaskExecutor</classname>:</para>
|
||||
|
||||
<programlisting> <bean id="jobLauncher"
|
||||
class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
|
||||
class="org.springframework.batch.execution.launch.SimpleJobLauncher">
|
||||
<property name="jobRepository" ref="jobRepository" />
|
||||
<property name="taskExecutor">
|
||||
<bean class="org.springframework.core.task.SimpleAsyncTaskExecutor" />
|
||||
@@ -444,7 +444,7 @@
|
||||
convenience: <classname>JobRepositoryFactoryBean</classname>.</para>
|
||||
|
||||
<programlisting> <bean id="jobRepository"
|
||||
class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean"
|
||||
class="org.springframework.batch.execution.repository.JobRepositoryFactoryBean"
|
||||
<property name="databaseType" value="hsql" />
|
||||
<property name="dataSource" value="dataSource" />
|
||||
</bean></programlisting>
|
||||
@@ -470,13 +470,13 @@
|
||||
</bean>
|
||||
|
||||
<bean id="mapJobInstanceDao"
|
||||
class="org.springframework.batch.core.repository.dao.MapJobInstanceDao" />
|
||||
class="org.springframework.batch.execution.repository.dao.MapJobInstanceDao" />
|
||||
|
||||
<bean id="mapJobExecutionDao"
|
||||
class="org.springframework.batch.core.repository.dao.MapJobExecutionDao" />
|
||||
class="org.springframework.batch.execution.repository.dao.MapJobExecutionDao" />
|
||||
|
||||
<bean id="mapStepExecutionDao"
|
||||
class="org.springframework.batch.core.repository.dao.MapStepExecutionDao" /></programlisting>
|
||||
class="org.springframework.batch.execution.repository.dao.MapStepExecutionDao" /></programlisting>
|
||||
|
||||
<para>The Map* DAO implementations store the batch artifacts in a
|
||||
transactional map. So, the repository and DAOs may still be used
|
||||
@@ -510,9 +510,9 @@
|
||||
</tx:attributes>
|
||||
</tx:advice></programlisting></para>
|
||||
|
||||
<para>This fragment can be used as is, or with almost no changes.
|
||||
The isolation level in the <code>create*</code> method attiributes
|
||||
is specified to ensure that when jobs are launched there if two
|
||||
<para>This fragment can be used as is, with almost no changes. The
|
||||
isolation level in the <code>create*</code> method attiributes is
|
||||
specified to ensure that when jobs are launched there if two
|
||||
processes are trying to launch the same job at the same time, only
|
||||
one will succeed. This is quite aggressive, and READ_COMMITTED would
|
||||
work just as well; READ_UNCOMMITTED would be fine if two processes
|
||||
@@ -530,15 +530,15 @@
|
||||
<title>Recommendations for Indexing Meta Data Tables</title>
|
||||
|
||||
<para>Spring Batch provides DDL samples for the meta-data tables in
|
||||
the Core jar file for several common database platforms. We do not
|
||||
include index declarations inthat DDL because there are too many
|
||||
variations in how people want to do that dependeing on their precise
|
||||
platform, local conventions and also the business requirements of
|
||||
how the jobs will be operated. So here we give some indication as to
|
||||
which columns are going to be used in a WHERE clause by the Dao
|
||||
ipmlementations that we provide, and how frequently they might be
|
||||
used, so that individual projects can make up their own minds about
|
||||
indexing.</para>
|
||||
the Core jar file for several common database platforms. Index
|
||||
declarations are not included in that DDL because there are too many
|
||||
variations in how users may want to index dependeing on their
|
||||
precise platform, local conventions and also the business
|
||||
requirements of how the jobs will be operated. The table below
|
||||
provides some indication as to which columns are going to be used in
|
||||
a WHERE clause by the Dao ipmlementations provided by Spring Batch,
|
||||
and how frequently they might be used, so that individual projects
|
||||
can make up their own minds about indexing.</para>
|
||||
|
||||
<table>
|
||||
<title>Where clauses in SQL statements (exluding primary keys) and
|
||||
@@ -722,8 +722,6 @@
|
||||
<bean class="org.springframework.batch.core.listener.JobListenerSupport" />
|
||||
</property>
|
||||
</bean></programlisting>
|
||||
|
||||
<para></para>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
@@ -731,26 +729,27 @@
|
||||
<title>JobFactory and Stateful Components in Steps</title>
|
||||
|
||||
<para>Unlike many traditional Spring applications, many of the
|
||||
components of a batch application are stateful - the file readers and
|
||||
writers are the obvious examples. The recommended way to deal with this
|
||||
is to create a fresh <classname>ApplicationContext</classname> for each
|
||||
job execution. If the job is launched from the command line with
|
||||
<classname>CommandLineJobRunner</classname> this is trivial. For more
|
||||
complex launching scenarios, where jobs are executed in parallel or
|
||||
serially from the same process, some extra steps have to be taken to
|
||||
ensure that the <classname>ApplicationContext</classname> is refreshed.
|
||||
This is preferable to using prototype scope for the stateful beans
|
||||
because then they would not receive lifecycle callbacks from the
|
||||
container at the end of use (e.g. through destroy-method in XML).</para>
|
||||
components of a batch application are stateful, the file readers and
|
||||
writers are obvious examples. The recommended way to deal with this is
|
||||
to create a fresh <classname>ApplicationContext</classname> for each job
|
||||
execution. If the <classname>Job</classname> is launched from the
|
||||
command line with <classname>CommandLineJobRunner</classname> this is
|
||||
trivial. For more complex launching scenarios, where jobs are executed
|
||||
in parallel or serially from the same process, some extra steps have to
|
||||
be taken to ensure that the <classname>ApplicationContext</classname> is
|
||||
refreshed. This is preferable to using prototype scope for the stateful
|
||||
beans because then they would not receive lifecycle callbacks from the
|
||||
container at the end of use. (e.g. through destroy-method in XML)</para>
|
||||
|
||||
<para>The strategy provided by Spring Batch to deal with this scenario
|
||||
is the <classname>JobFactory</classname>, and the samples provide an
|
||||
example of a specialised implementation that can load an
|
||||
example of a specialized implementation that can load an
|
||||
<classname>ApplicationContext</classname> and close it properly when the
|
||||
job is finished. Look at the
|
||||
job is finished. A relevant examples is
|
||||
<classname>ClassPathXmlApplicationContextJobFactory</classname> and its
|
||||
use in the <code>adhoc-job-launcher-context.xml</code> and the
|
||||
<code>quartz-job-launcher-context.xml</code>.</para>
|
||||
<code>quartz-job-launcher-context.xml</code>, which can be found in the
|
||||
Samples project.</para>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
@@ -862,10 +861,7 @@
|
||||
transaction. At the beginning of processing a transaction is begun,
|
||||
and each time <markup>read</markup> is called on the
|
||||
<classname>ItemReader</classname>, a counter is incremented. When it
|
||||
reaches 10, the transaction will be committed. This also means that if
|
||||
an item is skipped it will still count as an item against the commit
|
||||
interval even though it hasn't been written out. (Skipping items will
|
||||
be covered in more detail later in this chapter)</para>
|
||||
reaches 10, the transaction will be committed.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
@@ -885,8 +881,9 @@
|
||||
manually before it can be run again. This is configurable on the
|
||||
step level, since different steps have different requirements. One
|
||||
Step that may only be executed once can exist as part of the same
|
||||
Job as Step that can be run infinitely. Below is an example start
|
||||
limit configuration:</para>
|
||||
<classname>Job</classname> as <classname>Step</classname> that can
|
||||
be run infinitely. Below is an example start limit
|
||||
configuration:</para>
|
||||
|
||||
<programlisting> <bean id="simpleStep"
|
||||
class="org.springframework.batch.core.step.item.SimpleStepFactoryBean" >
|
||||
@@ -953,27 +950,28 @@
|
||||
information about football games and summarizes them. It contains
|
||||
three steps: playerLoad, gameLoad, and playerSummarization. The
|
||||
playerLoad <classname>Step</classname> loads player information from
|
||||
a flatfile, while the <classname>gameLoad</classname> Step does the
|
||||
same for games. The final step, playerSummarization, then summarizes
|
||||
a flat file, while the <classname>gameLoad</classname>
|
||||
<classname>Step</classname> does the same for games. The final
|
||||
<classname>Step</classname>, playerSummarization, then summarizes
|
||||
the statistics for each player based upon the provided games. It is
|
||||
assumed that the file loaded by 'playerLoad' must be loaded only
|
||||
once, but that 'gameLoad' will load any games found within a
|
||||
particular directory, deleting them after they have been
|
||||
successfully loaded into the database. As a result, the playerLoad
|
||||
<classname>Step</classname> contains no additionaly configuration.
|
||||
It can be started almost limitlessly, and if complete will be
|
||||
skipped. The 'gameLoad' <classname>Step</classname>, however, needs
|
||||
to be run everytime, in case extra files have been dropped since it
|
||||
last executed, so it has 'allowStartIfComplete' set to 'true' in
|
||||
order to always be started. (It is assumed that the database tables
|
||||
games are loaded into has a process indicator on it, to ensure new
|
||||
games can be properly found by the summarization step) The
|
||||
summarization <classname>step</classname>, which is the most
|
||||
important in the <classname>Job</classname>, is configured to have a
|
||||
start limit of 3. This is useful in case it continually fails, a new
|
||||
exit code will be returned to the operators that control job
|
||||
execution, and it won't be allowed to start again until manual
|
||||
intervention has taken place.</para>
|
||||
<classname>Step</classname> contains no additional configuration. It
|
||||
can be started almost limitlessly, and if complete will be skipped.
|
||||
The 'gameLoad' <classname>Step</classname>, however, needs to be run
|
||||
everytime, in case extra files have been dropped since it last
|
||||
executed, so it has 'allowStartIfComplete' set to 'true' in order to
|
||||
always be started. (It is assumed that the database tables games are
|
||||
loaded into has a process indicator on it, to ensure new games can
|
||||
be properly found by the summarization step) The summarization
|
||||
<classname>step</classname>, which is the most important in the
|
||||
<classname>Job</classname>, is configured to have a start limit of
|
||||
3. This is useful in case it continually fails, a new exit code will
|
||||
be returned to the operators that control job execution, and it
|
||||
won't be allowed to start again until manual intervention has taken
|
||||
place.</para>
|
||||
|
||||
<note>
|
||||
<para>This job is purely for example purposes and is not the same
|
||||
@@ -1076,7 +1074,10 @@
|
||||
|
||||
<para>In this example, a <classname>FlatFileItemReader</classname> is
|
||||
used, and if at any point a FlatFileParseException is thrown, it will
|
||||
be skipped and counted against the total skip limit of 10.</para>
|
||||
be skipped and counted against the total skip limit of 10. It should
|
||||
be noted that any failures encountered while reading will not count
|
||||
against the commit interval. In other words, the commit interval is
|
||||
only incremented on writes (regardless of success or failure).</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
@@ -1111,22 +1112,23 @@
|
||||
<section>
|
||||
<title>Registering ItemStreams with the Step</title>
|
||||
|
||||
<para>The step has to take care of the
|
||||
<classname>ItemStream</classname> callbacks at the necessary points in
|
||||
the flow. This is vital if a step is going to be fail, and might need
|
||||
to be restarted, because the <classname>ItemStream</classname>
|
||||
interface is where the step gets the information it needs about
|
||||
persistent state between executions. The factory beans that Spring
|
||||
Batch provides for convenient configuration of
|
||||
<classname>Step</classname> instances have features that allow streams
|
||||
to be registered with the step when it is configured.</para>
|
||||
<para>The step has to take care of <classname>ItemStream</classname>
|
||||
callbacks at the necessary points in its lifecycle. This is vital if a
|
||||
step fails, and might need to be restarted, because the
|
||||
<classname>ItemStream</classname> interface is where the step gets the
|
||||
information it needs about persistent state between executions. The
|
||||
factory beans that Spring Batch provides for convenient configuration
|
||||
of <classname>Step</classname> instances have features that allow
|
||||
streams to be registered with the step when it is configured.</para>
|
||||
|
||||
<para>If the ItemReader of ItemWriter themselves implement the
|
||||
ItemStream interface, then these will be registered automatically. Any
|
||||
other streams need to be registered separately. This is often the case
|
||||
where there are indirect dependencies, like delegates being injected
|
||||
into the reader and writer. To register these they can be injected
|
||||
into teh factory beans through the streams property, e.g.:</para>
|
||||
<para>If the <classname>ItemReader</classname> or
|
||||
<classname>ItemWriter</classname> themselves implement the ItemStream
|
||||
interface, then these will be registered automatically. Any other
|
||||
streams need to be registered separately. This is often the case where
|
||||
there are indirect dependencies, like delegates being injected into
|
||||
the reader and writer. To register these they can be injected into the
|
||||
factory beans through the streams property, as illustrated
|
||||
below:</para>
|
||||
|
||||
<programlisting><bean id="step1" parent="simpleStep"
|
||||
class="org.springframework.batch.core.step.item.StatefulRetryStepFactoryBean">
|
||||
@@ -1160,12 +1162,13 @@
|
||||
with one of many <classname>Step</classname> scoped listeners.</para>
|
||||
|
||||
<section>
|
||||
<title>StepListener</title>
|
||||
<title>StepExecutionListener</title>
|
||||
|
||||
<para>StepListener represents the most generic listener for
|
||||
<classname>Step</classname> execution. It allows for notification
|
||||
before a Step is started, after it has completed, and if any errors
|
||||
are encountered during processing:</para>
|
||||
<para><classname>StepExecutionListener</classname> represents the
|
||||
most generic listener for <classname>Step</classname> execution. It
|
||||
allows for notification before a <classname>Step</classname> is
|
||||
started, after it has completed, and if any errors are encountered
|
||||
during processing:</para>
|
||||
|
||||
<programlisting> public interface StepExecutionListener extends StepListener {
|
||||
|
||||
@@ -1180,7 +1183,8 @@
|
||||
<methodname>onErrorInStep</methodname> and
|
||||
<methodname>afterStep</methodname> in order to allow listeners the
|
||||
chance to modify the exit code that is returned upon completion of a
|
||||
<classname>Step</classname>. A StepListener can be applied to any
|
||||
<classname>Step</classname>. A
|
||||
<classname>StepExecutionListener</classname> can be applied to any
|
||||
step factory bean via the listeners property:</para>
|
||||
|
||||
<programlisting> <bean id="simpleStep"
|
||||
@@ -1227,10 +1231,10 @@
|
||||
<section>
|
||||
<title>ItemReadListener</title>
|
||||
|
||||
<para>When discussing skip logic earlier, it was mentioned that it
|
||||
may be beneficial to log out skipped records, so that they can be
|
||||
deal with later. In the case of read errors, this can be done with
|
||||
an <classname>ItemReaderListener:</classname><programlisting> public interface ItemReadListener extends StepListener {
|
||||
<para>When discussing skip logic above, it was mentioned that it may
|
||||
be beneficial to log out skipped records, so that they can be deal
|
||||
with later. In the case of read errors, this can be done with an
|
||||
<classname>ItemReaderListener:</classname><programlisting> public interface ItemReadListener extends StepListener {
|
||||
|
||||
void beforeRead();
|
||||
|
||||
@@ -1413,9 +1417,10 @@
|
||||
<classname>ItemTransformer</classname>.</para>
|
||||
|
||||
<para>Here we provide a few examples of common patterns in custom
|
||||
business logic, mainly using the listener interfaces - but remember that
|
||||
a reader or writer can implement the listener interfaces as well if that
|
||||
is appropriate.</para>
|
||||
business logic, mainly using the listener interfaces . It should be
|
||||
noted that an <classname>ItemReader</classname> or
|
||||
<classname>ItemWriter</classname> can implement the listener interfaces
|
||||
as well if appropriate.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
@@ -1424,10 +1429,11 @@
|
||||
<para>A common use case is the need for special handling of errors in a
|
||||
step, item by item, perhaps logging to a special channel, or inserting a
|
||||
record into a database. The ItemOrientedStep (created from the step
|
||||
factory beans) allows us to implement this use case with a simple
|
||||
factory beans) allows users to implement this use case with a simple
|
||||
<classname>ItemReadListener</classname>, for errors on read, and an
|
||||
<classname>ItemWriteListener</classname>, for errors on write.
|
||||
E.g.</para>
|
||||
<classname>ItemWriteListener</classname>, for errors on write. The below
|
||||
code snippets illustrate a listener that logs both read and write
|
||||
failures:</para>
|
||||
|
||||
<programlisting>public class ItemFailureLoggerListener extends ItemListenerSupport {
|
||||
|
||||
@@ -1443,8 +1449,8 @@
|
||||
|
||||
}</programlisting>
|
||||
|
||||
<para>Having implemented this listener it just needs to be registered
|
||||
with the step, e.g.</para>
|
||||
<para>Having implemented this listener it must be registered with the
|
||||
step:</para>
|
||||
|
||||
<programlisting><bean id="simpleStep"
|
||||
class="org.springframework.batch.core.step.item.SimpleStepFactoryBean" >
|
||||
@@ -1456,11 +1462,11 @@
|
||||
|
||||
<para>Remember that if your listener does anything in an
|
||||
<code>onError()</code> method, it will be inside a transaction that is
|
||||
going to rollback. If you need to use a transactional resource like a
|
||||
database inside an <code>onError()</code> method, consider adding a
|
||||
declarative transaction to that method (see Spring Core Reference Guide
|
||||
for details), and giving its propagation attribute the value
|
||||
REQUIRES_NEW.</para>
|
||||
going to be rolled back. If you need to use a transactional resource
|
||||
such as a database inside an <code>onError()</code> method, consider
|
||||
adding a declarative transaction to that method (see Spring Core
|
||||
Reference Guide for details), and giving its propagation attribute the
|
||||
value REQUIRES_NEW.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
@@ -1472,8 +1478,8 @@
|
||||
sense to stop a job execution from within the business logic.</para>
|
||||
|
||||
<para>The simplest thing to do is to throw a RuntimeException (one that
|
||||
isn't retried indefinitely or skipped), e.g. we could use a custom
|
||||
exception type as in the example below</para>
|
||||
isn't retried indefinitely or skipped), For example, a custom exception
|
||||
type could be used, as in the example below:</para>
|
||||
|
||||
<programlisting>public class PoisonPillItemWriter extends AbstractItemWriter {
|
||||
|
||||
@@ -1488,8 +1494,8 @@
|
||||
}</programlisting>
|
||||
|
||||
<para>Another simple way to stop a step from executing is to simply
|
||||
return <code>null</code> from the <classname>ItemReader</classname>,
|
||||
e.g.</para>
|
||||
return <code>null</code> from the
|
||||
<classname>ItemReader</classname>:</para>
|
||||
|
||||
<programlisting>public class EarlyCompletionItemReader extends AbstractItemReader {
|
||||
|
||||
@@ -1516,7 +1522,7 @@
|
||||
strategy which signals a complete batch when the item to be processed is
|
||||
null. A more sophisticated completion policy could be implemented and
|
||||
injected into the <classname>Step</classname> through the
|
||||
<classname>RepeatOperationsStepFactoryBean</classname>, e.g.</para>
|
||||
<classname>RepeatOperationsStepFactoryBean</classname>:</para>
|
||||
|
||||
<programlisting><bean id="simpleStep"
|
||||
class="org.springframework.batch.core.step.item.RepeatOperationsStepFactoryBean" >
|
||||
@@ -1533,10 +1539,10 @@
|
||||
<para>An alternative is to set a flag in the
|
||||
<classname>StepExecution</classname>, which is checked by the
|
||||
<classname>Step</classname> implementations in the framework in between
|
||||
item processing. To implement this alternative we need access to the
|
||||
item processing. To implement this alternative, we need access to the
|
||||
current StepExecution, and this can be achieved by implementing a
|
||||
StepListener and registering it with the Step. Here is an example of a
|
||||
listener that sets the flag</para>
|
||||
listener that sets the flag:</para>
|
||||
|
||||
<programlisting>public class CustomItemWriter extends ItemListenerSupport implements StepListener {
|
||||
|
||||
@@ -1567,13 +1573,13 @@
|
||||
<title>Adding a Footer Record</title>
|
||||
|
||||
<para>A very common requirement is to aggregate information during the
|
||||
output process and to append a record at the end of a file summarising
|
||||
output process and to append a record at the end of a file summarizing
|
||||
the data, or providing a checksum. This can also be achieved with a
|
||||
callbacks in the step, normally as part of a custom
|
||||
<classname>ItemWriter</classname>. In this case, since we are
|
||||
accumulating state that we do not want to lose if the job aborts, we
|
||||
probably need to implement the <classname>ItemStream</classname>
|
||||
interface.</para>
|
||||
<classname>ItemWriter</classname>. In this case, since a job is
|
||||
accumulating state that should not be lost if the job aborts, the
|
||||
<classname>ItemStream</classname> interface should be
|
||||
implemented:</para>
|
||||
|
||||
<programlisting>public class CustomItemWriter extends AbstractItemWriter implements
|
||||
ItemStream, StepListener
|
||||
@@ -1616,11 +1622,12 @@
|
||||
state is stored through the <classname>ItemStream</classname> interface
|
||||
in the <classname>ExecutionContext</classname>. In this way we can be
|
||||
sure that when the <code>open()</code> callback is received on a
|
||||
restart, we always get the last value that was committed.</para>
|
||||
|
||||
<para>N.B. We might not implement <classname>ItemStream</classname> if
|
||||
the ItemWriter is re-runnable, in the sense that it maintains its own
|
||||
state in a transactional resource like a database.</para>
|
||||
restart. The framework garuntees we always get the last value that was
|
||||
committed. It should be noted that it is not always necessary to
|
||||
implement ItemStream. For example, if the ItemWriter is re-runnable, in
|
||||
the sense that it maintains its own state in a transactional resource
|
||||
like a database, there is no need to maintain state within the writer
|
||||
itself.</para>
|
||||
</section>
|
||||
</section>
|
||||
</chapter>
|
||||
</chapter>
|
||||
@@ -10,17 +10,17 @@
|
||||
<para>All batch processing can be described in its most simple form as
|
||||
reading in large amounts of data, performing some type of calculation or
|
||||
transformation, and writing the result out. Spring Batch provides two key
|
||||
interfaces to help perform bulk reading and writing: ItemReader and
|
||||
ItemWriter</para>
|
||||
interfaces to help perform bulk reading and writing:
|
||||
<classname>ItemReader</classname> and
|
||||
<classname>ItemWriter</classname>.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title id="infrastructure.1">ItemReader</title>
|
||||
|
||||
<para>Although a simple concept, <emphasis
|
||||
role="bold">ItemReader</emphasis>s are the means for providing data from
|
||||
many different types of input. The most general examples include:
|
||||
<itemizedlist>
|
||||
<para>Although a simple concept, an <classname>ItemReader</classname> is
|
||||
the means for providing data from many different types of input. The most
|
||||
general examples include: <itemizedlist>
|
||||
<listitem>
|
||||
<para>Flat File- Flat File Item Readers read lines of data from a
|
||||
flat file that typically describe records with fields of data
|
||||
@@ -31,23 +31,24 @@
|
||||
<listitem>
|
||||
<para>XML - XML ItemReaders process XML independently of
|
||||
technologies used for parsing, mapping and validating objects. Input
|
||||
data allows for the validation of and XML file against and XSD
|
||||
data allows for the validation of and XML file against an XSD
|
||||
schema.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>Database - A database resource accessed that returns
|
||||
resultsets that can be mapped to objects for processing. The default
|
||||
SQL Input Sources invoke a RowMapper to return objects, keep track
|
||||
of the current row if restart is required, basic statistics, and
|
||||
some transaction enhancements that will be explained later.</para>
|
||||
<para>Database - A database resource is accessed that returns
|
||||
resultsets which can be mapped to objects for processing. The
|
||||
default SQL Input Sources invoke a <classname>RowMapper</classname>
|
||||
to return objects, keep track of the current row if restart is
|
||||
required, basic statistics, and some transaction enhancements that
|
||||
will be explained later.</para>
|
||||
</listitem>
|
||||
</itemizedlist>There are many more possibilities, but we'll focus on the
|
||||
basic ones for this chapter. A complete list of all available ItemReaders
|
||||
can be found in Appendix A.</para>
|
||||
|
||||
<para>The Item Reader is a basic interface for generic input
|
||||
operations:</para>
|
||||
<para><classname>ItemReader</classname> is a basic interface for generic
|
||||
input operations:</para>
|
||||
|
||||
<programlisting>public interface ItemReader {
|
||||
|
||||
@@ -64,7 +65,7 @@
|
||||
Item, returning null if no more items are left. An item might represent a
|
||||
line in a file, a row in a database, or an element in an XML file. It is
|
||||
generally expected that these will be mapped to a usable domain object
|
||||
(i.e. Trade or Foo, etc) but there is no requirement in the contract to do
|
||||
(i.e. Trade, Foo, etc) but there is no requirement in the contract to do
|
||||
so.</para>
|
||||
|
||||
<para>The <methodname>mark</methodname> and <methodname>reset</methodname>
|
||||
@@ -81,11 +82,11 @@
|
||||
|
||||
<para><classname>ItemWriter</classname> is similar in functionality to an
|
||||
<classname>ItemReader</classname> with the exception that the operations
|
||||
are reversed. They still need to be located, opened and closed but they
|
||||
differ in the case that we write out, rather than reading in. In the case
|
||||
of databases or queues these may be inserts, updates or sends. The format
|
||||
of the serialization of the output source is specific for every batch
|
||||
job.</para>
|
||||
are reversed. Resources still need to be located, opened and closed but
|
||||
they differ in the case that an <classname>ItemWriter</classname> writes
|
||||
out, rather than reading in. In the case of databases or queues these may
|
||||
be inserts, updates or sends. The format of the serialization of the
|
||||
output is specific for every batch job.</para>
|
||||
|
||||
<para>As with <classname>ItemReader</classname>,
|
||||
<classname>ItemWriter</classname> is a fairly generic interface:</para>
|
||||
@@ -101,19 +102,22 @@
|
||||
</programlisting>
|
||||
|
||||
<para>As with <methodname>read</methodname> on
|
||||
<classname>ItemReader</classname>, write provides the basic contract of
|
||||
<classname>ItemWriter</classname>, it will attempt to write out the item
|
||||
passed in as long as it is open. As with <methodname>mark</methodname> and
|
||||
<methodname>reset</methodname>, <methodname>flush</methodname> and
|
||||
<methodname>clear</methodname> are necessary due to the nature of batch
|
||||
processing. Because it is generally expected that items will be 'batched'
|
||||
together into a chunk, and then output, it is expected that an
|
||||
<classname>ItemWriter</classname> will perform some type of buffering.
|
||||
<methodname>flush</methodname> will empty the buffer by actually writing
|
||||
the items out, whereas <methodname>clear</methodname> will simply throw
|
||||
the contents of the buffer away. In most cases, a Step implementation will
|
||||
call <methodname>flush</methodname> before a commit and
|
||||
<methodname>clear</methodname> in case of rollback.</para>
|
||||
<classname>ItemReader</classname>, <methodname>write</methodname> provides
|
||||
the basic contract of <classname>ItemWriter</classname>, it will attempt
|
||||
to write out the item passed in as long as it is open. As with
|
||||
<methodname>mark</methodname> and <methodname>reset</methodname>,
|
||||
<methodname>flush</methodname> and <methodname>clear</methodname> are
|
||||
necessary due to the transactional nature of batch processing. Because it
|
||||
is generally expected that items will be 'batched' together into a chunk,
|
||||
and then output, it is expected that an <classname>ItemWriter</classname>
|
||||
will perform some type of buffering. <methodname>flush</methodname> will
|
||||
empty the buffer by actually writing the items out, whereas
|
||||
<methodname>clear</methodname> will simply throw the contents of the
|
||||
buffer away. In most cases, a Step implementation will call
|
||||
<methodname>flush</methodname> before a commit and
|
||||
<methodname>clear</methodname> in case of rollback. It is expected that
|
||||
implementations of the <classname>Step</classname> interface will call
|
||||
these methods.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
@@ -137,20 +141,20 @@
|
||||
|
||||
<para>Before describing each method, it's worth briefly mentioning the
|
||||
<classname>ExecutionContext</classname>. Clients of an
|
||||
<classname>ItemReader</classname> that is also an
|
||||
<classname>ItemReader</classname> that also implements
|
||||
<classname>ItemStream</classname> should call
|
||||
<methodname>open</methodname> before any calls to
|
||||
<methodname>read</methodname>, to open any resources such as files or
|
||||
obtain connections. A similar restriction applies to an
|
||||
<classname>ItemWriter</classname> that is also an
|
||||
<classname>ItemWriter</classname> is also implements
|
||||
<classname>ItemStream</classname>. As mentioned before, if expected data
|
||||
is found in the <classname>ExecutionContext</classname>, it may be used to
|
||||
start the <classname>ItemReader</classname> or
|
||||
<classname>ItemWriter</classname> at a location other than its initial
|
||||
state. Conversely, close will be called to ensure any resources allocated
|
||||
during <methodname>open</methodname> will be released safely.
|
||||
<methodname>update</methodname> is called primarily to ensure that any
|
||||
state currently being held is loaded into the provided
|
||||
state. Conversely, <methodname>close</methodname> will be called to ensure
|
||||
any resources allocated during <methodname>open</methodname> will be
|
||||
released safely. <methodname>update</methodname> is called primarily to
|
||||
ensure that any state currently being held is loaded into the provided
|
||||
<classname>ExecutionContext</classname>. This method will be called before
|
||||
committing, to ensure that the current state is persisted in the database
|
||||
before commit.</para>
|
||||
@@ -213,38 +217,25 @@
|
||||
<section>
|
||||
<title id="infrastructure.1.2.1">FlatFileItemReader</title>
|
||||
|
||||
<para>One of the most common tasks performed in batch jobs involve
|
||||
reading from some type of file. A flat file is basically any type of
|
||||
file that contains at most two-dimensional (tabular) data. Reading flat
|
||||
files in the Spring Batch framework is facilitated by the class
|
||||
<para>A flat file is any type of file that contains at most
|
||||
two-dimensional (tabular) data. Reading flat files in the Spring Batch
|
||||
framework is facilitated by the class
|
||||
<classname>FlatFileItemReader</classname>, which provides basic
|
||||
functionality for reading and parsing flat files. In addition, there are
|
||||
default implementations of the <classname>ItemReader</classname> and
|
||||
<classname>ItemStream</classname> interfaces that solve the majority of
|
||||
file processing needs.</para>
|
||||
|
||||
<para>The <classname>FlatFileItemReader</classname> class has several
|
||||
properties. The three most important of these properties are
|
||||
functionality for reading and parsing flat files.
|
||||
<classname>FlatFileItemReader</classname> class has several properties.
|
||||
The three most important of these properties are
|
||||
<classname>Resource</classname>, <classname>FieldSetMapper</classname>
|
||||
and <classname>LineTokenizer</classname>, which define the resource from
|
||||
which data will be read and the method by which the read data will be
|
||||
converted int distinct fields. The <classname>FieldSetMapper</classname>
|
||||
and <classname>LineTokenizer</classname> interfaces will be explored
|
||||
more in the next sections. In addition, we'll explore integration with
|
||||
the file system via the resource property. The resource property
|
||||
represents a Spring Core <classname>Resource</classname>. Documentation
|
||||
explaining how to create beans of this type can be found in <ulink
|
||||
and <classname>LineTokenizer. </classname>The
|
||||
<classname>FieldSetMapper</classname> and
|
||||
<classname>LineTokenizer</classname> interfaces will be explored more in
|
||||
the next sections. The resource property represents a Spring Core
|
||||
<classname>Resource</classname>. Documentation explaining how to create
|
||||
beans of this type can be found in <ulink
|
||||
url="http://static.springframework.org/spring/docs/2.5.x/reference/resources.html"><citetitle>Spring
|
||||
Framework, Chapter 4.Resources</citetitle></ulink>. Therefore, this
|
||||
guide will not go into the details of creating
|
||||
<classname>Resource</classname> objects except to make a couple of
|
||||
points on the locating files to process within a batch environment.
|
||||
Tokenizers and field set mappers will be discussed a bit later.</para>
|
||||
|
||||
<para>As mentioned, the location of the file is defined by the resource
|
||||
property. There are only a few methods exposed through a resource
|
||||
service. A resource is used to help locate, open, and close resources.
|
||||
It can be as simple as: <programlisting>
|
||||
<classname>Resource</classname> objects. A resource is used to locate,
|
||||
open, and close resources. It can be as simple as: <programlisting>
|
||||
Resource resource = new FileSystemResource("resources/trades.csv");
|
||||
</programlisting></para>
|
||||
|
||||
@@ -259,17 +250,8 @@
|
||||
process of feeding the data into the pipe from this starting
|
||||
point.</para>
|
||||
|
||||
<para>The flat file reader uses a
|
||||
<classname>ResourceLineReader</classname> object to read from the file.
|
||||
Optionally, you can specify a
|
||||
<classname>RecordSeparatorPolicy</classname> through the
|
||||
recordSeparatorPolicy property. This can be used to configure more
|
||||
low-level features, such as what constitutes the end of a line and
|
||||
whether to continue quoted strings over newlines, among other
|
||||
things.</para>
|
||||
|
||||
<para>The other properties in the flat file readers allow you to further
|
||||
specify how your data will be interpreted: <table>
|
||||
<para>The other properties in <classname>FlatFileItemReader</classname>
|
||||
allow you to further specify how your data will be interpreted: <table>
|
||||
<title>Flat File Item Reader Properties</title>
|
||||
|
||||
<tgroup cols="3">
|
||||
@@ -324,6 +306,16 @@
|
||||
AbstractLineTokenizer, field names will be set automatically
|
||||
from this line</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry align="left">recordSeparatorPolicy</entry>
|
||||
|
||||
<entry align="left">RecordSeparatorPolicy</entry>
|
||||
|
||||
<entry align="left">Used to determine where the line endings
|
||||
are and do things like continue over a line ending if inside a
|
||||
quoted string.</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table></para>
|
||||
@@ -331,16 +323,14 @@
|
||||
<section>
|
||||
<title>FieldSetMapper</title>
|
||||
|
||||
<para>Field set mappers used by the
|
||||
<classname>FlatFileItemReader</classname> implement the
|
||||
<classname>FieldSetMapper</classname> interface. This interface
|
||||
defines a single method, mapLine, which takes a FieldSet object and
|
||||
maps its contents to some Object. This object may be a custom DTO or
|
||||
domain object, or it could be as simple as an array, depending on your
|
||||
needs. The <classname>FieldSetMapper</classname> is used in
|
||||
conjunction with the <classname>LineTokenizer</classname> to translate
|
||||
a line of data from a resource into an object of the desired
|
||||
type:</para>
|
||||
<para>The <classname>FieldSetMapper</classname> interface defines a
|
||||
single method, <methodname>mapLine</methodname>, which takes a
|
||||
<classname>FieldSet</classname> object and maps its contents to an
|
||||
object. This object may be a custom DTO or domain object, or it could
|
||||
be as simple as an array, depending on your needs. The
|
||||
<classname>FieldSetMapper</classname> is used in conjunction with the
|
||||
<classname>LineTokenizer</classname> to translate a line of data from
|
||||
a resource into an object of the desired type:</para>
|
||||
|
||||
<programlisting> public interface FieldSetMapper {
|
||||
|
||||
@@ -348,7 +338,7 @@
|
||||
|
||||
}</programlisting>
|
||||
|
||||
<para>As you can see, the pattern used is exatly the same as
|
||||
<para>As you can see, the pattern used is exactly the same as
|
||||
<classname>RowMapper</classname> used by
|
||||
<classname>JdbcTemplate</classname>.</para>
|
||||
</section>
|
||||
@@ -367,14 +357,13 @@
|
||||
|
||||
FieldSet tokenize(String line);
|
||||
|
||||
}
|
||||
</programlisting>
|
||||
}</programlisting>
|
||||
|
||||
<para>The contract of a <classname>LineTokenizer</classname> is such
|
||||
that, given a line of input (in theory the
|
||||
<classname>String</classname> could encompass more than one line) a
|
||||
<classname>FieldSet</classname> representing the line will be
|
||||
returned. This will then be based to a
|
||||
returned. This will then be passed to a
|
||||
<classname>FieldSetMapper</classname>. Spring Batch contains the
|
||||
following LineTokenizers:</para>
|
||||
|
||||
@@ -405,8 +394,8 @@
|
||||
|
||||
<para>Now that the basic interfaces for reading in flat files have
|
||||
been defined, a simple example explaining how they work together is
|
||||
helpful. In it's most simple form, the flow when reading a line form a
|
||||
file is this:</para>
|
||||
helpful. In it's most simple form, the flow when reading a line from a
|
||||
file is the following:</para>
|
||||
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
@@ -479,7 +468,7 @@
|
||||
}
|
||||
} </programlisting></para>
|
||||
|
||||
<para>We can then read in from the filed by correctly constructing our
|
||||
<para>We can then read in from the file by correctly constructing our
|
||||
FlatFileItemReader and calling read():</para>
|
||||
|
||||
<programlisting> FlatFileItemReader itemReader = new FlatFileItemReader();
|
||||
@@ -498,11 +487,11 @@
|
||||
<section>
|
||||
<title>Mapping fields by name</title>
|
||||
|
||||
<para>There is one additional functionality that is similar in
|
||||
function to a JDBC <classname>ResultSet</classname>. The names of the
|
||||
fields can be injected into the <classname>LineTokenizer</classname>
|
||||
to increase the readability of the mapping function. We can expose
|
||||
this behavior by adding the following. First, we tell the
|
||||
<para>There is one additional functionality line tokenizers that is
|
||||
similar in function to a JDBC <classname>ResultSet</classname>. The
|
||||
names of the fields can be injected into the
|
||||
<classname>LineTokenizer</classname> to increase the readability of
|
||||
the mapping function. First, we tell the
|
||||
<classname>LineTokenizer</classname> what the names of the fields in
|
||||
the fieldset are:</para>
|
||||
|
||||
@@ -562,8 +551,8 @@
|
||||
is required) in the same way the Spring container will look for
|
||||
setters matching a property name. Each available field in the
|
||||
<classname>FieldSet</classname> will be mapped, and the resultant
|
||||
<classname>Player</classname> object will be returned, only there was
|
||||
no code required.</para>
|
||||
<classname>Player</classname> object will be returned, with no code
|
||||
required.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
@@ -678,23 +667,22 @@
|
||||
<para>Writing out to flat files has the same problems and issues that
|
||||
reading in from a file must overcome. It must be able to write out in
|
||||
either delimited or fixed length formats in a transactional
|
||||
manger.</para>
|
||||
manner.</para>
|
||||
|
||||
<section>
|
||||
<title>LineAggregator</title>
|
||||
|
||||
<para>Just like file reading's <classname>LineTokenizer</classname>
|
||||
interface is necessary to take a string and split it into tokens, file
|
||||
writing must have a way to aggregate multiple fields into a single
|
||||
string for writing to a file. In Spring Batch this is the
|
||||
<para>Just as the <classname>LineTokenizer</classname> interface is
|
||||
necessary to take a string and split it into tokens, file writing must
|
||||
have a way to aggregate multiple fields into a single string for
|
||||
writing to a file. In Spring Batch this is the
|
||||
<classname>LineAggregator</classname>:</para>
|
||||
|
||||
<programlisting> public interface LineAggregator {
|
||||
|
||||
public String aggregate(FieldSet fieldSet);
|
||||
|
||||
}
|
||||
</programlisting>
|
||||
}</programlisting>
|
||||
|
||||
<para>The <classname>LineAggregator</classname> is exactly the
|
||||
opposite of a <classname>LineTokenizer</classname>.
|
||||
@@ -761,22 +749,22 @@
|
||||
<classname>FlatFileItemWriter</classname> expresses this in
|
||||
code:</para>
|
||||
|
||||
<programlisting>public void write(Object data) throws Exception {
|
||||
FieldSet fieldSet = fieldSetCreator.mapItem(data);
|
||||
getOutputState().write(lineAggregator.aggregate(fieldSet) + LINE_SEPARATOR);
|
||||
}</programlisting>
|
||||
<programlisting> public void write(Object data) throws Exception {
|
||||
FieldSet fieldSet = fieldSetCreator.mapItem(data);
|
||||
getOutputState().write(lineAggregator.aggregate(fieldSet) + LINE_SEPARATOR);
|
||||
}</programlisting>
|
||||
|
||||
<para>A simple configuration with the smallest ammount of setters
|
||||
would look like the following:</para>
|
||||
|
||||
<programlisting><bean id="itemWriter"
|
||||
class="org.springframework.batch.io.file.FlatFileItemWriter">
|
||||
<property name="resource"
|
||||
value="file:target/test-outputs/20070122.testStream.multilineStep.txt" />
|
||||
<property name="fieldSetCreator">
|
||||
<bean class="org.springframework.batch.io.file.mapping.PassThroughFieldSetMapper"/>
|
||||
</property>
|
||||
</bean></programlisting>
|
||||
<programlisting> <bean id="itemWriter"
|
||||
class="org.springframework.batch.io.file.FlatFileItemWriter">
|
||||
<property name="resource"
|
||||
value="file:target/test-outputs/20070122.testStream.multilineStep.txt" />
|
||||
<property name="fieldSetCreator">
|
||||
<bean class="org.springframework.batch.io.file.mapping.PassThroughFieldSetMapper"/>
|
||||
</property>
|
||||
</bean></programlisting>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
@@ -788,17 +776,17 @@
|
||||
File writing isn't quite so simple. At first glance it seems like a
|
||||
similar straight forward contract should exist for
|
||||
<classname>FlatFileItemWriter</classname>, if the file already exists,
|
||||
throw an exception, if it does not, create it and start writing. Job
|
||||
restart throws a bit of a kink into this. In the normal restart
|
||||
scenario, the contract is reversed, if the file exists start writing
|
||||
to it from the last known good position, if it does not, throw an
|
||||
exception. However, what happens if the file name for this job is
|
||||
always the same? In this case, you would want to delete the file if it
|
||||
exists, unless it's a restart. Because of this possibility, the
|
||||
<classname>FlatFileItemWriter</classname> contains the property,
|
||||
<methodname>shouldDeleteIfExists</methodname>. Setting this property
|
||||
to true will cause an existing file with the same name to be deleted
|
||||
when the writer is opened.</para>
|
||||
throw an exception, if it does not, create it and start writing.
|
||||
However, potentially restarting a <classname>Job</classname> can cause
|
||||
issues. In the normal restart scenario, the contract is reversed, if
|
||||
the file exists start writing to it from the last known good position,
|
||||
if it does not, throw an exception. However, what happens if the file
|
||||
name for this job is always the same? In this case, you would want to
|
||||
delete the file if it exists, unless it's a restart. Because of this
|
||||
possibility, the <classname>FlatFileItemWriter</classname> contains
|
||||
the property, <methodname>shouldDeleteIfExists</methodname>. Setting
|
||||
this property to true will cause an existing file with the same name
|
||||
to be deleted when the writer is opened.</para>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
@@ -819,15 +807,15 @@
|
||||
only to provide callbacks).</para>
|
||||
</note>
|
||||
|
||||
<para>Lets take a closer look how XML input and output works in batch.
|
||||
First, there are a few concepts that vary from file reading and writing
|
||||
but are common across Spring Batch XML processing. With XML processing
|
||||
instead of lines of records (FieldSets) that need to be tokenized, it is
|
||||
assumed an XML resource is a collection of 'fragments' corresponding to
|
||||
individual records. Note that OXM tools are designed to work with
|
||||
standalone XML documents rather than XML fragments cut out of an XML
|
||||
document, therefore the Spring Batch infrastructure needs to work around
|
||||
this fact (as described below).</para>
|
||||
<para>Lets take a closer look how XML input and output works in Spring
|
||||
Batch. First, there are a few concepts that vary from file reading and
|
||||
writing but are common across Spring Batch XML processing. With XML
|
||||
processing instead of lines of records (FieldSets) that need to be
|
||||
tokenized, it is assumed an XML resource is a collection of 'fragments'
|
||||
corresponding to individual records. Note that OXM tools are designed to
|
||||
work with standalone XML documents rather than XML fragments cut out of an
|
||||
XML document, therefore the Spring Batch infrastructure needs to work
|
||||
around this fact, as described below:</para>
|
||||
|
||||
<para><mediaobject>
|
||||
<imageobject role="fo">
|
||||
@@ -843,9 +831,11 @@
|
||||
<caption><para>Figure 3.1: XML Input</para></caption>
|
||||
</mediaobject></para>
|
||||
|
||||
<para>Spring Batch uses Object/XML Mapping (OXM) to bind fragments to
|
||||
objects. However, Spring Batch is not tied to any particular xml binding
|
||||
technology. Typical use is to delegate to <ulink
|
||||
<para>The 'trade' tag is defined as the 'root element' in the scenario
|
||||
above. Everything between '<trade>' and '</trade>' is
|
||||
considered one 'fragment'. Spring Batch uses Object/XML Mapping (OXM) to
|
||||
bind fragments to objects. However, Spring Batch is not tied to any
|
||||
particular xml binding technology. Typical use is to delegate to <ulink
|
||||
url="http://static.springframework.org/spring-ws/site/reference/html/oxm.html"><citetitle>Spring
|
||||
OXM</citetitle></ulink>, which provides uniform abstraction for the most
|
||||
popular OXM technologies. The dependency on Spring OXM is optional and you
|
||||
@@ -868,8 +858,8 @@
|
||||
<caption><para>Figure 3.2: OXM Binding</para></caption>
|
||||
</mediaobject></para>
|
||||
|
||||
<para>Now with and introduction into OXM and how one can use XML fragments
|
||||
to represent records, let's take a closer look at Item Readers and Item
|
||||
<para>Now with an introduction to OXM and how one can use XML fragments to
|
||||
represent records, let's take a closer look at Item Readers and Item
|
||||
Writers.</para>
|
||||
|
||||
<section>
|
||||
@@ -901,27 +891,25 @@
|
||||
<price>99.99</price>
|
||||
<customer>Customer3</customer>
|
||||
</trade>
|
||||
</records>
|
||||
</programlisting></para>
|
||||
</records></programlisting></para>
|
||||
|
||||
<para>To be able to process the XML records we need the following:
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Root Element Name - this is name of the root element of the
|
||||
fragment that constitutes the object to be mapped. The example
|
||||
<para>Root Element Name - Name of the root element of the fragment
|
||||
that constitutes the object to be mapped. The example
|
||||
configuration demonstrates this with the value of trade.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>Resource - This is a Spring Resource that in the case of
|
||||
this example will abstract the details of opening a file for
|
||||
reading content.</para>
|
||||
<para>Resource - Spring Resource that represents the file to be
|
||||
read.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para><classname>FragmentDeserializer</classname> - this is the
|
||||
UnMarshalling facility provided by Spring OXM for mapping the XML
|
||||
fragment to an object.</para>
|
||||
<para><classname>FragmentDeserializer</classname> - UnMarshalling
|
||||
facility provided by Spring OXM for mapping the XML fragment to an
|
||||
object.</para>
|
||||
</listitem>
|
||||
</itemizedlist></para>
|
||||
|
||||
@@ -1010,12 +998,13 @@
|
||||
|
||||
<para>Output works symmetrically to input. The
|
||||
<classname>StaxEventItemWriter</classname> needs a
|
||||
<classname>Resource</classname>, a serializer, and a rootTagName. A java
|
||||
<classname>Resource</classname>, a serializer, and a rootTagName. A Java
|
||||
object is passed to a serializer (typically a wrapper around Spring OXM
|
||||
<classname>Marshaller</classname>) which writes to output using a custom
|
||||
event writer that filters the StartDocument and EndDocument events
|
||||
produced for each fragment by the OXM tools. We'll show this in an
|
||||
example using the
|
||||
<classname>Marshaller</classname>) which writes to a
|
||||
<classname>Resource</classname> using a custom event writer that filters
|
||||
the <classname>StartDocument</classname> and
|
||||
<classname>EndDocument</classname> events produced for each fragment by
|
||||
the OXM tools. We'll show this in an example using the
|
||||
<classname>MarshallingEventWriterSerializer</classname>. The Spring
|
||||
configuration for this setup looks as follows:</para>
|
||||
|
||||
@@ -1042,10 +1031,10 @@
|
||||
</bean></parameter></programlisting>
|
||||
|
||||
<para>To summarize with a Java example, the following code illustrates
|
||||
all of the points discussed. The code demonstrates the programmatic
|
||||
setup of the required properties.</para>
|
||||
all of the points discussed, demonstrating the programmatic setup of the
|
||||
required properties.</para>
|
||||
|
||||
<programlisting>StaxEventItemWriter staxItemWriter = new StaxEventItemWriter()
|
||||
<programlisting> StaxEventItemWriter staxItemWriter = new StaxEventItemWriter()
|
||||
FileSystemResource resource = new FileSystemResource(File.createTempFile("StaxEventWriterOutputSourceTests", "xml"))
|
||||
|
||||
Map aliases = new HashMap();
|
||||
@@ -1116,13 +1105,13 @@
|
||||
<classname>ResourceEditor</classname> in Spring already filters and does
|
||||
placeholder replacement on system properties.)</para>
|
||||
|
||||
<para>Often in a batch setting it is preferable to parameterise the file
|
||||
<para>Often in a batch setting it is preferable to parameterize the file
|
||||
name in the <classname>JobParameters</classname> of the job, instead of
|
||||
through system properties, and access them that way. To allow for this,
|
||||
Spring Batch provides the
|
||||
<classname>StepExecutionResourceProxy</classname>. The proxy can use
|
||||
either job name, step name, or any values from the
|
||||
<classname>JobParameters</classname>, by surround them with %:</para>
|
||||
<classname>JobParameters</classname>, by surrounding them with %:</para>
|
||||
|
||||
<programlisting> <bean id="inputFile"
|
||||
class="org.springframework.batch.core.resource.StepExecutionResourceProxy" />
|
||||
@@ -1131,8 +1120,8 @@
|
||||
|
||||
<para>Assuming a job name of 'fooJob', and a step name of 'fooStep', and
|
||||
the key-value pair of 'file.name="fileName.txt"' is in the
|
||||
<classname>JobParameters</classname> the job is start with, the following
|
||||
filename will be passed as the <classname>Resource</classname>:
|
||||
<classname>JobParameters</classname> the job is started with, the
|
||||
following filename will be passed as the <classname>Resource</classname>:
|
||||
"<filename>//fooJob/fooStep/fileName.txt</filename>". It should be noted
|
||||
that in order for the proxy to have access to the
|
||||
<classname>StepExecution</classname>, it must be registered as a
|
||||
@@ -1417,7 +1406,7 @@ itemReader.close(executionContext);</programlisting>
|
||||
<classname>CustomerCredit</classname> objects in the exact same manner
|
||||
as described by the <classname>JdbcCursorItemReader</classname>,
|
||||
assuming hibernate mapping files have been created correctly for the
|
||||
Customer table. The 'useStatelessSession' property default to true,
|
||||
Customer table. The 'useStatelessSession' property defaults to true,
|
||||
but has been added here to draw attention to the ability to switch it
|
||||
on or off.</para>
|
||||
</section>
|
||||
@@ -1427,7 +1416,7 @@ itemReader.close(executionContext);</programlisting>
|
||||
<title>Driving Query Based ItemReaders</title>
|
||||
|
||||
<para>In the previous section, Cursor based database input was
|
||||
discussed. However, this isn't the only option. Many database vendors,
|
||||
discussed. However, it isn't the only option. Many database vendors,
|
||||
such as DB2, have extremely pessimistic locking strategies that can
|
||||
cause issues if the table being read also needs to be used by other
|
||||
portions of the online application. Furthermore, opening cursors over
|
||||
@@ -1451,9 +1440,9 @@ itemReader.close(executionContext);</programlisting>
|
||||
<para>As you can see, this example uses the same 'FOO' table as was used
|
||||
in the cursor based example. However, rather than selecting the entire
|
||||
row, only the ID's were selected in the SQL statement. So, rather than a
|
||||
FOO object being returned from read(), an Integer will be returned. This
|
||||
number can then be used to query for the 'details', which is a complete
|
||||
Foo object:</para>
|
||||
FOO object being returned from <classname>read</classname>, an Integer
|
||||
will be returned. This number can then be used to query for the
|
||||
'details', which is a complete Foo object:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject role="html">
|
||||
@@ -1476,8 +1465,8 @@ itemReader.close(executionContext);</programlisting>
|
||||
<title>KeyCollector</title>
|
||||
|
||||
<para>As the previous example illustrates, the DrivingQueryItemReader
|
||||
is fairly simple. It simply iteratoes over a list of keys. However,
|
||||
the real complication is how those keys are obtained. The
|
||||
is fairly simple. It simply iterates over a list of keys. However, the
|
||||
real complication is how those keys are obtained. The
|
||||
<classname>KeyCollector</classname> interface abstracts this:</para>
|
||||
|
||||
<programlisting> public interface KeyCollector {
|
||||
@@ -1494,9 +1483,9 @@ itemReader.close(executionContext);</programlisting>
|
||||
keys 1 through 1,000, and fails after processing key 500, upon
|
||||
restarting keys 500 through 1,000 should be returned. This
|
||||
functionality is made possible by the
|
||||
<methodname>saveState</methodname> method, which saves the provided
|
||||
key (which should be the current key being processed) in the provided
|
||||
<classname>ExecutionContext</classname>. The
|
||||
<methodname>updateContext</methodname> method, which saves the
|
||||
provided key (which should be the current key being processed) in the
|
||||
provided <classname>ExecutionContext</classname>. The
|
||||
<methodname>retrieveKeys</methodname> method can then use this value
|
||||
to retrieve a subset of the original keys:</para>
|
||||
|
||||
@@ -1520,10 +1509,10 @@ itemReader.close(executionContext);</programlisting>
|
||||
<section>
|
||||
<title>SingleColumnJdbcKeyCollector</title>
|
||||
|
||||
<para>The most common driving query scenario is that of a input that
|
||||
has only one column that represents it's key. This is implemented as
|
||||
the <classname>SingleColumnJdbcKeyCollector</classname> class, which
|
||||
has the following options:</para>
|
||||
<para>The most common driving query scenario is that of input that has
|
||||
only one column that represents its key. This is implemented as the
|
||||
<classname>SingleColumnJdbcKeyCollector</classname> class, which has
|
||||
the following options:</para>
|
||||
|
||||
<table>
|
||||
<title>SinglecolumnJdbcKeyCollector properties</title>
|
||||
@@ -1717,25 +1706,24 @@ itemReader.close(executionContext);</programlisting>
|
||||
example, let's assume that 20 items will be written per chunk, and the
|
||||
15th item throws a DataIntegrityViolationException. As far as the Step
|
||||
is concerned, all 20 item will be written out successfully, since
|
||||
there's no way to know that and error will occur until they are actually
|
||||
there's no way to know that an error will occur until they are actually
|
||||
written out. Once
|
||||
<classname>ItemWriter#</classname><methodname>flush</methodname>() is
|
||||
called, the buffer will be emptied and the exception will be hit. At
|
||||
this point, there's nothing the Step can do, the transaction must be
|
||||
rolled back. Normally, this exception will cause the Item to be skipped
|
||||
(depending upon the skip/retry policies), and then it won't be written
|
||||
out again. However, in this scenario, there's no way for it to know
|
||||
which item caused the issue, the whole buffer was being written out when
|
||||
the failure happened. Because this is a common enough use case,
|
||||
especially when using Hibernate, Spring Batch provides an implementation
|
||||
to help: <classname>HibernateAwareItemWriter</classname>. The
|
||||
<classname>HibernateAwareItemWriter</classname> solves the problem in a
|
||||
straightforward way: if a chunk fails the first time, on subsequent runs
|
||||
it will be flushed and the transaction committed after each time. This
|
||||
effectively lowers the commit interval to one for the length of the
|
||||
chunk. Doing so allows for items to be skipped reliably. The following
|
||||
example illustrates how to configure the
|
||||
<classname>HibernateAwareItemWriter</classname>:</para>
|
||||
this point, there's nothing the <classname>Step</classname> can do, the
|
||||
transaction must be rolled back. Normally, this exception will cause the
|
||||
Item to be skipped (depending upon the skip/retry policies), and then it
|
||||
won't be written out again. However, in this scenario, there's no way
|
||||
for it to know which item caused the issue, the whole buffer was being
|
||||
written out when the failure happened. Because this is a common enough
|
||||
use case, especially when using Hibernate, Spring Batch provides an
|
||||
implementation to help: <classname>HibernateAwareItemWriter</classname>.
|
||||
The <classname>HibernateAwareItemWriter</classname> solves the problem
|
||||
in a straightforward way: if a chunk fails the first time, on subsequent
|
||||
runs it will be flushed after after each time. This effectively lowers
|
||||
the commit interval to one for the length of the chunk. Doing so allows
|
||||
for items to be skipped reliably. The following example illustrates how
|
||||
to configure the <classname>HibernateAwareItemWriter</classname>:</para>
|
||||
|
||||
<programlisting> <bean id="hibernateItemWriter"
|
||||
class="org.springframework.batch.item.database.HibernateAwareItemWriter">
|
||||
@@ -1855,10 +1843,10 @@ itemReader.close(executionContext);</programlisting>
|
||||
Object transform(Object item) throws Exception;
|
||||
}</programlisting>
|
||||
|
||||
<para>An ItemTransformer interface is very simple, given one object,
|
||||
transorm it and return another. The object provided may or may not be of
|
||||
the same type. The point is that business logic may be applied within
|
||||
transform, and is completely up to the developer to create. An
|
||||
<para>An ItemTransformer is very simple, given one object, transorm it and
|
||||
return another. The object provided may or may not be of the same type.
|
||||
The point is that business logic may be applied within transform, and is
|
||||
completely up to the developer to create. An
|
||||
<classname>ItemTransformer</classname> is used as part of the
|
||||
<classname>ItemTransformerItemWriter</classname>, which accepts an
|
||||
<classname>ItemWriter</classname> and an
|
||||
@@ -1920,18 +1908,19 @@ itemReader.close(executionContext);</programlisting>
|
||||
<section>
|
||||
<para>Note that the <classname>ItemTransformerItemWriter</classname>
|
||||
and the <classname>CompositeItemWriter</classname> are examples of a
|
||||
delegation pattern, which is quite common usage in Spring Batch. The
|
||||
delegates themselves might implement callback interfaces like
|
||||
delegation pattern, which is common in Spring Batch. The delegates
|
||||
themselves might implement callback interfaces like
|
||||
<classname>ItemStream</classname> or
|
||||
<classname>StepListener</classname>. If they do, and they are being
|
||||
used in conjunction with Spring Batch Core as part of a step in a job,
|
||||
then they almost certainly need to be registered manually with the
|
||||
used in conjunction with Spring Batch Core as part of a
|
||||
<classname>Step</classname> in a <classname>Job</classname>, then they
|
||||
almost certainly need to be registered manually with the
|
||||
<classname>Step</classname>. Registration is automatic when using the
|
||||
factory beans (<classname>*StepFactoryBean</classname>) , but only for
|
||||
the <classname>ItemReader</classname> and
|
||||
<classname>ItemWriter</classname> injected directly - the delegates
|
||||
are not known to the step, so they need to be injected as listeners or
|
||||
streams (or both if appropriate).</para>
|
||||
<classname>ItemWriter</classname> injected directly. The delegates are
|
||||
not known to the <classname>Step</classname>, so they need to be
|
||||
injected as listeners or streams (or both if appropriate).</para>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
@@ -69,9 +69,9 @@
|
||||
have teamed to collaborate on the development of Spring Batch.</para>
|
||||
|
||||
<para>Accenture has contributed previously proprietary batch processing
|
||||
architecture frameworks -- based upon decades worth of experience in
|
||||
architecture frameworks, based upon decades worth of experience in
|
||||
building batch architectures with the last several generations of
|
||||
platforms (i.e., COBOL/Mainframe, C++/Unix, and now Java/anywhere) -- to
|
||||
platforms, (i.e., COBOL/Mainframe, C++/Unix, and now Java/anywhere) to
|
||||
the Spring Batch project along with committer resources to drive
|
||||
support, enhancements, and the future roadmap.</para>
|
||||
|
||||
@@ -178,11 +178,11 @@
|
||||
<para>Spring Batch is designed with extensibility and a diverse group of
|
||||
end users in mind. The figure below shows a sketch of the layered
|
||||
architecture that supports the extensibility and ease of use for
|
||||
end-user developers.
|
||||
<mediaobject>
|
||||
end-user developers. <mediaobject>
|
||||
<imageobject role="fo">
|
||||
<imagedata align="center" fileref="src/site/resources/reference/images/spring-batch-layers.png"
|
||||
format="PNG"/>
|
||||
<imagedata align="center"
|
||||
fileref="src/site/resources/reference/images/spring-batch-layers.png"
|
||||
format="PNG" />
|
||||
</imageobject>
|
||||
|
||||
<imageobject role="html">
|
||||
@@ -204,7 +204,8 @@
|
||||
are built on top of a common infrastructure. This infrastructure
|
||||
contains common readers and writers, and services such as the
|
||||
<classname>RetryTemplate</classname>, which are used both by application
|
||||
developers(readers and writers) and the core framework itself.
|
||||
developers(<classname>ItemReader</classname> and
|
||||
<classname>ItemWriter</classname>) and the core framework itself.
|
||||
(retry)</para>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
Reference in New Issue
Block a user