Files
spring-batch/build/reference-work/readersAndWriters.xml
Michael Minella 75ab909314 update
2017-03-23 10:18:33 -05:00

2759 lines
127 KiB
XML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
<chapter id="readersAndWriters">
<title>ItemReaders and ItemWriters</title>
<para>All batch processing can be described in its most simple form as
reading in large amounts of data, performing some type of calculation or
transformation, and writing the result out. Spring Batch provides three key
interfaces to help perform bulk reading and writing:
<classname>ItemReader</classname>, <classname>ItemProcessor</classname> and
<classname>ItemWriter</classname>.</para>
<section id="itemReader">
<title id="infrastructure.1">ItemReader</title>
<para>Although a simple concept, an <classname>ItemReader</classname> is
the means for providing data from many different types of input. The most
general examples include: <itemizedlist>
<listitem>
<para>Flat File- Flat File Item Readers read lines of data from a
flat file that typically describe records with fields of data
defined by fixed positions in the file or delimited by some special
character (e.g. Comma).</para>
</listitem>
<listitem>
<para>XML - XML ItemReaders process XML independently of
technologies used for parsing, mapping and validating objects. Input
data allows for the validation of an XML file against an XSD
schema.</para>
</listitem>
<listitem>
<para>Database - A database resource is accessed to return
resultsets which can be mapped to objects for processing. The
default SQL ItemReaders invoke a <classname>RowMapper</classname> to
return objects, keep track of the current row if restart is
required, store basic statistics, and provide some transaction
enhancements that will be explained later.</para>
</listitem>
</itemizedlist>There are many more possibilities, but we'll focus on the
basic ones for this chapter. A complete list of all available ItemReaders
can be found in Appendix A.</para>
<para><classname>ItemReader</classname> is a basic interface for generic
input operations:</para>
<programlisting language="java">public interface ItemReader&lt;T&gt; {
T read() throws Exception, UnexpectedInputException, ParseException;
}</programlisting>
<para>The <methodname>read</methodname> method defines the most essential
contract of the <classname>ItemReader</classname>; calling it returns one
Item or null if no more items are left. An item might represent a line in
a file, a row in a database, or an element in an XML file. It is generally
expected that these will be mapped to a usable domain object (i.e. Trade,
Foo, etc) but there is no requirement in the contract to do so.</para>
<para>It is expected that implementations of the
<classname>ItemReader</classname> interface will be forward only. However,
if the underlying resource is transactional (such as a JMS queue) then
calling read may return the same logical item on subsequent calls in a
rollback scenario. It is also worth noting that a lack of items to process
by an <classname>ItemReader</classname> will not cause an exception to be
thrown. For example, a database <classname>ItemReader</classname> that is
configured with a query that returns 0 results will simply return null on
the first invocation of <methodname>read</methodname>.</para>
</section>
<section id="itemWriter">
<title id="infrastructure.1.4">ItemWriter</title>
<para><classname>ItemWriter</classname> is similar in functionality to an
<classname>ItemReader</classname>, but with inverse operations. Resources
still need to be located, opened and closed but they differ in that an
<classname>ItemWriter</classname> writes out, rather than reading in. In
the case of databases or queues these may be inserts, updates, or sends.
The format of the serialization of the output is specific to each batch
job.</para>
<para>As with <classname>ItemReader</classname>,
<classname>ItemWriter</classname> is a fairly generic interface:</para>
<programlisting language="java">public interface ItemWriter&lt;T&gt; {
void write(List&lt;? extends T&gt; items) throws Exception;
}</programlisting>
<para>As with <methodname>read</methodname> on
<classname>ItemReader</classname>, <methodname>write</methodname> provides
the basic contract of <classname>ItemWriter</classname>; it will attempt
to write out the list of items passed in as long as it is open. Because it
is generally expected that items will be 'batched' together into a chunk
and then output, the interface accepts a list of items, rather than an
item by itself. After writing out the list, any flushing that may be
necessary can be performed before returning from the write method. For
example, if writing to a Hibernate DAO, multiple calls to write can be
made, one for each item. The writer can then call close on the hibernate
Session before returning.</para>
</section>
<section id="itemProcessor">
<title>ItemProcessor</title>
<para>The <classname>ItemReader</classname> and
<classname>ItemWriter</classname> interfaces are both very useful for
their specific tasks, but what if you want to insert business logic before
writing? One option for both reading and writing is to use the composite
pattern: create an <classname>ItemWriter</classname> that contains another
<classname>ItemWriter</classname>, or an <classname>ItemReader</classname>
that contains another <classname>ItemReader</classname>. For
example:</para>
<programlisting language="java">public class CompositeItemWriter&lt;T&gt; implements ItemWriter&lt;T&gt; {
ItemWriter&lt;T&gt; itemWriter;
public CompositeItemWriter(ItemWriter&lt;T&gt; itemWriter) {
this.itemWriter = itemWriter;
}
public void write(List&lt;? extends T&gt; items) throws Exception {
//Add business logic here
itemWriter.write(item);
}
public void setDelegate(ItemWriter&lt;T&gt; itemWriter){
this.itemWriter = itemWriter;
}
}</programlisting>
<para>The class above contains another <classname>ItemWriter</classname>
to which it delegates after having provided some business logic. This
pattern could easily be used for an <classname>ItemReader</classname> as
well, perhaps to obtain more reference data based upon the input that was
provided by the main <classname>ItemReader</classname>. It is also useful
if you need to control the call to <classname>write</classname> yourself.
However, if you only want to 'transform' the item passed in for writing
before it is actually written, there isn't much need to call
<methodname>write</methodname> yourself: you just want to modify the item.
For this scenario, Spring Batch provides the
<classname>ItemProcessor</classname> interface:</para>
<programlisting language="java">public interface ItemProcessor&lt;I, O&gt; {
O process(I item) throws Exception;
}</programlisting>
<para>An <classname>ItemProcessor</classname> is very simple; given one
object, transform it and return another. The provided object may or may
not be of the same type. The point is that business logic may be applied
within process, and is completely up to the developer to create. An
<classname>ItemProcessor</classname> can be wired directly into a step,
For example, assuming an <classname>ItemReader</classname> provides a
class of type Foo, and it needs to be converted to type Bar before being
written out. An <classname>ItemProcessor</classname> can be written that
performs the conversion:</para>
<programlisting language="java">public class Foo {}
public class Bar {
public Bar(Foo foo) {}
}
public class FooProcessor implements ItemProcessor&lt;Foo,Bar&gt;{
public Bar process(Foo foo) throws Exception {
//Perform simple transformation, convert a Foo to a Bar
return new Bar(foo);
}
}
public class BarWriter implements ItemWriter&lt;Bar&gt;{
public void write(List&lt;? extends Bar&gt; bars) throws Exception {
//write bars
}
}</programlisting>
<para>In the very simple example above, there is a class
<classname>Foo</classname>, a class <classname>Bar</classname>, and a
class <classname>FooProcessor</classname> that adheres to the
<classname>ItemProcessor</classname> interface. The transformation is
simple, but any type of transformation could be done here. The
<classname>BarWriter</classname> will be used to write out
<classname>Bar</classname> objects, throwing an exception if any other
type is provided. Similarly, the <classname>FooProcessor</classname> will
throw an exception if anything but a <classname>Foo</classname> is
provided. The <classname>FooProcessor</classname> can then be injected
into a <classname>Step</classname>:</para>
<programlisting language="xml">&lt;job id="ioSampleJob"&gt;
&lt;step name="step1"&gt;
&lt;tasklet&gt;
&lt;chunk reader="fooReader" processor="fooProcessor" writer="barWriter"
commit-interval="2"/&gt;
&lt;/tasklet&gt;
&lt;/step&gt;
&lt;/job&gt;</programlisting>
<section id="chainingItemProcessors">
<title>Chaining ItemProcessors</title>
<para>Performing a single transformation is useful in many scenarios,
but what if you want to 'chain' together multiple
<classname>ItemProcessor</classname>s? This can be accomplished using
the composite pattern mentioned previously. To update the previous,
single transformation, example, <classname>Foo</classname> will be
transformed to <classname>Bar</classname>, which will be transformed to
<classname>Foobar</classname> and written out:</para>
<programlisting language="java">public class Foo {}
public class Bar {
public Bar(Foo foo) {}
}
public class Foobar{
public Foobar(Bar bar) {}
}
public class FooProcessor implements ItemProcessor&lt;Foo,Bar&gt;{
public Bar process(Foo foo) throws Exception {
//Perform simple transformation, convert a Foo to a Bar
return new Bar(foo);
}
}
public class BarProcessor implements ItemProcessor&lt;Bar,FooBar&gt;{
public FooBar process(Bar bar) throws Exception {
return new Foobar(bar);
}
}
public class FoobarWriter implements ItemWriter&lt;FooBar&gt;{
public void write(List&lt;? extends FooBar&gt; items) throws Exception {
//write items
}
}</programlisting>
<para>A <classname>FooProcessor</classname> and
<classname>BarProcessor</classname> can be 'chained' together to give
the resultant <classname>Foobar</classname>:</para>
<programlisting language="java">CompositeItemProcessor&lt;Foo,Foobar&gt; compositeProcessor =
new CompositeItemProcessor&lt;Foo,Foobar&gt;();
List itemProcessors = new ArrayList();
itemProcessors.add(new FooTransformer());
itemProcessors.add(new BarTransformer());
compositeProcessor.setDelegates(itemProcessors);</programlisting>
<para>Just as with the previous example, the composite processor can be
configured into the <classname>Step</classname>:</para>
<programlisting language="xml">&lt;job id="ioSampleJob"&gt;
&lt;step name="step1"&gt;
&lt;tasklet&gt;
&lt;chunk reader="fooReader" processor="compositeProcessor" writer="foobarWriter"
commit-interval="2"/&gt;
&lt;/tasklet&gt;
&lt;/step&gt;
&lt;/job&gt;
&lt;bean id="compositeItemProcessor"
class="org.springframework.batch.item.support.CompositeItemProcessor"&gt;
&lt;property name="delegates"&gt;
&lt;list&gt;
&lt;bean class="..FooProcessor" /&gt;
&lt;bean class="..BarProcessor" /&gt;
&lt;/list&gt;
&lt;/property&gt;
&lt;/bean&gt;</programlisting>
</section>
<section id="filiteringRecords">
<title>Filtering Records</title>
<para>One typical use for an item processor is to filter out records
before they are passed to the ItemWriter. Filtering is an action
distinct from skipping; skipping indicates that a record is invalid
whereas filtering simply indicates that a record should not be
written.</para>
<para>For example, consider a batch job that reads a file containing
three different types of records: records to insert, records to update,
and records to delete. If record deletion is not supported by the
system, then we would not want to send any "delete" records to the
<classname>ItemWriter</classname>. But, since these records are not
actually bad records, we would want to filter them out, rather than
skip. As a result, the ItemWriter would receive only "insert" and
"update" records.</para>
<para>To filter a record, one simply returns "null" from the
<classname>ItemProcessor</classname>. The framework will detect that the
result is "null" and avoid adding that item to the list of records
delivered to the <classname>ItemWriter</classname>. As usual, an
exception thrown from the <classname>ItemProcessor</classname> will
result in a skip.</para>
</section>
<section id="faultTolerant">
<title>Fault Tolerance</title>
<para>When a chunk is rolled back, items that have been cached
during reading may be reprocessed. If a step is configured to
be fault tolerant (uses skip or retry processing typically),
any ItemProcessor used should be implemented in a way that is
idempotent. Typically that would consist of performing no changes
on the input item for the ItemProcessor and only updating the
instance that is the result.</para>
</section>
</section>
<section id="itemStream">
<title>ItemStream</title>
<para>Both <classname>ItemReader</classname>s and
<classname>ItemWriter</classname>s serve their individual purposes well,
but there is a common concern among both of them that necessitates another
interface. In general, as part of the scope of a batch job, readers and
writers need to be opened, closed, and require a mechanism for persisting
state:</para>
<programlisting language="java">public interface ItemStream {
void open(ExecutionContext executionContext) throws ItemStreamException;
void update(ExecutionContext executionContext) throws ItemStreamException;
void close() throws ItemStreamException;
}</programlisting>
<para>Before describing each method, we should mention the
<classname>ExecutionContext</classname>. Clients of an
<classname>ItemReader</classname> that also implement
<classname>ItemStream</classname> should call
<methodname>open</methodname> before any calls to
<methodname>read</methodname> in order to open any resources such as files
or to obtain connections. A similar restriction applies to an
<classname>ItemWriter</classname> that implements
<classname>ItemStream</classname>. As mentioned in Chapter 2, if expected
data is found in the <classname>ExecutionContext</classname>, it may be
used to start the <classname>ItemReader</classname> or
<classname>ItemWriter</classname> at a location other than its initial
state. Conversely, <methodname>close</methodname> will be called to ensure
that any resources allocated during <methodname>open</methodname> will be
released safely. <methodname>update</methodname> is called primarily to
ensure that any state currently being held is loaded into the provided
<classname>ExecutionContext</classname>. This method will be called before
committing, to ensure that the current state is persisted in the database
before commit.</para>
<para>In the special case where the client of an
<classname>ItemStream</classname> is a <classname>Step</classname> (from
the Spring Batch Core), an <classname>ExecutionContext</classname> is
created for each <classname>StepExecution</classname> to allow users to
store the state of a particular execution, with the expectation that it
will be returned if the same <classname>JobInstance</classname> is started
again. For those familiar with Quartz, the semantics are very similar to a
Quartz <classname>JobDataMap</classname>.</para>
</section>
<section id="delegatePatternAndRegistering">
<title>The Delegate Pattern and Registering with the Step</title>
<para>Note that the <classname>CompositeItemWriter</classname> is an
example of the delegation pattern, which is common in Spring Batch. The
delegates themselves might implement callback interfaces <classname>StepListener</classname>.
If they do, and they are being used in conjunction with Spring Batch Core
as part of a <classname>Step</classname> in a <classname>Job</classname>,
then they almost certainly need to be registered manually with the
<classname>Step</classname>. A reader, writer, or processor that is
directly wired into the Step will be registered automatically if it
implements <classname>ItemStream</classname> or a
<classname>StepListener</classname> interface. But because the delegates
are not known to the <classname>Step</classname>, they need to be injected
as listeners or streams (or both if appropriate):</para>
<programlisting language="xml">&lt;job id="ioSampleJob"&gt;
&lt;step name="step1"&gt;
&lt;tasklet&gt;
&lt;chunk reader="fooReader" processor="fooProcessor" writer="compositeItemWriter"
commit-interval="2"&gt;
&lt;streams&gt;
&lt;stream ref="barWriter" /&gt;
&lt;/streams&gt;
&lt;/chunk&gt;
&lt;/tasklet&gt;
&lt;/step&gt;
&lt;/job&gt;
&lt;bean id="compositeItemWriter" class="...CustomCompositeItemWriter"&gt;
&lt;property name="delegate" ref="barWriter" /&gt;
&lt;/bean&gt;
&lt;bean id="barWriter" class="...BarWriter" /&gt;</programlisting>
</section>
<section id="flatFiles">
<title id="infrastructure.1.2">Flat Files</title>
<para>One of the most common mechanisms for interchanging bulk data has
always been the flat file. Unlike XML, which has an agreed upon standard
for defining how it is structured (XSD), anyone reading a flat file must
understand ahead of time exactly how the file is structured. In general,
all flat files fall into two types: Delimited and Fixed Length. Delimited
files are those in which fields are separated by a delimiter, such as a
comma. Fixed Length files have fields that are a set length.</para>
<section id="fieldSet">
<title>The FieldSet</title>
<para>When working with flat files in Spring Batch, regardless of
whether it is for input or output, one of the most important classes is
the <classname>FieldSet</classname>. Many architectures and libraries
contain abstractions for helping you read in from a file, but they
usually return a String or an array of Strings. This really only gets
you halfway there. A <classname>FieldSet</classname> is Spring Batchs
abstraction for enabling the binding of fields from a file resource. It
allows developers to work with file input in much the same way as they
would work with database input. A <classname>FieldSet</classname> is
conceptually very similar to a Jdbc <classname>ResultSet</classname>.
FieldSets only require one argument, a <classname>String</classname>
array of tokens. Optionally, you can also configure in the names of the
fields so that the fields may be accessed either by index or name as
patterned after <classname>ResultSet</classname>:</para>
<programlisting language="java">String[] tokens = new String[]{"foo", "1", "true"};
FieldSet fs = new DefaultFieldSet(tokens);
String name = fs.readString(0);
int value = fs.readInt(1);
boolean booleanValue = fs.readBoolean(2);</programlisting>
<para>There are many more options on the <classname>FieldSet</classname>
interface, such as <classname>Date</classname>, long,
<classname>BigDecimal</classname>, etc. The biggest advantage of the
<classname>FieldSet</classname> is that it provides consistent parsing
of flat file input. Rather than each batch job parsing differently in
potentially unexpected ways, it can be consistent, both when handling
errors caused by a format exception, or when doing simple data
conversions.</para>
</section>
<section id="flatFileItemReader">
<title id="infrastructure.1.2.1">FlatFileItemReader</title>
<para>A flat file is any type of file that contains at most
two-dimensional (tabular) data. Reading flat files in the Spring Batch
framework is facilitated by the class
<classname>FlatFileItemReader</classname>, which provides basic
functionality for reading and parsing flat files. The two most important
required dependencies of <classname>FlatFileItemReader</classname> are
<classname>Resource</classname> and <classname>LineMapper.
</classname>The <classname>LineMapper</classname> interface will be
explored more in the next sections. The resource property represents a
Spring Core <classname>Resource</classname>. Documentation explaining
how to create beans of this type can be found in <ulink
url="http://docs.spring.io/spring/docs/3.2.x/spring-framework-reference/html/resources.html"><citetitle>Spring
Framework, Chapter 5.Resources</citetitle></ulink>. Therefore, this
guide will not go into the details of creating
<classname>Resource</classname> objects. However, a simple example of a
file system resource can be found below:
</para>
<programlisting language="java">Resource resource = new FileSystemResource("resources/trades.csv");</programlisting>
<para>In complex batch environments the directory structures are often
managed by the EAI infrastructure where drop zones for external
interfaces are established for moving files from ftp locations to batch
processing locations and vice versa. File moving utilities are beyond
the scope of the spring batch architecture but it is not unusual for
batch job streams to include file moving utilities as steps in the job
stream. It is sufficient that the batch architecture only needs to know
how to locate the files to be processed. Spring Batch begins the process
of feeding the data into the pipe from this starting point. However,
<ulink
url="http://projects.spring.io/spring-integration/"><citetitle>Spring
Integration</citetitle></ulink> provides many of these types of
services.</para>
<para>The other properties in <classname>FlatFileItemReader</classname>
allow you to further specify how your data will be interpreted: <table>
<title>FlatFileItemReader Properties</title>
<tgroup cols="3">
<colspec align="center" />
<thead>
<row>
<entry align="center">Property</entry>
<entry align="center">Type</entry>
<entry align="center">Description</entry>
</row>
</thead>
<tbody>
<row>
<entry align="left">comments</entry>
<entry align="left">String[]</entry>
<entry align="left">Specifies line prefixes that indicate
comment rows</entry>
</row>
<row>
<entry align="left">encoding</entry>
<entry align="left">String</entry>
<entry align="left">Specifies what text encoding to use -
default is "ISO-8859-1"</entry>
</row>
<row>
<entry align="left">lineMapper</entry>
<entry align="left">LineMapper</entry>
<entry align="left">Converts a <classname>String</classname>
to an <classname>Object</classname> representing the
item.</entry>
</row>
<row>
<entry align="left">linesToSkip</entry>
<entry align="left">int</entry>
<entry align="left">Number of lines to ignore at the top of
the file</entry>
</row>
<row>
<entry align="left">recordSeparatorPolicy</entry>
<entry align="left">RecordSeparatorPolicy</entry>
<entry align="left">Used to determine where the line endings
are and do things like continue over a line ending if inside a
quoted string.</entry>
</row>
<row>
<entry align="left">resource</entry>
<entry align="left">Resource</entry>
<entry align="left">The resource from which to read.</entry>
</row>
<row>
<entry align="left">skippedLinesCallback</entry>
<entry align="left">LineCallbackHandler</entry>
<entry align="left">Interface which passes the raw line
content of the lines in the file to be skipped. If linesToSkip
is set to 2, then this interface will be called twice.</entry>
</row>
<row>
<entry align="left">strict</entry>
<entry align="left">boolean</entry>
<entry align="left">In strict mode, the reader will throw an
exception on ExecutionContext if the input resource does not
exist.</entry>
</row>
</tbody>
</tgroup>
</table></para>
<section id="lineMapper">
<title>LineMapper</title>
<para>As with <classname>RowMapper</classname>, which takes a low
level construct such as <classname>ResultSet</classname> and returns
an <classname>Object</classname>, flat file processing requires the
same construct to convert a <classname>String</classname> line into an
<classname>Object</classname>:
</para>
<programlisting language="java">public interface LineMapper&lt;T&gt; {
T mapLine(String line, int lineNumber) throws Exception;
}</programlisting>
<para>The basic contract is that, given the current line and the line
number with which it is associated, the mapper should return a
resulting domain object. This is similar to
<classname>RowMapper</classname> in that each line is associated with
its line number, just as each row in a
<classname>ResultSet</classname> is tied to its row number. This
allows the line number to be tied to the resulting domain object for
identity comparison or for more informative logging. However, unlike
<classname>RowMapper</classname>, the
<classname>LineMapper</classname> is given a raw line which, as
discussed above, only gets you halfway there. The line must be
tokenized into a <classname>FieldSet</classname>, which can then be
mapped to an object, as described below.</para>
</section>
<section id="lineTokenizer">
<title>LineTokenizer</title>
<para>An abstraction for turning a line of input into a line into a
<classname>FieldSet</classname> is necessary because there can be many
formats of flat file data that need to be converted to a
<classname>FieldSet</classname>. In Spring Batch, this interface is
the <classname>LineTokenizer</classname>:</para>
<programlisting language="java">public interface LineTokenizer {
FieldSet tokenize(String line);
}</programlisting>
<para>The contract of a <classname>LineTokenizer</classname> is such
that, given a line of input (in theory the
<classname>String</classname> could encompass more than one line), a
<classname>FieldSet</classname> representing the line will be
returned. This <classname>FieldSet</classname> can then be passed to a
<classname>FieldSetMapper</classname>. Spring Batch contains the
following <classname>LineTokenizer</classname> implementations:</para>
<itemizedlist>
<listitem>
<para><classname>DelmitedLineTokenizer</classname> - Used for
files where fields in a record are separated by a delimiter. The
most common delimiter is a comma, but pipes or semicolons are
often used as well.</para>
</listitem>
<listitem>
<para><classname>FixedLengthTokenizer</classname> - Used for files
where fields in a record are each a 'fixed width'. The width of
each field must be defined for each record type.</para>
</listitem>
<listitem>
<para><classname>PatternMatchingCompositeLineTokenizer</classname>
- Determines which among a list of
<classname>LineTokenizer</classname>s should be used on a
particular line by checking against a pattern.</para>
</listitem>
</itemizedlist>
</section>
<section id="fieldSetMapper">
<title>FieldSetMapper</title>
<para>The <classname>FieldSetMapper</classname> interface defines a
single method, <methodname>mapFieldSet</methodname>, which takes a
<classname>FieldSet</classname> object and maps its contents to an
object. This object may be a custom DTO, a domain object, or a simple
array, depending on the needs of the job. The
<classname>FieldSetMapper</classname> is used in conjunction with the
<classname>LineTokenizer</classname> to translate a line of data from
a resource into an object of the desired type:</para>
<programlisting language="java">public interface FieldSetMapper&lt;T&gt; {
T mapFieldSet(FieldSet fieldSet);
}</programlisting>
<para>The pattern used is the same as the
<classname>RowMapper</classname> used by
<classname>JdbcTemplate</classname>.</para>
</section>
<section id="defaultLineMapper">
<title>DefaultLineMapper</title>
<para>Now that the basic interfaces for reading in flat files have
been defined, it becomes clear that three basic steps are
required:<orderedlist>
<listitem>
<para>Read one line from the file.</para>
</listitem>
<listitem>
<para>Pass the string line into the
<methodname>LineTokenizer#tokenize</methodname>() method, in
order to retrieve a <classname>FieldSet</classname>.</para>
</listitem>
<listitem>
<para>Pass the <classname>FieldSet</classname> returned from
tokenizing to a <classname>FieldSetMapper</classname>, returning
the result from the <methodname>ItemReader#read</methodname>()
method.</para>
</listitem>
</orderedlist></para>
<para>The two interfaces described above represent two separate tasks:
converting a line into a <classname>FieldSet</classname>, and mapping
a <classname>FieldSet</classname> to a domain object. Because the
input of a <classname>LineTokenizer</classname> matches the input of
the <classname>LineMapper</classname> (a line), and the output of a
<classname>FieldSetMapper</classname> matches the output of the
<classname>LineMapper</classname>, a default implementation that uses
both a <classname>LineTokenizer</classname> and
<classname>FieldSetMapper</classname> is provided. The
<classname>DefaultLineMapper</classname> represents the behavior most
users will need:</para>
<programlisting language="java">public class DefaultLineMapper&lt;T&gt; implements LineMapper&lt;T&gt;, InitializingBean {
private LineTokenizer tokenizer;
private FieldSetMapper&lt;T&gt; fieldSetMapper;
public T mapLine(String line, int lineNumber) throws Exception {
<emphasis role="bold">return fieldSetMapper.mapFieldSet(tokenizer.tokenize(line));</emphasis>
}
public void setLineTokenizer(LineTokenizer tokenizer) {
this.tokenizer = tokenizer;
}
public void setFieldSetMapper(FieldSetMapper&lt;T&gt; fieldSetMapper) {
this.fieldSetMapper = fieldSetMapper;
}
}</programlisting>
<para>The above functionality is provided in a default implementation,
rather than being built into the reader itself (as was done in
previous versions of the framework) in order to allow users greater
flexibility in controlling the parsing process, especially if access
to the raw line is needed.</para>
</section>
<section id="simpleDelimitedFileReadingExample">
<title>Simple Delimited File Reading Example</title>
<para>The following example will be used to illustrate this using an
actual domain scenario. This particular batch job reads in football
players from the following file:
</para>
<programlisting>ID,lastName,firstName,position,birthYear,debutYear
"AbduKa00,Abdul-Jabbar,Karim,rb,1974,1996",
"AbduRa00,Abdullah,Rabih,rb,1975,1999",
"AberWa00,Abercrombie,Walter,rb,1959,1982",
"AbraDa00,Abramowicz,Danny,wr,1945,1967",
"AdamBo00,Adams,Bob,te,1946,1969",
"AdamCh00,Adams,Charlie,wr,1979,2003" </programlisting>
<para>The contents of this file will be mapped to the following
<classname>Player</classname> domain object:
</para>
<programlisting language="java">public class Player implements Serializable {
private String ID;
private String lastName;
private String firstName;
private String position;
private int birthYear;
private int debutYear;
public String toString() {
return "PLAYER:ID=" + ID + ",Last Name=" + lastName +
",First Name=" + firstName + ",Position=" + position +
",Birth Year=" + birthYear + ",DebutYear=" +
debutYear;
}
// setters and getters...
}</programlisting>
<para>In order to map a <classname>FieldSet</classname> into a
<classname>Player</classname> object, a
<classname>FieldSetMapper</classname> that returns players needs to be
defined:</para>
<programlisting language="java">protected static class PlayerFieldSetMapper implements FieldSetMapper&lt;Player&gt; {
public Player mapFieldSet(FieldSet fieldSet) {
Player player = new Player();
player.setID(fieldSet.readString(0));
player.setLastName(fieldSet.readString(1));
player.setFirstName(fieldSet.readString(2));
player.setPosition(fieldSet.readString(3));
player.setBirthYear(fieldSet.readInt(4));
player.setDebutYear(fieldSet.readInt(5));
return player;
}
}</programlisting>
<para>The file can then be read by correctly constructing a
<classname>FlatFileItemReader</classname> and calling
<methodname>read</methodname>:</para>
<programlisting language="java">FlatFileItemReader&lt;Player&gt; itemReader = new FlatFileItemReader&lt;Player&gt;();
itemReader.setResource(new FileSystemResource("resources/players.csv"));
//DelimitedLineTokenizer defaults to comma as its delimiter
DefaultLineMapper&lt;Player&gt; lineMapper = new DefaultLineMapper&lt;Player&gt;();
lineMapper.setLineTokenizer(new DelimitedLineTokenizer());
lineMapper.setFieldSetMapper(new PlayerFieldSetMapper());
itemReader.setLineMapper(lineMapper);
itemReader.open(new ExecutionContext());
Player player = itemReader.read();</programlisting>
<para>Each call to <methodname>read</methodname> will return a new
Player object from each line in the file. When the end of the file is
reached, null will be returned.</para>
</section>
<section id="mappingFieldsByName">
<title>Mapping Fields by Name</title>
<para>There is one additional piece of functionality that is allowed
by both <classname>DelimitedLineTokenizer</classname> and
<classname>FixedLengthTokenizer</classname> that is similar in
function to a Jdbc <classname>ResultSet</classname>. The names of the
fields can be injected into either of these
<classname>LineTokenizer</classname> implementations to increase the
readability of the mapping function. First, the column names of all
fields in the flat file are injected into the tokenizer:</para>
<programlisting language="java">tokenizer.setNames(new String[] {"ID", "lastName","firstName","position","birthYear","debutYear"}); </programlisting>
<para>A <classname>FieldSetMapper</classname> can use this information
as follows:</para>
<programlisting language="java">public class PlayerMapper implements FieldSetMapper&lt;Player&gt; {
public Player mapFieldSet(FieldSet fs) {
if(fs == null){
return null;
}
Player player = new Player();
player.setID(fs.readString(<emphasis role="bold">"ID"</emphasis>));
player.setLastName(fs.readString(<emphasis role="bold">"lastName"</emphasis>));
player.setFirstName(fs.readString(<emphasis role="bold">"firstName"</emphasis>));
player.setPosition(fs.readString(<emphasis role="bold">"position"</emphasis>));
player.setDebutYear(fs.readInt(<emphasis role="bold">"debutYear"</emphasis>));
player.setBirthYear(fs.readInt(<emphasis role="bold">"birthYear"</emphasis>));
return player;
}
}</programlisting>
</section>
<section id="beanWrapperFieldSetMapper">
<title>Automapping FieldSets to Domain Objects</title>
<para>For many, having to write a specific
<classname>FieldSetMapper</classname> is equally as cumbersome as
writing a specific <classname>RowMapper</classname> for a
<classname>JdbcTemplate</classname>. Spring Batch makes this easier by
providing a <classname>FieldSetMapper</classname> that automatically
maps fields by matching a field name with a setter on the object using
the JavaBean specification. Again using the football example, the
<classname>BeanWrapperFieldSetMapper</classname> configuration looks
like the following:</para>
<programlisting language="xml">&lt;bean id="fieldSetMapper"
class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"&gt;
&lt;property name="prototypeBeanName" value="player" /&gt;
&lt;/bean&gt;
&lt;bean id="player"
class="org.springframework.batch.sample.domain.Player"
scope="prototype" /&gt;</programlisting>
<para>For each entry in the <classname>FieldSet</classname>, the
mapper will look for a corresponding setter on a new instance of the
<classname>Player</classname> object (for this reason, prototype scope
is required) in the same way the Spring container will look for
setters matching a property name. Each available field in the
<classname>FieldSet</classname> will be mapped, and the resultant
<classname>Player</classname> object will be returned, with no code
required.</para>
</section>
<section id="fixedLengthFileFormats">
<title>Fixed Length File Formats</title>
<para>So far only delimited files have been discussed in much detail,
however, they represent only half of the file reading picture. Many
organizations that use flat files use fixed length formats. An example
fixed length file is below:</para>
<programlisting>UK21341EAH4121131.11customer1
UK21341EAH4221232.11customer2
UK21341EAH4321333.11customer3
UK21341EAH4421434.11customer4
UK21341EAH4521535.11customer5</programlisting>
<para>While this looks like one large field, it actually represent 4
distinct fields:</para>
<orderedlist>
<listitem>
<para>ISIN: Unique identifier for the item being order - 12
characters long.</para>
</listitem>
<listitem>
<para>Quantity: Number of this item being ordered - 3 characters
long.</para>
</listitem>
<listitem>
<para>Price: Price of the item - 5 characters long.</para>
</listitem>
<listitem>
<para>Customer: Id of the customer ordering the item - 9
characters long.</para>
</listitem>
</orderedlist>
<para>When configuring the
<classname>FixedLengthLineTokenizer</classname>, each of these lengths
must be provided in the form of ranges:</para>
<programlisting language="xml">&lt;bean id="fixedLengthLineTokenizer"
class="org.springframework.batch.io.file.transform.FixedLengthTokenizer"&gt;
&lt;property name="names" value="ISIN,Quantity,Price,Customer" /&gt;
&lt;property name="columns" value="1-12, 13-15, 16-20, 21-29" /&gt;
&lt;/bean&gt;</programlisting>
<para>Because the <classname>FixedLengthLineTokenizer</classname> uses
the same <classname>LineTokenizer</classname> interface as discussed
above, it will return the same <classname>FieldSet</classname> as if a
delimiter had been used. This allows the same approaches to be used in
handling its output, such as using the
<classname>BeanWrapperFieldSetMapper</classname>.</para>
<para><note>
<para>Supporting the above syntax for ranges requires that a
specialized property editor,
<classname>RangeArrayPropertyEditor</classname>, be configured in
the <classname>ApplicationContext</classname>. However, this bean
is automatically declared in an
<classname>ApplicationContext</classname> where the batch
namespace is used.</para>
</note></para>
</section>
<section id="prefixMatchingLineMapper">
<title>Multiple Record Types within a Single File</title>
<para>All of the file reading examples up to this point have all made
a key assumption for simplicity's sake: all of the records in a file
have the same format. However, this may not always be the case. It is
very common that a file might have records with different formats that
need to be tokenized differently and mapped to different objects. The
following excerpt from a file illustrates this:</para>
<programlisting>USER;Smith;Peter;;T;20014539;F
LINEA;1044391041ABC037.49G201XX1383.12H
LINEB;2134776319DEF422.99M005LI</programlisting>
<para>In this file we have three types of records, "USER", "LINEA",
and "LINEB". A "USER" line corresponds to a User object. "LINEA" and
"LINEB" both correspond to Line objects, though a "LINEA" has more
information than a "LINEB".</para>
<para>The <classname>ItemReader </classname>will read each line
individually, but we must specify different
<classname>LineTokenizer</classname> and
<classname>FieldSetMapper</classname> objects so that the
<classname>ItemWriter</classname> will receive the correct items. The
<classname>PatternMatchingCompositeLineMapper</classname> makes this
easy by allowing maps of patterns to
<classname>LineTokenizer</classname>s and patterns to
<classname>FieldSetMapper</classname>s to be configured:</para>
<programlisting language="xml">&lt;bean id="orderFileLineMapper"
class="org.spr...PatternMatchingCompositeLineMapper"&gt;
&lt;property name="tokenizers"&gt;
&lt;map&gt;
&lt;entry key="USER*" value-ref="userTokenizer" /&gt;
&lt;entry key="LINEA*" value-ref="lineATokenizer" /&gt;
&lt;entry key="LINEB*" value-ref="lineBTokenizer" /&gt;
&lt;/map&gt;
&lt;/property&gt;
&lt;property name="fieldSetMappers"&gt;
&lt;map&gt;
&lt;entry key="USER*" value-ref="userFieldSetMapper" /&gt;
&lt;entry key="LINE*" value-ref="lineFieldSetMapper" /&gt;
&lt;/map&gt;
&lt;/property&gt;
&lt;/bean&gt;</programlisting>
<para>In this example, "LINEA" and "LINEB" have separate
<classname>LineTokenizer</classname>s but they both use the same
<classname>FieldSetMapper</classname>.</para>
<para>The <classname>PatternMatchingCompositeLineMapper</classname>
makes use of the <classname>PatternMatcher</classname>'s
<classname>match</classname> method in order to select the correct
delegate for each line. The <classname>PatternMatcher</classname>
allows for two wildcard characters with special meaning: the question
mark ("?") will match exactly one character, while the asterisk ("*")
will match zero or more characters. Note that in the configuration
above, all patterns end with an asterisk, making them effectively
prefixes to lines. The <classname>PatternMatcher</classname> will
always match the most specific pattern possible, regardless of the
order in the configuration. So if "LINE*" and "LINEA*" were both
listed as patterns, "LINEA" would match pattern "LINEA*", while
"LINEB" would match pattern "LINE*". Additionally, a single asterisk
("*") can serve as a default by matching any line not matched by any
other pattern.</para>
<programlisting language="xml">&lt;entry key="*" value-ref="defaultLineTokenizer" /&gt;</programlisting>
<para>There is also a
<classname>PatternMatchingCompositeLineTokenizer</classname> that can
be used for tokenization alone.</para>
<para>It is also common for a flat file to contain records that each
span multiple lines. To handle this situation, a more complex strategy
is required. A demonstration of this common pattern can be found in
<xref linkend="multiLineRecords" />.</para>
</section>
<section id="exceptionHandlingInFlatFiles">
<title>Exception Handling in Flat Files</title>
<para>There are many scenarios when tokenizing a line may cause
exceptions to be thrown. Many flat files are imperfect and contain
records that aren't formatted correctly. Many users choose to skip
these erroneous lines, logging out the issue, original line, and line
number. These logs can later be inspected manually or by another batch
job. For this reason, Spring Batch provides a hierarchy of exceptions
for handling parse exceptions:
<classname>FlatFileParseException</classname> and
<classname>FlatFileFormatException</classname>.
<classname>FlatFileParseException</classname> is thrown by the
<classname>FlatFileItemReader</classname> when any errors are
encountered while trying to read a file.
<classname>FlatFileFormatException</classname> is thrown by
implementations of the <classname>LineTokenizer</classname> interface,
and indicates a more specific error encountered while
tokenizing.</para>
<section id="incorrectTokenCountException">
<title>IncorrectTokenCountException</title>
<para>Both <classname>DelimitedLineTokenizer</classname> and
<classname>FixedLengthLineTokenizer</classname> have the ability to
specify column names that can be used for creating a
<classname>FieldSet</classname>. However, if the number of column
names doesn't match the number of columns found while tokenizing a
line the <classname>FieldSet</classname> can't be created, and a
<classname>IncorrectTokenCountException</classname> is thrown, which
contains the number of tokens encountered, and the number
expected:</para>
<programlisting language="java">tokenizer.setNames(new String[] {"A", "B", "C", "D"});
try {
tokenizer.tokenize("a,b,c");
}
catch(IncorrectTokenCountException e){
assertEquals(4, e.getExpectedCount());
assertEquals(3, e.getActualCount());
}</programlisting>
<para>Because the tokenizer was configured with 4 column names, but
only 3 tokens were found in the file, an
<classname>IncorrectTokenCountException</classname> was
thrown.</para>
</section>
<section id="incorrectLineLengthException">
<title>IncorrectLineLengthException</title>
<para>Files formatted in a fixed length format have additional
requirements when parsing because, unlike a delimited format, each
column must strictly adhere to its predefined width. If the total
line length doesn't add up to the widest value of this column, an
exception is thrown:</para>
<programlisting language="java">tokenizer.setColumns(new Range[] { new Range(1, 5),
new Range(6, 10),
new Range(11, 15) });
try {
tokenizer.tokenize("12345");
fail("Expected IncorrectLineLengthException");
}
catch (IncorrectLineLengthException ex) {
assertEquals(15, ex.getExpectedLength());
assertEquals(5, ex.getActualLength());
}</programlisting>
<para>The configured ranges for the tokenizer above are: 1-5, 6-10,
and 11-15, thus the total length of the line expected is 15.
However, in this case a line of length 5 was passed in, causing an
<classname>IncorrectLineLengthException</classname> to be thrown.
Throwing an exception here rather than only mapping the first column
allows the processing of the line to fail earlier, and with more
information than it would if it failed while trying to read in
column 2 in a <classname>FieldSetMapper</classname>. However, there
are scenarios where the length of the line isn't always constant.
For this reason, validation of line length can be turned off via the
'strict' property:</para>
<programlisting language="java">tokenizer.setColumns(new Range[] { new Range(1, 5), new Range(6, 10) });
<emphasis role="bold">tokenizer.setStrict(false);</emphasis>
FieldSet tokens = tokenizer.tokenize("12345");
assertEquals("12345", tokens.readString(0));
assertEquals("", tokens.readString(1));</programlisting>
<para>The above example is almost identical to the one before it,
except that tokenizer.setStrict(false) was called. This setting
tells the tokenizer to not enforce line lengths when tokenizing the
line. A <classname>FieldSet</classname> is now correctly created and
returned. However, it will only contain empty tokens for the
remaining values.</para>
</section>
</section>
</section>
<section id="flatFileItemWriter">
<title>FlatFileItemWriter</title>
<para>Writing out to flat files has the same problems and issues that
reading in from a file must overcome. A step must be able to write out
in either delimited or fixed length formats in a transactional
manner.</para>
<section id="lineAggregator">
<title>LineAggregator</title>
<para>Just as the <classname>LineTokenizer</classname> interface is
necessary to take an item and turn it into a
<classname>String</classname>, file writing must have a way to
aggregate multiple fields into a single string for writing to a file.
In Spring Batch this is the
<classname>LineAggregator</classname>:</para>
<programlisting language="java">public interface LineAggregator&lt;T&gt; {
public String aggregate(T item);
}</programlisting>
<para>The <classname>LineAggregator</classname> is the opposite of a
<classname>LineTokenizer</classname>.
<classname>LineTokenizer</classname> takes a
<classname>String</classname> and returns a
<classname>FieldSet</classname>, whereas
<classname>LineAggregator</classname> takes an
<classname>item</classname> and returns a
<classname>String</classname>.</para>
<section id="PassThroughLineAggregator">
<title>PassThroughLineAggregator</title>
<para>The most basic implementation of the LineAggregator interface
is the <classname>PassThroughLineAggregator</classname>, which
simply assumes that the object is already a string, or that its
string representation is acceptable for writing:</para>
<programlisting language="java">public class PassThroughLineAggregator&lt;T&gt; implements LineAggregator&lt;T&gt; {
public String aggregate(T item) {
return item.toString();
}
}</programlisting>
<para>The above implementation is useful if direct control of
creating the string is required, but the advantages of a
<classname>FlatFileItemWriter</classname>, such as transaction and
restart support, are necessary.</para>
</section>
</section>
<section id="SimplifiedFileWritingExample">
<title>Simplified File Writing Example</title>
<para>Now that the <classname>LineAggregator</classname> interface and
its most basic implementation,
<classname>PassThroughLineAggregator</classname>, have been defined,
the basic flow of writing can be explained:</para>
<orderedlist>
<listitem>
<para>The object to be written is passed to the
<classname>LineAggregator</classname> in order to obtain a
<classname>String</classname>.</para>
</listitem>
<listitem>
<para>The returned <classname>String</classname> is written to the
configured file.</para>
</listitem>
</orderedlist>
<para>The following excerpt from the
<classname>FlatFileItemWriter</classname> expresses this in
code:</para>
<programlisting language="java">public void write(T item) throws Exception {
write(lineAggregator.aggregate(item) + LINE_SEPARATOR);
}</programlisting>
<para>A simple configuration would look like the following:</para>
<programlisting language="xml">&lt;bean id="itemWriter" class="org.spr...FlatFileItemWriter"&gt;
&lt;property name="resource" value="file:target/test-outputs/output.txt" /&gt;
&lt;property name="lineAggregator"&gt;
&lt;bean class="org.spr...PassThroughLineAggregator"/&gt;
&lt;/property&gt;
&lt;/bean&gt;</programlisting>
</section>
<section id="FieldExtractor">
<title>FieldExtractor</title>
<para>The above example may be useful for the most basic uses of a
writing to a file. However, most users of the
<classname>FlatFileItemWriter</classname> will have a domain object
that needs to be written out, and thus must be converted into a line.
In file reading, the following was required:<orderedlist>
<listitem>
<para>Read one line from the file.</para>
</listitem>
<listitem>
<para>Pass the string line into the
<methodname>LineTokenizer#tokenize</methodname>() method, in
order to retrieve a <classname>FieldSet</classname></para>
</listitem>
<listitem>
<para>Pass the <classname>FieldSet</classname> returned from
tokenizing to a <classname>FieldSetMapper</classname>, returning
the result from the <methodname>ItemReader#read</methodname>()
method</para>
</listitem>
</orderedlist></para>
<para>File writing has similar, but inverse steps:</para>
<orderedlist>
<listitem>
<para>Pass the item to be written to the writer</para>
</listitem>
<listitem>
<para>convert the fields on the item into an array</para>
</listitem>
<listitem>
<para>aggregate the resulting array into a line</para>
</listitem>
</orderedlist>
<para>Because there is no way for the framework to know which fields
from the object need to be written out, a
<classname>FieldExtractor</classname> must be written to accomplish
the task of turning the item into an array:</para>
<programlisting language="java">public interface FieldExtractor&lt;T&gt; {
Object[] extract(T item);
}</programlisting>
<para>Implementations of the <classname>FieldExtractor</classname>
interface should create an array from the fields of the provided
object, which can then be written out with a delimiter between the
elements, or as part of a field-width line.</para>
<section id="PassThroughFieldExtractor">
<title>PassThroughFieldExtractor</title>
<para>There are many cases where a collection, such as an array,
<classname>Collection</classname>, or
<classname>FieldSet</classname>, needs to be written out.
"Extracting" an array from a one of these collection types is very
straightforward: simply convert the collection to an array.
Therefore, the <classname>PassThroughFieldExtractor</classname>
should be used in this scenario. It should be noted, that if the
object passed in is not a type of collection, then the
<classname>PassThroughFieldExtractor</classname> will return an
array containing solely the item to be extracted.</para>
</section>
<section id="BeanWrapperFieldExtractor">
<title>BeanWrapperFieldExtractor</title>
<para>As with the <classname>BeanWrapperFieldSetMapper</classname>
described in the file reading section, it is often preferable to
configure how to convert a domain object to an object array, rather
than writing the conversion yourself. The
<classname>BeanWrapperFieldExtractor</classname> provides just this
type of functionality:</para>
<programlisting language="java">BeanWrapperFieldExtractor&lt;Name&gt; extractor = new BeanWrapperFieldExtractor&lt;Name&gt;();
extractor.setNames(new String[] { "first", "last", "born" });
String first = "Alan";
String last = "Turing";
int born = 1912;
Name n = new Name(first, last, born);
Object[] values = extractor.extract(n);
assertEquals(first, values[0]);
assertEquals(last, values[1]);
assertEquals(born, values[2]);</programlisting>
<para>This extractor implementation has only one required property,
the names of the fields to map. Just as the
<classname>BeanWrapperFieldSetMapper</classname> needs field names
to map fields on the <classname>FieldSet</classname> to setters on
the provided object, the
<classname>BeanWrapperFieldExtractor</classname> needs names to map
to getters for creating an object array. It is worth noting that the
order of the names determines the order of the fields within the
array.</para>
</section>
</section>
<section id="delimitedFileWritingExample">
<title>Delimited File Writing Example</title>
<para>The most basic flat file format is one in which all fields are
separated by a delimiter. This can be accomplished using a
<classname>DelimitedLineAggregator</classname>. The example below
writes out a simple domain object that represents a credit to a
customer account:</para>
<programlisting language="java">public class CustomerCredit {
private int id;
private String name;
private BigDecimal credit;
//getters and setters removed for clarity
}</programlisting>
<para>Because a domain object is being used, an implementation of the
FieldExtractor interface must be provided, along with the delimiter to
use:</para>
<programlisting language="xml">&lt;bean id="itemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter"&gt;
&lt;property name="resource" ref="outputResource" /&gt;
&lt;property name="lineAggregator"&gt;
&lt;bean class="org.spr...DelimitedLineAggregator"&gt;
&lt;property name="delimiter" value=","/&gt;
&lt;property name="fieldExtractor"&gt;
&lt;bean class="org.spr...BeanWrapperFieldExtractor"&gt;
&lt;property name="names" value="name,credit"/&gt;
&lt;/bean&gt;
&lt;/property&gt;
&lt;/bean&gt;
&lt;/property&gt;
&lt;/bean&gt;</programlisting>
<para>In this case, the
<classname>BeanWrapperFieldExtractor</classname> described earlier in
this chapter is used to turn the name and credit fields within
<classname>CustomerCredit</classname> into an object array, which is
then written out with commas between each field.</para>
</section>
<section id="fixedWidthFileWritingExample">
<title>Fixed Width File Writing Example</title>
<para>Delimited is not the only type of flat file format. Many prefer
to use a set width for each column to delineate between fields, which
is usually referred to as 'fixed width'. Spring Batch supports this in
file writing via the <classname>FormatterLineAggregator</classname>.
Using the same <classname>CustomerCredit</classname> domain object
described above, it can be configured as follows:</para>
<programlisting language="xml">&lt;bean id="itemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter"&gt;
&lt;property name="resource" ref="outputResource" /&gt;
&lt;property name="lineAggregator"&gt;
&lt;bean class="org.spr...FormatterLineAggregator"&gt;
&lt;property name="fieldExtractor"&gt;
&lt;bean class="org.spr...BeanWrapperFieldExtractor"&gt;
&lt;property name="names" value="name,credit" /&gt;
&lt;/bean&gt;
&lt;/property&gt;
&lt;property name="format" value="%-9s%-2.0f" /&gt;
&lt;/bean&gt;
&lt;/property&gt;
&lt;/bean&gt;</programlisting>
<para>Most of the above example should look familiar. However, the
value of the format property is new:</para>
<programlisting language="xml">&lt;property name="format" value="%-9s%-2.0f" /&gt;</programlisting>
<para>The underlying implementation is built using the same
<classname>Formatter</classname> added as part of Java 5. The Java
<classname>Formatter</classname> is based on the
<methodname>printf</methodname> functionality of the C programming
language. Most details on how to configure a formatter can be found in
the javadoc of <ulink
url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/Formatter.html"><citetitle>Formatter</citetitle></ulink>.</para>
</section>
<section id="handlingFileCreation">
<title>Handling File Creation</title>
<para><classname>FlatFileItemReader</classname> has a very simple
relationship with file resources. When the reader is initialized, it
opens the file if it exists, and throws an exception if it does not.
File writing isn't quite so simple. At first glance it seems like a
similar straight forward contract should exist for
<classname>FlatFileItemWriter</classname>: if the file already exists,
throw an exception, and if it does not, create it and start writing.
However, potentially restarting a <classname>Job</classname> can cause
issues. In normal restart scenarios, the contract is reversed: if the
file exists, start writing to it from the last known good position,
and if it does not, throw an exception. However, what happens if the
file name for this job is always the same? In this case, you would
want to delete the file if it exists, unless it's a restart. Because
of this possibility, the <classname>FlatFileItemWriter</classname>
contains the property, <methodname>shouldDeleteIfExists</methodname>.
Setting this property to true will cause an existing file with the
same name to be deleted when the writer is opened.</para>
</section>
</section>
</section>
<section id="xmlReadingWriting">
<title id="infrastructure.2.3">XML Item Readers and Writers</title>
<para>Spring Batch provides transactional infrastructure for both reading
XML records and mapping them to Java objects as well as writing Java
objects as XML records.</para>
<note>
<title>Constraints on streaming XML</title>
<para>The StAX API is used for I/O as other standard XML parsing APIs do
not fit batch processing requirements (DOM loads the whole input into
memory at once and SAX controls the parsing process allowing the user
only to provide callbacks).</para>
</note>
<para>Lets take a closer look how XML input and output works in Spring
Batch. First, there are a few concepts that vary from file reading and
writing but are common across Spring Batch XML processing. With XML
processing, instead of lines of records (FieldSets) that need to be
tokenized, it is assumed an XML resource is a collection of 'fragments'
corresponding to individual records:</para>
<para><mediaobject>
<imageobject role="html">
<imagedata align="center" fileref="images/xmlinput.png" format="PNG"
scale="65" />
</imageobject>
<imageobject role="fo">
<imagedata align="center" fileref="images/xmlinput.png" format="PNG"
scale="45" />
</imageobject>
<caption><para>Figure 3.1: XML Input</para></caption>
</mediaobject></para>
<para>The 'trade' tag is defined as the 'root element' in the scenario
above. Everything between '&lt;trade&gt;' and '&lt;/trade&gt;' is
considered one 'fragment'. Spring Batch uses Object/XML Mapping (OXM) to
bind fragments to objects. However, Spring Batch is not tied to any
particular XML binding technology. Typical use is to delegate to <ulink
url="http://docs.spring.io/spring-ws/site/reference/html/oxm.html"><citetitle>Spring
OXM</citetitle></ulink>, which provides uniform abstraction for the most
popular OXM technologies. The dependency on Spring OXM is optional and you
can choose to implement Spring Batch specific interfaces if desired. The
relationship to the technologies that OXM supports can be shown as the
following:</para>
<para><mediaobject>
<imageobject role="html">
<imagedata align="center" fileref="images/oxm-fragments.png"
format="PNG" scale="60" />
</imageobject>
<imageobject role="fo">
<imagedata align="center" fileref="images/oxm-fragments.png"
format="PNG" scale="45" />
</imageobject>
<caption><para>Figure 3.2: OXM Binding</para></caption>
</mediaobject></para>
<para>Now with an introduction to OXM and how one can use XML fragments to
represent records, let's take a closer look at readers and writers.</para>
<section id="StaxEventItemReader">
<title>StaxEventItemReader</title>
<para>The <classname>StaxEventItemReader</classname> configuration
provides a typical setup for the processing of records from an XML input
stream. First, lets examine a set of XML records that the
<classname>StaxEventItemReader</classname> can process.</para>
<programlisting language="xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;records&gt;
&lt;trade xmlns="http://springframework.org/batch/sample/io/oxm/domain"&gt;
&lt;isin&gt;XYZ0001&lt;/isin&gt;
&lt;quantity&gt;5&lt;/quantity&gt;
&lt;price&gt;11.39&lt;/price&gt;
&lt;customer&gt;Customer1&lt;/customer&gt;
&lt;/trade&gt;
&lt;trade xmlns="http://springframework.org/batch/sample/io/oxm/domain"&gt;
&lt;isin&gt;XYZ0002&lt;/isin&gt;
&lt;quantity&gt;2&lt;/quantity&gt;
&lt;price&gt;72.99&lt;/price&gt;
&lt;customer&gt;Customer2c&lt;/customer&gt;
&lt;/trade&gt;
&lt;trade xmlns="http://springframework.org/batch/sample/io/oxm/domain"&gt;
&lt;isin&gt;XYZ0003&lt;/isin&gt;
&lt;quantity&gt;9&lt;/quantity&gt;
&lt;price&gt;99.99&lt;/price&gt;
&lt;customer&gt;Customer3&lt;/customer&gt;
&lt;/trade&gt;
&lt;/records&gt;</programlisting>
<para>To be able to process the XML records the following is needed:
<itemizedlist>
<listitem>
<para>Root Element Name - Name of the root element of the fragment
that constitutes the object to be mapped. The example
configuration demonstrates this with the value of trade.</para>
</listitem>
<listitem>
<para>Resource - Spring Resource that represents the file to be
read.</para>
</listitem>
<listitem>
<para><classname>Unmarshaller</classname> - Unmarshalling
facility provided by Spring OXM for mapping the XML fragment to an
object.</para>
</listitem>
</itemizedlist></para>
<programlisting language="xml">&lt;bean id="itemReader" class="org.springframework.batch.item.xml.StaxEventItemReader"&gt;
&lt;property name="fragmentRootElementName" value="trade" /&gt;
&lt;property name="resource" value="data/iosample/input/input.xml" /&gt;
&lt;property name="unmarshaller" ref="tradeMarshaller" /&gt;
&lt;/bean&gt;</programlisting>
<para>Notice that in this example we have chosen to use an
<classname>XStreamMarshaller</classname> which accepts an alias passed
in as a map with the first key and value being the name of the fragment
(i.e. root element) and the object type to bind. Then, similar to a
<classname>FieldSet</classname>, the names of the other elements that
map to fields within the object type are described as key/value pairs in
the map. In the configuration file we can use a Spring configuration
utility to describe the required alias as follows:</para>
<programlisting language="xml">&lt;bean id="tradeMarshaller"
class="org.springframework.oxm.xstream.XStreamMarshaller"&gt;
&lt;property name="aliases"&gt;
<emphasis role="bold"> &lt;util:map id="aliases"&gt;
&lt;entry key="trade"
value="org.springframework.batch.sample.domain.Trade" /&gt;
&lt;entry key="price" value="java.math.BigDecimal" /&gt;
&lt;entry key="name" value="java.lang.String" /&gt;
&lt;/util:map&gt;</emphasis>
&lt;/property&gt;
&lt;/bean&gt;</programlisting>
<para>On input the reader reads the XML resource until it recognizes
that a new fragment is about to start (by matching the tag name by
default). The reader creates a standalone XML document from the fragment
(or at least makes it appear so) and passes the document to a
deserializer (typically a wrapper around a Spring OXM
<classname>Unmarshaller</classname>) to map the XML to a Java
object.</para>
<para>In summary, this procedure is analogous to the following scripted
Java code which uses the injection provided by the Spring
configuration:</para>
<programlisting language="java">StaxEventItemReader xmlStaxEventItemReader = new StaxEventItemReader()
Resource resource = new ByteArrayResource(xmlResource.getBytes())
Map aliases = new HashMap();
aliases.put("trade","org.springframework.batch.sample.domain.Trade");
aliases.put("price","java.math.BigDecimal");
aliases.put("customer","java.lang.String");
XStreamMarshaller unmarshaller = new XStreamMarshaller();
unmarshaller.setAliases(aliases);
xmlStaxEventItemReader.setUnmarshaller(unmarshaller);
xmlStaxEventItemReader.setResource(resource);
xmlStaxEventItemReader.setFragmentRootElementName("trade");
xmlStaxEventItemReader.open(new ExecutionContext());
boolean hasNext = true
CustomerCredit credit = null;
while (hasNext) {
credit = xmlStaxEventItemReader.read();
if (credit == null) {
hasNext = false;
}
else {
System.out.println(credit);
}
}</programlisting>
</section>
<section id="StaxEventItemWriter">
<title>StaxEventItemWriter</title>
<para>Output works symmetrically to input. The
<classname>StaxEventItemWriter</classname> needs a
<classname>Resource</classname>, a marshaller, and a <literal>rootTagName</literal>. A Java
object is passed to a marshaller (typically a standard Spring OXM
<classname>Marshaller</classname>) which writes to a
<classname>Resource</classname> using a custom event writer that filters
the <classname>StartDocument</classname> and
<classname>EndDocument</classname> events produced for each fragment by
the OXM tools. We'll show this in an example using the
<classname>MarshallingEventWriterSerializer</classname>. The Spring
configuration for this setup looks as follows:</para>
<programlisting language="xml">&lt;bean id="itemWriter" class="org.springframework.batch.item.xml.StaxEventItemWriter"&gt;
&lt;property name="resource" ref="outputResource" /&gt;
&lt;property name="marshaller" ref="customerCreditMarshaller" /&gt;
&lt;property name="rootTagName" value="customers" /&gt;
&lt;property name="overwriteOutput" value="true" /&gt;
&lt;/bean&gt;</programlisting>
<para>The configuration sets up the three required properties and
optionally sets the overwriteOutput=true, mentioned earlier in the
chapter for specifying whether an existing file can be overwritten. It
should be noted the marshaller used for the writer is the exact same as
the one used in the reading example from earlier in the chapter:</para>
<programlisting language="xml">&lt;bean id="customerCreditMarshaller"
class="org.springframework.oxm.xstream.XStreamMarshaller"&gt;
&lt;property name="aliases"&gt;
&lt;util:map id="aliases"&gt;
&lt;entry key="customer"
value="org.springframework.batch.sample.domain.CustomerCredit" /&gt;
&lt;entry key="credit" value="java.math.BigDecimal" /&gt;
&lt;entry key="name" value="java.lang.String" /&gt;
&lt;/util:map&gt;
&lt;/property&gt;
&lt;/bean&gt;</programlisting>
<para>To summarize with a Java example, the following code illustrates
all of the points discussed, demonstrating the programmatic setup of the
required properties:</para>
<programlisting language="java">StaxEventItemWriter staxItemWriter = new StaxEventItemWriter()
FileSystemResource resource = new FileSystemResource("data/outputFile.xml")
Map aliases = new HashMap();
aliases.put("customer","org.springframework.batch.sample.domain.CustomerCredit");
aliases.put("credit","java.math.BigDecimal");
aliases.put("name","java.lang.String");
Marshaller marshaller = new XStreamMarshaller();
marshaller.setAliases(aliases);
staxItemWriter.setResource(resource);
staxItemWriter.setMarshaller(marshaller);
staxItemWriter.setRootTagName("trades");
staxItemWriter.setOverwriteOutput(true);
ExecutionContext executionContext = new ExecutionContext();
staxItemWriter.open(executionContext);
CustomerCredit Credit = new CustomerCredit();
trade.setPrice(11.39);
credit.setName("Customer1");
staxItemWriter.write(trade);</programlisting>
</section>
</section>
<section id="multiFileInput">
<title>Multi-File Input</title>
<para>It is a common requirement to process multiple files within a single
<classname>Step</classname>. Assuming the files all have the same
formatting, the <classname>MultiResourceItemReader</classname> supports
this type of input for both XML and flat file processing. Consider the
following files in a directory:</para>
<programlisting>file-1.txt file-2.txt ignored.txt</programlisting>
<para>file-1.txt and file-2.txt are formatted the same and for business
reasons should be processed together. The
<classname>MuliResourceItemReader</classname> can be used to read in both
files by using wildcards:</para>
<programlisting language="xml">&lt;bean id="multiResourceReader" class="org.spr...MultiResourceItemReader"&gt;
&lt;property name="resources" value="classpath:data/input/file-*.txt" /&gt;
&lt;property name="delegate" ref="flatFileItemReader" /&gt;
&lt;/bean&gt;</programlisting>
<para>The referenced delegate is a simple
<classname>FlatFileItemReader</classname>. The above configuration will
read input from both files, handling rollback and restart scenarios. It
should be noted that, as with any <classname>ItemReader</classname>,
adding extra input (in this case a file) could cause potential issues when
restarting. It is recommended that batch jobs work with their own
individual directories until completed successfully.</para>
</section>
<section id="database">
<title id="infrastructure.2.2">Database</title>
<para>Like most enterprise application styles, a database is the central
storage mechanism for batch. However, batch differs from other application
styles due to the sheer size of the datasets with which the system must
work. If a SQL statement returns 1 million rows, the result set probably
holds all returned results in memory until all rows have been read. Spring
Batch provides two types of solutions for this problem: Cursor and Paging
database ItemReaders.</para>
<section id="cursorBasedItemReaders">
<title>Cursor Based ItemReaders</title>
<para>Using a database cursor is generally the default approach of most
batch developers, because it is the database's solution to the problem
of 'streaming' relational data. The Java
<classname>ResultSet</classname> class is essentially an object
orientated mechanism for manipulating a cursor. A
<classname>ResultSet</classname> maintains a cursor to the current row
of data. Calling <methodname>next</methodname> on a
<classname>ResultSet</classname> moves this cursor to the next row.
Spring Batch cursor based ItemReaders open the a cursor on
initialization, and move the cursor forward one row for every call to
<methodname>read</methodname>, returning a mapped object that can be
used for processing. The <methodname>close</methodname> method will then
be called to ensure all resources are freed up. The Spring core
<classname>JdbcTemplate</classname> gets around this problem by using
the callback pattern to completely map all rows in a
<classname>ResultSet</classname> and close before returning control back
to the method caller. However, in batch this must wait until the step is
complete. Below is a generic diagram of how a cursor based
<classname>ItemReader</classname> works, and while a SQL statement is
used as an example since it is so widely known, any technology could
implement the basic approach:</para>
<mediaobject>
<imageobject role="html">
<imagedata align="center" fileref="images/cursorExample.png"
scale="65" />
</imageobject>
<imageobject role="fo">
<imagedata align="center" fileref="images/cursorExample.png"
scale="35" />
</imageobject>
</mediaobject>
<para>This example illustrates the basic pattern. Given a 'FOO' table,
which has three columns: ID, NAME, and BAR, select all rows with an ID
greater than 1 but less than 7. This puts the beginning of the cursor
(row 1) on ID 2. The result of this row should be a completely mapped
Foo object. Calling <methodname>read</methodname>() again moves the
cursor to the next row, which is the Foo with an ID of 3. The results of
these reads will be written out after each
<methodname>read</methodname>, thus allowing the objects to be garbage
collected (assuming no instance variables are maintaining references to
them).</para>
<section id="JdbcCursorItemReader">
<title>JdbcCursorItemReader</title>
<para><classname>JdbcCursorItemReader</classname> is the Jdbc
implementation of the cursor based technique. It works directly with a
<classname>ResultSet</classname> and requires a SQL statement to run
against a connection obtained from a
<classname>DataSource</classname>. The following database schema will
be used as an example:</para>
<programlisting language="sql">CREATE TABLE CUSTOMER (
ID BIGINT IDENTITY PRIMARY KEY,
NAME VARCHAR(45),
CREDIT FLOAT
);</programlisting>
<para>Many people prefer to use a domain object for each row, so we'll
use an implementation of the <classname>RowMapper</classname>
interface to map a <classname>CustomerCredit</classname>
object:</para>
<programlisting language="java">public class CustomerCreditRowMapper implements RowMapper {
public static final String ID_COLUMN = "id";
public static final String NAME_COLUMN = "name";
public static final String CREDIT_COLUMN = "credit";
public Object mapRow(ResultSet rs, int rowNum) throws SQLException {
CustomerCredit customerCredit = new CustomerCredit();
customerCredit.setId(rs.getInt(ID_COLUMN));
customerCredit.setName(rs.getString(NAME_COLUMN));
customerCredit.setCredit(rs.getBigDecimal(CREDIT_COLUMN));
return customerCredit;
}
}</programlisting>
<para>Because <classname>JdbcTemplate</classname> is so familiar to
users of Spring, and the <classname>JdbcCursorItemReader</classname>
shares key interfaces with it, it is useful to see an example of how
to read in this data with <classname>JdbcTemplate</classname>, in
order to contrast it with the <classname>ItemReader</classname>. For
the purposes of this example, let's assume there are 1,000 rows in the
CUSTOMER database. The first example will be using
<classname>JdbcTemplate</classname>:</para>
<programlisting language="java">//For simplicity sake, assume a dataSource has already been obtained
JdbcTemplate jdbcTemplate = new JdbcTemplate(dataSource);
List customerCredits = jdbcTemplate.query("SELECT ID, NAME, CREDIT from CUSTOMER",
new CustomerCreditRowMapper());</programlisting>
<para>After running this code snippet the customerCredits list will
contain 1,000 <classname>CustomerCredit</classname> objects. In the
query method, a connection will be obtained from the
<classname>DataSource</classname>, the provided SQL will be run
against it, and the <methodname>mapRow</methodname> method will be
called for each row in the <classname>ResultSet</classname>. Let's
contrast this with the approach of the
<classname>JdbcCursorItemReader</classname>:</para>
<programlisting language="java">JdbcCursorItemReader itemReader = new JdbcCursorItemReader();
itemReader.setDataSource(dataSource);
itemReader.setSql("SELECT ID, NAME, CREDIT from CUSTOMER");
itemReader.setRowMapper(new CustomerCreditRowMapper());
int counter = 0;
ExecutionContext executionContext = new ExecutionContext();
itemReader.open(executionContext);
Object customerCredit = new Object();
while(customerCredit != null){
customerCredit = itemReader.read();
counter++;
}
itemReader.close(executionContext);</programlisting>
<para>After running this code snippet the counter will equal 1,000. If
the code above had put the returned customerCredit into a list, the
result would have been exactly the same as with the
<classname>JdbcTemplate</classname> example. However, the big
advantage of the <classname>ItemReader</classname> is that it allows
items to be 'streamed'. The <methodname>read</methodname> method can
be called once, and the item written out via an
<classname>ItemWriter</classname>, and then the next item obtained via
<methodname>read</methodname>. This allows item reading and writing to
be done in 'chunks' and committed periodically, which is the essence
of high performance batch processing. Furthermore, it is very easily
configured for injection into a Spring Batch
<classname>Step</classname>:</para>
<programlisting language="xml">&lt;bean id="itemReader" class="org.spr...JdbcCursorItemReader"&gt;
&lt;property name="dataSource" ref="dataSource"/&gt;
&lt;property name="sql" value="select ID, NAME, CREDIT from CUSTOMER"/&gt;
&lt;property name="rowMapper"&gt;
&lt;bean class="org.springframework.batch.sample.domain.CustomerCreditRowMapper"/&gt;
&lt;/property&gt;
&lt;/bean&gt;</programlisting>
<section id="JdbcCursorItemReaderProperties">
<title>Additional Properties</title>
<para>Because there are so many varying options for opening a cursor
in Java, there are many properties on the
<classname>JdbcCustorItemReader</classname> that can be set:</para>
<table>
<title>JdbcCursorItemReader Properties</title>
<tgroup cols="2">
<tbody>
<row>
<entry>ignoreWarnings</entry>
<entry>Determines whether or not SQLWarnings are logged or
cause an exception - default is true</entry>
</row>
<row>
<entry>fetchSize</entry>
<entry>Gives the Jdbc driver a hint as to the number of rows
that should be fetched from the database when more rows are
needed by the <classname>ResultSet</classname> object used
by the <classname>ItemReader</classname>. By default, no
hint is given.</entry>
</row>
<row>
<entry>maxRows</entry>
<entry>Sets the limit for the maximum number of rows the
underlying <classname>ResultSet</classname> can hold at any
one time.</entry>
</row>
<row>
<entry>queryTimeout</entry>
<entry>Sets the number of seconds the driver will wait for a
<classname>Statement</classname> object to execute to the
given number of seconds. If the limit is exceeded, a
<classname>DataAccessEception</classname> is thrown.
(Consult your driver vendor documentation for
details).</entry>
</row>
<row>
<entry>verifyCursorPosition</entry>
<entry>Because the same <classname>ResultSet</classname>
held by the <classname>ItemReader</classname> is passed to
the <classname>RowMapper</classname>, it is possible for
users to call <methodname>ResultSet.next</methodname>()
themselves, which could cause issues with the reader's
internal count. Setting this value to true will cause an
exception to be thrown if the cursor position is not the
same after the <classname>RowMapper</classname> call as it
was before.</entry>
</row>
<row>
<entry>saveState</entry>
<entry>Indicates whether or not the reader's state should be
saved in the <classname>ExecutionContext</classname>
provided by
<methodname>ItemStream#update</methodname>(<classname>ExecutionContext</classname>)
The default value is true.</entry>
</row>
<row>
<entry>driverSupportsAbsolute</entry>
<entry>Defaults to false. Indicates whether the Jdbc driver
supports setting the absolute row on a
<classname>ResultSet</classname>. It is recommended that
this is set to true for Jdbc drivers that supports
<methodname>ResultSet.absolute</methodname>() as it may
improve performance, especially if a step fails while
working with a large data set.</entry>
</row>
<row>
<entry>setUseSharedExtendedConnection</entry>
<entry>Defaults to false. Indicates whether the connection
used for the cursor should be used by all other processing
thus sharing the same transaction. If this is set to false,
which is the default, then the cursor will be opened using
its own connection and will not participate in any
transactions started for the rest of the step processing. If
you set this flag to true then you must wrap the
<classname>DataSource</classname> in an
<classname>ExtendedConnectionDataSourceProxy</classname> to
prevent the connection from being closed and released after
each commit. When you set this option to true then the
statement used to open the cursor will be created with both
'READ_ONLY' and 'HOLD_CUSORS_OVER_COMMIT' options. This
allows holding the cursor open over transaction start and
commits performed in the step processing. To use this
feature you need a database that supports this and a Jdbc
driver supporting Jdbc 3.0 or later.</entry>
</row>
</tbody>
</tgroup>
</table>
</section>
</section>
<section id="HibernateCursorItemReader">
<title>HibernateCursorItemReader</title>
<para>Just as normal Spring users make important decisions about
whether or not to use ORM solutions, which affect whether or not they
use a <classname>JdbcTemplate</classname> or a
<classname>HibernateTemplate</classname>, Spring Batch users have the
same options. <classname>HibernateCursorItemReader</classname> is the
Hibernate implementation of the cursor technique. Hibernate's usage in
batch has been fairly controversial. This has largely been because
Hibernate was originally developed to support online application
styles. However, that doesn't mean it can't be used for batch
processing. The easiest approach for solving this problem is to use a
<classname>StatelessSession</classname> rather than a standard
session. This removes all of the caching and dirty checking hibernate
employs that can cause issues in a batch scenario. For more
information on the differences between stateless and normal hibernate
sessions, refer to the documentation of your specific hibernate
release. The <classname>HibernateCursorItemReader</classname> allows
you to declare an HQL statement and pass in a
<classname>SessionFactory</classname>, which will pass back one item
per call to <methodname>read</methodname> in the same basic fashion as
the <classname>JdbcCursorItemReader</classname>. Below is an example
configuration using the same 'customer credit' example as the JDBC
reader:</para>
<programlisting language="java">HibernateCursorItemReader itemReader = new HibernateCursorItemReader();
itemReader.setQueryString("from CustomerCredit");
//For simplicity sake, assume sessionFactory already obtained.
itemReader.setSessionFactory(sessionFactory);
itemReader.setUseStatelessSession(true);
int counter = 0;
ExecutionContext executionContext = new ExecutionContext();
itemReader.open(executionContext);
Object customerCredit = new Object();
while(customerCredit != null){
customerCredit = itemReader.read();
counter++;
}
itemReader.close(executionContext);</programlisting>
<para>This configured <classname>ItemReader</classname> will return
<classname>CustomerCredit</classname> objects in the exact same manner
as described by the <classname>JdbcCursorItemReader</classname>,
assuming hibernate mapping files have been created correctly for the
Customer table. The 'useStatelessSession' property defaults to true,
but has been added here to draw attention to the ability to switch it
on or off. It is also worth noting that the fetchSize of the
underlying cursor can be set via the setFetchSize property. As with
<classname>JdbcCursorItemReader</classname>, configuration is
straightforward:</para>
<programlisting language="xml">&lt;bean id="itemReader"
class="org.springframework.batch.item.database.HibernateCursorItemReader"&gt;
&lt;property name="sessionFactory" ref="sessionFactory" /&gt;
&lt;property name="queryString" value="from CustomerCredit" /&gt;
&lt;/bean&gt;</programlisting>
</section>
<section id="StoredProcedureItemReader">
<title>StoredProcedureItemReader</title>
<para>Sometimes it is necessary to obtain the cursor data using a
stored procedure. The <classname>StoredProcedureItemReader</classname>
works like the <classname>JdbcCursorItemReader</classname> except that
instead of executing a query to obtain a cursor we execute a stored
procedure that returns a cursor. The stored procedure can return the
cursor in three different ways:</para>
<orderedlist>
<listitem>
<para>as a returned ResultSet (used by SQL Server, Sybase, DB2,
Derby and MySQL)</para>
</listitem>
<listitem>
<para>as a ref-cursor returned as an out parameter (used by Oracle
and PostgreSQL)</para>
</listitem>
<listitem>
<para>as the return value of a stored function call</para>
</listitem>
</orderedlist>
<para>Below is a basic example configuration using the same 'customer
credit' example as earlier:</para>
<programlisting language="xml">&lt;bean id="reader" class="o.s.batch.item.database.StoredProcedureItemReader"&gt;
&lt;property name="dataSource" ref="dataSource"/&gt;
&lt;property name="procedureName" value="sp_customer_credit"/&gt;
&lt;property name="rowMapper"&gt;
&lt;bean class="org.springframework.batch.sample.domain.CustomerCreditRowMapper"/&gt;
&lt;/property&gt;
&lt;/bean&gt;
</programlisting>
<para>This example relies on the stored procedure to provide a
ResultSet as a returned result (option 1 above). </para>
<para>If the stored procedure returned a ref-cursor (option 2) then we
would need to provide the position of the out parameter that is the
returned ref-cursor. Here is an example where the first parameter is
the returned ref-cursor:</para>
<programlisting language="xml">&lt;bean id="reader" class="o.s.batch.item.database.StoredProcedureItemReader"&gt;
&lt;property name="dataSource" ref="dataSource"/&gt;
&lt;property name="procedureName" value="sp_customer_credit"/&gt;
&lt;property name="refCursorPosition" value="1"/&gt;
&lt;property name="rowMapper"&gt;
&lt;bean class="org.springframework.batch.sample.domain.CustomerCreditRowMapper"/&gt;
&lt;/property&gt;
&lt;/bean&gt;
</programlisting>
<para>If the cursor was returned from a stored function (option 3) we
would need to set the property "<varname>function</varname>" to
<literal>true</literal>. It defaults to <literal>false</literal>. Here
is what that would look like:</para>
<programlisting language="xml">&lt;bean id="reader" class="o.s.batch.item.database.StoredProcedureItemReader"&gt;
&lt;property name="dataSource" ref="dataSource"/&gt;
&lt;property name="procedureName" value="sp_customer_credit"/&gt;
&lt;property name="function" value="true"/&gt;
&lt;property name="rowMapper"&gt;
&lt;bean class="org.springframework.batch.sample.domain.CustomerCreditRowMapper"/&gt;
&lt;/property&gt;
&lt;/bean&gt;
</programlisting>
<para>In all of these cases we need to define a
<classname>RowMapper</classname> as well as a
<classname>DataSource</classname> and the actual procedure
name.</para>
<para>If the stored procedure or function takes in parameter then they
must be declared and set via the parameters property. Here is an
example for Oracle that declares three parameters. The first one is
the out parameter that returns the ref-cursor, the second and third
are in parameters that takes a value of type INTEGER:</para>
<programlisting language="xml">&lt;bean id="reader" class="o.s.batch.item.database.StoredProcedureItemReader"&gt;
&lt;property name="dataSource" ref="dataSource"/&gt;
&lt;property name="procedureName" value="spring.cursor_func"/&gt;
&lt;property name="parameters"&gt;
&lt;list&gt;
&lt;bean class="org.springframework.jdbc.core.SqlOutParameter"&gt;
&lt;constructor-arg index="0" value="newid"/&gt;
&lt;constructor-arg index="1"&gt;
&lt;util:constant static-field="oracle.jdbc.OracleTypes.CURSOR"/&gt;
&lt;/constructor-arg&gt;
&lt;/bean&gt;
&lt;bean class="org.springframework.jdbc.core.SqlParameter"&gt;
&lt;constructor-arg index="0" value="amount"/&gt;
&lt;constructor-arg index="1"&gt;
&lt;util:constant static-field="java.sql.Types.INTEGER"/&gt;
&lt;/constructor-arg&gt;
&lt;/bean&gt;
&lt;bean class="org.springframework.jdbc.core.SqlParameter"&gt;
&lt;constructor-arg index="0" value="custid"/&gt;
&lt;constructor-arg index="1"&gt;
&lt;util:constant static-field="java.sql.Types.INTEGER"/&gt;
&lt;/constructor-arg&gt;
&lt;/bean&gt;
&lt;/list&gt;
&lt;/property&gt;
&lt;property name="refCursorPosition" value="1"/&gt;
&lt;property name="rowMapper" ref="rowMapper"/&gt;
&lt;property name="preparedStatementSetter" ref="parameterSetter"/&gt;
&lt;/bean&gt;</programlisting>
<para>In addition to the parameter declarations we need to specify a
<classname>PreparedStatementSetter</classname> implementation that
sets the parameter values for the call. This works the same as for the
<classname>JdbcCursorItemReader</classname> above. All the additional
properties listed in <xref linkend="JdbcCursorItemReaderProperties" />
apply to the <classname>StoredProcedureItemReader</classname> as well.
</para>
</section>
</section>
<section id="pagingItemReaders">
<title>Paging ItemReaders</title>
<para>An alternative to using a database cursor is executing multiple
queries where each query is bringing back a portion of the results. We
refer to this portion as a page. Each query that is executed must
specify the starting row number and the number of rows that we want
returned for the page.</para>
<section id="JdbcPagingItemReader">
<title>JdbcPagingItemReader</title>
<para>One implementation of a paging <classname>ItemReader</classname>
is the <classname>JdbcPagingItemReader</classname>. The
<classname>JdbcPagingItemReader</classname> needs a
<classname>PagingQueryProvider</classname> responsible for providing
the SQL queries used to retrieve the rows making up a page. Since each
database has its own strategy for providing paging support, we need to
use a different <classname>PagingQueryProvider</classname> for each
supported database type. There is also the
<classname>SqlPagingQueryProviderFactoryBean</classname> that will
auto-detect the database that is being used and determine the
appropriate <classname>PagingQueryProvider</classname> implementation.
This simplifies the configuration and is the recommended best
practice.</para>
<para>The <classname>SqlPagingQueryProviderFactoryBean</classname>
requires that you specify a select clause and a from clause. You can
also provide an optional where clause. These clauses will be used to
build an SQL statement combined with the required sortKey.</para>
<para>After the reader has been opened, it will pass back one item per
call to <methodname>read</methodname> in the same basic fashion as any
other <classname>ItemReader</classname>. The paging happens behind the
scenes when additional rows are needed.</para>
<para>Below is an example configuration using a similar 'customer
credit' example as the cursor based ItemReaders above:</para>
<programlisting language="xml">&lt;bean id="itemReader" class="org.spr...JdbcPagingItemReader"&gt;
&lt;property name="dataSource" ref="dataSource"/&gt;
&lt;property name="queryProvider"&gt;
&lt;bean class="org.spr...SqlPagingQueryProviderFactoryBean"&gt;
&lt;property name="selectClause" value="select id, name, credit"/&gt;
&lt;property name="fromClause" value="from customer"/&gt;
&lt;property name="whereClause" value="where status=:status"/&gt;
&lt;property name="sortKey" value="id"/&gt;
&lt;/bean&gt;
&lt;/property&gt;
&lt;property name="parameterValues"&gt;
&lt;map&gt;
&lt;entry key="status" value="NEW"/&gt;
&lt;/map&gt;
&lt;/property&gt;
&lt;property name="pageSize" value="1000"/&gt;
&lt;property name="rowMapper" ref="customerMapper"/&gt;
&lt;/bean&gt;</programlisting>
<para>This configured <classname>ItemReader</classname> will return
<classname>CustomerCredit</classname> objects using the
<classname>RowMapper</classname> that must be specified. The
'pageSize' property determines the number of entities read from the
database for each query execution.</para>
<para>The 'parameterValues' property can be used to specify a Map of
parameter values for the query. If you use named parameters in the
where clause the key for each entry should match the name of the named
parameter. If you use a traditional '?' placeholder then the key for
each entry should be the number of the placeholder, starting with
1.</para>
</section>
<section id="JpaPagingItemReader">
<title>JpaPagingItemReader</title>
<para>Another implementation of a paging
<classname>ItemReader</classname> is the
<classname>JpaPagingItemReader</classname>. JPA doesn't have a concept
similar to the Hibernate <classname>StatelessSession</classname> so we
have to use other features provided by the JPA specification. Since
JPA supports paging, this is a natural choice when it comes to using
JPA for batch processing. After each page is read, the entities will
become detached and the persistence context will be cleared in order
to allow the entities to be garbage collected once the page is
processed.</para>
<para>The <classname>JpaPagingItemReader</classname> allows you to
declare a JPQL statement and pass in a
<classname>EntityManagerFactory</classname>. It will then pass back
one item per call to <methodname>read</methodname> in the same basic
fashion as any other <classname>ItemReader</classname>. The paging
happens behind the scenes when additional entities are needed. Below
is an example configuration using the same 'customer credit' example
as the JDBC reader above:</para>
<programlisting language="xml">&lt;bean id="itemReader" class="org.spr...JpaPagingItemReader"&gt;
&lt;property name="entityManagerFactory" ref="entityManagerFactory"/&gt;
&lt;property name="queryString" value="select c from CustomerCredit c"/&gt;
&lt;property name="pageSize" value="1000"/&gt;
&lt;/bean&gt;</programlisting>
<para>This configured <classname>ItemReader</classname> will return
<classname>CustomerCredit</classname> objects in the exact same manner
as described by the <classname>JdbcPagingItemReader</classname> above,
assuming the Customer object has the correct JPA annotations or ORM
mapping file. The 'pageSize' property determines the number of
entities read from the database for each query execution.</para>
</section>
<section id="IbatisPagingItemReader">
<title>IbatisPagingItemReader</title>
<note>This reader is deprecated as of Spring Batch 3.0.</note>
<para>If you use IBATIS for your data access then you can use the
<classname>IbatisPagingItemReader</classname> which, as the name
indicates, is an implementation of a paging
<classname>ItemReader</classname>. IBATIS doesn't have direct support
for reading rows in pages but by providing a couple of standard
variables you can add paging support to your IBATIS queries.</para>
<para>Here is an example of a configuration for a
<classname>IbatisPagingItemReader</classname> reading CustomerCredits
as in the examples above:</para>
<programlisting language="xml">&lt;bean id="itemReader" class="org.spr...IbatisPagingItemReader"&gt;
&lt;property name="sqlMapClient" ref="sqlMapClient"/&gt;
&lt;property name="queryId" value="getPagedCustomerCredits"/&gt;
&lt;property name="pageSize" value="1000"/&gt;
&lt;/bean&gt;</programlisting>
<para>The <classname>IbatisPagingItemReader</classname> configuration
above references an IBATIS query called "getPagedCustomerCredits".
Here is an example of what that query should look like for
MySQL.</para>
<programlisting language="xml">&lt;select id="getPagedCustomerCredits" resultMap="customerCreditResult"&gt;
select id, name, credit from customer order by id asc LIMIT #_skiprows#, #_pagesize#
&lt;/select&gt;</programlisting>
<para>The <classname>_skiprows</classname> and
<classname>_pagesize</classname> variables are provided by the
<classname>IbatisPagingItemReader</classname> and there is also a
<classname>_page</classname> variable that can be used if necessary.
The syntax for the paging queries varies with the database used. Here
is an example for Oracle (unfortunately we need to use CDATA for some
operators since this belongs in an XML document):</para>
<programlisting language="xml">&lt;select id="getPagedCustomerCredits" resultMap="customerCreditResult"&gt;
select * from (
select * from (
select t.id, t.name, t.credit, ROWNUM ROWNUM_ from customer t order by id
)) where ROWNUM_ &lt;![CDATA[ &gt; ]]&gt; ( #_page# * #_pagesize# )
) where ROWNUM &lt;![CDATA[ &lt;= ]]&gt; #_pagesize#
&lt;/select&gt;</programlisting>
</section>
</section>
<section id="databaseItemWriters">
<title>Database ItemWriters</title>
<para>While both Flat Files and XML have specific ItemWriters, there is
no exact equivalent in the database world. This is because transactions
provide all the functionality that is needed. ItemWriters are necessary
for files because they must act as if they're transactional, keeping
track of written items and flushing or clearing at the appropriate
times. Databases have no need for this functionality, since the write is
already contained in a transaction. Users can create their own DAOs that
implement the <classname>ItemWriter</classname> interface or use one
from a custom <classname>ItemWriter</classname> that's written for
generic processing concerns, either way, they should work without any
issues. One thing to look out for is the performance and error handling
capabilities that are provided by batching the outputs. This is most
common when using hibernate as an <classname>ItemWriter</classname>, but
could have the same issues when using Jdbc batch mode. Batching database
output doesn't have any inherent flaws, assuming we are careful to flush
and there are no errors in the data. However, any errors while writing
out can cause confusion because there is no way to know which individual
item caused an exception, or even if any individual item was
responsible, as illustrated below:</para>
<para><mediaobject>
<imageobject role="html">
<imagedata align="center" fileref="images/errorOnFlush.png"
scale="95" />
</imageobject>
<imageobject role="fo">
<imagedata align="center" fileref="images/errorOnFlush.png"
scale="80" />
</imageobject>
</mediaobject>If items are buffered before being written out, any
errors encountered will not be thrown until the buffer is flushed just
before a commit. For example, let's assume that 20 items will be written
per chunk, and the 15th item throws a DataIntegrityViolationException.
As far as the Step is concerned, all 20 item will be written out
successfully, since there's no way to know that an error will occur
until they are actually written out. Once
<classname>Session#</classname><methodname>flush</methodname>() is
called, the buffer will be emptied and the exception will be hit. At
this point, there's nothing the <classname>Step</classname> can do, the
transaction must be rolled back. Normally, this exception might cause
the Item to be skipped (depending upon the skip/retry policies), and
then it won't be written out again. However, in the batched scenario,
there's no way for it to know which item caused the issue, the whole
buffer was being written out when the failure happened. The only way to
solve this issue is to flush after each item:</para>
<mediaobject>
<imageobject>
<imagedata align="center" fileref="images/errorOnWrite.png"
scale="95" />
</imageobject>
<imageobject role="fo">
<imagedata align="center" fileref="images/errorOnWrite.png"
scale="80" />
</imageobject>
</mediaobject>
<para>This is a common use case, especially when using Hibernate, and
the simple guideline for implementations of
<classname>ItemWriter</classname>, is to flush on each call to
<methodname>write()</methodname>. Doing so allows for items to be
skipped reliably, with Spring Batch taking care internally of the
granularity of the calls to <classname>ItemWriter</classname> after an
error.</para>
</section>
</section>
<section id="reusingExistingServices">
<title>Reusing Existing Services</title>
<para>Batch systems are often used in conjunction with other application
styles. The most common is an online system, but it may also support
integration or even a thick client application by moving necessary bulk
data that each application style uses. For this reason, it is common that
many users want to reuse existing DAOs or other services within their
batch jobs. The Spring container itself makes this fairly easy by allowing
any necessary class to be injected. However, there may be cases where the
existing service needs to act as an <classname>ItemReader</classname> or
<classname>ItemWriter</classname>, either to satisfy the dependency of
another Spring Batch class, or because it truly is the main
<classname>ItemReader</classname> for a step. It is fairly trivial to
write an adaptor class for each service that needs wrapping, but because
it is such a common concern, Spring Batch provides implementations:
<classname>ItemReaderAdapter</classname> and
<classname>ItemWriterAdapter</classname>. Both classes implement the
standard Spring method invoking the delegate pattern and are fairly simple
to set up. Below is an example of the reader:</para>
<programlisting language="xml">&lt;bean id="itemReader" class="org.springframework.batch.item.adapter.ItemReaderAdapter"&gt;
&lt;property name="targetObject" ref="fooService" /&gt;
&lt;property name="targetMethod" value="generateFoo" /&gt;
&lt;/bean&gt;
&lt;bean id="fooService" class="org.springframework.batch.item.sample.FooService" /&gt;</programlisting>
<para>One important point to note is that the contract of the targetMethod
must be the same as the contract for <methodname>read</methodname>: when
exhausted it will return null, otherwise an <classname>Object</classname>.
Anything else will prevent the framework from knowing when processing
should end, either causing an infinite loop or incorrect failure,
depending upon the implementation of the
<classname>ItemWriter</classname>. The <classname>ItemWriter</classname>
implementation is equally as simple:</para>
<programlisting language="xml">&lt;bean id="itemWriter" class="org.springframework.batch.item.adapter.ItemWriterAdapter"&gt;
&lt;property name="targetObject" ref="fooService" /&gt;
&lt;property name="targetMethod" value="processFoo" /&gt;
&lt;/bean&gt;
&lt;bean id="fooService" class="org.springframework.batch.item.sample.FooService" /&gt;
</programlisting>
</section>
<section id="validatingInput">
<title id="infrastructure.5">Validating Input</title>
<para>During the course of this chapter, multiple approaches to parsing
input have been discussed. Each major implementation will throw an
exception if it is not 'well-formed'. The
<classname>FixedLengthTokenizer</classname> will throw an exception if a
range of data is missing. Similarly, attempting to access an index in a
<classname>RowMapper</classname> of <classname>FieldSetMapper</classname>
that doesn't exist or is in a different format than the one expected will
cause an exception to be thrown. All of these types of exceptions will be
thrown before <methodname>read</methodname> returns. However, they don't
address the issue of whether or not the returned item is valid. For
example, if one of the fields is an age, it obviously cannot be negative.
It will parse correctly, because it existed and is a number, but it won't
cause an exception. Since there are already a plethora of Validation
frameworks, Spring Batch does not attempt to provide yet another, but
rather provides a very simple interface that can be implemented by any
number of frameworks:</para>
<programlisting language="java">public interface Validator {
void validate(Object value) throws ValidationException;
}</programlisting>
<para>The contract is that the <methodname>validate</methodname> method
will throw an exception if the object is invalid, and return normally if
it is valid. Spring Batch provides an out of the box
<classname>ItemProcessor:</classname></para>
<programlisting language="xml">&lt;bean class="org.springframework.batch.item.validator.ValidatingItemProcessor"&gt;
&lt;property name="validator" ref="validator" /&gt;
&lt;/bean&gt;
&lt;bean id="validator"
class="org.springframework.batch.item.validator.SpringValidator"&gt;
&lt;property name="validator"&gt;
&lt;bean id="orderValidator"
class="org.springmodules.validation.valang.ValangValidator"&gt;
&lt;property name="valang"&gt;
&lt;value&gt;
&lt;![CDATA[
{ orderId : ? &gt; 0 AND ? &lt;= 9999999999 : 'Incorrect order ID' : 'error.order.id' }
{ totalLines : ? = size(lineItems) : 'Bad count of order lines'
: 'error.order.lines.badcount'}
{ customer.registered : customer.businessCustomer = FALSE OR ? = TRUE
: 'Business customer must be registered'
: 'error.customer.registration'}
{ customer.companyName : customer.businessCustomer = FALSE OR ? HAS TEXT
: 'Company name for business customer is mandatory'
:'error.customer.companyname'}
]]&gt;
&lt;/value&gt;
&lt;/property&gt;
&lt;/bean&gt;
&lt;/property&gt;
&lt;/bean&gt;</programlisting>
<para>This simple example shows a simple
<classname>ValangValidator</classname> that is used to validate an order
object. The intent is not to show Valang functionality as much as to show
how a validator could be added.</para>
</section>
<section id="process-indicator">
<title>Preventing State Persistence</title>
<para>By default, all of the <classname>ItemReader</classname> and
<classname>ItemWriter</classname> implementations store their current
state in the <classname>ExecutionContext</classname> before it is
committed. However, this may not always be the desired behavior. For
example, many developers choose to make their database readers
'rerunnable' by using a process indicator. An extra column is added to the
input data to indicate whether or not it has been processed. When a
particular record is being read (or written out) the processed flag is
flipped from false to true. The SQL statement can then contain an extra
statement in the where clause, such as "where PROCESSED_IND = false",
thereby ensuring that only unprocessed records will be returned in the
case of a restart. In this scenario, it is preferable to not store any
state, such as the current row number, since it will be irrelevant upon
restart. For this reason, all readers and writers include the 'saveState'
property:</para>
<programlisting language="xml">&lt;bean id="playerSummarizationSource" class="org.spr...JdbcCursorItemReader"&gt;
&lt;property name="dataSource" ref="dataSource" /&gt;
&lt;property name="rowMapper"&gt;
&lt;bean class="org.springframework.batch.sample.PlayerSummaryMapper" /&gt;
&lt;/property&gt;
<emphasis role="bold">&lt;property name="saveState" value="false" /&gt;</emphasis>
&lt;property name="sql"&gt;
&lt;value&gt;
SELECT games.player_id, games.year_no, SUM(COMPLETES),
SUM(ATTEMPTS), SUM(PASSING_YARDS), SUM(PASSING_TD),
SUM(INTERCEPTIONS), SUM(RUSHES), SUM(RUSH_YARDS),
SUM(RECEPTIONS), SUM(RECEPTIONS_YARDS), SUM(TOTAL_TD)
from games, players where players.player_id =
games.player_id group by games.player_id, games.year_no
&lt;/value&gt;
&lt;/property&gt;
&lt;/bean&gt;</programlisting>
<para>The <classname>ItemReader</classname> configured above will not make
any entries in the <classname>ExecutionContext</classname> for any
executions in which it participates.</para>
</section>
<section id="customReadersWriters">
<title id="infrastructure.1.1">Creating Custom ItemReaders and
ItemWriters</title>
<para>So far in this chapter the basic contracts that exist for reading
and writing in Spring Batch and some common implementations have been
discussed. However, these are all fairly generic, and there are many
potential scenarios that may not be covered by out of the box
implementations. This section will show, using a simple example, how to
create a custom <classname>ItemReader</classname> and
<classname>ItemWriter</classname> implementation and implement their
contracts correctly. The <classname>ItemReader</classname> will also
implement <classname>ItemStream</classname>, in order to illustrate how to
make a reader or writer restartable.</para>
<section id="customReader">
<title>Custom ItemReader Example</title>
<para>For the purpose of this example, a simple
<classname>ItemReader</classname> implementation that reads from a
provided list will be created. We'll start out by implementing the most
basic contract of <classname>ItemReader</classname>,
<methodname>read</methodname>:</para>
<programlisting language="java">public class CustomItemReader&lt;T&gt; implements ItemReader&lt;T&gt;{
List&lt;T&gt; items;
public CustomItemReader(List&lt;T&gt; items) {
this.items = items;
}
public T read() throws Exception, UnexpectedInputException,
NoWorkFoundException, ParseException {
if (!items.isEmpty()) {
return items.remove(0);
}
return null;
}
}</programlisting>
<para>This very simple class takes a list of items, and returns them one
at a time, removing each from the list. When the list is empty, it
returns null, thus satisfying the most basic requirements of an
<classname>ItemReader</classname>, as illustrated below:</para>
<programlisting language="java">List&lt;String&gt; items = new ArrayList&lt;String&gt;();
items.add("1");
items.add("2");
items.add("3");
ItemReader itemReader = new CustomItemReader&lt;String&gt;(items);
assertEquals("1", itemReader.read());
assertEquals("2", itemReader.read());
assertEquals("3", itemReader.read());
assertNull(itemReader.read());</programlisting>
<section id="restartableReader">
<title>Making the <classname>ItemReader</classname>
Restartable</title>
<para>The final challenge now is to make the
<classname>ItemReader</classname> restartable. Currently, if the power
goes out, and processing begins again, the
<classname>ItemReader</classname> must start at the beginning. This is
actually valid in many scenarios, but it is sometimes preferable that
a batch job starts where it left off. The key discriminant is often
whether the reader is stateful or stateless. A stateless reader does
not need to worry about restartability, but a stateful one has to try
and reconstitute its last known state on restart. For this reason, we
recommend that you keep custom readers stateless if possible, so you
don't have to worry about restartability.</para>
<para>If you do need to store state, then the
<classname>ItemStream</classname> interface should be used:</para>
<programlisting language="java">public class CustomItemReader&lt;T&gt; implements ItemReader&lt;T&gt;, ItemStream {
List&lt;T&gt; items;
int currentIndex = 0;
private static final String CURRENT_INDEX = "current.index";
public CustomItemReader(List&lt;T&gt; items) {
this.items = items;
}
public T read() throws Exception, UnexpectedInputException,
ParseException {
if (currentIndex &lt; items.size()) {
return items.get(currentIndex++);
}
return null;
}
public void open(ExecutionContext executionContext) throws ItemStreamException {
if(executionContext.containsKey(CURRENT_INDEX)){
currentIndex = new Long(executionContext.getLong(CURRENT_INDEX)).intValue();
}
else{
currentIndex = 0;
}
}
public void update(ExecutionContext executionContext) throws ItemStreamException {
executionContext.putLong(CURRENT_INDEX, new Long(currentIndex).longValue());
}
public void close() throws ItemStreamException {}
}</programlisting>
<para>On each call to the <classname>ItemStream</classname>
<methodname>update</methodname> method, the current index of the
<classname>ItemReader</classname> will be stored in the provided
<classname>ExecutionContext</classname> with a key of 'current.index'.
When the <classname>ItemStream</classname> <classname>open</classname>
method is called, the <classname>ExecutionContext</classname> is
checked to see if it contains an entry with that key. If the key is
found, then the current index is moved to that location. This is a
fairly trivial example, but it still meets the general
contract:</para>
<programlisting language="java">ExecutionContext executionContext = new ExecutionContext();
((ItemStream)itemReader).open(executionContext);
assertEquals("1", itemReader.read());
((ItemStream)itemReader).update(executionContext);
List&lt;String&gt; items = new ArrayList&lt;String&gt;();
items.add("1");
items.add("2");
items.add("3");
itemReader = new CustomItemReader&lt;String&gt;(items);
((ItemStream)itemReader).open(executionContext);
assertEquals("2", itemReader.read());</programlisting>
<para>Most ItemReaders have much more sophisticated restart logic. The
<classname>JdbcCursorItemReader</classname>, for example, stores the
row id of the last processed row in the Cursor.</para>
<para>It is also worth noting that the key used within the
<classname>ExecutionContext</classname> should not be trivial. That is
because the same <classname>ExecutionContext</classname> is used for
all <classname>ItemStream</classname>s within a
<classname>Step</classname>. In most cases, simply prepending the key
with the class name should be enough to guarantee uniqueness. However,
in the rare cases where two of the same type of
<classname>ItemStream</classname> are used in the same step (which can
happen if two files are need for output) then a more unique name will
be needed. For this reason, many of the Spring Batch
<classname>ItemReader</classname> and
<classname>ItemWriter</classname> implementations have a
<methodname>setName</methodname>() property that allows this key name
to be overridden.</para>
</section>
</section>
<section id="customWriter">
<title>Custom ItemWriter Example</title>
<para>Implementing a Custom <classname>ItemWriter</classname> is similar
in many ways to the <classname>ItemReader</classname> example above, but
differs in enough ways as to warrant its own example. However, adding
restartability is essentially the same, so it won't be covered in this
example. As with the <classname>ItemReader</classname> example, a
<classname>List</classname> will be used in order to keep the example as
simple as possible:</para>
<programlisting language="java">public class CustomItemWriter&lt;T&gt; implements ItemWriter&lt;T&gt; {
List&lt;T&gt; output = TransactionAwareProxyFactory.createTransactionalList();
public void write(List&lt;? extends T&gt; items) throws Exception {
output.addAll(items);
}
public List&lt;T&gt; getOutput() {
return output;
}
}</programlisting>
<section id="restartableWriter">
<title>Making the <classname>ItemWriter</classname>
Restartable</title>
<para>To make the ItemWriter restartable we would follow the same
process as for the <classname>ItemReader</classname>, adding and
implementing the <classname>ItemStream</classname> interface to
synchronize the execution context. In the example we might have to
count the number of items processed and add that as a footer record.
If we needed to do that, we could implement
<classname>ItemStream</classname> in our
<classname>ItemWriter</classname> so that the counter was
reconstituted from the execution context if the stream was
re-opened.</para>
<para>In many realistic cases, custom ItemWriters also delegate to
another writer that itself is restartable (e.g. when writing to a
file), or else it writes to a transactional resource so doesn't need
to be restartable because it is stateless. When you have a stateful
writer you should probably also be sure to implement
<classname>ItemStream</classname> as well as
<classname>ItemWriter</classname>. Remember also that the client of
the writer needs to be aware of the <classname>ItemStream</classname>,
so you may need to register it as a stream in the configuration
xml.</para>
</section>
</section>
</section>
</chapter>