1737 lines
175 KiB
HTML
1737 lines
175 KiB
HTML
<html><head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
|
<title>6. ItemReaders and ItemWriters</title><link rel="stylesheet" type="text/css" href="css/manual-multipage.css"><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="Spring Batch - Reference Documentation"><link rel="up" href="index.html" title="Spring Batch - Reference Documentation"><link rel="prev" href="configureStep.html" title="5. Configuring a Step"><link rel="next" href="scalability.html" title="7. Scaling and Parallel Processing"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">6. ItemReaders and ItemWriters</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="configureStep.html">Prev</a> </td><th width="60%" align="center"> </th><td width="20%" align="right"> <a accesskey="n" href="scalability.html">Next</a></td></tr></table><hr></div><div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="readersAndWriters" href="#readersAndWriters"></a>6. ItemReaders and ItemWriters</h1></div></div></div><p>All batch processing can be described in its most simple form as
|
|
reading in large amounts of data, performing some type of calculation or
|
|
transformation, and writing the result out. Spring Batch provides three key
|
|
interfaces to help perform bulk reading and writing:
|
|
<code class="classname">ItemReader</code>, <code class="classname">ItemProcessor</code> and
|
|
<code class="classname">ItemWriter</code>.</p><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="itemReader" href="#itemReader"></a>6.1 ItemReader</h2></div></div></div><p>Although a simple concept, an <code class="classname">ItemReader</code> is
|
|
the means for providing data from many different types of input. The most
|
|
general examples include: </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Flat File- Flat File Item Readers read lines of data from a
|
|
flat file that typically describe records with fields of data
|
|
defined by fixed positions in the file or delimited by some special
|
|
character (e.g. Comma).</p></li><li class="listitem"><p>XML - XML ItemReaders process XML independently of
|
|
technologies used for parsing, mapping and validating objects. Input
|
|
data allows for the validation of an XML file against an XSD
|
|
schema.</p></li><li class="listitem"><p>Database - A database resource is accessed to return
|
|
resultsets which can be mapped to objects for processing. The
|
|
default SQL ItemReaders invoke a <code class="classname">RowMapper</code> to
|
|
return objects, keep track of the current row if restart is
|
|
required, store basic statistics, and provide some transaction
|
|
enhancements that will be explained later.</p></li></ul></div><p>There are many more possibilities, but we'll focus on the
|
|
basic ones for this chapter. A complete list of all available ItemReaders
|
|
can be found in Appendix A.</p><p><code class="classname">ItemReader</code> is a basic interface for generic
|
|
input operations:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ItemReader<T> {
|
|
|
|
T read() <span class="hl-keyword">throws</span> Exception, UnexpectedInputException, ParseException;
|
|
|
|
}</pre><p>The <code class="methodname">read</code> method defines the most essential
|
|
contract of the <code class="classname">ItemReader</code>; calling it returns one
|
|
Item or null if no more items are left. An item might represent a line in
|
|
a file, a row in a database, or an element in an XML file. It is generally
|
|
expected that these will be mapped to a usable domain object (i.e. Trade,
|
|
Foo, etc) but there is no requirement in the contract to do so.</p><p>It is expected that implementations of the
|
|
<code class="classname">ItemReader</code> interface will be forward only. However,
|
|
if the underlying resource is transactional (such as a JMS queue) then
|
|
calling read may return the same logical item on subsequent calls in a
|
|
rollback scenario. It is also worth noting that a lack of items to process
|
|
by an <code class="classname">ItemReader</code> will not cause an exception to be
|
|
thrown. For example, a database <code class="classname">ItemReader</code> that is
|
|
configured with a query that returns 0 results will simply return null on
|
|
the first invocation of <code class="methodname">read</code>.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="itemWriter" href="#itemWriter"></a>6.2 ItemWriter</h2></div></div></div><p><code class="classname">ItemWriter</code> is similar in functionality to an
|
|
<code class="classname">ItemReader</code>, but with inverse operations. Resources
|
|
still need to be located, opened and closed but they differ in that an
|
|
<code class="classname">ItemWriter</code> writes out, rather than reading in. In
|
|
the case of databases or queues these may be inserts, updates, or sends.
|
|
The format of the serialization of the output is specific to each batch
|
|
job.</p><p>As with <code class="classname">ItemReader</code>,
|
|
<code class="classname">ItemWriter</code> is a fairly generic interface:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ItemWriter<T> {
|
|
|
|
<span class="hl-keyword">void</span> write(List<? <span class="hl-keyword">extends</span> T> items) <span class="hl-keyword">throws</span> Exception;
|
|
|
|
}</pre><p>As with <code class="methodname">read</code> on
|
|
<code class="classname">ItemReader</code>, <code class="methodname">write</code> provides
|
|
the basic contract of <code class="classname">ItemWriter</code>; it will attempt
|
|
to write out the list of items passed in as long as it is open. Because it
|
|
is generally expected that items will be 'batched' together into a chunk
|
|
and then output, the interface accepts a list of items, rather than an
|
|
item by itself. After writing out the list, any flushing that may be
|
|
necessary can be performed before returning from the write method. For
|
|
example, if writing to a Hibernate DAO, multiple calls to write can be
|
|
made, one for each item. The writer can then call close on the hibernate
|
|
Session before returning.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="itemProcessor" href="#itemProcessor"></a>6.3 ItemProcessor</h2></div></div></div><p>The <code class="classname">ItemReader</code> and
|
|
<code class="classname">ItemWriter</code> interfaces are both very useful for
|
|
their specific tasks, but what if you want to insert business logic before
|
|
writing? One option for both reading and writing is to use the composite
|
|
pattern: create an <code class="classname">ItemWriter</code> that contains another
|
|
<code class="classname">ItemWriter</code>, or an <code class="classname">ItemReader</code>
|
|
that contains another <code class="classname">ItemReader</code>. For
|
|
example:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> CompositeItemWriter<T> <span class="hl-keyword">implements</span> ItemWriter<T> {
|
|
|
|
ItemWriter<T> itemWriter;
|
|
|
|
<span class="hl-keyword">public</span> CompositeItemWriter(ItemWriter<T> itemWriter) {
|
|
<span class="hl-keyword">this</span>.itemWriter = itemWriter;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(List<? <span class="hl-keyword">extends</span> T> items) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-comment">//Add business logic here</span>
|
|
itemWriter.write(item);
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> setDelegate(ItemWriter<T> itemWriter){
|
|
<span class="hl-keyword">this</span>.itemWriter = itemWriter;
|
|
}
|
|
}</pre><p>The class above contains another <code class="classname">ItemWriter</code>
|
|
to which it delegates after having provided some business logic. This
|
|
pattern could easily be used for an <code class="classname">ItemReader</code> as
|
|
well, perhaps to obtain more reference data based upon the input that was
|
|
provided by the main <code class="classname">ItemReader</code>. It is also useful
|
|
if you need to control the call to <code class="classname">write</code> yourself.
|
|
However, if you only want to 'transform' the item passed in for writing
|
|
before it is actually written, there isn't much need to call
|
|
<code class="methodname">write</code> yourself: you just want to modify the item.
|
|
For this scenario, Spring Batch provides the
|
|
<code class="classname">ItemProcessor</code> interface:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ItemProcessor<I, O> {
|
|
|
|
O process(I item) <span class="hl-keyword">throws</span> Exception;
|
|
}</pre><p>An <code class="classname">ItemProcessor</code> is very simple; given one
|
|
object, transform it and return another. The provided object may or may
|
|
not be of the same type. The point is that business logic may be applied
|
|
within process, and is completely up to the developer to create. An
|
|
<code class="classname">ItemProcessor</code> can be wired directly into a step,
|
|
For example, assuming an <code class="classname">ItemReader</code> provides a
|
|
class of type Foo, and it needs to be converted to type Bar before being
|
|
written out. An <code class="classname">ItemProcessor</code> can be written that
|
|
performs the conversion:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> Foo {}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> Bar {
|
|
<span class="hl-keyword">public</span> Bar(Foo foo) {}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> FooProcessor <span class="hl-keyword">implements</span> ItemProcessor<Foo,Bar>{
|
|
<span class="hl-keyword">public</span> Bar process(Foo foo) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-comment">//Perform simple transformation, convert a Foo to a Bar</span>
|
|
<span class="hl-keyword">return</span> <span class="hl-keyword">new</span> Bar(foo);
|
|
}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> BarWriter <span class="hl-keyword">implements</span> ItemWriter<Bar>{
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(List<? <span class="hl-keyword">extends</span> Bar> bars) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-comment">//write bars</span>
|
|
}
|
|
}</pre><p>In the very simple example above, there is a class
|
|
<code class="classname">Foo</code>, a class <code class="classname">Bar</code>, and a
|
|
class <code class="classname">FooProcessor</code> that adheres to the
|
|
<code class="classname">ItemProcessor</code> interface. The transformation is
|
|
simple, but any type of transformation could be done here. The
|
|
<code class="classname">BarWriter</code> will be used to write out
|
|
<code class="classname">Bar</code> objects, throwing an exception if any other
|
|
type is provided. Similarly, the <code class="classname">FooProcessor</code> will
|
|
throw an exception if anything but a <code class="classname">Foo</code> is
|
|
provided. The <code class="classname">FooProcessor</code> can then be injected
|
|
into a <code class="classname">Step</code>:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"ioSampleJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">name</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"fooReader"</span> <span class="hl-attribute">processor</span>=<span class="hl-value">"fooProcessor"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"barWriter"</span>
|
|
<span class="hl-attribute">commit-interval</span>=<span class="hl-value">"2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span></pre><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="chainingItemProcessors" href="#chainingItemProcessors"></a>6.3.1 Chaining ItemProcessors</h3></div></div></div><p>Performing a single transformation is useful in many scenarios,
|
|
but what if you want to 'chain' together multiple
|
|
<code class="classname">ItemProcessor</code>s? This can be accomplished using
|
|
the composite pattern mentioned previously. To update the previous,
|
|
single transformation, example, <code class="classname">Foo</code> will be
|
|
transformed to <code class="classname">Bar</code>, which will be transformed to
|
|
<code class="classname">Foobar</code> and written out:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> Foo {}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> Bar {
|
|
<span class="hl-keyword">public</span> Bar(Foo foo) {}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> Foobar{
|
|
<span class="hl-keyword">public</span> Foobar(Bar bar) {}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> FooProcessor <span class="hl-keyword">implements</span> ItemProcessor<Foo,Bar>{
|
|
<span class="hl-keyword">public</span> Bar process(Foo foo) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-comment">//Perform simple transformation, convert a Foo to a Bar</span>
|
|
<span class="hl-keyword">return</span> <span class="hl-keyword">new</span> Bar(foo);
|
|
}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> BarProcessor <span class="hl-keyword">implements</span> ItemProcessor<Bar,FooBar>{
|
|
<span class="hl-keyword">public</span> FooBar process(Bar bar) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-keyword">return</span> <span class="hl-keyword">new</span> Foobar(bar);
|
|
}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> FoobarWriter <span class="hl-keyword">implements</span> ItemWriter<FooBar>{
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(List<? <span class="hl-keyword">extends</span> FooBar> items) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-comment">//write items</span>
|
|
}
|
|
}</pre><p>A <code class="classname">FooProcessor</code> and
|
|
<code class="classname">BarProcessor</code> can be 'chained' together to give
|
|
the resultant <code class="classname">Foobar</code>:</p><pre class="programlisting">CompositeItemProcessor<Foo,Foobar> compositeProcessor =
|
|
<span class="hl-keyword">new</span> CompositeItemProcessor<Foo,Foobar>();
|
|
List itemProcessors = <span class="hl-keyword">new</span> ArrayList();
|
|
itemProcessors.add(<span class="hl-keyword">new</span> FooTransformer());
|
|
itemProcessors.add(<span class="hl-keyword">new</span> BarTransformer());
|
|
compositeProcessor.setDelegates(itemProcessors);</pre><p>Just as with the previous example, the composite processor can be
|
|
configured into the <code class="classname">Step</code>:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"ioSampleJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">name</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"fooReader"</span> <span class="hl-attribute">processor</span>=<span class="hl-value">"compositeProcessor"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"foobarWriter"</span>
|
|
<span class="hl-attribute">commit-interval</span>=<span class="hl-value">"2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"compositeItemProcessor"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.support.CompositeItemProcessor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"delegates"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><list></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"..FooProcessor"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"..BarProcessor"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></list></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="filiteringRecords" href="#filiteringRecords"></a>6.3.2 Filtering Records</h3></div></div></div><p>One typical use for an item processor is to filter out records
|
|
before they are passed to the ItemWriter. Filtering is an action
|
|
distinct from skipping; skipping indicates that a record is invalid
|
|
whereas filtering simply indicates that a record should not be
|
|
written.</p><p>For example, consider a batch job that reads a file containing
|
|
three different types of records: records to insert, records to update,
|
|
and records to delete. If record deletion is not supported by the
|
|
system, then we would not want to send any "delete" records to the
|
|
<code class="classname">ItemWriter</code>. But, since these records are not
|
|
actually bad records, we would want to filter them out, rather than
|
|
skip. As a result, the ItemWriter would receive only "insert" and
|
|
"update" records.</p><p>To filter a record, one simply returns "null" from the
|
|
<code class="classname">ItemProcessor</code>. The framework will detect that the
|
|
result is "null" and avoid adding that item to the list of records
|
|
delivered to the <code class="classname">ItemWriter</code>. As usual, an
|
|
exception thrown from the <code class="classname">ItemProcessor</code> will
|
|
result in a skip.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="faultTolerant" href="#faultTolerant"></a>6.3.3 Fault Tolerance</h3></div></div></div><p>When a chunk is rolled back, items that have been cached
|
|
during reading may be reprocessed. If a step is configured to
|
|
be fault tolerant (uses skip or retry processing typically),
|
|
any ItemProcessor used should be implemented in a way that is
|
|
idempotent. Typically that would consist of performing no changes
|
|
on the input item for the ItemProcessor and only updating the
|
|
instance that is the result.</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="itemStream" href="#itemStream"></a>6.4 ItemStream</h2></div></div></div><p>Both <code class="classname">ItemReader</code>s and
|
|
<code class="classname">ItemWriter</code>s serve their individual purposes well,
|
|
but there is a common concern among both of them that necessitates another
|
|
interface. In general, as part of the scope of a batch job, readers and
|
|
writers need to be opened, closed, and require a mechanism for persisting
|
|
state:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ItemStream {
|
|
|
|
<span class="hl-keyword">void</span> open(ExecutionContext executionContext) <span class="hl-keyword">throws</span> ItemStreamException;
|
|
|
|
<span class="hl-keyword">void</span> update(ExecutionContext executionContext) <span class="hl-keyword">throws</span> ItemStreamException;
|
|
|
|
<span class="hl-keyword">void</span> close() <span class="hl-keyword">throws</span> ItemStreamException;
|
|
}</pre><p>Before describing each method, we should mention the
|
|
<code class="classname">ExecutionContext</code>. Clients of an
|
|
<code class="classname">ItemReader</code> that also implement
|
|
<code class="classname">ItemStream</code> should call
|
|
<code class="methodname">open</code> before any calls to
|
|
<code class="methodname">read</code> in order to open any resources such as files
|
|
or to obtain connections. A similar restriction applies to an
|
|
<code class="classname">ItemWriter</code> that implements
|
|
<code class="classname">ItemStream</code>. As mentioned in Chapter 2, if expected
|
|
data is found in the <code class="classname">ExecutionContext</code>, it may be
|
|
used to start the <code class="classname">ItemReader</code> or
|
|
<code class="classname">ItemWriter</code> at a location other than its initial
|
|
state. Conversely, <code class="methodname">close</code> will be called to ensure
|
|
that any resources allocated during <code class="methodname">open</code> will be
|
|
released safely. <code class="methodname">update</code> is called primarily to
|
|
ensure that any state currently being held is loaded into the provided
|
|
<code class="classname">ExecutionContext</code>. This method will be called before
|
|
committing, to ensure that the current state is persisted in the database
|
|
before commit.</p><p>In the special case where the client of an
|
|
<code class="classname">ItemStream</code> is a <code class="classname">Step</code> (from
|
|
the Spring Batch Core), an <code class="classname">ExecutionContext</code> is
|
|
created for each <code class="classname">StepExecution</code> to allow users to
|
|
store the state of a particular execution, with the expectation that it
|
|
will be returned if the same <code class="classname">JobInstance</code> is started
|
|
again. For those familiar with Quartz, the semantics are very similar to a
|
|
Quartz <code class="classname">JobDataMap</code>.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="delegatePatternAndRegistering" href="#delegatePatternAndRegistering"></a>6.5 The Delegate Pattern and Registering with the Step</h2></div></div></div><p>Note that the <code class="classname">CompositeItemWriter</code> is an
|
|
example of the delegation pattern, which is common in Spring Batch. The
|
|
delegates themselves might implement callback interfaces <code class="classname">StepListener</code>.
|
|
If they do, and they are being used in conjunction with Spring Batch Core
|
|
as part of a <code class="classname">Step</code> in a <code class="classname">Job</code>,
|
|
then they almost certainly need to be registered manually with the
|
|
<code class="classname">Step</code>. A reader, writer, or processor that is
|
|
directly wired into the Step will be registered automatically if it
|
|
implements <code class="classname">ItemStream</code> or a
|
|
<code class="classname">StepListener</code> interface. But because the delegates
|
|
are not known to the <code class="classname">Step</code>, they need to be injected
|
|
as listeners or streams (or both if appropriate):</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"ioSampleJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">name</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"fooReader"</span> <span class="hl-attribute">processor</span>=<span class="hl-value">"fooProcessor"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"compositeItemWriter"</span>
|
|
<span class="hl-attribute">commit-interval</span>=<span class="hl-value">"2"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><streams></span>
|
|
<span class="hl-tag"><stream</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"barWriter"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></streams></span>
|
|
<span class="hl-tag"></chunk></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"compositeItemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"...CustomCompositeItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"delegate"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"barWriter"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"barWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"...BarWriter"</span><span class="hl-tag"> /></span></pre></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="flatFiles" href="#flatFiles"></a>6.6 Flat Files</h2></div></div></div><p>One of the most common mechanisms for interchanging bulk data has
|
|
always been the flat file. Unlike XML, which has an agreed upon standard
|
|
for defining how it is structured (XSD), anyone reading a flat file must
|
|
understand ahead of time exactly how the file is structured. In general,
|
|
all flat files fall into two types: Delimited and Fixed Length. Delimited
|
|
files are those in which fields are separated by a delimiter, such as a
|
|
comma. Fixed Length files have fields that are a set length.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="fieldSet" href="#fieldSet"></a>6.6.1 The FieldSet</h3></div></div></div><p>When working with flat files in Spring Batch, regardless of
|
|
whether it is for input or output, one of the most important classes is
|
|
the <code class="classname">FieldSet</code>. Many architectures and libraries
|
|
contain abstractions for helping you read in from a file, but they
|
|
usually return a String or an array of Strings. This really only gets
|
|
you halfway there. A <code class="classname">FieldSet</code> is Spring Batch’s
|
|
abstraction for enabling the binding of fields from a file resource. It
|
|
allows developers to work with file input in much the same way as they
|
|
would work with database input. A <code class="classname">FieldSet</code> is
|
|
conceptually very similar to a Jdbc <code class="classname">ResultSet</code>.
|
|
FieldSets only require one argument, a <code class="classname">String</code>
|
|
array of tokens. Optionally, you can also configure in the names of the
|
|
fields so that the fields may be accessed either by index or name as
|
|
patterned after <code class="classname">ResultSet</code>:</p><pre class="programlisting">String[] tokens = <span class="hl-keyword">new</span> String[]{<span class="hl-string">"foo"</span>, <span class="hl-string">"1"</span>, <span class="hl-string">"true"</span>};
|
|
FieldSet fs = <span class="hl-keyword">new</span> DefaultFieldSet(tokens);
|
|
String name = fs.readString(<span class="hl-number">0</span>);
|
|
<span class="hl-keyword">int</span> value = fs.readInt(<span class="hl-number">1</span>);
|
|
<span class="hl-keyword">boolean</span> booleanValue = fs.readBoolean(<span class="hl-number">2</span>);</pre><p>There are many more options on the <code class="classname">FieldSet</code>
|
|
interface, such as <code class="classname">Date</code>, long,
|
|
<code class="classname">BigDecimal</code>, etc. The biggest advantage of the
|
|
<code class="classname">FieldSet</code> is that it provides consistent parsing
|
|
of flat file input. Rather than each batch job parsing differently in
|
|
potentially unexpected ways, it can be consistent, both when handling
|
|
errors caused by a format exception, or when doing simple data
|
|
conversions.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="flatFileItemReader" href="#flatFileItemReader"></a>6.6.2 FlatFileItemReader</h3></div></div></div><p>A flat file is any type of file that contains at most
|
|
two-dimensional (tabular) data. Reading flat files in the Spring Batch
|
|
framework is facilitated by the class
|
|
<code class="classname">FlatFileItemReader</code>, which provides basic
|
|
functionality for reading and parsing flat files. The two most important
|
|
required dependencies of <code class="classname">FlatFileItemReader</code> are
|
|
<code class="classname">Resource</code> and <code class="classname">LineMapper.
|
|
</code>The <code class="classname">LineMapper</code> interface will be
|
|
explored more in the next sections. The resource property represents a
|
|
Spring Core <code class="classname">Resource</code>. Documentation explaining
|
|
how to create beans of this type can be found in <a class="ulink" href="http://docs.spring.io/spring/docs/3.2.x/spring-framework-reference/html/resources.html" target="_top"><em class="citetitle">Spring
|
|
Framework, Chapter 5.Resources</em></a>. Therefore, this
|
|
guide will not go into the details of creating
|
|
<code class="classname">Resource</code> objects. However, a simple example of a
|
|
file system resource can be found below:
|
|
</p><pre class="programlisting">Resource resource = <span class="hl-keyword">new</span> FileSystemResource(<span class="hl-string">"resources/trades.csv"</span>);</pre><p>In complex batch environments the directory structures are often
|
|
managed by the EAI infrastructure where drop zones for external
|
|
interfaces are established for moving files from ftp locations to batch
|
|
processing locations and vice versa. File moving utilities are beyond
|
|
the scope of the spring batch architecture but it is not unusual for
|
|
batch job streams to include file moving utilities as steps in the job
|
|
stream. It is sufficient that the batch architecture only needs to know
|
|
how to locate the files to be processed. Spring Batch begins the process
|
|
of feeding the data into the pipe from this starting point. However,
|
|
<a class="ulink" href="http://projects.spring.io/spring-integration/" target="_top"><em class="citetitle">Spring
|
|
Integration</em></a> provides many of these types of
|
|
services.</p><p>The other properties in <code class="classname">FlatFileItemReader</code>
|
|
allow you to further specify how your data will be interpreted: </p><div class="table"><a name="d5e2230" href="#d5e2230"></a><p class="title"><b>Table 6.1. FlatFileItemReader Properties</b></p><div class="table-contents"><table summary="FlatFileItemReader Properties" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col align="center"><col><col></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="center">Property</th><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="center">Type</th><th style="border-bottom: 0.5pt solid ; " align="center">Description</th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">comments</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">String[]</td><td style="border-bottom: 0.5pt solid ; " align="left">Specifies line prefixes that indicate
|
|
comment rows</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">encoding</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">String</td><td style="border-bottom: 0.5pt solid ; " align="left">Specifies what text encoding to use -
|
|
default is "ISO-8859-1"</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">lineMapper</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">LineMapper</td><td style="border-bottom: 0.5pt solid ; " align="left">Converts a <code class="classname">String</code>
|
|
to an <code class="classname">Object</code> representing the
|
|
item.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">linesToSkip</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">int</td><td style="border-bottom: 0.5pt solid ; " align="left">Number of lines to ignore at the top of
|
|
the file</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">recordSeparatorPolicy</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">RecordSeparatorPolicy</td><td style="border-bottom: 0.5pt solid ; " align="left">Used to determine where the line endings
|
|
are and do things like continue over a line ending if inside a
|
|
quoted string.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">resource</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">Resource</td><td style="border-bottom: 0.5pt solid ; " align="left">The resource from which to read.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">skippedLinesCallback</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">LineCallbackHandler</td><td style="border-bottom: 0.5pt solid ; " align="left">Interface which passes the raw line
|
|
content of the lines in the file to be skipped. If linesToSkip
|
|
is set to 2, then this interface will be called twice.</td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">strict</td><td style="border-right: 0.5pt solid ; " align="left">boolean</td><td style="" align="left">In strict mode, the reader will throw an
|
|
exception on ExecutionContext if the input resource does not
|
|
exist.</td></tr></tbody></table></div></div><p><br class="table-break"></p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="lineMapper" href="#lineMapper"></a>LineMapper</h4></div></div></div><p>As with <code class="classname">RowMapper</code>, which takes a low
|
|
level construct such as <code class="classname">ResultSet</code> and returns
|
|
an <code class="classname">Object</code>, flat file processing requires the
|
|
same construct to convert a <code class="classname">String</code> line into an
|
|
<code class="classname">Object</code>:
|
|
</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> LineMapper<T> {
|
|
|
|
T mapLine(String line, <span class="hl-keyword">int</span> lineNumber) <span class="hl-keyword">throws</span> Exception;
|
|
|
|
}</pre><p>The basic contract is that, given the current line and the line
|
|
number with which it is associated, the mapper should return a
|
|
resulting domain object. This is similar to
|
|
<code class="classname">RowMapper</code> in that each line is associated with
|
|
its line number, just as each row in a
|
|
<code class="classname">ResultSet</code> is tied to its row number. This
|
|
allows the line number to be tied to the resulting domain object for
|
|
identity comparison or for more informative logging. However, unlike
|
|
<code class="classname">RowMapper</code>, the
|
|
<code class="classname">LineMapper</code> is given a raw line which, as
|
|
discussed above, only gets you halfway there. The line must be
|
|
tokenized into a <code class="classname">FieldSet</code>, which can then be
|
|
mapped to an object, as described below.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="lineTokenizer" href="#lineTokenizer"></a>LineTokenizer</h4></div></div></div><p>An abstraction for turning a line of input into a line into a
|
|
<code class="classname">FieldSet</code> is necessary because there can be many
|
|
formats of flat file data that need to be converted to a
|
|
<code class="classname">FieldSet</code>. In Spring Batch, this interface is
|
|
the <code class="classname">LineTokenizer</code>:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> LineTokenizer {
|
|
|
|
FieldSet tokenize(String line);
|
|
|
|
}</pre><p>The contract of a <code class="classname">LineTokenizer</code> is such
|
|
that, given a line of input (in theory the
|
|
<code class="classname">String</code> could encompass more than one line), a
|
|
<code class="classname">FieldSet</code> representing the line will be
|
|
returned. This <code class="classname">FieldSet</code> can then be passed to a
|
|
<code class="classname">FieldSetMapper</code>. Spring Batch contains the
|
|
following <code class="classname">LineTokenizer</code> implementations:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="classname">DelmitedLineTokenizer</code> - Used for
|
|
files where fields in a record are separated by a delimiter. The
|
|
most common delimiter is a comma, but pipes or semicolons are
|
|
often used as well.</p></li><li class="listitem"><p><code class="classname">FixedLengthTokenizer</code> - Used for files
|
|
where fields in a record are each a 'fixed width'. The width of
|
|
each field must be defined for each record type.</p></li><li class="listitem"><p><code class="classname">PatternMatchingCompositeLineTokenizer</code>
|
|
- Determines which among a list of
|
|
<code class="classname">LineTokenizer</code>s should be used on a
|
|
particular line by checking against a pattern.</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="fieldSetMapper" href="#fieldSetMapper"></a>FieldSetMapper</h4></div></div></div><p>The <code class="classname">FieldSetMapper</code> interface defines a
|
|
single method, <code class="methodname">mapFieldSet</code>, which takes a
|
|
<code class="classname">FieldSet</code> object and maps its contents to an
|
|
object. This object may be a custom DTO, a domain object, or a simple
|
|
array, depending on the needs of the job. The
|
|
<code class="classname">FieldSetMapper</code> is used in conjunction with the
|
|
<code class="classname">LineTokenizer</code> to translate a line of data from
|
|
a resource into an object of the desired type:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> FieldSetMapper<T> {
|
|
|
|
T mapFieldSet(FieldSet fieldSet);
|
|
|
|
}</pre><p>The pattern used is the same as the
|
|
<code class="classname">RowMapper</code> used by
|
|
<code class="classname">JdbcTemplate</code>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="defaultLineMapper" href="#defaultLineMapper"></a>DefaultLineMapper</h4></div></div></div><p>Now that the basic interfaces for reading in flat files have
|
|
been defined, it becomes clear that three basic steps are
|
|
required:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>Read one line from the file.</p></li><li class="listitem"><p>Pass the string line into the
|
|
<code class="methodname">LineTokenizer#tokenize</code>() method, in
|
|
order to retrieve a <code class="classname">FieldSet</code>.</p></li><li class="listitem"><p>Pass the <code class="classname">FieldSet</code> returned from
|
|
tokenizing to a <code class="classname">FieldSetMapper</code>, returning
|
|
the result from the <code class="methodname">ItemReader#read</code>()
|
|
method.</p></li></ol></div><p>The two interfaces described above represent two separate tasks:
|
|
converting a line into a <code class="classname">FieldSet</code>, and mapping
|
|
a <code class="classname">FieldSet</code> to a domain object. Because the
|
|
input of a <code class="classname">LineTokenizer</code> matches the input of
|
|
the <code class="classname">LineMapper</code> (a line), and the output of a
|
|
<code class="classname">FieldSetMapper</code> matches the output of the
|
|
<code class="classname">LineMapper</code>, a default implementation that uses
|
|
both a <code class="classname">LineTokenizer</code> and
|
|
<code class="classname">FieldSetMapper</code> is provided. The
|
|
<code class="classname">DefaultLineMapper</code> represents the behavior most
|
|
users will need:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> DefaultLineMapper<T> <span class="hl-keyword">implements</span> LineMapper<T>, InitializingBean {
|
|
|
|
<span class="hl-keyword">private</span> LineTokenizer tokenizer;
|
|
|
|
<span class="hl-keyword">private</span> FieldSetMapper<T> fieldSetMapper;
|
|
|
|
<span class="hl-keyword">public</span> T mapLine(String line, <span class="hl-keyword">int</span> lineNumber) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="bold"><strong>return fieldSetMapper.mapFieldSet(tokenizer.tokenize(line));</strong></span>
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> setLineTokenizer(LineTokenizer tokenizer) {
|
|
<span class="hl-keyword">this</span>.tokenizer = tokenizer;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> setFieldSetMapper(FieldSetMapper<T> fieldSetMapper) {
|
|
<span class="hl-keyword">this</span>.fieldSetMapper = fieldSetMapper;
|
|
}
|
|
}</pre><p>The above functionality is provided in a default implementation,
|
|
rather than being built into the reader itself (as was done in
|
|
previous versions of the framework) in order to allow users greater
|
|
flexibility in controlling the parsing process, especially if access
|
|
to the raw line is needed.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="simpleDelimitedFileReadingExample" href="#simpleDelimitedFileReadingExample"></a>Simple Delimited File Reading Example</h4></div></div></div><p>The following example will be used to illustrate this using an
|
|
actual domain scenario. This particular batch job reads in football
|
|
players from the following file:
|
|
</p><pre class="programlisting">ID,lastName,firstName,position,birthYear,debutYear
|
|
"AbduKa00,Abdul-Jabbar,Karim,rb,1974,1996",
|
|
"AbduRa00,Abdullah,Rabih,rb,1975,1999",
|
|
"AberWa00,Abercrombie,Walter,rb,1959,1982",
|
|
"AbraDa00,Abramowicz,Danny,wr,1945,1967",
|
|
"AdamBo00,Adams,Bob,te,1946,1969",
|
|
"AdamCh00,Adams,Charlie,wr,1979,2003" </pre><p>The contents of this file will be mapped to the following
|
|
<code class="classname">Player</code> domain object:
|
|
</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> Player <span class="hl-keyword">implements</span> Serializable {
|
|
|
|
<span class="hl-keyword">private</span> String ID;
|
|
<span class="hl-keyword">private</span> String lastName;
|
|
<span class="hl-keyword">private</span> String firstName;
|
|
<span class="hl-keyword">private</span> String position;
|
|
<span class="hl-keyword">private</span> <span class="hl-keyword">int</span> birthYear;
|
|
<span class="hl-keyword">private</span> <span class="hl-keyword">int</span> debutYear;
|
|
|
|
<span class="hl-keyword">public</span> String toString() {
|
|
<span class="hl-keyword">return</span> <span class="hl-string">"PLAYER:ID="</span> + ID + <span class="hl-string">",Last Name="</span> + lastName +
|
|
<span class="hl-string">",First Name="</span> + firstName + <span class="hl-string">",Position="</span> + position +
|
|
<span class="hl-string">",Birth Year="</span> + birthYear + <span class="hl-string">",DebutYear="</span> +
|
|
debutYear;
|
|
}
|
|
|
|
<span class="hl-comment">// setters and getters...</span>
|
|
}</pre><p>In order to map a <code class="classname">FieldSet</code> into a
|
|
<code class="classname">Player</code> object, a
|
|
<code class="classname">FieldSetMapper</code> that returns players needs to be
|
|
defined:</p><pre class="programlisting"><span class="hl-keyword">protected</span> <span class="hl-keyword">static</span> <span class="hl-keyword">class</span> PlayerFieldSetMapper <span class="hl-keyword">implements</span> FieldSetMapper<Player> {
|
|
<span class="hl-keyword">public</span> Player mapFieldSet(FieldSet fieldSet) {
|
|
Player player = <span class="hl-keyword">new</span> Player();
|
|
|
|
player.setID(fieldSet.readString(<span class="hl-number">0</span>));
|
|
player.setLastName(fieldSet.readString(<span class="hl-number">1</span>));
|
|
player.setFirstName(fieldSet.readString(<span class="hl-number">2</span>));
|
|
player.setPosition(fieldSet.readString(<span class="hl-number">3</span>));
|
|
player.setBirthYear(fieldSet.readInt(<span class="hl-number">4</span>));
|
|
player.setDebutYear(fieldSet.readInt(<span class="hl-number">5</span>));
|
|
|
|
<span class="hl-keyword">return</span> player;
|
|
}
|
|
}</pre><p>The file can then be read by correctly constructing a
|
|
<code class="classname">FlatFileItemReader</code> and calling
|
|
<code class="methodname">read</code>:</p><pre class="programlisting">FlatFileItemReader<Player> itemReader = <span class="hl-keyword">new</span> FlatFileItemReader<Player>();
|
|
itemReader.setResource(<span class="hl-keyword">new</span> FileSystemResource(<span class="hl-string">"resources/players.csv"</span>));
|
|
<span class="hl-comment">//DelimitedLineTokenizer defaults to comma as its delimiter</span>
|
|
DefaultLineMapper<Player> lineMapper = <span class="hl-keyword">new</span> DefaultLineMapper<Player>();
|
|
lineMapper.setLineTokenizer(<span class="hl-keyword">new</span> DelimitedLineTokenizer());
|
|
lineMapper.setFieldSetMapper(<span class="hl-keyword">new</span> PlayerFieldSetMapper());
|
|
itemReader.setLineMapper(lineMapper);
|
|
itemReader.open(<span class="hl-keyword">new</span> ExecutionContext());
|
|
Player player = itemReader.read();</pre><p>Each call to <code class="methodname">read</code> will return a new
|
|
Player object from each line in the file. When the end of the file is
|
|
reached, null will be returned.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="mappingFieldsByName" href="#mappingFieldsByName"></a>Mapping Fields by Name</h4></div></div></div><p>There is one additional piece of functionality that is allowed
|
|
by both <code class="classname">DelimitedLineTokenizer</code> and
|
|
<code class="classname">FixedLengthTokenizer</code> that is similar in
|
|
function to a Jdbc <code class="classname">ResultSet</code>. The names of the
|
|
fields can be injected into either of these
|
|
<code class="classname">LineTokenizer</code> implementations to increase the
|
|
readability of the mapping function. First, the column names of all
|
|
fields in the flat file are injected into the tokenizer:</p><pre class="programlisting">tokenizer.setNames(<span class="hl-keyword">new</span> String[] {<span class="hl-string">"ID"</span>, <span class="hl-string">"lastName"</span>,<span class="hl-string">"firstName"</span>,<span class="hl-string">"position"</span>,<span class="hl-string">"birthYear"</span>,<span class="hl-string">"debutYear"</span>}); </pre><p>A <code class="classname">FieldSetMapper</code> can use this information
|
|
as follows:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> PlayerMapper <span class="hl-keyword">implements</span> FieldSetMapper<Player> {
|
|
<span class="hl-keyword">public</span> Player mapFieldSet(FieldSet fs) {
|
|
|
|
<span class="hl-keyword">if</span>(fs == null){
|
|
<span class="hl-keyword">return</span> null;
|
|
}
|
|
|
|
Player player = <span class="hl-keyword">new</span> Player();
|
|
player.setID(fs.readString(<span class="bold"><strong>"ID"</strong></span>));
|
|
player.setLastName(fs.readString(<span class="bold"><strong>"lastName"</strong></span>));
|
|
player.setFirstName(fs.readString(<span class="bold"><strong>"firstName"</strong></span>));
|
|
player.setPosition(fs.readString(<span class="bold"><strong>"position"</strong></span>));
|
|
player.setDebutYear(fs.readInt(<span class="bold"><strong>"debutYear"</strong></span>));
|
|
player.setBirthYear(fs.readInt(<span class="bold"><strong>"birthYear"</strong></span>));
|
|
|
|
<span class="hl-keyword">return</span> player;
|
|
}
|
|
}</pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="beanWrapperFieldSetMapper" href="#beanWrapperFieldSetMapper"></a>Automapping FieldSets to Domain Objects</h4></div></div></div><p>For many, having to write a specific
|
|
<code class="classname">FieldSetMapper</code> is equally as cumbersome as
|
|
writing a specific <code class="classname">RowMapper</code> for a
|
|
<code class="classname">JdbcTemplate</code>. Spring Batch makes this easier by
|
|
providing a <code class="classname">FieldSetMapper</code> that automatically
|
|
maps fields by matching a field name with a setter on the object using
|
|
the JavaBean specification. Again using the football example, the
|
|
<code class="classname">BeanWrapperFieldSetMapper</code> configuration looks
|
|
like the following:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"fieldSetMapper"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"prototypeBeanName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"player"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"player"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.sample.domain.Player"</span>
|
|
<span class="hl-attribute">scope</span>=<span class="hl-value">"prototype"</span><span class="hl-tag"> /></span></pre><p>For each entry in the <code class="classname">FieldSet</code>, the
|
|
mapper will look for a corresponding setter on a new instance of the
|
|
<code class="classname">Player</code> object (for this reason, prototype scope
|
|
is required) in the same way the Spring container will look for
|
|
setters matching a property name. Each available field in the
|
|
<code class="classname">FieldSet</code> will be mapped, and the resultant
|
|
<code class="classname">Player</code> object will be returned, with no code
|
|
required.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="fixedLengthFileFormats" href="#fixedLengthFileFormats"></a>Fixed Length File Formats</h4></div></div></div><p>So far only delimited files have been discussed in much detail,
|
|
however, they represent only half of the file reading picture. Many
|
|
organizations that use flat files use fixed length formats. An example
|
|
fixed length file is below:</p><pre class="programlisting">UK21341EAH4121131.11customer1
|
|
UK21341EAH4221232.11customer2
|
|
UK21341EAH4321333.11customer3
|
|
UK21341EAH4421434.11customer4
|
|
UK21341EAH4521535.11customer5</pre><p>While this looks like one large field, it actually represent 4
|
|
distinct fields:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>ISIN: Unique identifier for the item being order - 12
|
|
characters long.</p></li><li class="listitem"><p>Quantity: Number of this item being ordered - 3 characters
|
|
long.</p></li><li class="listitem"><p>Price: Price of the item - 5 characters long.</p></li><li class="listitem"><p>Customer: Id of the customer ordering the item - 9
|
|
characters long.</p></li></ol></div><p>When configuring the
|
|
<code class="classname">FixedLengthLineTokenizer</code>, each of these lengths
|
|
must be provided in the form of ranges:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"fixedLengthLineTokenizer"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.io.file.transform.FixedLengthTokenizer"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"names"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"ISIN,Quantity,Price,Customer"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"columns"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"1-12, 13-15, 16-20, 21-29"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><p>Because the <code class="classname">FixedLengthLineTokenizer</code> uses
|
|
the same <code class="classname">LineTokenizer</code> interface as discussed
|
|
above, it will return the same <code class="classname">FieldSet</code> as if a
|
|
delimiter had been used. This allows the same approaches to be used in
|
|
handling its output, such as using the
|
|
<code class="classname">BeanWrapperFieldSetMapper</code>.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>Supporting the above syntax for ranges requires that a
|
|
specialized property editor,
|
|
<code class="classname">RangeArrayPropertyEditor</code>, be configured in
|
|
the <code class="classname">ApplicationContext</code>. However, this bean
|
|
is automatically declared in an
|
|
<code class="classname">ApplicationContext</code> where the batch
|
|
namespace is used.</p></td></tr></table></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="prefixMatchingLineMapper" href="#prefixMatchingLineMapper"></a>Multiple Record Types within a Single File</h4></div></div></div><p>All of the file reading examples up to this point have all made
|
|
a key assumption for simplicity's sake: all of the records in a file
|
|
have the same format. However, this may not always be the case. It is
|
|
very common that a file might have records with different formats that
|
|
need to be tokenized differently and mapped to different objects. The
|
|
following excerpt from a file illustrates this:</p><pre class="programlisting">USER;Smith;Peter;;T;20014539;F
|
|
LINEA;1044391041ABC037.49G201XX1383.12H
|
|
LINEB;2134776319DEF422.99M005LI</pre><p>In this file we have three types of records, "USER", "LINEA",
|
|
and "LINEB". A "USER" line corresponds to a User object. "LINEA" and
|
|
"LINEB" both correspond to Line objects, though a "LINEA" has more
|
|
information than a "LINEB".</p><p>The <code class="classname">ItemReader </code>will read each line
|
|
individually, but we must specify different
|
|
<code class="classname">LineTokenizer</code> and
|
|
<code class="classname">FieldSetMapper</code> objects so that the
|
|
<code class="classname">ItemWriter</code> will receive the correct items. The
|
|
<code class="classname">PatternMatchingCompositeLineMapper</code> makes this
|
|
easy by allowing maps of patterns to
|
|
<code class="classname">LineTokenizer</code>s and patterns to
|
|
<code class="classname">FieldSetMapper</code>s to be configured:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"orderFileLineMapper"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...PatternMatchingCompositeLineMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"tokenizers"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><map></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"USER*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"userTokenizer"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"LINEA*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"lineATokenizer"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"LINEB*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"lineBTokenizer"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></map></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"fieldSetMappers"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><map></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"USER*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"userFieldSetMapper"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"LINE*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"lineFieldSetMapper"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></map></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>In this example, "LINEA" and "LINEB" have separate
|
|
<code class="classname">LineTokenizer</code>s but they both use the same
|
|
<code class="classname">FieldSetMapper</code>.</p><p>The <code class="classname">PatternMatchingCompositeLineMapper</code>
|
|
makes use of the <code class="classname">PatternMatcher</code>'s
|
|
<code class="classname">match</code> method in order to select the correct
|
|
delegate for each line. The <code class="classname">PatternMatcher</code>
|
|
allows for two wildcard characters with special meaning: the question
|
|
mark ("?") will match exactly one character, while the asterisk ("*")
|
|
will match zero or more characters. Note that in the configuration
|
|
above, all patterns end with an asterisk, making them effectively
|
|
prefixes to lines. The <code class="classname">PatternMatcher</code> will
|
|
always match the most specific pattern possible, regardless of the
|
|
order in the configuration. So if "LINE*" and "LINEA*" were both
|
|
listed as patterns, "LINEA" would match pattern "LINEA*", while
|
|
"LINEB" would match pattern "LINE*". Additionally, a single asterisk
|
|
("*") can serve as a default by matching any line not matched by any
|
|
other pattern.</p><pre class="programlisting"><span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"defaultLineTokenizer"</span><span class="hl-tag"> /></span></pre><p>There is also a
|
|
<code class="classname">PatternMatchingCompositeLineTokenizer</code> that can
|
|
be used for tokenization alone.</p><p>It is also common for a flat file to contain records that each
|
|
span multiple lines. To handle this situation, a more complex strategy
|
|
is required. A demonstration of this common pattern can be found in
|
|
<a class="xref" href="patterns.html#multiLineRecords" title="11.5 Multi-Line Records">Section 11.5, “Multi-Line Records”</a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="exceptionHandlingInFlatFiles" href="#exceptionHandlingInFlatFiles"></a>Exception Handling in Flat Files</h4></div></div></div><p>There are many scenarios when tokenizing a line may cause
|
|
exceptions to be thrown. Many flat files are imperfect and contain
|
|
records that aren't formatted correctly. Many users choose to skip
|
|
these erroneous lines, logging out the issue, original line, and line
|
|
number. These logs can later be inspected manually or by another batch
|
|
job. For this reason, Spring Batch provides a hierarchy of exceptions
|
|
for handling parse exceptions:
|
|
<code class="classname">FlatFileParseException</code> and
|
|
<code class="classname">FlatFileFormatException</code>.
|
|
<code class="classname">FlatFileParseException</code> is thrown by the
|
|
<code class="classname">FlatFileItemReader</code> when any errors are
|
|
encountered while trying to read a file.
|
|
<code class="classname">FlatFileFormatException</code> is thrown by
|
|
implementations of the <code class="classname">LineTokenizer</code> interface,
|
|
and indicates a more specific error encountered while
|
|
tokenizing.</p><div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="incorrectTokenCountException" href="#incorrectTokenCountException"></a>IncorrectTokenCountException</h5></div></div></div><p>Both <code class="classname">DelimitedLineTokenizer</code> and
|
|
<code class="classname">FixedLengthLineTokenizer</code> have the ability to
|
|
specify column names that can be used for creating a
|
|
<code class="classname">FieldSet</code>. However, if the number of column
|
|
names doesn't match the number of columns found while tokenizing a
|
|
line the <code class="classname">FieldSet</code> can't be created, and a
|
|
<code class="classname">IncorrectTokenCountException</code> is thrown, which
|
|
contains the number of tokens encountered, and the number
|
|
expected:</p><pre class="programlisting">tokenizer.setNames(<span class="hl-keyword">new</span> String[] {<span class="hl-string">"A"</span>, <span class="hl-string">"B"</span>, <span class="hl-string">"C"</span>, <span class="hl-string">"D"</span>});
|
|
|
|
<span class="hl-keyword">try</span> {
|
|
tokenizer.tokenize(<span class="hl-string">"a,b,c"</span>);
|
|
}
|
|
<span class="hl-keyword">catch</span>(IncorrectTokenCountException e){
|
|
assertEquals(<span class="hl-number">4</span>, e.getExpectedCount());
|
|
assertEquals(<span class="hl-number">3</span>, e.getActualCount());
|
|
}</pre><p>Because the tokenizer was configured with 4 column names, but
|
|
only 3 tokens were found in the file, an
|
|
<code class="classname">IncorrectTokenCountException</code> was
|
|
thrown.</p></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="incorrectLineLengthException" href="#incorrectLineLengthException"></a>IncorrectLineLengthException</h5></div></div></div><p>Files formatted in a fixed length format have additional
|
|
requirements when parsing because, unlike a delimited format, each
|
|
column must strictly adhere to its predefined width. If the total
|
|
line length doesn't add up to the widest value of this column, an
|
|
exception is thrown:</p><pre class="programlisting">tokenizer.setColumns(<span class="hl-keyword">new</span> Range[] { <span class="hl-keyword">new</span> Range(<span class="hl-number">1</span>, <span class="hl-number">5</span>),
|
|
<span class="hl-keyword">new</span> Range(<span class="hl-number">6</span>, <span class="hl-number">10</span>),
|
|
<span class="hl-keyword">new</span> Range(<span class="hl-number">11</span>, <span class="hl-number">15</span>) });
|
|
<span class="hl-keyword">try</span> {
|
|
tokenizer.tokenize(<span class="hl-string">"12345"</span>);
|
|
fail(<span class="hl-string">"Expected IncorrectLineLengthException"</span>);
|
|
}
|
|
<span class="hl-keyword">catch</span> (IncorrectLineLengthException ex) {
|
|
assertEquals(<span class="hl-number">15</span>, ex.getExpectedLength());
|
|
assertEquals(<span class="hl-number">5</span>, ex.getActualLength());
|
|
}</pre><p>The configured ranges for the tokenizer above are: 1-5, 6-10,
|
|
and 11-15, thus the total length of the line expected is 15.
|
|
However, in this case a line of length 5 was passed in, causing an
|
|
<code class="classname">IncorrectLineLengthException</code> to be thrown.
|
|
Throwing an exception here rather than only mapping the first column
|
|
allows the processing of the line to fail earlier, and with more
|
|
information than it would if it failed while trying to read in
|
|
column 2 in a <code class="classname">FieldSetMapper</code>. However, there
|
|
are scenarios where the length of the line isn't always constant.
|
|
For this reason, validation of line length can be turned off via the
|
|
'strict' property:</p><pre class="programlisting">tokenizer.setColumns(<span class="hl-keyword">new</span> Range[] { <span class="hl-keyword">new</span> Range(<span class="hl-number">1</span>, <span class="hl-number">5</span>), <span class="hl-keyword">new</span> Range(<span class="hl-number">6</span>, <span class="hl-number">10</span>) });
|
|
<span class="bold"><strong>tokenizer.setStrict(false);</strong></span>
|
|
FieldSet tokens = tokenizer.tokenize(<span class="hl-string">"12345"</span>);
|
|
assertEquals(<span class="hl-string">"12345"</span>, tokens.readString(<span class="hl-number">0</span>));
|
|
assertEquals(<span class="hl-string">""</span>, tokens.readString(<span class="hl-number">1</span>));</pre><p>The above example is almost identical to the one before it,
|
|
except that tokenizer.setStrict(false) was called. This setting
|
|
tells the tokenizer to not enforce line lengths when tokenizing the
|
|
line. A <code class="classname">FieldSet</code> is now correctly created and
|
|
returned. However, it will only contain empty tokens for the
|
|
remaining values.</p></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="flatFileItemWriter" href="#flatFileItemWriter"></a>6.6.3 FlatFileItemWriter</h3></div></div></div><p>Writing out to flat files has the same problems and issues that
|
|
reading in from a file must overcome. A step must be able to write out
|
|
in either delimited or fixed length formats in a transactional
|
|
manner.</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="lineAggregator" href="#lineAggregator"></a>LineAggregator</h4></div></div></div><p>Just as the <code class="classname">LineTokenizer</code> interface is
|
|
necessary to take an item and turn it into a
|
|
<code class="classname">String</code>, file writing must have a way to
|
|
aggregate multiple fields into a single string for writing to a file.
|
|
In Spring Batch this is the
|
|
<code class="classname">LineAggregator</code>:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> LineAggregator<T> {
|
|
|
|
<span class="hl-keyword">public</span> String aggregate(T item);
|
|
|
|
}</pre><p>The <code class="classname">LineAggregator</code> is the opposite of a
|
|
<code class="classname">LineTokenizer</code>.
|
|
<code class="classname">LineTokenizer</code> takes a
|
|
<code class="classname">String</code> and returns a
|
|
<code class="classname">FieldSet</code>, whereas
|
|
<code class="classname">LineAggregator</code> takes an
|
|
<code class="classname">item</code> and returns a
|
|
<code class="classname">String</code>.</p><div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="PassThroughLineAggregator" href="#PassThroughLineAggregator"></a>PassThroughLineAggregator</h5></div></div></div><p>The most basic implementation of the LineAggregator interface
|
|
is the <code class="classname">PassThroughLineAggregator</code>, which
|
|
simply assumes that the object is already a string, or that its
|
|
string representation is acceptable for writing:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> PassThroughLineAggregator<T> <span class="hl-keyword">implements</span> LineAggregator<T> {
|
|
|
|
<span class="hl-keyword">public</span> String aggregate(T item) {
|
|
<span class="hl-keyword">return</span> item.toString();
|
|
}
|
|
}</pre><p>The above implementation is useful if direct control of
|
|
creating the string is required, but the advantages of a
|
|
<code class="classname">FlatFileItemWriter</code>, such as transaction and
|
|
restart support, are necessary.</p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="SimplifiedFileWritingExample" href="#SimplifiedFileWritingExample"></a>Simplified File Writing Example</h4></div></div></div><p>Now that the <code class="classname">LineAggregator</code> interface and
|
|
its most basic implementation,
|
|
<code class="classname">PassThroughLineAggregator</code>, have been defined,
|
|
the basic flow of writing can be explained:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>The object to be written is passed to the
|
|
<code class="classname">LineAggregator</code> in order to obtain a
|
|
<code class="classname">String</code>.</p></li><li class="listitem"><p>The returned <code class="classname">String</code> is written to the
|
|
configured file.</p></li></ol></div><p>The following excerpt from the
|
|
<code class="classname">FlatFileItemWriter</code> expresses this in
|
|
code:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(T item) <span class="hl-keyword">throws</span> Exception {
|
|
write(lineAggregator.aggregate(item) + LINE_SEPARATOR);
|
|
}</pre><p>A simple configuration would look like the following:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...FlatFileItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"file:target/test-outputs/output.txt"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"lineAggregator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...PassThroughLineAggregator"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="FieldExtractor" href="#FieldExtractor"></a>FieldExtractor</h4></div></div></div><p>The above example may be useful for the most basic uses of a
|
|
writing to a file. However, most users of the
|
|
<code class="classname">FlatFileItemWriter</code> will have a domain object
|
|
that needs to be written out, and thus must be converted into a line.
|
|
In file reading, the following was required:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>Read one line from the file.</p></li><li class="listitem"><p>Pass the string line into the
|
|
<code class="methodname">LineTokenizer#tokenize</code>() method, in
|
|
order to retrieve a <code class="classname">FieldSet</code></p></li><li class="listitem"><p>Pass the <code class="classname">FieldSet</code> returned from
|
|
tokenizing to a <code class="classname">FieldSetMapper</code>, returning
|
|
the result from the <code class="methodname">ItemReader#read</code>()
|
|
method</p></li></ol></div><p>File writing has similar, but inverse steps:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>Pass the item to be written to the writer</p></li><li class="listitem"><p>convert the fields on the item into an array</p></li><li class="listitem"><p>aggregate the resulting array into a line</p></li></ol></div><p>Because there is no way for the framework to know which fields
|
|
from the object need to be written out, a
|
|
<code class="classname">FieldExtractor</code> must be written to accomplish
|
|
the task of turning the item into an array:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> FieldExtractor<T> {
|
|
|
|
Object[] extract(T item);
|
|
|
|
}</pre><p>Implementations of the <code class="classname">FieldExtractor</code>
|
|
interface should create an array from the fields of the provided
|
|
object, which can then be written out with a delimiter between the
|
|
elements, or as part of a field-width line.</p><div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="PassThroughFieldExtractor" href="#PassThroughFieldExtractor"></a>PassThroughFieldExtractor</h5></div></div></div><p>There are many cases where a collection, such as an array,
|
|
<code class="classname">Collection</code>, or
|
|
<code class="classname">FieldSet</code>, needs to be written out.
|
|
"Extracting" an array from a one of these collection types is very
|
|
straightforward: simply convert the collection to an array.
|
|
Therefore, the <code class="classname">PassThroughFieldExtractor</code>
|
|
should be used in this scenario. It should be noted, that if the
|
|
object passed in is not a type of collection, then the
|
|
<code class="classname">PassThroughFieldExtractor</code> will return an
|
|
array containing solely the item to be extracted.</p></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="BeanWrapperFieldExtractor" href="#BeanWrapperFieldExtractor"></a>BeanWrapperFieldExtractor</h5></div></div></div><p>As with the <code class="classname">BeanWrapperFieldSetMapper</code>
|
|
described in the file reading section, it is often preferable to
|
|
configure how to convert a domain object to an object array, rather
|
|
than writing the conversion yourself. The
|
|
<code class="classname">BeanWrapperFieldExtractor</code> provides just this
|
|
type of functionality:</p><pre class="programlisting">BeanWrapperFieldExtractor<Name> extractor = <span class="hl-keyword">new</span> BeanWrapperFieldExtractor<Name>();
|
|
extractor.setNames(<span class="hl-keyword">new</span> String[] { <span class="hl-string">"first"</span>, <span class="hl-string">"last"</span>, <span class="hl-string">"born"</span> });
|
|
|
|
String first = <span class="hl-string">"Alan"</span>;
|
|
String last = <span class="hl-string">"Turing"</span>;
|
|
<span class="hl-keyword">int</span> born = <span class="hl-number">1912</span>;
|
|
|
|
Name n = <span class="hl-keyword">new</span> Name(first, last, born);
|
|
Object[] values = extractor.extract(n);
|
|
|
|
assertEquals(first, values[<span class="hl-number">0</span>]);
|
|
assertEquals(last, values[<span class="hl-number">1</span>]);
|
|
assertEquals(born, values[<span class="hl-number">2</span>]);</pre><p>This extractor implementation has only one required property,
|
|
the names of the fields to map. Just as the
|
|
<code class="classname">BeanWrapperFieldSetMapper</code> needs field names
|
|
to map fields on the <code class="classname">FieldSet</code> to setters on
|
|
the provided object, the
|
|
<code class="classname">BeanWrapperFieldExtractor</code> needs names to map
|
|
to getters for creating an object array. It is worth noting that the
|
|
order of the names determines the order of the fields within the
|
|
array.</p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="delimitedFileWritingExample" href="#delimitedFileWritingExample"></a>Delimited File Writing Example</h4></div></div></div><p>The most basic flat file format is one in which all fields are
|
|
separated by a delimiter. This can be accomplished using a
|
|
<code class="classname">DelimitedLineAggregator</code>. The example below
|
|
writes out a simple domain object that represents a credit to a
|
|
customer account:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> CustomerCredit {
|
|
|
|
<span class="hl-keyword">private</span> <span class="hl-keyword">int</span> id;
|
|
<span class="hl-keyword">private</span> String name;
|
|
<span class="hl-keyword">private</span> BigDecimal credit;
|
|
|
|
<span class="hl-comment">//getters and setters removed for clarity</span>
|
|
}</pre><p>Because a domain object is being used, an implementation of the
|
|
FieldExtractor interface must be provided, along with the delimiter to
|
|
use:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.file.FlatFileItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"outputResource"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"lineAggregator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...DelimitedLineAggregator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"delimiter"</span> <span class="hl-attribute">value</span>=<span class="hl-value">","</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"fieldExtractor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...BeanWrapperFieldExtractor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"names"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"name,credit"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>In this case, the
|
|
<code class="classname">BeanWrapperFieldExtractor</code> described earlier in
|
|
this chapter is used to turn the name and credit fields within
|
|
<code class="classname">CustomerCredit</code> into an object array, which is
|
|
then written out with commas between each field.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="fixedWidthFileWritingExample" href="#fixedWidthFileWritingExample"></a>Fixed Width File Writing Example</h4></div></div></div><p>Delimited is not the only type of flat file format. Many prefer
|
|
to use a set width for each column to delineate between fields, which
|
|
is usually referred to as 'fixed width'. Spring Batch supports this in
|
|
file writing via the <code class="classname">FormatterLineAggregator</code>.
|
|
Using the same <code class="classname">CustomerCredit</code> domain object
|
|
described above, it can be configured as follows:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.file.FlatFileItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"outputResource"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"lineAggregator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...FormatterLineAggregator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"fieldExtractor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...BeanWrapperFieldExtractor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"names"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"name,credit"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"format"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"%-9s%-2.0f"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>Most of the above example should look familiar. However, the
|
|
value of the format property is new:</p><pre class="programlisting"><span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"format"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"%-9s%-2.0f"</span><span class="hl-tag"> /></span></pre><p>The underlying implementation is built using the same
|
|
<code class="classname">Formatter</code> added as part of Java 5. The Java
|
|
<code class="classname">Formatter</code> is based on the
|
|
<code class="methodname">printf</code> functionality of the C programming
|
|
language. Most details on how to configure a formatter can be found in
|
|
the javadoc of <a class="ulink" href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/Formatter.html" target="_top"><em class="citetitle">Formatter</em></a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="handlingFileCreation" href="#handlingFileCreation"></a>Handling File Creation</h4></div></div></div><p><code class="classname">FlatFileItemReader</code> has a very simple
|
|
relationship with file resources. When the reader is initialized, it
|
|
opens the file if it exists, and throws an exception if it does not.
|
|
File writing isn't quite so simple. At first glance it seems like a
|
|
similar straight forward contract should exist for
|
|
<code class="classname">FlatFileItemWriter</code>: if the file already exists,
|
|
throw an exception, and if it does not, create it and start writing.
|
|
However, potentially restarting a <code class="classname">Job</code> can cause
|
|
issues. In normal restart scenarios, the contract is reversed: if the
|
|
file exists, start writing to it from the last known good position,
|
|
and if it does not, throw an exception. However, what happens if the
|
|
file name for this job is always the same? In this case, you would
|
|
want to delete the file if it exists, unless it's a restart. Because
|
|
of this possibility, the <code class="classname">FlatFileItemWriter</code>
|
|
contains the property, <code class="methodname">shouldDeleteIfExists</code>.
|
|
Setting this property to true will cause an existing file with the
|
|
same name to be deleted when the writer is opened.</p></div></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="xmlReadingWriting" href="#xmlReadingWriting"></a>6.7 XML Item Readers and Writers</h2></div></div></div><p>Spring Batch provides transactional infrastructure for both reading
|
|
XML records and mapping them to Java objects as well as writing Java
|
|
objects as XML records.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note: Constraints on streaming XML"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Constraints on streaming XML</th></tr><tr><td align="left" valign="top"><p>The StAX API is used for I/O as other standard XML parsing APIs do
|
|
not fit batch processing requirements (DOM loads the whole input into
|
|
memory at once and SAX controls the parsing process allowing the user
|
|
only to provide callbacks).</p></td></tr></table></div><p>Lets take a closer look how XML input and output works in Spring
|
|
Batch. First, there are a few concepts that vary from file reading and
|
|
writing but are common across Spring Batch XML processing. With XML
|
|
processing, instead of lines of records (FieldSets) that need to be
|
|
tokenized, it is assumed an XML resource is a collection of 'fragments'
|
|
corresponding to individual records:</p><div class="mediaobject" align="center"><img src="images/xmlinput.png" align="middle"><div class="caption"><p>Figure 3.1: XML Input</p></div></div><p>The 'trade' tag is defined as the 'root element' in the scenario
|
|
above. Everything between '<trade>' and '</trade>' is
|
|
considered one 'fragment'. Spring Batch uses Object/XML Mapping (OXM) to
|
|
bind fragments to objects. However, Spring Batch is not tied to any
|
|
particular XML binding technology. Typical use is to delegate to <a class="ulink" href="http://docs.spring.io/spring-ws/site/reference/html/oxm.html" target="_top"><em class="citetitle">Spring
|
|
OXM</em></a>, which provides uniform abstraction for the most
|
|
popular OXM technologies. The dependency on Spring OXM is optional and you
|
|
can choose to implement Spring Batch specific interfaces if desired. The
|
|
relationship to the technologies that OXM supports can be shown as the
|
|
following:</p><div class="mediaobject" align="center"><img src="images/oxm-fragments.png" align="middle"><div class="caption"><p>Figure 3.2: OXM Binding</p></div></div><p>Now with an introduction to OXM and how one can use XML fragments to
|
|
represent records, let's take a closer look at readers and writers.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="StaxEventItemReader" href="#StaxEventItemReader"></a>6.7.1 StaxEventItemReader</h3></div></div></div><p>The <code class="classname">StaxEventItemReader</code> configuration
|
|
provides a typical setup for the processing of records from an XML input
|
|
stream. First, lets examine a set of XML records that the
|
|
<code class="classname">StaxEventItemReader</code> can process.</p><pre class="programlisting"><span class="hl-directive" style="color: maroon"><?xml version="1.0" encoding="UTF-8"?></span>
|
|
<span class="hl-tag"><records></span>
|
|
<span class="hl-tag"><trade</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"http://springframework.org/batch/sample/io/oxm/domain"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><isin></span>XYZ0001<span class="hl-tag"></isin></span>
|
|
<span class="hl-tag"><quantity></span>5<span class="hl-tag"></quantity></span>
|
|
<span class="hl-tag"><price></span>11.39<span class="hl-tag"></price></span>
|
|
<span class="hl-tag"><customer></span>Customer1<span class="hl-tag"></customer></span>
|
|
<span class="hl-tag"></trade></span>
|
|
<span class="hl-tag"><trade</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"http://springframework.org/batch/sample/io/oxm/domain"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><isin></span>XYZ0002<span class="hl-tag"></isin></span>
|
|
<span class="hl-tag"><quantity></span>2<span class="hl-tag"></quantity></span>
|
|
<span class="hl-tag"><price></span>72.99<span class="hl-tag"></price></span>
|
|
<span class="hl-tag"><customer></span>Customer2c<span class="hl-tag"></customer></span>
|
|
<span class="hl-tag"></trade></span>
|
|
<span class="hl-tag"><trade</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"http://springframework.org/batch/sample/io/oxm/domain"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><isin></span>XYZ0003<span class="hl-tag"></isin></span>
|
|
<span class="hl-tag"><quantity></span>9<span class="hl-tag"></quantity></span>
|
|
<span class="hl-tag"><price></span>99.99<span class="hl-tag"></price></span>
|
|
<span class="hl-tag"><customer></span>Customer3<span class="hl-tag"></customer></span>
|
|
<span class="hl-tag"></trade></span>
|
|
<span class="hl-tag"></records></span></pre><p>To be able to process the XML records the following is needed:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Root Element Name - Name of the root element of the fragment
|
|
that constitutes the object to be mapped. The example
|
|
configuration demonstrates this with the value of trade.</p></li><li class="listitem"><p>Resource - Spring Resource that represents the file to be
|
|
read.</p></li><li class="listitem"><p><code class="classname">Unmarshaller</code> - Unmarshalling
|
|
facility provided by Spring OXM for mapping the XML fragment to an
|
|
object.</p></li></ul></div><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.xml.StaxEventItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"fragmentRootElementName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"trade"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"data/iosample/input/input.xml"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"unmarshaller"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"tradeMarshaller"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><p>Notice that in this example we have chosen to use an
|
|
<code class="classname">XStreamMarshaller</code> which accepts an alias passed
|
|
in as a map with the first key and value being the name of the fragment
|
|
(i.e. root element) and the object type to bind. Then, similar to a
|
|
<code class="classname">FieldSet</code>, the names of the other elements that
|
|
map to fields within the object type are described as key/value pairs in
|
|
the map. In the configuration file we can use a Spring configuration
|
|
utility to describe the required alias as follows:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"tradeMarshaller"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.oxm.xstream.XStreamMarshaller"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"aliases"</span><span class="hl-tag">></span>
|
|
<span class="bold"><strong> <util:map id="aliases">
|
|
<entry key="trade"
|
|
value="org.springframework.batch.sample.domain.Trade" />
|
|
<entry key="price" value="java.math.BigDecimal" />
|
|
<entry key="name" value="java.lang.String" />
|
|
</util:map></strong></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>On input the reader reads the XML resource until it recognizes
|
|
that a new fragment is about to start (by matching the tag name by
|
|
default). The reader creates a standalone XML document from the fragment
|
|
(or at least makes it appear so) and passes the document to a
|
|
deserializer (typically a wrapper around a Spring OXM
|
|
<code class="classname">Unmarshaller</code>) to map the XML to a Java
|
|
object.</p><p>In summary, this procedure is analogous to the following scripted
|
|
Java code which uses the injection provided by the Spring
|
|
configuration:</p><pre class="programlisting">StaxEventItemReader xmlStaxEventItemReader = <span class="hl-keyword">new</span> StaxEventItemReader()
|
|
Resource resource = <span class="hl-keyword">new</span> ByteArrayResource(xmlResource.getBytes())
|
|
|
|
Map aliases = <span class="hl-keyword">new</span> HashMap();
|
|
aliases.put(<span class="hl-string">"trade"</span>,<span class="hl-string">"org.springframework.batch.sample.domain.Trade"</span>);
|
|
aliases.put(<span class="hl-string">"price"</span>,<span class="hl-string">"java.math.BigDecimal"</span>);
|
|
aliases.put(<span class="hl-string">"customer"</span>,<span class="hl-string">"java.lang.String"</span>);
|
|
XStreamMarshaller unmarshaller = <span class="hl-keyword">new</span> XStreamMarshaller();
|
|
unmarshaller.setAliases(aliases);
|
|
xmlStaxEventItemReader.setUnmarshaller(unmarshaller);
|
|
xmlStaxEventItemReader.setResource(resource);
|
|
xmlStaxEventItemReader.setFragmentRootElementName(<span class="hl-string">"trade"</span>);
|
|
xmlStaxEventItemReader.open(<span class="hl-keyword">new</span> ExecutionContext());
|
|
|
|
<span class="hl-keyword">boolean</span> hasNext = true
|
|
|
|
CustomerCredit credit = null;
|
|
|
|
<span class="hl-keyword">while</span> (hasNext) {
|
|
credit = xmlStaxEventItemReader.read();
|
|
<span class="hl-keyword">if</span> (credit == null) {
|
|
hasNext = false;
|
|
}
|
|
<span class="hl-keyword">else</span> {
|
|
System.out.println(credit);
|
|
}
|
|
}</pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="StaxEventItemWriter" href="#StaxEventItemWriter"></a>6.7.2 StaxEventItemWriter</h3></div></div></div><p>Output works symmetrically to input. The
|
|
<code class="classname">StaxEventItemWriter</code> needs a
|
|
<code class="classname">Resource</code>, a marshaller, and a <code class="literal">rootTagName</code>. A Java
|
|
object is passed to a marshaller (typically a standard Spring OXM
|
|
<code class="classname">Marshaller</code>) which writes to a
|
|
<code class="classname">Resource</code> using a custom event writer that filters
|
|
the <code class="classname">StartDocument</code> and
|
|
<code class="classname">EndDocument</code> events produced for each fragment by
|
|
the OXM tools. We'll show this in an example using the
|
|
<code class="classname">MarshallingEventWriterSerializer</code>. The Spring
|
|
configuration for this setup looks as follows:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.xml.StaxEventItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"outputResource"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"marshaller"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"customerCreditMarshaller"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rootTagName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"customers"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"overwriteOutput"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"true"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><p>The configuration sets up the three required properties and
|
|
optionally sets the overwriteOutput=true, mentioned earlier in the
|
|
chapter for specifying whether an existing file can be overwritten. It
|
|
should be noted the marshaller used for the writer is the exact same as
|
|
the one used in the reading example from earlier in the chapter:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"customerCreditMarshaller"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.oxm.xstream.XStreamMarshaller"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"aliases"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><util:map</span> <span class="hl-attribute">id</span>=<span class="hl-value">"aliases"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"customer"</span>
|
|
<span class="hl-attribute">value</span>=<span class="hl-value">"org.springframework.batch.sample.domain.CustomerCredit"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"credit"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"java.math.BigDecimal"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"name"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"java.lang.String"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></util:map></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>To summarize with a Java example, the following code illustrates
|
|
all of the points discussed, demonstrating the programmatic setup of the
|
|
required properties:</p><pre class="programlisting">StaxEventItemWriter staxItemWriter = <span class="hl-keyword">new</span> StaxEventItemWriter()
|
|
FileSystemResource resource = <span class="hl-keyword">new</span> FileSystemResource(<span class="hl-string">"data/outputFile.xml"</span>)
|
|
|
|
Map aliases = <span class="hl-keyword">new</span> HashMap();
|
|
aliases.put(<span class="hl-string">"customer"</span>,<span class="hl-string">"org.springframework.batch.sample.domain.CustomerCredit"</span>);
|
|
aliases.put(<span class="hl-string">"credit"</span>,<span class="hl-string">"java.math.BigDecimal"</span>);
|
|
aliases.put(<span class="hl-string">"name"</span>,<span class="hl-string">"java.lang.String"</span>);
|
|
Marshaller marshaller = <span class="hl-keyword">new</span> XStreamMarshaller();
|
|
marshaller.setAliases(aliases);
|
|
|
|
staxItemWriter.setResource(resource);
|
|
staxItemWriter.setMarshaller(marshaller);
|
|
staxItemWriter.setRootTagName(<span class="hl-string">"trades"</span>);
|
|
staxItemWriter.setOverwriteOutput(true);
|
|
|
|
ExecutionContext executionContext = <span class="hl-keyword">new</span> ExecutionContext();
|
|
staxItemWriter.open(executionContext);
|
|
CustomerCredit Credit = <span class="hl-keyword">new</span> CustomerCredit();
|
|
trade.setPrice(<span class="hl-number">11.39</span>);
|
|
credit.setName(<span class="hl-string">"Customer1"</span>);
|
|
staxItemWriter.write(trade);</pre></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="multiFileInput" href="#multiFileInput"></a>6.8 Multi-File Input</h2></div></div></div><p>It is a common requirement to process multiple files within a single
|
|
<code class="classname">Step</code>. Assuming the files all have the same
|
|
formatting, the <code class="classname">MultiResourceItemReader</code> supports
|
|
this type of input for both XML and flat file processing. Consider the
|
|
following files in a directory:</p><pre class="programlisting">file-1.txt file-2.txt ignored.txt</pre><p>file-1.txt and file-2.txt are formatted the same and for business
|
|
reasons should be processed together. The
|
|
<code class="classname">MuliResourceItemReader</code> can be used to read in both
|
|
files by using wildcards:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"multiResourceReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...MultiResourceItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resources"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"classpath:data/input/file-*.txt"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"delegate"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"flatFileItemReader"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><p>The referenced delegate is a simple
|
|
<code class="classname">FlatFileItemReader</code>. The above configuration will
|
|
read input from both files, handling rollback and restart scenarios. It
|
|
should be noted that, as with any <code class="classname">ItemReader</code>,
|
|
adding extra input (in this case a file) could cause potential issues when
|
|
restarting. It is recommended that batch jobs work with their own
|
|
individual directories until completed successfully.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="database" href="#database"></a>6.9 Database</h2></div></div></div><p>Like most enterprise application styles, a database is the central
|
|
storage mechanism for batch. However, batch differs from other application
|
|
styles due to the sheer size of the datasets with which the system must
|
|
work. If a SQL statement returns 1 million rows, the result set probably
|
|
holds all returned results in memory until all rows have been read. Spring
|
|
Batch provides two types of solutions for this problem: Cursor and Paging
|
|
database ItemReaders.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="cursorBasedItemReaders" href="#cursorBasedItemReaders"></a>6.9.1 Cursor Based ItemReaders</h3></div></div></div><p>Using a database cursor is generally the default approach of most
|
|
batch developers, because it is the database's solution to the problem
|
|
of 'streaming' relational data. The Java
|
|
<code class="classname">ResultSet</code> class is essentially an object
|
|
orientated mechanism for manipulating a cursor. A
|
|
<code class="classname">ResultSet</code> maintains a cursor to the current row
|
|
of data. Calling <code class="methodname">next</code> on a
|
|
<code class="classname">ResultSet</code> moves this cursor to the next row.
|
|
Spring Batch cursor based ItemReaders open the a cursor on
|
|
initialization, and move the cursor forward one row for every call to
|
|
<code class="methodname">read</code>, returning a mapped object that can be
|
|
used for processing. The <code class="methodname">close</code> method will then
|
|
be called to ensure all resources are freed up. The Spring core
|
|
<code class="classname">JdbcTemplate</code> gets around this problem by using
|
|
the callback pattern to completely map all rows in a
|
|
<code class="classname">ResultSet</code> and close before returning control back
|
|
to the method caller. However, in batch this must wait until the step is
|
|
complete. Below is a generic diagram of how a cursor based
|
|
<code class="classname">ItemReader</code> works, and while a SQL statement is
|
|
used as an example since it is so widely known, any technology could
|
|
implement the basic approach:</p><div class="mediaobject" align="center"><img src="images/cursorExample.png" align="middle"></div><p>This example illustrates the basic pattern. Given a 'FOO' table,
|
|
which has three columns: ID, NAME, and BAR, select all rows with an ID
|
|
greater than 1 but less than 7. This puts the beginning of the cursor
|
|
(row 1) on ID 2. The result of this row should be a completely mapped
|
|
Foo object. Calling <code class="methodname">read</code>() again moves the
|
|
cursor to the next row, which is the Foo with an ID of 3. The results of
|
|
these reads will be written out after each
|
|
<code class="methodname">read</code>, thus allowing the objects to be garbage
|
|
collected (assuming no instance variables are maintaining references to
|
|
them).</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="JdbcCursorItemReader" href="#JdbcCursorItemReader"></a>JdbcCursorItemReader</h4></div></div></div><p><code class="classname">JdbcCursorItemReader</code> is the Jdbc
|
|
implementation of the cursor based technique. It works directly with a
|
|
<code class="classname">ResultSet</code> and requires a SQL statement to run
|
|
against a connection obtained from a
|
|
<code class="classname">DataSource</code>. The following database schema will
|
|
be used as an example:</p><pre class="programlisting"><span class="hl-keyword">CREATE</span> <span class="hl-keyword">TABLE</span> CUSTOMER (
|
|
ID <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">IDENTITY</span> <span class="hl-keyword">PRIMARY</span> <span class="hl-keyword">KEY</span>,
|
|
<span class="hl-keyword">NAME</span> <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">45</span>),
|
|
CREDIT <span class="hl-keyword">FLOAT</span>
|
|
);</pre><p>Many people prefer to use a domain object for each row, so we'll
|
|
use an implementation of the <code class="classname">RowMapper</code>
|
|
interface to map a <code class="classname">CustomerCredit</code>
|
|
object:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> CustomerCreditRowMapper <span class="hl-keyword">implements</span> RowMapper {
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">static</span> <span class="hl-keyword">final</span> String ID_COLUMN = <span class="hl-string">"id"</span>;
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">static</span> <span class="hl-keyword">final</span> String NAME_COLUMN = <span class="hl-string">"name"</span>;
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">static</span> <span class="hl-keyword">final</span> String CREDIT_COLUMN = <span class="hl-string">"credit"</span>;
|
|
|
|
<span class="hl-keyword">public</span> Object mapRow(ResultSet rs, <span class="hl-keyword">int</span> rowNum) <span class="hl-keyword">throws</span> SQLException {
|
|
CustomerCredit customerCredit = <span class="hl-keyword">new</span> CustomerCredit();
|
|
|
|
customerCredit.setId(rs.getInt(ID_COLUMN));
|
|
customerCredit.setName(rs.getString(NAME_COLUMN));
|
|
customerCredit.setCredit(rs.getBigDecimal(CREDIT_COLUMN));
|
|
|
|
<span class="hl-keyword">return</span> customerCredit;
|
|
}
|
|
}</pre><p>Because <code class="classname">JdbcTemplate</code> is so familiar to
|
|
users of Spring, and the <code class="classname">JdbcCursorItemReader</code>
|
|
shares key interfaces with it, it is useful to see an example of how
|
|
to read in this data with <code class="classname">JdbcTemplate</code>, in
|
|
order to contrast it with the <code class="classname">ItemReader</code>. For
|
|
the purposes of this example, let's assume there are 1,000 rows in the
|
|
CUSTOMER database. The first example will be using
|
|
<code class="classname">JdbcTemplate</code>:</p><pre class="programlisting"><span class="hl-comment">//For simplicity sake, assume a dataSource has already been obtained</span>
|
|
JdbcTemplate jdbcTemplate = <span class="hl-keyword">new</span> JdbcTemplate(dataSource);
|
|
List customerCredits = jdbcTemplate.query(<span class="hl-string">"SELECT ID, NAME, CREDIT from CUSTOMER"</span>,
|
|
<span class="hl-keyword">new</span> CustomerCreditRowMapper());</pre><p>After running this code snippet the customerCredits list will
|
|
contain 1,000 <code class="classname">CustomerCredit</code> objects. In the
|
|
query method, a connection will be obtained from the
|
|
<code class="classname">DataSource</code>, the provided SQL will be run
|
|
against it, and the <code class="methodname">mapRow</code> method will be
|
|
called for each row in the <code class="classname">ResultSet</code>. Let's
|
|
contrast this with the approach of the
|
|
<code class="classname">JdbcCursorItemReader</code>:</p><pre class="programlisting">JdbcCursorItemReader itemReader = <span class="hl-keyword">new</span> JdbcCursorItemReader();
|
|
itemReader.setDataSource(dataSource);
|
|
itemReader.setSql(<span class="hl-string">"SELECT ID, NAME, CREDIT from CUSTOMER"</span>);
|
|
itemReader.setRowMapper(<span class="hl-keyword">new</span> CustomerCreditRowMapper());
|
|
<span class="hl-keyword">int</span> counter = <span class="hl-number">0</span>;
|
|
ExecutionContext executionContext = <span class="hl-keyword">new</span> ExecutionContext();
|
|
itemReader.open(executionContext);
|
|
Object customerCredit = <span class="hl-keyword">new</span> Object();
|
|
<span class="hl-keyword">while</span>(customerCredit != null){
|
|
customerCredit = itemReader.read();
|
|
counter++;
|
|
}
|
|
itemReader.close(executionContext);</pre><p>After running this code snippet the counter will equal 1,000. If
|
|
the code above had put the returned customerCredit into a list, the
|
|
result would have been exactly the same as with the
|
|
<code class="classname">JdbcTemplate</code> example. However, the big
|
|
advantage of the <code class="classname">ItemReader</code> is that it allows
|
|
items to be 'streamed'. The <code class="methodname">read</code> method can
|
|
be called once, and the item written out via an
|
|
<code class="classname">ItemWriter</code>, and then the next item obtained via
|
|
<code class="methodname">read</code>. This allows item reading and writing to
|
|
be done in 'chunks' and committed periodically, which is the essence
|
|
of high performance batch processing. Furthermore, it is very easily
|
|
configured for injection into a Spring Batch
|
|
<code class="classname">Step</code>:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...JdbcCursorItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"sql"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"select ID, NAME, CREDIT from CUSTOMER"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.sample.domain.CustomerCreditRowMapper"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="JdbcCursorItemReaderProperties" href="#JdbcCursorItemReaderProperties"></a>Additional Properties</h5></div></div></div><p>Because there are so many varying options for opening a cursor
|
|
in Java, there are many properties on the
|
|
<code class="classname">JdbcCustorItemReader</code> that can be set:</p><div class="table"><a name="d5e2752" href="#d5e2752"></a><p class="title"><b>Table 6.2. JdbcCursorItemReader Properties</b></p><div class="table-contents"><table summary="JdbcCursorItemReader Properties" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">ignoreWarnings</td><td style="border-bottom: 0.5pt solid ; ">Determines whether or not SQLWarnings are logged or
|
|
cause an exception - default is true</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">fetchSize</td><td style="border-bottom: 0.5pt solid ; ">Gives the Jdbc driver a hint as to the number of rows
|
|
that should be fetched from the database when more rows are
|
|
needed by the <code class="classname">ResultSet</code> object used
|
|
by the <code class="classname">ItemReader</code>. By default, no
|
|
hint is given.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">maxRows</td><td style="border-bottom: 0.5pt solid ; ">Sets the limit for the maximum number of rows the
|
|
underlying <code class="classname">ResultSet</code> can hold at any
|
|
one time.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">queryTimeout</td><td style="border-bottom: 0.5pt solid ; ">Sets the number of seconds the driver will wait for a
|
|
<code class="classname">Statement</code> object to execute to the
|
|
given number of seconds. If the limit is exceeded, a
|
|
<code class="classname">DataAccessEception</code> is thrown.
|
|
(Consult your driver vendor documentation for
|
|
details).</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">verifyCursorPosition</td><td style="border-bottom: 0.5pt solid ; ">Because the same <code class="classname">ResultSet</code>
|
|
held by the <code class="classname">ItemReader</code> is passed to
|
|
the <code class="classname">RowMapper</code>, it is possible for
|
|
users to call <code class="methodname">ResultSet.next</code>()
|
|
themselves, which could cause issues with the reader's
|
|
internal count. Setting this value to true will cause an
|
|
exception to be thrown if the cursor position is not the
|
|
same after the <code class="classname">RowMapper</code> call as it
|
|
was before.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">saveState</td><td style="border-bottom: 0.5pt solid ; ">Indicates whether or not the reader's state should be
|
|
saved in the <code class="classname">ExecutionContext</code>
|
|
provided by
|
|
<code class="methodname">ItemStream#update</code>(<code class="classname">ExecutionContext</code>)
|
|
The default value is true.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">driverSupportsAbsolute</td><td style="border-bottom: 0.5pt solid ; ">Defaults to false. Indicates whether the Jdbc driver
|
|
supports setting the absolute row on a
|
|
<code class="classname">ResultSet</code>. It is recommended that
|
|
this is set to true for Jdbc drivers that supports
|
|
<code class="methodname">ResultSet.absolute</code>() as it may
|
|
improve performance, especially if a step fails while
|
|
working with a large data set.</td></tr><tr><td style="border-right: 0.5pt solid ; ">setUseSharedExtendedConnection</td><td style="">Defaults to false. Indicates whether the connection
|
|
used for the cursor should be used by all other processing
|
|
thus sharing the same transaction. If this is set to false,
|
|
which is the default, then the cursor will be opened using
|
|
its own connection and will not participate in any
|
|
transactions started for the rest of the step processing. If
|
|
you set this flag to true then you must wrap the
|
|
<code class="classname">DataSource</code> in an
|
|
<code class="classname">ExtendedConnectionDataSourceProxy</code> to
|
|
prevent the connection from being closed and released after
|
|
each commit. When you set this option to true then the
|
|
statement used to open the cursor will be created with both
|
|
'READ_ONLY' and 'HOLD_CUSORS_OVER_COMMIT' options. This
|
|
allows holding the cursor open over transaction start and
|
|
commits performed in the step processing. To use this
|
|
feature you need a database that supports this and a Jdbc
|
|
driver supporting Jdbc 3.0 or later.</td></tr></tbody></table></div></div><br class="table-break"></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="HibernateCursorItemReader" href="#HibernateCursorItemReader"></a>HibernateCursorItemReader</h4></div></div></div><p>Just as normal Spring users make important decisions about
|
|
whether or not to use ORM solutions, which affect whether or not they
|
|
use a <code class="classname">JdbcTemplate</code> or a
|
|
<code class="classname">HibernateTemplate</code>, Spring Batch users have the
|
|
same options. <code class="classname">HibernateCursorItemReader</code> is the
|
|
Hibernate implementation of the cursor technique. Hibernate's usage in
|
|
batch has been fairly controversial. This has largely been because
|
|
Hibernate was originally developed to support online application
|
|
styles. However, that doesn't mean it can't be used for batch
|
|
processing. The easiest approach for solving this problem is to use a
|
|
<code class="classname">StatelessSession</code> rather than a standard
|
|
session. This removes all of the caching and dirty checking hibernate
|
|
employs that can cause issues in a batch scenario. For more
|
|
information on the differences between stateless and normal hibernate
|
|
sessions, refer to the documentation of your specific hibernate
|
|
release. The <code class="classname">HibernateCursorItemReader</code> allows
|
|
you to declare an HQL statement and pass in a
|
|
<code class="classname">SessionFactory</code>, which will pass back one item
|
|
per call to <code class="methodname">read</code> in the same basic fashion as
|
|
the <code class="classname">JdbcCursorItemReader</code>. Below is an example
|
|
configuration using the same 'customer credit' example as the JDBC
|
|
reader:</p><pre class="programlisting">HibernateCursorItemReader itemReader = <span class="hl-keyword">new</span> HibernateCursorItemReader();
|
|
itemReader.setQueryString(<span class="hl-string">"from CustomerCredit"</span>);
|
|
<span class="hl-comment">//For simplicity sake, assume sessionFactory already obtained.</span>
|
|
itemReader.setSessionFactory(sessionFactory);
|
|
itemReader.setUseStatelessSession(true);
|
|
<span class="hl-keyword">int</span> counter = <span class="hl-number">0</span>;
|
|
ExecutionContext executionContext = <span class="hl-keyword">new</span> ExecutionContext();
|
|
itemReader.open(executionContext);
|
|
Object customerCredit = <span class="hl-keyword">new</span> Object();
|
|
<span class="hl-keyword">while</span>(customerCredit != null){
|
|
customerCredit = itemReader.read();
|
|
counter++;
|
|
}
|
|
itemReader.close(executionContext);</pre><p>This configured <code class="classname">ItemReader</code> will return
|
|
<code class="classname">CustomerCredit</code> objects in the exact same manner
|
|
as described by the <code class="classname">JdbcCursorItemReader</code>,
|
|
assuming hibernate mapping files have been created correctly for the
|
|
Customer table. The 'useStatelessSession' property defaults to true,
|
|
but has been added here to draw attention to the ability to switch it
|
|
on or off. It is also worth noting that the fetchSize of the
|
|
underlying cursor can be set via the setFetchSize property. As with
|
|
<code class="classname">JdbcCursorItemReader</code>, configuration is
|
|
straightforward:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.database.HibernateCursorItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"sessionFactory"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"sessionFactory"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"queryString"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"from CustomerCredit"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="StoredProcedureItemReader" href="#StoredProcedureItemReader"></a>StoredProcedureItemReader</h4></div></div></div><p>Sometimes it is necessary to obtain the cursor data using a
|
|
stored procedure. The <code class="classname">StoredProcedureItemReader</code>
|
|
works like the <code class="classname">JdbcCursorItemReader</code> except that
|
|
instead of executing a query to obtain a cursor we execute a stored
|
|
procedure that returns a cursor. The stored procedure can return the
|
|
cursor in three different ways:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>as a returned ResultSet (used by SQL Server, Sybase, DB2,
|
|
Derby and MySQL)</p></li><li class="listitem"><p>as a ref-cursor returned as an out parameter (used by Oracle
|
|
and PostgreSQL)</p></li><li class="listitem"><p>as the return value of a stored function call</p></li></ol></div><p>Below is a basic example configuration using the same 'customer
|
|
credit' example as earlier:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"reader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"o.s.batch.item.database.StoredProcedureItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"procedureName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"sp_customer_credit"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.sample.domain.CustomerCreditRowMapper"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
</pre><p>This example relies on the stored procedure to provide a
|
|
ResultSet as a returned result (option 1 above). </p><p>If the stored procedure returned a ref-cursor (option 2) then we
|
|
would need to provide the position of the out parameter that is the
|
|
returned ref-cursor. Here is an example where the first parameter is
|
|
the returned ref-cursor:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"reader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"o.s.batch.item.database.StoredProcedureItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"procedureName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"sp_customer_credit"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"refCursorPosition"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"1"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.sample.domain.CustomerCreditRowMapper"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
</pre><p>If the cursor was returned from a stored function (option 3) we
|
|
would need to set the property "<code class="varname">function</code>" to
|
|
<code class="literal">true</code>. It defaults to <code class="literal">false</code>. Here
|
|
is what that would look like:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"reader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"o.s.batch.item.database.StoredProcedureItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"procedureName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"sp_customer_credit"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"function"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"true"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.sample.domain.CustomerCreditRowMapper"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
</pre><p>In all of these cases we need to define a
|
|
<code class="classname">RowMapper</code> as well as a
|
|
<code class="classname">DataSource</code> and the actual procedure
|
|
name.</p><p>If the stored procedure or function takes in parameter then they
|
|
must be declared and set via the parameters property. Here is an
|
|
example for Oracle that declares three parameters. The first one is
|
|
the out parameter that returns the ref-cursor, the second and third
|
|
are in parameters that takes a value of type INTEGER:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"reader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"o.s.batch.item.database.StoredProcedureItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"procedureName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"spring.cursor_func"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"parameters"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><list></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.jdbc.core.SqlOutParameter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><constructor-arg</span> <span class="hl-attribute">index</span>=<span class="hl-value">"0"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"newid"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><constructor-arg</span> <span class="hl-attribute">index</span>=<span class="hl-value">"1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><util:constant</span> <span class="hl-attribute">static-field</span>=<span class="hl-value">"oracle.jdbc.OracleTypes.CURSOR"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></constructor-arg></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.jdbc.core.SqlParameter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><constructor-arg</span> <span class="hl-attribute">index</span>=<span class="hl-value">"0"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"amount"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><constructor-arg</span> <span class="hl-attribute">index</span>=<span class="hl-value">"1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><util:constant</span> <span class="hl-attribute">static-field</span>=<span class="hl-value">"java.sql.Types.INTEGER"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></constructor-arg></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.jdbc.core.SqlParameter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><constructor-arg</span> <span class="hl-attribute">index</span>=<span class="hl-value">"0"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"custid"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><constructor-arg</span> <span class="hl-attribute">index</span>=<span class="hl-value">"1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><util:constant</span> <span class="hl-attribute">static-field</span>=<span class="hl-value">"java.sql.Types.INTEGER"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></constructor-arg></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></list></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"refCursorPosition"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"1"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"rowMapper"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"preparedStatementSetter"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"parameterSetter"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span></pre><p>In addition to the parameter declarations we need to specify a
|
|
<code class="classname">PreparedStatementSetter</code> implementation that
|
|
sets the parameter values for the call. This works the same as for the
|
|
<code class="classname">JdbcCursorItemReader</code> above. All the additional
|
|
properties listed in <a class="xref" href="readersAndWriters.html#JdbcCursorItemReaderProperties" title="Additional Properties">the section called “Additional Properties”</a>
|
|
apply to the <code class="classname">StoredProcedureItemReader</code> as well.
|
|
</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="pagingItemReaders" href="#pagingItemReaders"></a>6.9.2 Paging ItemReaders</h3></div></div></div><p>An alternative to using a database cursor is executing multiple
|
|
queries where each query is bringing back a portion of the results. We
|
|
refer to this portion as a page. Each query that is executed must
|
|
specify the starting row number and the number of rows that we want
|
|
returned for the page.</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="JdbcPagingItemReader" href="#JdbcPagingItemReader"></a>JdbcPagingItemReader</h4></div></div></div><p>One implementation of a paging <code class="classname">ItemReader</code>
|
|
is the <code class="classname">JdbcPagingItemReader</code>. The
|
|
<code class="classname">JdbcPagingItemReader</code> needs a
|
|
<code class="classname">PagingQueryProvider</code> responsible for providing
|
|
the SQL queries used to retrieve the rows making up a page. Since each
|
|
database has its own strategy for providing paging support, we need to
|
|
use a different <code class="classname">PagingQueryProvider</code> for each
|
|
supported database type. There is also the
|
|
<code class="classname">SqlPagingQueryProviderFactoryBean</code> that will
|
|
auto-detect the database that is being used and determine the
|
|
appropriate <code class="classname">PagingQueryProvider</code> implementation.
|
|
This simplifies the configuration and is the recommended best
|
|
practice.</p><p>The <code class="classname">SqlPagingQueryProviderFactoryBean</code>
|
|
requires that you specify a select clause and a from clause. You can
|
|
also provide an optional where clause. These clauses will be used to
|
|
build an SQL statement combined with the required sortKey.</p><p>After the reader has been opened, it will pass back one item per
|
|
call to <code class="methodname">read</code> in the same basic fashion as any
|
|
other <code class="classname">ItemReader</code>. The paging happens behind the
|
|
scenes when additional rows are needed.</p><p>Below is an example configuration using a similar 'customer
|
|
credit' example as the cursor based ItemReaders above:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...JdbcPagingItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"queryProvider"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...SqlPagingQueryProviderFactoryBean"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"selectClause"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"select id, name, credit"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"fromClause"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"from customer"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"whereClause"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"where status=:status"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"sortKey"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"id"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"parameterValues"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><map></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"status"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"NEW"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></map></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"pageSize"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"1000"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"customerMapper"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span></pre><p>This configured <code class="classname">ItemReader</code> will return
|
|
<code class="classname">CustomerCredit</code> objects using the
|
|
<code class="classname">RowMapper</code> that must be specified. The
|
|
'pageSize' property determines the number of entities read from the
|
|
database for each query execution.</p><p>The 'parameterValues' property can be used to specify a Map of
|
|
parameter values for the query. If you use named parameters in the
|
|
where clause the key for each entry should match the name of the named
|
|
parameter. If you use a traditional '?' placeholder then the key for
|
|
each entry should be the number of the placeholder, starting with
|
|
1.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="JpaPagingItemReader" href="#JpaPagingItemReader"></a>JpaPagingItemReader</h4></div></div></div><p>Another implementation of a paging
|
|
<code class="classname">ItemReader</code> is the
|
|
<code class="classname">JpaPagingItemReader</code>. JPA doesn't have a concept
|
|
similar to the Hibernate <code class="classname">StatelessSession</code> so we
|
|
have to use other features provided by the JPA specification. Since
|
|
JPA supports paging, this is a natural choice when it comes to using
|
|
JPA for batch processing. After each page is read, the entities will
|
|
become detached and the persistence context will be cleared in order
|
|
to allow the entities to be garbage collected once the page is
|
|
processed.</p><p>The <code class="classname">JpaPagingItemReader</code> allows you to
|
|
declare a JPQL statement and pass in a
|
|
<code class="classname">EntityManagerFactory</code>. It will then pass back
|
|
one item per call to <code class="methodname">read</code> in the same basic
|
|
fashion as any other <code class="classname">ItemReader</code>. The paging
|
|
happens behind the scenes when additional entities are needed. Below
|
|
is an example configuration using the same 'customer credit' example
|
|
as the JDBC reader above:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...JpaPagingItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"entityManagerFactory"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"entityManagerFactory"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"queryString"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"select c from CustomerCredit c"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"pageSize"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"1000"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span></pre><p>This configured <code class="classname">ItemReader</code> will return
|
|
<code class="classname">CustomerCredit</code> objects in the exact same manner
|
|
as described by the <code class="classname">JdbcPagingItemReader</code> above,
|
|
assuming the Customer object has the correct JPA annotations or ORM
|
|
mapping file. The 'pageSize' property determines the number of
|
|
entities read from the database for each query execution.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="IbatisPagingItemReader" href="#IbatisPagingItemReader"></a>IbatisPagingItemReader</h4></div></div></div><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top">This reader is deprecated as of Spring Batch 3.0.</td></tr></table></div><p>If you use IBATIS for your data access then you can use the
|
|
<code class="classname">IbatisPagingItemReader</code> which, as the name
|
|
indicates, is an implementation of a paging
|
|
<code class="classname">ItemReader</code>. IBATIS doesn't have direct support
|
|
for reading rows in pages but by providing a couple of standard
|
|
variables you can add paging support to your IBATIS queries.</p><p>Here is an example of a configuration for a
|
|
<code class="classname">IbatisPagingItemReader</code> reading CustomerCredits
|
|
as in the examples above:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...IbatisPagingItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"sqlMapClient"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"sqlMapClient"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"queryId"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"getPagedCustomerCredits"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"pageSize"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"1000"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span></pre><p>The <code class="classname">IbatisPagingItemReader</code> configuration
|
|
above references an IBATIS query called "getPagedCustomerCredits".
|
|
Here is an example of what that query should look like for
|
|
MySQL.</p><pre class="programlisting"><span class="hl-tag"><select</span> <span class="hl-attribute">id</span>=<span class="hl-value">"getPagedCustomerCredits"</span> <span class="hl-attribute">resultMap</span>=<span class="hl-value">"customerCreditResult"</span><span class="hl-tag">></span>
|
|
select id, name, credit from customer order by id asc LIMIT #_skiprows#, #_pagesize#
|
|
<span class="hl-tag"></select></span></pre><p>The <code class="classname">_skiprows</code> and
|
|
<code class="classname">_pagesize</code> variables are provided by the
|
|
<code class="classname">IbatisPagingItemReader</code> and there is also a
|
|
<code class="classname">_page</code> variable that can be used if necessary.
|
|
The syntax for the paging queries varies with the database used. Here
|
|
is an example for Oracle (unfortunately we need to use CDATA for some
|
|
operators since this belongs in an XML document):</p><pre class="programlisting"><span class="hl-tag"><select</span> <span class="hl-attribute">id</span>=<span class="hl-value">"getPagedCustomerCredits"</span> <span class="hl-attribute">resultMap</span>=<span class="hl-value">"customerCreditResult"</span><span class="hl-tag">></span>
|
|
select * from (
|
|
select * from (
|
|
select t.id, t.name, t.credit, ROWNUM ROWNUM_ from customer t order by id
|
|
)) where ROWNUM_ <span class="hl-tag"><![CDATA[</span> > <span class="hl-tag">]]></span> ( #_page# * #_pagesize# )
|
|
) where ROWNUM <span class="hl-tag"><![CDATA[</span> <= <span class="hl-tag">]]></span> #_pagesize#
|
|
<span class="hl-tag"></select></span></pre></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="databaseItemWriters" href="#databaseItemWriters"></a>6.9.3 Database ItemWriters</h3></div></div></div><p>While both Flat Files and XML have specific ItemWriters, there is
|
|
no exact equivalent in the database world. This is because transactions
|
|
provide all the functionality that is needed. ItemWriters are necessary
|
|
for files because they must act as if they're transactional, keeping
|
|
track of written items and flushing or clearing at the appropriate
|
|
times. Databases have no need for this functionality, since the write is
|
|
already contained in a transaction. Users can create their own DAOs that
|
|
implement the <code class="classname">ItemWriter</code> interface or use one
|
|
from a custom <code class="classname">ItemWriter</code> that's written for
|
|
generic processing concerns, either way, they should work without any
|
|
issues. One thing to look out for is the performance and error handling
|
|
capabilities that are provided by batching the outputs. This is most
|
|
common when using hibernate as an <code class="classname">ItemWriter</code>, but
|
|
could have the same issues when using Jdbc batch mode. Batching database
|
|
output doesn't have any inherent flaws, assuming we are careful to flush
|
|
and there are no errors in the data. However, any errors while writing
|
|
out can cause confusion because there is no way to know which individual
|
|
item caused an exception, or even if any individual item was
|
|
responsible, as illustrated below:</p><div class="mediaobject" align="center"><img src="images/errorOnFlush.png" align="middle"></div><p>If items are buffered before being written out, any
|
|
errors encountered will not be thrown until the buffer is flushed just
|
|
before a commit. For example, let's assume that 20 items will be written
|
|
per chunk, and the 15th item throws a DataIntegrityViolationException.
|
|
As far as the Step is concerned, all 20 item will be written out
|
|
successfully, since there's no way to know that an error will occur
|
|
until they are actually written out. Once
|
|
<code class="classname">Session#</code><code class="methodname">flush</code>() is
|
|
called, the buffer will be emptied and the exception will be hit. At
|
|
this point, there's nothing the <code class="classname">Step</code> can do, the
|
|
transaction must be rolled back. Normally, this exception might cause
|
|
the Item to be skipped (depending upon the skip/retry policies), and
|
|
then it won't be written out again. However, in the batched scenario,
|
|
there's no way for it to know which item caused the issue, the whole
|
|
buffer was being written out when the failure happened. The only way to
|
|
solve this issue is to flush after each item:</p><div class="mediaobject" align="center"><img src="images/errorOnWrite.png" align="middle"></div><p>This is a common use case, especially when using Hibernate, and
|
|
the simple guideline for implementations of
|
|
<code class="classname">ItemWriter</code>, is to flush on each call to
|
|
<code class="methodname">write()</code>. Doing so allows for items to be
|
|
skipped reliably, with Spring Batch taking care internally of the
|
|
granularity of the calls to <code class="classname">ItemWriter</code> after an
|
|
error.</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="reusingExistingServices" href="#reusingExistingServices"></a>6.10 Reusing Existing Services</h2></div></div></div><p>Batch systems are often used in conjunction with other application
|
|
styles. The most common is an online system, but it may also support
|
|
integration or even a thick client application by moving necessary bulk
|
|
data that each application style uses. For this reason, it is common that
|
|
many users want to reuse existing DAOs or other services within their
|
|
batch jobs. The Spring container itself makes this fairly easy by allowing
|
|
any necessary class to be injected. However, there may be cases where the
|
|
existing service needs to act as an <code class="classname">ItemReader</code> or
|
|
<code class="classname">ItemWriter</code>, either to satisfy the dependency of
|
|
another Spring Batch class, or because it truly is the main
|
|
<code class="classname">ItemReader</code> for a step. It is fairly trivial to
|
|
write an adaptor class for each service that needs wrapping, but because
|
|
it is such a common concern, Spring Batch provides implementations:
|
|
<code class="classname">ItemReaderAdapter</code> and
|
|
<code class="classname">ItemWriterAdapter</code>. Both classes implement the
|
|
standard Spring method invoking the delegate pattern and are fairly simple
|
|
to set up. Below is an example of the reader:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.adapter.ItemReaderAdapter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"targetObject"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"fooService"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"targetMethod"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"generateFoo"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"fooService"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.sample.FooService"</span><span class="hl-tag"> /></span></pre><p>One important point to note is that the contract of the targetMethod
|
|
must be the same as the contract for <code class="methodname">read</code>: when
|
|
exhausted it will return null, otherwise an <code class="classname">Object</code>.
|
|
Anything else will prevent the framework from knowing when processing
|
|
should end, either causing an infinite loop or incorrect failure,
|
|
depending upon the implementation of the
|
|
<code class="classname">ItemWriter</code>. The <code class="classname">ItemWriter</code>
|
|
implementation is equally as simple:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.adapter.ItemWriterAdapter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"targetObject"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"fooService"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"targetMethod"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"processFoo"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"fooService"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.sample.FooService"</span><span class="hl-tag"> /></span>
|
|
</pre></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="validatingInput" href="#validatingInput"></a>6.11 Validating Input</h2></div></div></div><p>During the course of this chapter, multiple approaches to parsing
|
|
input have been discussed. Each major implementation will throw an
|
|
exception if it is not 'well-formed'. The
|
|
<code class="classname">FixedLengthTokenizer</code> will throw an exception if a
|
|
range of data is missing. Similarly, attempting to access an index in a
|
|
<code class="classname">RowMapper</code> of <code class="classname">FieldSetMapper</code>
|
|
that doesn't exist or is in a different format than the one expected will
|
|
cause an exception to be thrown. All of these types of exceptions will be
|
|
thrown before <code class="methodname">read</code> returns. However, they don't
|
|
address the issue of whether or not the returned item is valid. For
|
|
example, if one of the fields is an age, it obviously cannot be negative.
|
|
It will parse correctly, because it existed and is a number, but it won't
|
|
cause an exception. Since there are already a plethora of Validation
|
|
frameworks, Spring Batch does not attempt to provide yet another, but
|
|
rather provides a very simple interface that can be implemented by any
|
|
number of frameworks:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> Validator {
|
|
|
|
<span class="hl-keyword">void</span> validate(Object value) <span class="hl-keyword">throws</span> ValidationException;
|
|
|
|
}</pre><p>The contract is that the <code class="methodname">validate</code> method
|
|
will throw an exception if the object is invalid, and return normally if
|
|
it is valid. Spring Batch provides an out of the box
|
|
<code class="classname">ItemProcessor:</code></p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.validator.ValidatingItemProcessor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"validator"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"validator"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"validator"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.validator.SpringValidator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"validator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"orderValidator"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springmodules.validation.valang.ValangValidator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"valang"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><value></span>
|
|
<span class="hl-tag"><![CDATA[</span>
|
|
{ orderId : ? > 0 AND ? <= 9999999999 : 'Incorrect order ID' : 'error.order.id' }
|
|
{ totalLines : ? = size(lineItems) : 'Bad count of order lines'
|
|
: 'error.order.lines.badcount'}
|
|
{ customer.registered : customer.businessCustomer = FALSE OR ? = TRUE
|
|
: 'Business customer must be registered'
|
|
: 'error.customer.registration'}
|
|
{ customer.companyName : customer.businessCustomer = FALSE OR ? HAS TEXT
|
|
: 'Company name for business customer is mandatory'
|
|
:'error.customer.companyname'}
|
|
<span class="hl-tag">]]></span>
|
|
<span class="hl-tag"></value></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>This simple example shows a simple
|
|
<code class="classname">ValangValidator</code> that is used to validate an order
|
|
object. The intent is not to show Valang functionality as much as to show
|
|
how a validator could be added.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="process-indicator" href="#process-indicator"></a>6.12 Preventing State Persistence</h2></div></div></div><p>By default, all of the <code class="classname">ItemReader</code> and
|
|
<code class="classname">ItemWriter</code> implementations store their current
|
|
state in the <code class="classname">ExecutionContext</code> before it is
|
|
committed. However, this may not always be the desired behavior. For
|
|
example, many developers choose to make their database readers
|
|
'rerunnable' by using a process indicator. An extra column is added to the
|
|
input data to indicate whether or not it has been processed. When a
|
|
particular record is being read (or written out) the processed flag is
|
|
flipped from false to true. The SQL statement can then contain an extra
|
|
statement in the where clause, such as "where PROCESSED_IND = false",
|
|
thereby ensuring that only unprocessed records will be returned in the
|
|
case of a restart. In this scenario, it is preferable to not store any
|
|
state, such as the current row number, since it will be irrelevant upon
|
|
restart. For this reason, all readers and writers include the 'saveState'
|
|
property:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerSummarizationSource"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...JdbcCursorItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.sample.PlayerSummaryMapper"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="bold"><strong><property name="saveState" value="false" /></strong></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"sql"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><value></span>
|
|
SELECT games.player_id, games.year_no, SUM(COMPLETES),
|
|
SUM(ATTEMPTS), SUM(PASSING_YARDS), SUM(PASSING_TD),
|
|
SUM(INTERCEPTIONS), SUM(RUSHES), SUM(RUSH_YARDS),
|
|
SUM(RECEPTIONS), SUM(RECEPTIONS_YARDS), SUM(TOTAL_TD)
|
|
from games, players where players.player_id =
|
|
games.player_id group by games.player_id, games.year_no
|
|
<span class="hl-tag"></value></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>The <code class="classname">ItemReader</code> configured above will not make
|
|
any entries in the <code class="classname">ExecutionContext</code> for any
|
|
executions in which it participates.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="customReadersWriters" href="#customReadersWriters"></a>6.13 Creating Custom ItemReaders and
|
|
ItemWriters</h2></div></div></div><p>So far in this chapter the basic contracts that exist for reading
|
|
and writing in Spring Batch and some common implementations have been
|
|
discussed. However, these are all fairly generic, and there are many
|
|
potential scenarios that may not be covered by out of the box
|
|
implementations. This section will show, using a simple example, how to
|
|
create a custom <code class="classname">ItemReader</code> and
|
|
<code class="classname">ItemWriter</code> implementation and implement their
|
|
contracts correctly. The <code class="classname">ItemReader</code> will also
|
|
implement <code class="classname">ItemStream</code>, in order to illustrate how to
|
|
make a reader or writer restartable.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="customReader" href="#customReader"></a>6.13.1 Custom ItemReader Example</h3></div></div></div><p>For the purpose of this example, a simple
|
|
<code class="classname">ItemReader</code> implementation that reads from a
|
|
provided list will be created. We'll start out by implementing the most
|
|
basic contract of <code class="classname">ItemReader</code>,
|
|
<code class="methodname">read</code>:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> CustomItemReader<T> <span class="hl-keyword">implements</span> ItemReader<T>{
|
|
|
|
List<T> items;
|
|
|
|
<span class="hl-keyword">public</span> CustomItemReader(List<T> items) {
|
|
<span class="hl-keyword">this</span>.items = items;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> T read() <span class="hl-keyword">throws</span> Exception, UnexpectedInputException,
|
|
NoWorkFoundException, ParseException {
|
|
|
|
<span class="hl-keyword">if</span> (!items.isEmpty()) {
|
|
<span class="hl-keyword">return</span> items.remove(<span class="hl-number">0</span>);
|
|
}
|
|
<span class="hl-keyword">return</span> null;
|
|
}
|
|
}</pre><p>This very simple class takes a list of items, and returns them one
|
|
at a time, removing each from the list. When the list is empty, it
|
|
returns null, thus satisfying the most basic requirements of an
|
|
<code class="classname">ItemReader</code>, as illustrated below:</p><pre class="programlisting">List<String> items = <span class="hl-keyword">new</span> ArrayList<String>();
|
|
items.add(<span class="hl-string">"1"</span>);
|
|
items.add(<span class="hl-string">"2"</span>);
|
|
items.add(<span class="hl-string">"3"</span>);
|
|
|
|
ItemReader itemReader = <span class="hl-keyword">new</span> CustomItemReader<String>(items);
|
|
assertEquals(<span class="hl-string">"1"</span>, itemReader.read());
|
|
assertEquals(<span class="hl-string">"2"</span>, itemReader.read());
|
|
assertEquals(<span class="hl-string">"3"</span>, itemReader.read());
|
|
assertNull(itemReader.read());</pre><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="restartableReader" href="#restartableReader"></a>Making the <code class="classname">ItemReader</code>
|
|
Restartable</h4></div></div></div><p>The final challenge now is to make the
|
|
<code class="classname">ItemReader</code> restartable. Currently, if the power
|
|
goes out, and processing begins again, the
|
|
<code class="classname">ItemReader</code> must start at the beginning. This is
|
|
actually valid in many scenarios, but it is sometimes preferable that
|
|
a batch job starts where it left off. The key discriminant is often
|
|
whether the reader is stateful or stateless. A stateless reader does
|
|
not need to worry about restartability, but a stateful one has to try
|
|
and reconstitute its last known state on restart. For this reason, we
|
|
recommend that you keep custom readers stateless if possible, so you
|
|
don't have to worry about restartability.</p><p>If you do need to store state, then the
|
|
<code class="classname">ItemStream</code> interface should be used:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> CustomItemReader<T> <span class="hl-keyword">implements</span> ItemReader<T>, ItemStream {
|
|
|
|
List<T> items;
|
|
<span class="hl-keyword">int</span> currentIndex = <span class="hl-number">0</span>;
|
|
<span class="hl-keyword">private</span> <span class="hl-keyword">static</span> <span class="hl-keyword">final</span> String CURRENT_INDEX = <span class="hl-string">"current.index"</span>;
|
|
|
|
<span class="hl-keyword">public</span> CustomItemReader(List<T> items) {
|
|
<span class="hl-keyword">this</span>.items = items;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> T read() <span class="hl-keyword">throws</span> Exception, UnexpectedInputException,
|
|
ParseException {
|
|
|
|
<span class="hl-keyword">if</span> (currentIndex < items.size()) {
|
|
<span class="hl-keyword">return</span> items.get(currentIndex++);
|
|
}
|
|
|
|
<span class="hl-keyword">return</span> null;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> open(ExecutionContext executionContext) <span class="hl-keyword">throws</span> ItemStreamException {
|
|
<span class="hl-keyword">if</span>(executionContext.containsKey(CURRENT_INDEX)){
|
|
currentIndex = <span class="hl-keyword">new</span> Long(executionContext.getLong(CURRENT_INDEX)).intValue();
|
|
}
|
|
<span class="hl-keyword">else</span>{
|
|
currentIndex = <span class="hl-number">0</span>;
|
|
}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> update(ExecutionContext executionContext) <span class="hl-keyword">throws</span> ItemStreamException {
|
|
executionContext.putLong(CURRENT_INDEX, <span class="hl-keyword">new</span> Long(currentIndex).longValue());
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> close() <span class="hl-keyword">throws</span> ItemStreamException {}
|
|
}</pre><p>On each call to the <code class="classname">ItemStream</code>
|
|
<code class="methodname">update</code> method, the current index of the
|
|
<code class="classname">ItemReader</code> will be stored in the provided
|
|
<code class="classname">ExecutionContext</code> with a key of 'current.index'.
|
|
When the <code class="classname">ItemStream</code> <code class="classname">open</code>
|
|
method is called, the <code class="classname">ExecutionContext</code> is
|
|
checked to see if it contains an entry with that key. If the key is
|
|
found, then the current index is moved to that location. This is a
|
|
fairly trivial example, but it still meets the general
|
|
contract:</p><pre class="programlisting">ExecutionContext executionContext = <span class="hl-keyword">new</span> ExecutionContext();
|
|
((ItemStream)itemReader).open(executionContext);
|
|
assertEquals(<span class="hl-string">"1"</span>, itemReader.read());
|
|
((ItemStream)itemReader).update(executionContext);
|
|
|
|
List<String> items = <span class="hl-keyword">new</span> ArrayList<String>();
|
|
items.add(<span class="hl-string">"1"</span>);
|
|
items.add(<span class="hl-string">"2"</span>);
|
|
items.add(<span class="hl-string">"3"</span>);
|
|
itemReader = <span class="hl-keyword">new</span> CustomItemReader<String>(items);
|
|
|
|
((ItemStream)itemReader).open(executionContext);
|
|
assertEquals(<span class="hl-string">"2"</span>, itemReader.read());</pre><p>Most ItemReaders have much more sophisticated restart logic. The
|
|
<code class="classname">JdbcCursorItemReader</code>, for example, stores the
|
|
row id of the last processed row in the Cursor.</p><p>It is also worth noting that the key used within the
|
|
<code class="classname">ExecutionContext</code> should not be trivial. That is
|
|
because the same <code class="classname">ExecutionContext</code> is used for
|
|
all <code class="classname">ItemStream</code>s within a
|
|
<code class="classname">Step</code>. In most cases, simply prepending the key
|
|
with the class name should be enough to guarantee uniqueness. However,
|
|
in the rare cases where two of the same type of
|
|
<code class="classname">ItemStream</code> are used in the same step (which can
|
|
happen if two files are need for output) then a more unique name will
|
|
be needed. For this reason, many of the Spring Batch
|
|
<code class="classname">ItemReader</code> and
|
|
<code class="classname">ItemWriter</code> implementations have a
|
|
<code class="methodname">setName</code>() property that allows this key name
|
|
to be overridden.</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="customWriter" href="#customWriter"></a>6.13.2 Custom ItemWriter Example</h3></div></div></div><p>Implementing a Custom <code class="classname">ItemWriter</code> is similar
|
|
in many ways to the <code class="classname">ItemReader</code> example above, but
|
|
differs in enough ways as to warrant its own example. However, adding
|
|
restartability is essentially the same, so it won't be covered in this
|
|
example. As with the <code class="classname">ItemReader</code> example, a
|
|
<code class="classname">List</code> will be used in order to keep the example as
|
|
simple as possible:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> CustomItemWriter<T> <span class="hl-keyword">implements</span> ItemWriter<T> {
|
|
|
|
List<T> output = TransactionAwareProxyFactory.createTransactionalList();
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(List<? <span class="hl-keyword">extends</span> T> items) <span class="hl-keyword">throws</span> Exception {
|
|
output.addAll(items);
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> List<T> getOutput() {
|
|
<span class="hl-keyword">return</span> output;
|
|
}
|
|
}</pre><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="restartableWriter" href="#restartableWriter"></a>Making the <code class="classname">ItemWriter</code>
|
|
Restartable</h4></div></div></div><p>To make the ItemWriter restartable we would follow the same
|
|
process as for the <code class="classname">ItemReader</code>, adding and
|
|
implementing the <code class="classname">ItemStream</code> interface to
|
|
synchronize the execution context. In the example we might have to
|
|
count the number of items processed and add that as a footer record.
|
|
If we needed to do that, we could implement
|
|
<code class="classname">ItemStream</code> in our
|
|
<code class="classname">ItemWriter</code> so that the counter was
|
|
reconstituted from the execution context if the stream was
|
|
re-opened.</p><p>In many realistic cases, custom ItemWriters also delegate to
|
|
another writer that itself is restartable (e.g. when writing to a
|
|
file), or else it writes to a transactional resource so doesn't need
|
|
to be restartable because it is stateless. When you have a stateful
|
|
writer you should probably also be sure to implement
|
|
<code class="classname">ItemStream</code> as well as
|
|
<code class="classname">ItemWriter</code>. Remember also that the client of
|
|
the writer needs to be aware of the <code class="classname">ItemStream</code>,
|
|
so you may need to register it as a stream in the configuration
|
|
xml.</p></div></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="configureStep.html">Prev</a> </td><td width="20%" align="center"> </td><td width="40%" align="right"> <a accesskey="n" href="scalability.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">5. Configuring a Step </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> 7. Scaling and Parallel Processing</td></tr></table></div></body></html> |