Files
spring-batch/spring-batch-docs/modules/ROOT/pages/processor.adoc
2023-11-21 23:56:04 +01:00

401 lines
12 KiB
Plaintext

[[itemProcessor]]
= Item processing
The xref:readersAndWriters.adoc[ItemReader and ItemWriter interfaces] are both very useful for their specific
tasks, but what if you want to insert business logic before writing? One option for both
reading and writing is to use the composite pattern: Create an `ItemWriter` that contains
another `ItemWriter` or an `ItemReader` that contains another `ItemReader`. The following
code shows an example:
[source, java]
----
public class CompositeItemWriter<T> implements ItemWriter<T> {
ItemWriter<T> itemWriter;
public CompositeItemWriter(ItemWriter<T> itemWriter) {
this.itemWriter = itemWriter;
}
public void write(Chunk<? extends T> items) throws Exception {
//Add business logic here
itemWriter.write(items);
}
public void setDelegate(ItemWriter<T> itemWriter){
this.itemWriter = itemWriter;
}
}
----
The preceding class contains another `ItemWriter` to which it delegates after having
provided some business logic. This pattern could easily be used for an `ItemReader` as
well, perhaps to obtain more reference data based on the input that was provided by the
main `ItemReader`. It is also useful if you need to control the call to `write` yourself.
However, if you only want to "`transform`" the item passed in for writing before it is
actually written, you need not `write` yourself. You can just modify the item. For this
scenario, Spring Batch provides the `ItemProcessor` interface, as the following
interface definition shows:
[source, java]
----
public interface ItemProcessor<I, O> {
O process(I item) throws Exception;
}
----
An `ItemProcessor` is simple. Given one object, transform it and return another. The
provided object may or may not be of the same type. The point is that business logic may
be applied within the process, and it is completely up to the developer to create that
logic. An `ItemProcessor` can be wired directly into a step. For example, assume an
`ItemReader` provides a class of type `Foo` and that it needs to be converted to type `Bar`
before being written out. The following example shows an `ItemProcessor` that performs
the conversion:
[source, java]
----
public class Foo {}
public class Bar {
public Bar(Foo foo) {}
}
public class FooProcessor implements ItemProcessor<Foo, Bar> {
public Bar process(Foo foo) throws Exception {
//Perform simple transformation, convert a Foo to a Bar
return new Bar(foo);
}
}
public class BarWriter implements ItemWriter<Bar> {
public void write(Chunk<? extends Bar> bars) throws Exception {
//write bars
}
}
----
In the preceding example, there is a class named `Foo`, a class named `Bar`, and a class
named `FooProcessor` that adheres to the `ItemProcessor` interface. The transformation is
simple, but any type of transformation could be done here. The `BarWriter` writes `Bar`
objects, throwing an exception if any other type is provided. Similarly, the
`FooProcessor` throws an exception if anything but a `Foo` is provided. The
`FooProcessor` can then be injected into a `Step`, as the following example shows:
[tabs]
====
Java::
+
.Java Configuration
[source, java]
----
@Bean
public Job ioSampleJob(JobRepository jobRepository, Step step1) {
return new JobBuilder("ioSampleJob", jobRepository)
.start(step1)
.build();
}
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("step1", jobRepository)
.<Foo, Bar>chunk(2, transactionManager)
.reader(fooReader())
.processor(fooProcessor())
.writer(barWriter())
.build();
}
----
XML::
+
.XML Configuration
[source, xml]
----
<job id="ioSampleJob">
<step name="step1">
<tasklet>
<chunk reader="fooReader" processor="fooProcessor" writer="barWriter"
commit-interval="2"/>
</tasklet>
</step>
</job>
----
====
A difference between `ItemProcessor` and `ItemReader` or `ItemWriter` is that an `ItemProcessor`
is optional for a `Step`.
[[chainingItemProcessors]]
== Chaining ItemProcessors
Performing a single transformation is useful in many scenarios, but what if you want to
"`chain`" together multiple `ItemProcessor` implementations? You can do so by using
the composite pattern mentioned previously. To update the previous, single
transformation, example, `Foo` is transformed to `Bar`, which is transformed to `Foobar`
and written out, as the following example shows:
[source, java]
----
public class Foo {}
public class Bar {
public Bar(Foo foo) {}
}
public class Foobar {
public Foobar(Bar bar) {}
}
public class FooProcessor implements ItemProcessor<Foo, Bar> {
public Bar process(Foo foo) throws Exception {
//Perform simple transformation, convert a Foo to a Bar
return new Bar(foo);
}
}
public class BarProcessor implements ItemProcessor<Bar, Foobar> {
public Foobar process(Bar bar) throws Exception {
return new Foobar(bar);
}
}
public class FoobarWriter implements ItemWriter<Foobar>{
public void write(Chunk<? extends Foobar> items) throws Exception {
//write items
}
}
----
A `FooProcessor` and a `BarProcessor` can be 'chained' together to give the resultant
`Foobar`, as shown in the following example:
[source, java]
----
CompositeItemProcessor<Foo,Foobar> compositeProcessor =
new CompositeItemProcessor<Foo,Foobar>();
List itemProcessors = new ArrayList();
itemProcessors.add(new FooProcessor());
itemProcessors.add(new BarProcessor());
compositeProcessor.setDelegates(itemProcessors);
----
Just as with the previous example, you can configure the composite processor into the
`Step`:
[tabs]
====
Java::
+
.Java Configuration
[source, java]
----
@Bean
public Job ioSampleJob(JobRepository jobRepository, Step step1) {
return new JobBuilder("ioSampleJob", jobRepository)
.start(step1)
.build();
}
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("step1", jobRepository)
.<Foo, Foobar>chunk(2, transactionManager)
.reader(fooReader())
.processor(compositeProcessor())
.writer(foobarWriter())
.build();
}
@Bean
public CompositeItemProcessor compositeProcessor() {
List<ItemProcessor> delegates = new ArrayList<>(2);
delegates.add(new FooProcessor());
delegates.add(new BarProcessor());
CompositeItemProcessor processor = new CompositeItemProcessor();
processor.setDelegates(delegates);
return processor;
}
----
XML::
+
.XML Configuration
[source, xml]
----
<job id="ioSampleJob">
<step name="step1">
<tasklet>
<chunk reader="fooReader" processor="compositeItemProcessor" writer="foobarWriter"
commit-interval="2"/>
</tasklet>
</step>
</job>
<bean id="compositeItemProcessor"
class="org.springframework.batch.item.support.CompositeItemProcessor">
<property name="delegates">
<list>
<bean class="..FooProcessor" />
<bean class="..BarProcessor" />
</list>
</property>
</bean>
----
====
[[filteringRecords]]
== Filtering Records
One typical use for an item processor is to filter out records before they are passed to
the `ItemWriter`. Filtering is an action distinct from skipping. Skipping indicates that
a record is invalid, while filtering indicates that a record should not be
written.
For example, consider a batch job that reads a file containing three different types of
records: records to insert, records to update, and records to delete. If record deletion
is not supported by the system, we would not want to send any deletable records to
the `ItemWriter`. However, since these records are not actually bad records, we would want to
filter them out rather than skip them. As a result, the `ItemWriter` would receive only
insertable and updatable records.
To filter a record, you can return `null` from the `ItemProcessor`. The framework detects
that the result is `null` and avoids adding that item to the list of records delivered to
the `ItemWriter`. An exception thrown from the `ItemProcessor` results in a
skip.
[[validatingInput]]
== Validating Input
The xref:readersAndWriters.adoc[ItemReaders and ItemWriters] chapter discusses multiple approaches to parsing input.
Each major implementation throws an exception if it is not "`well formed.`" The
`FixedLengthTokenizer` throws an exception if a range of data is missing. Similarly,
attempting to access an index in a `RowMapper` or `FieldSetMapper` that does not exist or
is in a different format than the one expected causes an exception to be thrown. All of
these types of exceptions are thrown before `read` returns. However, they do not address
the issue of whether or not the returned item is valid. For example, if one of the fields
is an age, it cannot be negative. It may parse correctly, because it exists and
is a number, but it does not cause an exception. Since there are already a plethora of
validation frameworks, Spring Batch does not attempt to provide yet another. Rather, it
provides a simple interface, called `Validator`, that you can implement by any number of
frameworks, as the following interface definition shows:
[source, java]
----
public interface Validator<T> {
void validate(T value) throws ValidationException;
}
----
The contract is that the `validate` method throws an exception if the object is invalid
and returns normally if it is valid. Spring Batch provides an
`ValidatingItemProcessor`, as the following bean definition shows:
[tabs]
====
Java::
+
.Java Configuration
[source, java]
----
@Bean
public ValidatingItemProcessor itemProcessor() {
ValidatingItemProcessor processor = new ValidatingItemProcessor();
processor.setValidator(validator());
return processor;
}
@Bean
public SpringValidator validator() {
SpringValidator validator = new SpringValidator();
validator.setValidator(new TradeValidator());
return validator;
}
----
XML::
+
.XML Configuration
[source,xml]
----
<bean class="org.springframework.batch.item.validator.ValidatingItemProcessor">
<property name="validator" ref="validator" />
</bean>
<bean id="validator" class="org.springframework.batch.item.validator.SpringValidator">
<property name="validator">
<bean class="org.springframework.batch.samples.domain.trade.internal.validator.TradeValidator"/>
</property>
</bean>
----
====
You can also use the `BeanValidatingItemProcessor` to validate items annotated with
the Bean Validation API (JSR-303) annotations. For example, consider the following type `Person`:
[source, java]
----
class Person {
@NotEmpty
private String name;
public Person(String name) {
this.name = name;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}
----
You can validate items by declaring a `BeanValidatingItemProcessor` bean in your
application context and register it as a processor in your chunk-oriented step:
[source, java]
----
@Bean
public BeanValidatingItemProcessor<Person> beanValidatingItemProcessor() throws Exception {
BeanValidatingItemProcessor<Person> beanValidatingItemProcessor = new BeanValidatingItemProcessor<>();
beanValidatingItemProcessor.setFilter(true);
return beanValidatingItemProcessor;
}
----
[[faultTolerant]]
== Fault Tolerance
When a chunk is rolled back, items that have been cached during reading may be
reprocessed. If a step is configured to be fault-tolerant (typically by using skip or
retry processing), any `ItemProcessor` used should be implemented in a way that is
idempotent. Typically that would consist of performing no changes on the input item for
the `ItemProcessor` and updating only the
instance that is the result.