From c973e8215ade6a0cd901d2e484c9b93940ac86ab Mon Sep 17 00:00:00 2001 From: lucasward Date: Tue, 22 Apr 2008 21:55:22 +0000 Subject: [PATCH] BATCH-598:Tidied up the reference docs. --- docs/src/site/docbook/reference/core.xml | 232 +++++----- docs/src/site/docbook/reference/execution.xml | 249 +++++------ .../docbook/reference/readersAndWriters.xml | 405 +++++++++--------- .../docbook/reference/spring-batch-intro.xml | 15 +- 4 files changed, 450 insertions(+), 451 deletions(-) diff --git a/docs/src/site/docbook/reference/core.xml b/docs/src/site/docbook/reference/core.xml index 94a78b8b0..a39de91e6 100644 --- a/docs/src/site/docbook/reference/core.xml +++ b/docs/src/site/docbook/reference/core.xml @@ -12,24 +12,24 @@ are “Jobs” and “Steps” and developer supplied processing units called ItemReaders and ItemWriters. However, because of the Spring patterns, operations, templates, callbacks, and idioms, there are opportunities for - + the following: significant improvement in adherence to a clear separation of - concerns, + concerns clearly delineated architectural layers and services provided - as interfaces, + as interfaces simple and default implementations that allowed for quick - adoption and ease of use out-of-the-box, and + adoption and ease of use out-of-the-box - significantly enhanced extensibility. + significantly enhanced extensibility @@ -44,8 +44,7 @@ physical implementation of the layers, components and technical services commonly found in robust, maintainable systems used to address the creation of simple to complex batch applications, with the infrastructure - and extensions to address very complex processing needs. The materials - below will walk through the details of the diagram. + and extensions to address very complex processing needs.
@@ -67,12 +66,14 @@ Figure 2.1: Batch Stereotypes - The colors used on the above diagram are extremely important. Grey + The above diagram highlights the interactions and key services + provided by the Spring Batch framework. The colors used are important to + understanding the responsibilities of a developer in Spring Batch. Grey represents an external application such as an enterprise scheduler or a database. It's important to note that scheduling is grey, and should thus be considered separate from Spring Batch. Blue represents application architecture services. In most cases these are provided by Spring Batch - with out of the box implementations, but an architecture time may make + with out of the box implementations, but an architecture team may make specific implementations that better address their specific needs. Yellow represents the pieces that must be configured by a developer. For example, they need to configure their job schedule so that the job is kicked off at @@ -87,7 +88,7 @@ which include Run, Job, Application, and Data. The primary goal for organizing an application according to the tiers is to embed what is known as "separation of concerns" within the system. These tiers can be - conceptual but may they prove effective in mapping the deployment of the + conceptual but may prove effective in mapping the deployment of the artifacts onto physical components like Java runtimes and integration with data sources and targets. Effective separation of concerns results in reducing the impact of change to the system. The four conceptual tiers @@ -130,9 +131,10 @@ This section describes stereotypes relating to the concept of a batch job. A job is an entity that encapsulates an entire batch process. - The file containing the job may sometimes be referred to as the "job - configuration". However, Job is just the top of an - overall hierarchy: + As is common with other Spring projects, a Job will + be wired together via an XML configuration file. This file may be referred + to as the "job configuration". However, Job is just + the top of an overall hierarchy: @@ -148,8 +150,7 @@
Job - The job could be described as the heart of the Spring Batch - framework. It is represented by a Spring bean that implements the + A job is represented by a Spring bean that implements the Job interface and contains all of the information necessary to define the operations performed by a job. A job configuration is typically contained within a Spring XML configuration @@ -198,9 +199,9 @@ A JobInstance refers to the concept of a logical job run. Let's consider a batch job that should be run once at the end of the day, such as the 'EndOfDay' job from the diagram above. - There is a one 'EndOfDay' Job, but each - individual run of the Job must be tracked - separately. In the case of this job, there will be one logical + There is one 'EndOfDay' Job, but each individual + run of the Job must be tracked separately. In the + case of this job, there will be one logical JobInstance per day. For example, there will be a January 1st run, and a January 2nd run. If the January 1st run fails the first time and is run again the next day, it's still the January 1st @@ -208,30 +209,33 @@ meaning the January 1st run processes data for January 1st, etc) That is to say, each JobInstance can have multiple executions. (JobExecution is discussed in more - detail below) and only one instance can be running at a given time. The - definition of a JobInstance has absolutely no - bearing on the data the will be loaded. It is entirely up to the - ItemReader implementation used to determine how - data will be loaded. For example, in the EndOfDay scenario, there may be - a column on the data that indicates the 'effective date' or 'schedule - date' to which the data belongs. So, the January 1st run would only load - data from the 1st, and the January 2nd run would only use data from the - 2nd. Because this determination will likely be a business decision, it - is left up to the ItemReader to decide. What - using the same JobInstance will decide, however, is whether or not the - 'state' (i.e. the ExecutionContext, which is discussed below) will be - used. Using a new instace will mean 'start from the beginning' and using - an existing instance will generally mean 'start from where you left - off'. + detail below) and only one JobInstance + corresponding to a particular Job can be running + at a given time. The definition of a JobInstance + has absolutely no bearing on the data the will be loaded. It is entirely + up to the ItemReader implementation used to + determine how data will be loaded. For example, in the EndOfDay + scenario, there may be a column on the data that indicates the + 'effective date' or 'schedule date' to which the data belongs. So, the + January 1st run would only load data from the 1st, and the January 2nd + run would only use data from the 2nd. Because this determination will + likely be a business decision, it is left up to the + ItemReader to decide. What using the same + JobInstance will determine, however, is whether + or not the 'state' (i.e. the ExecutionContext, which is discussed below) + from previous executions will be used. Using a new + JobInstance will mean 'start from the beginning' + and using an existing instance will generally mean 'start from where you + left off'.
JobParameters Having discussed JobInstance and how it - differs from Job, the natural question to ask is, - "how is one JobInstance distinguished from another?" The answer is: - JobParameters. + differs from Job, the natural question to ask is: + "how is one JobInstance distinguished from + another?" The answer is: JobParameters. JobParameters are any set of parameters used to start a batch job, which can be used for identification or even as reference data during the run. In the example above, where there are two @@ -310,7 +314,7 @@ These properties are important because they will be persisted and can be used to completely determine the status of an execution. For example, if the EndOfDay job for 01-01 is executed at 9:00 PM, and fails - at 9:30, the following entries will be in the batch meta data + at 9:30, the following entries will be made in the batch meta data tables: @@ -405,14 +409,18 @@ completing successfully at 9:30. Because it's now the next day, the 01-02 job must be run as well, which is kicked off just afterwards at 9:31, and completes in it's normal one hour time at 10:30. There is no - requirement that one be kicked off after another, unless there is - potential for the two jobs to attempt to access the same data, causing - issues with locking at the database level. It is entirely up to the - scheduler to determine when to run. Since they're separate JobInstances, - Spring Batch will make no attempt to stop them from being run - concurrently. There should now be an extra entry in both the job - instance and job parameters table, and two extra entries in the job - execution table: + requirement that one JobInstance be kicked off + after another, unless there is potential for the two jobs to attempt to + access the same data, causing issues with locking at the database level. + It is entirely up to the scheduler to determine when to run. Since + they're separate JobInstances, Spring Batch will make no attempt to stop + them from being run concurrently. (Attempting to run the same + JobInstance while another is already running will + result in a JobExecutionAlreadyRunningException + being thrown) There should now be an extra entry in both the + JobInstance and + JobParameters tables, and two extra entries in + the JobExecution table:
BATCH_JOB_INSTANCE @@ -539,17 +547,19 @@
Step Stereotypes - A Step is an entity that encapsulates a - single, independent phase of a batch job. Therefore, every batch job is - composed entirely of one or more batch steps. Steps should be thought of - as unique processing streams that will be executed in sequence. For - example, if you have one step that loads a file into a database, another - that reads from the database, validates the data, preforms processing, and - then writes to another table, and another that reads from that table and - writes out to a file. Each of these steps will be performed completely - before moving on to the next step. The file will be completely read into - the database before step 2 can begin. As with Job, a Step has individual - executions, that correspond with unique JobExecutions: + A Step is a domain object that encapsulates + an independent, sequential phase of a batch job. Therefore, every Job is + composed entirely of one or more steps. A Step + should be thought of as a unique processing stream that will be executed + in sequence. For example, if you have one step that loads a file into a + database, another that reads from the database, validates the data, + preforms processing, and then writes to another table, and another that + reads from that table and writes out to a file. Each of these steps will + be performed completely before moving on to the next step. The file will + be completely read into the database before step 2 can begin. As with + Job, a Step has an + individual StepExecution that corresponds with a + unique JobExecution: @@ -566,65 +576,52 @@ Step A Step contains all of the information - necessary to define a discrete set of business logic within a job. This - is a necessarily vague description because the contents of any given - step are at the discretion of the developer writing a job. A step can be - as narrowly defined as a single line of code or as broadly defined as - necessary to complete the entire work of a job. There are several - factors that will affect the breadth of step configurations. - - - - Re-usability - step definitions can be shared between - jobs - - - - Transaction Management - depending on your desired transaction - strategy, you may divide the work of your job differently between - steps - - - - Extensibility - adequately granular definition of steps allows - the addition or subtraction of steps at a later time in the - appropriate position within your job configuration - - + necessary to define and control the actual batch processing. This is a + necessarily vague description because the contents of any given + Step are at the discretion of the developer + writing a Job. A Step can be as simple or complex + as the developer desires. A simple Step might + load data from a file into the database, requiring little or no code. + (depending upon the implementations used) A more complex + Step may have complicated business rules that are + applied as part of the processing. Steps are defined by instantiating implementations of the Step interface. Two step implementation classes are available in the Spring Batch framework, and they are each discussed - in detail in other sections of this guide. For most situations, the + in detail in Chatper 4 of this guide. For most situations, the ItemOrientedStep implementation is sufficient, but for situations where only one call is needed, such as a stored procedure call or a wrapper around existing script, a - TaskletStep may be the better option. + TaskletStep may be a better option.
StepExecution - A StepExecution represents the technical - concept of a single attempt to execute a Step. - For instance, using the example from - JobExecution, if we have a job instance - "EndOfJob-01-01-2008" that fails to successfully complete its work the - first time it is run, when we attempt to run it again, a new - StepExecution will be created. Each of these step - executions may represent a different invocation of the batch framework, - but they will all correspond to the same + A StepExecution represents a single attempt + to execute a Step. Using the example from + JobExecution, if there is a + JobInstance for the "EndOfDayJob", with + JobParameters of "01-01-2008" that fails to + successfully complete its work the first time it is run, when it is + executed again, a new StepExecution will be + created. Each of these step executions may represent a different + invocation of the batch framework, but they will all correspond to the + same JobInstance, just as multiple + JobExecutions belong to the same JobInstance. Step executions are represented by objects of the StepExecution class. Each execution contains a - reference to its corresponding step and job execution, and transaction - related data such as commit and rollback count and start and end times. - Additionally, each step execution will contain an - ExecutionContext, which contains any data a - developer needs persisted across batch runs, such as statistics or state - information needed to restart. The following is a listing of the - properties for StepExecution: + reference to its corresponding step and + JobExecution, and transaction related data such + as commit and rollback count and start and end times. Additionally, each + step execution will contain an ExecutionContext, + which contains any data a developer needs persisted across batch runs, + such as statistics or state information needed to restart. The following + is a listing of the properties for + StepExecution:
StepExecution properties @@ -635,9 +632,10 @@ statusA BatchStatus object that - indicates the status of the execution. While it's running, it's - BatchStatus.STARTED, if it fails it's BatchStatus.FAILED, and if - it finishes successfully it's BatchStatus.COMPLETED + indicates the status of the execution. While it's running, the + status is BatchStatus.STARTED, if it fails the status is + BatchStatus.FAILED, and if it finishes successfully the status + is BatchStatus.COMPLETED @@ -659,29 +657,29 @@ exitStatus The ExitStatus indicating the - result of the run. It is most important because it contains an - exit code that will be returned to the caller. See chapter 5 for - more details. + result of the execution. It is most important because it + contains an exit code that will be returned to the caller. See + chapter 5 for more details. executionContext The 'property bag' containing any user data that needs to - be persisted between batch runs. + be persisted between executions. commitCount - The number of times the transaction has been committed - for this execution + The number transactions that have been committed for this + execution itemCount - The number of items that have been process for this + The number of items that have been processed for this execution. @@ -707,9 +705,13 @@ executionContext.putLong(getKey(LINES_READ_COUNT), reader.getPosition()); - When the ItemReader is opened, it can check - to see if it has any stored state in the context, and initialize itself - from there: + The call above will store the current number of lines read into + the ExecutionContext. It should be made just before the framework + commits. Being notified before a commit requires one of the various + StepListeners, or an ItemStream, which are discussed in more detail + later in this guide. When the ItemReader is + opened, it can check to see if it has any stored state in the context, + and initialize itself from there: if (executionContext.containsKey(getKey(LINES_READ_COUNT))) { log.debug("Initializing for restart. Restart data is: " + executionContext); @@ -759,9 +761,9 @@
JobRepository - The JobRepository is the persistence - mechanism for all of the Stereotypes mentioned above. When a job is first - launched, a JobExecution is obtained by calling the + JobRepository is the persistence mechanism + for all of the Stereotypes mentioned above. When a job is first launched, + a JobExecution is obtained by calling the repository's createJobExecution method, and during the course of execution, StepExecution and JobExecution are persisted by passing them to the diff --git a/docs/src/site/docbook/reference/execution.xml b/docs/src/site/docbook/reference/execution.xml index ed4525364..c71f3dc8a 100644 --- a/docs/src/site/docbook/reference/execution.xml +++ b/docs/src/site/docbook/reference/execution.xml @@ -7,8 +7,8 @@
Introduction - In Chapter 2, the overall description of the architecture was - discussed, using the following diagram as a guide: + In Chapter 2, the overall architecture design was discussed, using + the following diagram as a guide: @@ -80,7 +80,7 @@
Run Tier - As it's name suggests, this tier is entirely concerned with actually + As its name suggests, this tier is entirely concerned with actually running the job. Regardless of whether the originator is a Scheduler or an HTTP request, a Job must be obtained, parameters must be parsed, and eventually a JobLauncher called: @@ -102,11 +102,11 @@ For users that want to run their jobs from an enterprise scheduler, the command line is the primary interface. This is because most schedulers (with the exception of Quartz unless using the - NativeJob) work directly with operating system processes, primarily - kicked off with shell scripts. There are many ways to launch a Java - process besides a shell script, such as Perl, Ruby, or even 'build - tools' such as ant or maven. However, because most people are familiar - with shell scripts, this example will focus on them. + NativeJob) work directly with operating system + processes, primarily kicked off with shell scripts. There are many ways + to launch a Java process besides a shell script, such as Perl, Ruby, or + even 'build tools' such as ant or maven. However, because most people + are familiar with shell scripts, this example will focus on them.
The CommandLineJobRunner @@ -295,7 +295,7 @@ in order to obtain an execution: <bean id="jobLauncher" - class="org.springframework.batch.core.launch.support.SimpleJobLauncher"> + class="org.springframework.batch.execution.launch.SimpleJobLauncher"> <property name="jobRepository" ref="jobRepository" /> </bean> @@ -341,7 +341,7 @@ TaskExecutor: <bean id="jobLauncher" - class="org.springframework.batch.core.launch.support.SimpleJobLauncher"> + class="org.springframework.batch.execution.launch.SimpleJobLauncher"> <property name="jobRepository" ref="jobRepository" /> <property name="taskExecutor"> <bean class="org.springframework.core.task.SimpleAsyncTaskExecutor" /> @@ -444,7 +444,7 @@ convenience: JobRepositoryFactoryBean. <bean id="jobRepository" - class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean" + class="org.springframework.batch.execution.repository.JobRepositoryFactoryBean" <property name="databaseType" value="hsql" /> <property name="dataSource" value="dataSource" /> </bean> @@ -470,13 +470,13 @@ </bean> <bean id="mapJobInstanceDao" - class="org.springframework.batch.core.repository.dao.MapJobInstanceDao" /> + class="org.springframework.batch.execution.repository.dao.MapJobInstanceDao" /> <bean id="mapJobExecutionDao" - class="org.springframework.batch.core.repository.dao.MapJobExecutionDao" /> + class="org.springframework.batch.execution.repository.dao.MapJobExecutionDao" /> <bean id="mapStepExecutionDao" - class="org.springframework.batch.core.repository.dao.MapStepExecutionDao" /> + class="org.springframework.batch.execution.repository.dao.MapStepExecutionDao" /> The Map* DAO implementations store the batch artifacts in a transactional map. So, the repository and DAOs may still be used @@ -510,9 +510,9 @@ </tx:attributes> </tx:advice> - This fragment can be used as is, or with almost no changes. - The isolation level in the create* method attiributes - is specified to ensure that when jobs are launched there if two + This fragment can be used as is, with almost no changes. The + isolation level in the create* method attiributes is + specified to ensure that when jobs are launched there if two processes are trying to launch the same job at the same time, only one will succeed. This is quite aggressive, and READ_COMMITTED would work just as well; READ_UNCOMMITTED would be fine if two processes @@ -530,15 +530,15 @@ Recommendations for Indexing Meta Data Tables Spring Batch provides DDL samples for the meta-data tables in - the Core jar file for several common database platforms. We do not - include index declarations inthat DDL because there are too many - variations in how people want to do that dependeing on their precise - platform, local conventions and also the business requirements of - how the jobs will be operated. So here we give some indication as to - which columns are going to be used in a WHERE clause by the Dao - ipmlementations that we provide, and how frequently they might be - used, so that individual projects can make up their own minds about - indexing. + the Core jar file for several common database platforms. Index + declarations are not included in that DDL because there are too many + variations in how users may want to index dependeing on their + precise platform, local conventions and also the business + requirements of how the jobs will be operated. The table below + provides some indication as to which columns are going to be used in + a WHERE clause by the Dao ipmlementations provided by Spring Batch, + and how frequently they might be used, so that individual projects + can make up their own minds about indexing.
Where clauses in SQL statements (exluding primary keys) and @@ -722,8 +722,6 @@ <bean class="org.springframework.batch.core.listener.JobListenerSupport" /> </property> </bean></programlisting> - - <para></para> </section> </section> @@ -731,26 +729,27 @@ <title>JobFactory and Stateful Components in StepsUnlike many traditional Spring applications, many of the - components of a batch application are stateful - the file readers and - writers are the obvious examples. The recommended way to deal with this - is to create a fresh ApplicationContext for each - job execution. If the job is launched from the command line with - CommandLineJobRunner this is trivial. For more - complex launching scenarios, where jobs are executed in parallel or - serially from the same process, some extra steps have to be taken to - ensure that the ApplicationContext is refreshed. - This is preferable to using prototype scope for the stateful beans - because then they would not receive lifecycle callbacks from the - container at the end of use (e.g. through destroy-method in XML). + components of a batch application are stateful, the file readers and + writers are obvious examples. The recommended way to deal with this is + to create a fresh ApplicationContext for each job + execution. If the Job is launched from the + command line with CommandLineJobRunner this is + trivial. For more complex launching scenarios, where jobs are executed + in parallel or serially from the same process, some extra steps have to + be taken to ensure that the ApplicationContext is + refreshed. This is preferable to using prototype scope for the stateful + beans because then they would not receive lifecycle callbacks from the + container at the end of use. (e.g. through destroy-method in XML)The strategy provided by Spring Batch to deal with this scenario is the JobFactory, and the samples provide an - example of a specialised implementation that can load an + example of a specialized implementation that can load an ApplicationContext and close it properly when the - job is finished. Look at the + job is finished. A relevant examples is ClassPathXmlApplicationContextJobFactory and its use in the adhoc-job-launcher-context.xml and the - quartz-job-launcher-context.xml. + quartz-job-launcher-context.xml, which can be found in the + Samples project. @@ -862,10 +861,7 @@ transaction. At the beginning of processing a transaction is begun, and each time read is called on the ItemReader, a counter is incremented. When it - reaches 10, the transaction will be committed. This also means that if - an item is skipped it will still count as an item against the commit - interval even though it hasn't been written out. (Skipping items will - be covered in more detail later in this chapter) + reaches 10, the transaction will be committed.
@@ -885,8 +881,9 @@ manually before it can be run again. This is configurable on the step level, since different steps have different requirements. One Step that may only be executed once can exist as part of the same - Job as Step that can be run infinitely. Below is an example start - limit configuration: + Job as Step that can + be run infinitely. Below is an example start limit + configuration: <bean id="simpleStep" class="org.springframework.batch.core.step.item.SimpleStepFactoryBean" > @@ -953,27 +950,28 @@ information about football games and summarizes them. It contains three steps: playerLoad, gameLoad, and playerSummarization. The playerLoad Step loads player information from - a flatfile, while the gameLoad Step does the - same for games. The final step, playerSummarization, then summarizes + a flat file, while the gameLoad + Step does the same for games. The final + Step, playerSummarization, then summarizes the statistics for each player based upon the provided games. It is assumed that the file loaded by 'playerLoad' must be loaded only once, but that 'gameLoad' will load any games found within a particular directory, deleting them after they have been successfully loaded into the database. As a result, the playerLoad - Step contains no additionaly configuration. - It can be started almost limitlessly, and if complete will be - skipped. The 'gameLoad' Step, however, needs - to be run everytime, in case extra files have been dropped since it - last executed, so it has 'allowStartIfComplete' set to 'true' in - order to always be started. (It is assumed that the database tables - games are loaded into has a process indicator on it, to ensure new - games can be properly found by the summarization step) The - summarization step, which is the most - important in the Job, is configured to have a - start limit of 3. This is useful in case it continually fails, a new - exit code will be returned to the operators that control job - execution, and it won't be allowed to start again until manual - intervention has taken place. + Step contains no additional configuration. It + can be started almost limitlessly, and if complete will be skipped. + The 'gameLoad' Step, however, needs to be run + everytime, in case extra files have been dropped since it last + executed, so it has 'allowStartIfComplete' set to 'true' in order to + always be started. (It is assumed that the database tables games are + loaded into has a process indicator on it, to ensure new games can + be properly found by the summarization step) The summarization + step, which is the most important in the + Job, is configured to have a start limit of + 3. This is useful in case it continually fails, a new exit code will + be returned to the operators that control job execution, and it + won't be allowed to start again until manual intervention has taken + place. This job is purely for example purposes and is not the same @@ -1076,7 +1074,10 @@ In this example, a FlatFileItemReader is used, and if at any point a FlatFileParseException is thrown, it will - be skipped and counted against the total skip limit of 10. + be skipped and counted against the total skip limit of 10. It should + be noted that any failures encountered while reading will not count + against the commit interval. In other words, the commit interval is + only incremented on writes (regardless of success or failure).
@@ -1111,22 +1112,23 @@
Registering ItemStreams with the Step - The step has to take care of the - ItemStream callbacks at the necessary points in - the flow. This is vital if a step is going to be fail, and might need - to be restarted, because the ItemStream - interface is where the step gets the information it needs about - persistent state between executions. The factory beans that Spring - Batch provides for convenient configuration of - Step instances have features that allow streams - to be registered with the step when it is configured. + The step has to take care of ItemStream + callbacks at the necessary points in its lifecycle. This is vital if a + step fails, and might need to be restarted, because the + ItemStream interface is where the step gets the + information it needs about persistent state between executions. The + factory beans that Spring Batch provides for convenient configuration + of Step instances have features that allow + streams to be registered with the step when it is configured. - If the ItemReader of ItemWriter themselves implement the - ItemStream interface, then these will be registered automatically. Any - other streams need to be registered separately. This is often the case - where there are indirect dependencies, like delegates being injected - into the reader and writer. To register these they can be injected - into teh factory beans through the streams property, e.g.: + If the ItemReader or + ItemWriter themselves implement the ItemStream + interface, then these will be registered automatically. Any other + streams need to be registered separately. This is often the case where + there are indirect dependencies, like delegates being injected into + the reader and writer. To register these they can be injected into the + factory beans through the streams property, as illustrated + below: <bean id="step1" parent="simpleStep" class="org.springframework.batch.core.step.item.StatefulRetryStepFactoryBean"> @@ -1160,12 +1162,13 @@ with one of many Step scoped listeners.
- StepListener + StepExecutionListener - StepListener represents the most generic listener for - Step execution. It allows for notification - before a Step is started, after it has completed, and if any errors - are encountered during processing: + StepExecutionListener represents the + most generic listener for Step execution. It + allows for notification before a Step is + started, after it has completed, and if any errors are encountered + during processing: public interface StepExecutionListener extends StepListener { @@ -1180,7 +1183,8 @@ onErrorInStep and afterStep in order to allow listeners the chance to modify the exit code that is returned upon completion of a - Step. A StepListener can be applied to any + Step. A + StepExecutionListener can be applied to any step factory bean via the listeners property: <bean id="simpleStep" @@ -1227,10 +1231,10 @@
ItemReadListener - When discussing skip logic earlier, it was mentioned that it - may be beneficial to log out skipped records, so that they can be - deal with later. In the case of read errors, this can be done with - an ItemReaderListener: public interface ItemReadListener extends StepListener { + When discussing skip logic above, it was mentioned that it may + be beneficial to log out skipped records, so that they can be deal + with later. In the case of read errors, this can be done with an + ItemReaderListener: public interface ItemReadListener extends StepListener { void beforeRead(); @@ -1413,9 +1417,10 @@ ItemTransformer. Here we provide a few examples of common patterns in custom - business logic, mainly using the listener interfaces - but remember that - a reader or writer can implement the listener interfaces as well if that - is appropriate. + business logic, mainly using the listener interfaces . It should be + noted that an ItemReader or + ItemWriter can implement the listener interfaces + as well if appropriate.
@@ -1424,10 +1429,11 @@ A common use case is the need for special handling of errors in a step, item by item, perhaps logging to a special channel, or inserting a record into a database. The ItemOrientedStep (created from the step - factory beans) allows us to implement this use case with a simple + factory beans) allows users to implement this use case with a simple ItemReadListener, for errors on read, and an - ItemWriteListener, for errors on write. - E.g. + ItemWriteListener, for errors on write. The below + code snippets illustrate a listener that logs both read and write + failures: public class ItemFailureLoggerListener extends ItemListenerSupport { @@ -1443,8 +1449,8 @@ } - Having implemented this listener it just needs to be registered - with the step, e.g. + Having implemented this listener it must be registered with the + step: <bean id="simpleStep" class="org.springframework.batch.core.step.item.SimpleStepFactoryBean" > @@ -1456,11 +1462,11 @@ Remember that if your listener does anything in an onError() method, it will be inside a transaction that is - going to rollback. If you need to use a transactional resource like a - database inside an onError() method, consider adding a - declarative transaction to that method (see Spring Core Reference Guide - for details), and giving its propagation attribute the value - REQUIRES_NEW. + going to be rolled back. If you need to use a transactional resource + such as a database inside an onError() method, consider + adding a declarative transaction to that method (see Spring Core + Reference Guide for details), and giving its propagation attribute the + value REQUIRES_NEW.
@@ -1472,8 +1478,8 @@ sense to stop a job execution from within the business logic. The simplest thing to do is to throw a RuntimeException (one that - isn't retried indefinitely or skipped), e.g. we could use a custom - exception type as in the example below + isn't retried indefinitely or skipped), For example, a custom exception + type could be used, as in the example below: public class PoisonPillItemWriter extends AbstractItemWriter { @@ -1488,8 +1494,8 @@ } Another simple way to stop a step from executing is to simply - return null from the ItemReader, - e.g. + return null from the + ItemReader: public class EarlyCompletionItemReader extends AbstractItemReader { @@ -1516,7 +1522,7 @@ strategy which signals a complete batch when the item to be processed is null. A more sophisticated completion policy could be implemented and injected into the Step through the - RepeatOperationsStepFactoryBean, e.g. + RepeatOperationsStepFactoryBean: <bean id="simpleStep" class="org.springframework.batch.core.step.item.RepeatOperationsStepFactoryBean" > @@ -1533,10 +1539,10 @@ An alternative is to set a flag in the StepExecution, which is checked by the Step implementations in the framework in between - item processing. To implement this alternative we need access to the + item processing. To implement this alternative, we need access to the current StepExecution, and this can be achieved by implementing a StepListener and registering it with the Step. Here is an example of a - listener that sets the flag + listener that sets the flag: public class CustomItemWriter extends ItemListenerSupport implements StepListener { @@ -1567,13 +1573,13 @@ Adding a Footer Record A very common requirement is to aggregate information during the - output process and to append a record at the end of a file summarising + output process and to append a record at the end of a file summarizing the data, or providing a checksum. This can also be achieved with a callbacks in the step, normally as part of a custom - ItemWriter. In this case, since we are - accumulating state that we do not want to lose if the job aborts, we - probably need to implement the ItemStream - interface. + ItemWriter. In this case, since a job is + accumulating state that should not be lost if the job aborts, the + ItemStream interface should be + implemented: public class CustomItemWriter extends AbstractItemWriter implements ItemStream, StepListener @@ -1616,11 +1622,12 @@ state is stored through the ItemStream interface in the ExecutionContext. In this way we can be sure that when the open() callback is received on a - restart, we always get the last value that was committed. - - N.B. We might not implement ItemStream if - the ItemWriter is re-runnable, in the sense that it maintains its own - state in a transactional resource like a database. + restart. The framework garuntees we always get the last value that was + committed. It should be noted that it is not always necessary to + implement ItemStream. For example, if the ItemWriter is re-runnable, in + the sense that it maintains its own state in a transactional resource + like a database, there is no need to maintain state within the writer + itself.
- + \ No newline at end of file diff --git a/docs/src/site/docbook/reference/readersAndWriters.xml b/docs/src/site/docbook/reference/readersAndWriters.xml index e04395883..dd8ddf5a8 100644 --- a/docs/src/site/docbook/reference/readersAndWriters.xml +++ b/docs/src/site/docbook/reference/readersAndWriters.xml @@ -10,17 +10,17 @@ All batch processing can be described in its most simple form as reading in large amounts of data, performing some type of calculation or transformation, and writing the result out. Spring Batch provides two key - interfaces to help perform bulk reading and writing: ItemReader and - ItemWriter + interfaces to help perform bulk reading and writing: + ItemReader and + ItemWriter.
ItemReader - Although a simple concept, ItemReaders are the means for providing data from - many different types of input. The most general examples include: - + Although a simple concept, an ItemReader is + the means for providing data from many different types of input. The most + general examples include: Flat File- Flat File Item Readers read lines of data from a flat file that typically describe records with fields of data @@ -31,23 +31,24 @@ XML - XML ItemReaders process XML independently of technologies used for parsing, mapping and validating objects. Input - data allows for the validation of and XML file against and XSD + data allows for the validation of and XML file against an XSD schema. - Database - A database resource accessed that returns - resultsets that can be mapped to objects for processing. The default - SQL Input Sources invoke a RowMapper to return objects, keep track - of the current row if restart is required, basic statistics, and - some transaction enhancements that will be explained later. + Database - A database resource is accessed that returns + resultsets which can be mapped to objects for processing. The + default SQL Input Sources invoke a RowMapper + to return objects, keep track of the current row if restart is + required, basic statistics, and some transaction enhancements that + will be explained later. There are many more possibilities, but we'll focus on the basic ones for this chapter. A complete list of all available ItemReaders can be found in Appendix A. - The Item Reader is a basic interface for generic input - operations: + ItemReader is a basic interface for generic + input operations: public interface ItemReader { @@ -64,7 +65,7 @@ Item, returning null if no more items are left. An item might represent a line in a file, a row in a database, or an element in an XML file. It is generally expected that these will be mapped to a usable domain object - (i.e. Trade or Foo, etc) but there is no requirement in the contract to do + (i.e. Trade, Foo, etc) but there is no requirement in the contract to do so. The mark and reset @@ -81,11 +82,11 @@ ItemWriter is similar in functionality to an ItemReader with the exception that the operations - are reversed. They still need to be located, opened and closed but they - differ in the case that we write out, rather than reading in. In the case - of databases or queues these may be inserts, updates or sends. The format - of the serialization of the output source is specific for every batch - job. + are reversed. Resources still need to be located, opened and closed but + they differ in the case that an ItemWriter writes + out, rather than reading in. In the case of databases or queues these may + be inserts, updates or sends. The format of the serialization of the + output is specific for every batch job. As with ItemReader, ItemWriter is a fairly generic interface: @@ -101,19 +102,22 @@ As with read on - ItemReader, write provides the basic contract of - ItemWriter, it will attempt to write out the item - passed in as long as it is open. As with mark and - reset, flush and - clear are necessary due to the nature of batch - processing. Because it is generally expected that items will be 'batched' - together into a chunk, and then output, it is expected that an - ItemWriter will perform some type of buffering. - flush will empty the buffer by actually writing - the items out, whereas clear will simply throw - the contents of the buffer away. In most cases, a Step implementation will - call flush before a commit and - clear in case of rollback. + ItemReader, write provides + the basic contract of ItemWriter, it will attempt + to write out the item passed in as long as it is open. As with + mark and reset, + flush and clear are + necessary due to the transactional nature of batch processing. Because it + is generally expected that items will be 'batched' together into a chunk, + and then output, it is expected that an ItemWriter + will perform some type of buffering. flush will + empty the buffer by actually writing the items out, whereas + clear will simply throw the contents of the + buffer away. In most cases, a Step implementation will call + flush before a commit and + clear in case of rollback. It is expected that + implementations of the Step interface will call + these methods.
@@ -137,20 +141,20 @@ Before describing each method, it's worth briefly mentioning the ExecutionContext. Clients of an - ItemReader that is also an + ItemReader that also implements ItemStream should call open before any calls to read, to open any resources such as files or obtain connections. A similar restriction applies to an - ItemWriter that is also an + ItemWriter is also implements ItemStream. As mentioned before, if expected data is found in the ExecutionContext, it may be used to start the ItemReader or ItemWriter at a location other than its initial - state. Conversely, close will be called to ensure any resources allocated - during open will be released safely. - update is called primarily to ensure that any - state currently being held is loaded into the provided + state. Conversely, close will be called to ensure + any resources allocated during open will be + released safely. update is called primarily to + ensure that any state currently being held is loaded into the provided ExecutionContext. This method will be called before committing, to ensure that the current state is persisted in the database before commit. @@ -213,38 +217,25 @@
FlatFileItemReader - One of the most common tasks performed in batch jobs involve - reading from some type of file. A flat file is basically any type of - file that contains at most two-dimensional (tabular) data. Reading flat - files in the Spring Batch framework is facilitated by the class + A flat file is any type of file that contains at most + two-dimensional (tabular) data. Reading flat files in the Spring Batch + framework is facilitated by the class FlatFileItemReader, which provides basic - functionality for reading and parsing flat files. In addition, there are - default implementations of the ItemReader and - ItemStream interfaces that solve the majority of - file processing needs. - - The FlatFileItemReader class has several - properties. The three most important of these properties are + functionality for reading and parsing flat files. + FlatFileItemReader class has several properties. + The three most important of these properties are Resource, FieldSetMapper - and LineTokenizer, which define the resource from - which data will be read and the method by which the read data will be - converted int distinct fields. The FieldSetMapper - and LineTokenizer interfaces will be explored - more in the next sections. In addition, we'll explore integration with - the file system via the resource property. The resource property - represents a Spring Core Resource. Documentation - explaining how to create beans of this type can be found in LineTokenizer. The + FieldSetMapper and + LineTokenizer interfaces will be explored more in + the next sections. The resource property represents a Spring Core + Resource. Documentation explaining how to create + beans of this type can be found in Spring Framework, Chapter 4.Resources. Therefore, this guide will not go into the details of creating - Resource objects except to make a couple of - points on the locating files to process within a batch environment. - Tokenizers and field set mappers will be discussed a bit later. - - As mentioned, the location of the file is defined by the resource - property. There are only a few methods exposed through a resource - service. A resource is used to help locate, open, and close resources. - It can be as simple as: + Resource objects. A resource is used to locate, + open, and close resources. It can be as simple as: Resource resource = new FileSystemResource("resources/trades.csv"); @@ -259,17 +250,8 @@ process of feeding the data into the pipe from this starting point. - The flat file reader uses a - ResourceLineReader object to read from the file. - Optionally, you can specify a - RecordSeparatorPolicy through the - recordSeparatorPolicy property. This can be used to configure more - low-level features, such as what constitutes the end of a line and - whether to continue quoted strings over newlines, among other - things. - - The other properties in the flat file readers allow you to further - specify how your data will be interpreted:
+ The other properties in FlatFileItemReader + allow you to further specify how your data will be interpreted:
Flat File Item Reader Properties @@ -324,6 +306,16 @@ AbstractLineTokenizer, field names will be set automatically from this line + + + recordSeparatorPolicy + + RecordSeparatorPolicy + + Used to determine where the line endings + are and do things like continue over a line ending if inside a + quoted string. +
@@ -331,16 +323,14 @@
FieldSetMapper - Field set mappers used by the - FlatFileItemReader implement the - FieldSetMapper interface. This interface - defines a single method, mapLine, which takes a FieldSet object and - maps its contents to some Object. This object may be a custom DTO or - domain object, or it could be as simple as an array, depending on your - needs. The FieldSetMapper is used in - conjunction with the LineTokenizer to translate - a line of data from a resource into an object of the desired - type: + The FieldSetMapper interface defines a + single method, mapLine, which takes a + FieldSet object and maps its contents to an + object. This object may be a custom DTO or domain object, or it could + be as simple as an array, depending on your needs. The + FieldSetMapper is used in conjunction with the + LineTokenizer to translate a line of data from + a resource into an object of the desired type: public interface FieldSetMapper { @@ -348,7 +338,7 @@ } - As you can see, the pattern used is exatly the same as + As you can see, the pattern used is exactly the same as RowMapper used by JdbcTemplate.
@@ -367,14 +357,13 @@ FieldSet tokenize(String line); - } - + } The contract of a LineTokenizer is such that, given a line of input (in theory the String could encompass more than one line) a FieldSet representing the line will be - returned. This will then be based to a + returned. This will then be passed to a FieldSetMapper. Spring Batch contains the following LineTokenizers: @@ -405,8 +394,8 @@ Now that the basic interfaces for reading in flat files have been defined, a simple example explaining how they work together is - helpful. In it's most simple form, the flow when reading a line form a - file is this: + helpful. In it's most simple form, the flow when reading a line from a + file is the following: @@ -479,7 +468,7 @@ } } - We can then read in from the filed by correctly constructing our + We can then read in from the file by correctly constructing our FlatFileItemReader and calling read(): FlatFileItemReader itemReader = new FlatFileItemReader(); @@ -498,11 +487,11 @@
Mapping fields by name - There is one additional functionality that is similar in - function to a JDBC ResultSet. The names of the - fields can be injected into the LineTokenizer - to increase the readability of the mapping function. We can expose - this behavior by adding the following. First, we tell the + There is one additional functionality line tokenizers that is + similar in function to a JDBC ResultSet. The + names of the fields can be injected into the + LineTokenizer to increase the readability of + the mapping function. First, we tell the LineTokenizer what the names of the fields in the fieldset are: @@ -562,8 +551,8 @@ is required) in the same way the Spring container will look for setters matching a property name. Each available field in the FieldSet will be mapped, and the resultant - Player object will be returned, only there was - no code required. + Player object will be returned, with no code + required.
@@ -678,23 +667,22 @@ Writing out to flat files has the same problems and issues that reading in from a file must overcome. It must be able to write out in either delimited or fixed length formats in a transactional - manger. + manner.
LineAggregator - Just like file reading's LineTokenizer - interface is necessary to take a string and split it into tokens, file - writing must have a way to aggregate multiple fields into a single - string for writing to a file. In Spring Batch this is the + Just as the LineTokenizer interface is + necessary to take a string and split it into tokens, file writing must + have a way to aggregate multiple fields into a single string for + writing to a file. In Spring Batch this is the LineAggregator: public interface LineAggregator { public String aggregate(FieldSet fieldSet); - } - + } The LineAggregator is exactly the opposite of a LineTokenizer. @@ -761,22 +749,22 @@ FlatFileItemWriter expresses this in code: - public void write(Object data) throws Exception { - FieldSet fieldSet = fieldSetCreator.mapItem(data); - getOutputState().write(lineAggregator.aggregate(fieldSet) + LINE_SEPARATOR); -} + public void write(Object data) throws Exception { + FieldSet fieldSet = fieldSetCreator.mapItem(data); + getOutputState().write(lineAggregator.aggregate(fieldSet) + LINE_SEPARATOR); + } A simple configuration with the smallest ammount of setters would look like the following: - <bean id="itemWriter" - class="org.springframework.batch.io.file.FlatFileItemWriter"> - <property name="resource" - value="file:target/test-outputs/20070122.testStream.multilineStep.txt" /> - <property name="fieldSetCreator"> - <bean class="org.springframework.batch.io.file.mapping.PassThroughFieldSetMapper"/> - </property> -</bean> + <bean id="itemWriter" + class="org.springframework.batch.io.file.FlatFileItemWriter"> + <property name="resource" + value="file:target/test-outputs/20070122.testStream.multilineStep.txt" /> + <property name="fieldSetCreator"> + <bean class="org.springframework.batch.io.file.mapping.PassThroughFieldSetMapper"/> + </property> + </bean>
@@ -788,17 +776,17 @@ File writing isn't quite so simple. At first glance it seems like a similar straight forward contract should exist for FlatFileItemWriter, if the file already exists, - throw an exception, if it does not, create it and start writing. Job - restart throws a bit of a kink into this. In the normal restart - scenario, the contract is reversed, if the file exists start writing - to it from the last known good position, if it does not, throw an - exception. However, what happens if the file name for this job is - always the same? In this case, you would want to delete the file if it - exists, unless it's a restart. Because of this possibility, the - FlatFileItemWriter contains the property, - shouldDeleteIfExists. Setting this property - to true will cause an existing file with the same name to be deleted - when the writer is opened. + throw an exception, if it does not, create it and start writing. + However, potentially restarting a Job can cause + issues. In the normal restart scenario, the contract is reversed, if + the file exists start writing to it from the last known good position, + if it does not, throw an exception. However, what happens if the file + name for this job is always the same? In this case, you would want to + delete the file if it exists, unless it's a restart. Because of this + possibility, the FlatFileItemWriter contains + the property, shouldDeleteIfExists. Setting + this property to true will cause an existing file with the same name + to be deleted when the writer is opened.
@@ -819,15 +807,15 @@ only to provide callbacks).
- Lets take a closer look how XML input and output works in batch. - First, there are a few concepts that vary from file reading and writing - but are common across Spring Batch XML processing. With XML processing - instead of lines of records (FieldSets) that need to be tokenized, it is - assumed an XML resource is a collection of 'fragments' corresponding to - individual records. Note that OXM tools are designed to work with - standalone XML documents rather than XML fragments cut out of an XML - document, therefore the Spring Batch infrastructure needs to work around - this fact (as described below). + Lets take a closer look how XML input and output works in Spring + Batch. First, there are a few concepts that vary from file reading and + writing but are common across Spring Batch XML processing. With XML + processing instead of lines of records (FieldSets) that need to be + tokenized, it is assumed an XML resource is a collection of 'fragments' + corresponding to individual records. Note that OXM tools are designed to + work with standalone XML documents rather than XML fragments cut out of an + XML document, therefore the Spring Batch infrastructure needs to work + around this fact, as described below: @@ -843,9 +831,11 @@ Figure 3.1: XML Input - Spring Batch uses Object/XML Mapping (OXM) to bind fragments to - objects. However, Spring Batch is not tied to any particular xml binding - technology. Typical use is to delegate to The 'trade' tag is defined as the 'root element' in the scenario + above. Everything between '<trade>' and '</trade>' is + considered one 'fragment'. Spring Batch uses Object/XML Mapping (OXM) to + bind fragments to objects. However, Spring Batch is not tied to any + particular xml binding technology. Typical use is to delegate to Spring OXM, which provides uniform abstraction for the most popular OXM technologies. The dependency on Spring OXM is optional and you @@ -868,8 +858,8 @@ Figure 3.2: OXM Binding - Now with and introduction into OXM and how one can use XML fragments - to represent records, let's take a closer look at Item Readers and Item + Now with an introduction to OXM and how one can use XML fragments to + represent records, let's take a closer look at Item Readers and Item Writers.
@@ -901,27 +891,25 @@ <price>99.99</price> <customer>Customer3</customer> </trade> -</records> - +</records> To be able to process the XML records we need the following: - Root Element Name - this is name of the root element of the - fragment that constitutes the object to be mapped. The example + Root Element Name - Name of the root element of the fragment + that constitutes the object to be mapped. The example configuration demonstrates this with the value of trade. - Resource - This is a Spring Resource that in the case of - this example will abstract the details of opening a file for - reading content. + Resource - Spring Resource that represents the file to be + read. - FragmentDeserializer - this is the - UnMarshalling facility provided by Spring OXM for mapping the XML - fragment to an object. + FragmentDeserializer - UnMarshalling + facility provided by Spring OXM for mapping the XML fragment to an + object. @@ -1010,12 +998,13 @@ Output works symmetrically to input. The StaxEventItemWriter needs a - Resource, a serializer, and a rootTagName. A java + Resource, a serializer, and a rootTagName. A Java object is passed to a serializer (typically a wrapper around Spring OXM - Marshaller) which writes to output using a custom - event writer that filters the StartDocument and EndDocument events - produced for each fragment by the OXM tools. We'll show this in an - example using the + Marshaller) which writes to a + Resource using a custom event writer that filters + the StartDocument and + EndDocument events produced for each fragment by + the OXM tools. We'll show this in an example using the MarshallingEventWriterSerializer. The Spring configuration for this setup looks as follows: @@ -1042,10 +1031,10 @@ </bean> To summarize with a Java example, the following code illustrates - all of the points discussed. The code demonstrates the programmatic - setup of the required properties. + all of the points discussed, demonstrating the programmatic setup of the + required properties. - StaxEventItemWriter staxItemWriter = new StaxEventItemWriter() + StaxEventItemWriter staxItemWriter = new StaxEventItemWriter() FileSystemResource resource = new FileSystemResource(File.createTempFile("StaxEventWriterOutputSourceTests", "xml")) Map aliases = new HashMap(); @@ -1116,13 +1105,13 @@ ResourceEditor in Spring already filters and does placeholder replacement on system properties.) - Often in a batch setting it is preferable to parameterise the file + Often in a batch setting it is preferable to parameterize the file name in the JobParameters of the job, instead of through system properties, and access them that way. To allow for this, Spring Batch provides the StepExecutionResourceProxy. The proxy can use either job name, step name, or any values from the - JobParameters, by surround them with %: + JobParameters, by surrounding them with %: <bean id="inputFile" class="org.springframework.batch.core.resource.StepExecutionResourceProxy" /> @@ -1131,8 +1120,8 @@ Assuming a job name of 'fooJob', and a step name of 'fooStep', and the key-value pair of 'file.name="fileName.txt"' is in the - JobParameters the job is start with, the following - filename will be passed as the Resource: + JobParameters the job is started with, the + following filename will be passed as the Resource: "//fooJob/fooStep/fileName.txt". It should be noted that in order for the proxy to have access to the StepExecution, it must be registered as a @@ -1417,7 +1406,7 @@ itemReader.close(executionContext); CustomerCredit objects in the exact same manner as described by the JdbcCursorItemReader, assuming hibernate mapping files have been created correctly for the - Customer table. The 'useStatelessSession' property default to true, + Customer table. The 'useStatelessSession' property defaults to true, but has been added here to draw attention to the ability to switch it on or off.
@@ -1427,7 +1416,7 @@ itemReader.close(executionContext); Driving Query Based ItemReaders In the previous section, Cursor based database input was - discussed. However, this isn't the only option. Many database vendors, + discussed. However, it isn't the only option. Many database vendors, such as DB2, have extremely pessimistic locking strategies that can cause issues if the table being read also needs to be used by other portions of the online application. Furthermore, opening cursors over @@ -1451,9 +1440,9 @@ itemReader.close(executionContext); As you can see, this example uses the same 'FOO' table as was used in the cursor based example. However, rather than selecting the entire row, only the ID's were selected in the SQL statement. So, rather than a - FOO object being returned from read(), an Integer will be returned. This - number can then be used to query for the 'details', which is a complete - Foo object: + FOO object being returned from read, an Integer + will be returned. This number can then be used to query for the + 'details', which is a complete Foo object: @@ -1476,8 +1465,8 @@ itemReader.close(executionContext); KeyCollector As the previous example illustrates, the DrivingQueryItemReader - is fairly simple. It simply iteratoes over a list of keys. However, - the real complication is how those keys are obtained. The + is fairly simple. It simply iterates over a list of keys. However, the + real complication is how those keys are obtained. The KeyCollector interface abstracts this: public interface KeyCollector { @@ -1494,9 +1483,9 @@ itemReader.close(executionContext); keys 1 through 1,000, and fails after processing key 500, upon restarting keys 500 through 1,000 should be returned. This functionality is made possible by the - saveState method, which saves the provided - key (which should be the current key being processed) in the provided - ExecutionContext. The + updateContext method, which saves the + provided key (which should be the current key being processed) in the + provided ExecutionContext. The retrieveKeys method can then use this value to retrieve a subset of the original keys:
@@ -1520,10 +1509,10 @@ itemReader.close(executionContext);
SingleColumnJdbcKeyCollector - The most common driving query scenario is that of a input that - has only one column that represents it's key. This is implemented as - the SingleColumnJdbcKeyCollector class, which - has the following options: + The most common driving query scenario is that of input that has + only one column that represents its key. This is implemented as the + SingleColumnJdbcKeyCollector class, which has + the following options: SinglecolumnJdbcKeyCollector properties @@ -1717,25 +1706,24 @@ itemReader.close(executionContext); example, let's assume that 20 items will be written per chunk, and the 15th item throws a DataIntegrityViolationException. As far as the Step is concerned, all 20 item will be written out successfully, since - there's no way to know that and error will occur until they are actually + there's no way to know that an error will occur until they are actually written out. Once ItemWriter#flush() is called, the buffer will be emptied and the exception will be hit. At - this point, there's nothing the Step can do, the transaction must be - rolled back. Normally, this exception will cause the Item to be skipped - (depending upon the skip/retry policies), and then it won't be written - out again. However, in this scenario, there's no way for it to know - which item caused the issue, the whole buffer was being written out when - the failure happened. Because this is a common enough use case, - especially when using Hibernate, Spring Batch provides an implementation - to help: HibernateAwareItemWriter. The - HibernateAwareItemWriter solves the problem in a - straightforward way: if a chunk fails the first time, on subsequent runs - it will be flushed and the transaction committed after each time. This - effectively lowers the commit interval to one for the length of the - chunk. Doing so allows for items to be skipped reliably. The following - example illustrates how to configure the - HibernateAwareItemWriter: + this point, there's nothing the Step can do, the + transaction must be rolled back. Normally, this exception will cause the + Item to be skipped (depending upon the skip/retry policies), and then it + won't be written out again. However, in this scenario, there's no way + for it to know which item caused the issue, the whole buffer was being + written out when the failure happened. Because this is a common enough + use case, especially when using Hibernate, Spring Batch provides an + implementation to help: HibernateAwareItemWriter. + The HibernateAwareItemWriter solves the problem + in a straightforward way: if a chunk fails the first time, on subsequent + runs it will be flushed after after each time. This effectively lowers + the commit interval to one for the length of the chunk. Doing so allows + for items to be skipped reliably. The following example illustrates how + to configure the HibernateAwareItemWriter: <bean id="hibernateItemWriter" class="org.springframework.batch.item.database.HibernateAwareItemWriter"> @@ -1855,10 +1843,10 @@ itemReader.close(executionContext); Object transform(Object item) throws Exception; } - An ItemTransformer interface is very simple, given one object, - transorm it and return another. The object provided may or may not be of - the same type. The point is that business logic may be applied within - transform, and is completely up to the developer to create. An + An ItemTransformer is very simple, given one object, transorm it and + return another. The object provided may or may not be of the same type. + The point is that business logic may be applied within transform, and is + completely up to the developer to create. An ItemTransformer is used as part of the ItemTransformerItemWriter, which accepts an ItemWriter and an @@ -1920,18 +1908,19 @@ itemReader.close(executionContext);
Note that the ItemTransformerItemWriter and the CompositeItemWriter are examples of a - delegation pattern, which is quite common usage in Spring Batch. The - delegates themselves might implement callback interfaces like + delegation pattern, which is common in Spring Batch. The delegates + themselves might implement callback interfaces like ItemStream or StepListener. If they do, and they are being - used in conjunction with Spring Batch Core as part of a step in a job, - then they almost certainly need to be registered manually with the + used in conjunction with Spring Batch Core as part of a + Step in a Job, then they + almost certainly need to be registered manually with the Step. Registration is automatic when using the factory beans (*StepFactoryBean) , but only for the ItemReader and - ItemWriter injected directly - the delegates - are not known to the step, so they need to be injected as listeners or - streams (or both if appropriate). + ItemWriter injected directly. The delegates are + not known to the Step, so they need to be + injected as listeners or streams (or both if appropriate).
diff --git a/docs/src/site/docbook/reference/spring-batch-intro.xml b/docs/src/site/docbook/reference/spring-batch-intro.xml index 8d188272b..a5ec6f266 100644 --- a/docs/src/site/docbook/reference/spring-batch-intro.xml +++ b/docs/src/site/docbook/reference/spring-batch-intro.xml @@ -69,9 +69,9 @@ have teamed to collaborate on the development of Spring Batch.
Accenture has contributed previously proprietary batch processing - architecture frameworks -- based upon decades worth of experience in + architecture frameworks, based upon decades worth of experience in building batch architectures with the last several generations of - platforms (i.e., COBOL/Mainframe, C++/Unix, and now Java/anywhere) -- to + platforms, (i.e., COBOL/Mainframe, C++/Unix, and now Java/anywhere) to the Spring Batch project along with committer resources to drive support, enhancements, and the future roadmap. @@ -178,11 +178,11 @@ Spring Batch is designed with extensibility and a diverse group of end users in mind. The figure below shows a sketch of the layered architecture that supports the extensibility and ease of use for - end-user developers. - + end-user developers. - + @@ -204,7 +204,8 @@ are built on top of a common infrastructure. This infrastructure contains common readers and writers, and services such as the RetryTemplate, which are used both by application - developers(readers and writers) and the core framework itself. + developers(ItemReader and + ItemWriter) and the core framework itself. (retry)