7088 lines
742 KiB
HTML
7088 lines
742 KiB
HTML
<html><head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
|
<title>Spring Batch - Reference Documentation</title><link rel="stylesheet" type="text/css" href="css/manual-singlepage.css"><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="book"><div class="titlepage"><div><div><h1 class="title"><a name="spring-batch-reference"></a>Spring Batch - Reference Documentation</h1></div><div><div class="authorgroup"><h2>Authors</h2>
|
|
<span class="author"><span class="firstname">Lucas</span> <span class="surname">Ward</span></span>
|
|
, <span class="author"><span class="firstname">Dave</span> <span class="surname">Syer</span></span>
|
|
|
|
, <span class="author"><span class="firstname">Thomas</span> <span class="surname">Risberg</span></span>
|
|
|
|
, <span class="author"><span class="firstname">Robert</span> <span class="surname">Kasanicky</span></span>
|
|
|
|
, <span class="author"><span class="firstname">Dan</span> <span class="surname">Garrette</span></span>
|
|
|
|
, <span class="author"><span class="firstname">Wayne</span> <span class="surname">Lund</span></span>
|
|
|
|
, <span class="author"><span class="firstname">Michael</span> <span class="surname">Minella</span></span>
|
|
|
|
, <span class="author"><span class="firstname">Chris</span> <span class="surname">Schaefer</span></span>
|
|
|
|
, <span class="author"><span class="firstname">Gunnar</span> <span class="surname">Hillert</span></span>
|
|
</div></div><div><p class="releaseinfo">4.0.0.BUILD-SNAPSHOT</p></div><div><p class="copyright">Copyright © 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017
|
|
Pivotal, Inc. All Rights Reserved.
|
|
</p></div><div><div class="legalnotice"><a name="d5e45" href="#d5e45"></a>
|
|
<p>Copies of this document may be made for your own use and for
|
|
distribution to others, provided that you do not charge any fee for such
|
|
copies and further provided that each copy contains this Copyright
|
|
Notice, whether distributed in print or electronically.</p>
|
|
</div></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl class="toc"><dt><span class="chapter"><a href="#spring-batch-intro">1. Spring Batch Introduction</a></span></dt><dd><dl><dt><span class="section"><a href="#springBatchBackground">1.1. Background</a></span></dt><dt><span class="section"><a href="#springBatchUsageScenarios">1.2. Usage Scenarios</a></span></dt><dt><span class="section"><a href="#springBatchArchitecture">1.3. Spring Batch Architecture</a></span></dt><dt><span class="section"><a href="#batchArchitectureConsiderations">1.4. General Batch Principles and Guidelines</a></span></dt><dt><span class="section"><a href="#batchProcessingStrategy">1.5. Batch Processing Strategies</a></span></dt></dl></dd><dt><span class="chapter"><a href="#whatsNew">2. What's New in Spring Batch 4.0</a></span></dt><dd><dl><dt><span class="section"><a href="#whatsNewJava">2.1. Java 8 Requirement</a></span></dt><dt><span class="section"><a href="#whatsNewDependencies">2.2. Dependencies re-baseline</a></span></dt><dt><span class="section"><a href="#whatsNewBuilders">2.3. Provide builders for the ItemReaders and ItemWriters</a></span></dt></dl></dd><dt><span class="chapter"><a href="#domain">3. The Domain Language of Batch</a></span></dt><dd><dl><dt><span class="section"><a href="#domainJob">3.1. Job</a></span></dt><dd><dl><dt><span class="section"><a href="#domainJobInstance">3.1.1. JobInstance</a></span></dt><dt><span class="section"><a href="#domainJobParameters">3.1.2. JobParameters</a></span></dt><dt><span class="section"><a href="#domainJobExecution">3.1.3. JobExecution</a></span></dt></dl></dd><dt><span class="section"><a href="#domainStep">3.2. Step</a></span></dt><dd><dl><dt><span class="section"><a href="#domainStepExecution">3.2.1. StepExecution</a></span></dt></dl></dd><dt><span class="section"><a href="#domainExecutionContext">3.3. ExecutionContext</a></span></dt><dt><span class="section"><a href="#domainJobRepository">3.4. JobRepository</a></span></dt><dt><span class="section"><a href="#domainJobLauncher">3.5. JobLauncher</a></span></dt><dt><span class="section"><a href="#domainItemReader">3.6. Item Reader</a></span></dt><dt><span class="section"><a href="#domainItemWriter">3.7. Item Writer</a></span></dt><dt><span class="section"><a href="#domainItemProcessor">3.8. Item Processor</a></span></dt><dt><span class="section"><a href="#domainBatchNamespace">3.9. Batch Namespace</a></span></dt></dl></dd><dt><span class="chapter"><a href="#configureJob">4. Configuring and Running a Job</a></span></dt><dd><dl><dt><span class="section"><a href="#configuringAJob">4.1. Configuring a Job</a></span></dt><dd><dl><dt><span class="section"><a href="#restartability">4.1.1. Restartability</a></span></dt><dt><span class="section"><a href="#interceptingJobExecution">4.1.2. Intercepting Job Execution</a></span></dt><dt><span class="section"><a href="#inheritingFromAParentJob">4.1.3. Inheriting from a Parent Job</a></span></dt><dt><span class="section"><a href="#d5e953">4.1.4. JobParametersValidator</a></span></dt></dl></dd><dt><span class="section"><a href="#javaConfig">4.2. Java Config</a></span></dt><dt><span class="section"><a href="#configuringJobRepository">4.3. Configuring a JobRepository</a></span></dt><dd><dl><dt><span class="section"><a href="#txConfigForJobRepository">4.3.1. Transaction Configuration for the JobRepository</a></span></dt><dt><span class="section"><a href="#repositoryTablePrefix">4.3.2. Changing the Table Prefix</a></span></dt><dt><span class="section"><a href="#inMemoryRepository">4.3.3. In-Memory Repository</a></span></dt><dt><span class="section"><a href="#nonStandardDatabaseTypesInRepository">4.3.4. Non-standard Database Types in a Repository</a></span></dt></dl></dd><dt><span class="section"><a href="#configuringJobLauncher">4.4. Configuring a JobLauncher</a></span></dt><dt><span class="section"><a href="#runningAJob">4.5. Running a Job</a></span></dt><dd><dl><dt><span class="section"><a href="#runningJobsFromCommandLine">4.5.1. Running Jobs from the Command Line</a></span></dt><dd><dl><dt><span class="section"><a href="#commandLineJobRunner">The CommandLineJobRunner</a></span></dt><dt><span class="section"><a href="#exitCodes">ExitCodes</a></span></dt></dl></dd><dt><span class="section"><a href="#runningJobsFromWebContainer">4.5.2. Running Jobs from within a Web Container</a></span></dt></dl></dd><dt><span class="section"><a href="#advancedMetaData">4.6. Advanced Meta-Data Usage</a></span></dt><dd><dl><dt><span class="section"><a href="#queryingRepository">4.6.1. Querying the Repository</a></span></dt><dt><span class="section"><a href="#d5e1215">4.6.2. JobRegistry</a></span></dt><dd><dl><dt><span class="section"><a href="#d5e1220">JobRegistryBeanPostProcessor</a></span></dt><dt><span class="section"><a href="#d5e1225">AutomaticJobRegistrar</a></span></dt></dl></dd><dt><span class="section"><a href="#JobOperator">4.6.3. JobOperator</a></span></dt><dt><span class="section"><a href="#JobParametersIncrementer">4.6.4. JobParametersIncrementer</a></span></dt><dt><span class="section"><a href="#stoppingAJob">4.6.5. Stopping a Job</a></span></dt><dt><span class="section"><a href="#d5e1303">4.6.6. Aborting a Job</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#configureStep">5. Configuring a Step</a></span></dt><dd><dl><dt><span class="section"><a href="#chunkOrientedProcessing">5.1. Chunk-Oriented Processing</a></span></dt><dd><dl><dt><span class="section"><a href="#configuringAStep">5.1.1. Configuring a Step</a></span></dt><dt><span class="section"><a href="#InheritingFromParentStep">5.1.2. Inheriting from a Parent Step</a></span></dt><dd><dl><dt><span class="section"><a href="#abstractStep">Abstract Step</a></span></dt><dt><span class="section"><a href="#mergingListsOnStep">Merging Lists</a></span></dt></dl></dd><dt><span class="section"><a href="#commitInterval">5.1.3. The Commit Interval</a></span></dt><dt><span class="section"><a href="#stepRestart">5.1.4. Configuring a Step for Restart</a></span></dt><dd><dl><dt><span class="section"><a href="#startLimit">Setting a StartLimit</a></span></dt><dt><span class="section"><a href="#allowStartIfComplete">Restarting a completed step</a></span></dt><dt><span class="section"><a href="#stepRestartExample">Step Restart Configuration Example</a></span></dt></dl></dd><dt><span class="section"><a href="#configuringSkip">5.1.5. Configuring Skip Logic</a></span></dt><dt><span class="section"><a href="#retryLogic">5.1.6. Configuring Retry Logic</a></span></dt><dt><span class="section"><a href="#controllingRollback">5.1.7. Controlling Rollback</a></span></dt><dd><dl><dt><span class="section"><a href="#transactionalReaders">Transactional Readers</a></span></dt></dl></dd><dt><span class="section"><a href="#transactionAttributes">5.1.8. Transaction Attributes</a></span></dt><dt><span class="section"><a href="#registeringItemStreams">5.1.9. Registering ItemStreams with the Step</a></span></dt><dt><span class="section"><a href="#interceptingStepExecution">5.1.10. Intercepting Step Execution</a></span></dt><dd><dl><dt><span class="section"><a href="#stepExecutionListener">StepExecutionListener</a></span></dt><dt><span class="section"><a href="#chunkListener">ChunkListener</a></span></dt><dt><span class="section"><a href="#itemReadListener">ItemReadListener</a></span></dt><dt><span class="section"><a href="#itemProcessListener">ItemProcessListener</a></span></dt><dt><span class="section"><a href="#itemWriteListener">ItemWriteListener</a></span></dt><dt><span class="section"><a href="#skipListener">SkipListener</a></span></dt></dl></dd></dl></dd><dt><span class="section"><a href="#taskletStep">5.2. TaskletStep</a></span></dt><dd><dl><dt><span class="section"><a href="#taskletAdapter">5.2.1. TaskletAdapter</a></span></dt><dt><span class="section"><a href="#exampleTaskletImplementation">5.2.2. Example Tasklet Implementation</a></span></dt></dl></dd><dt><span class="section"><a href="#controllingStepFlow">5.3. Controlling Step Flow</a></span></dt><dd><dl><dt><span class="section"><a href="#SequentialFlow">5.3.1. Sequential Flow</a></span></dt><dt><span class="section"><a href="#conditionalFlow">5.3.2. Conditional Flow</a></span></dt><dd><dl><dt><span class="section"><a href="#batchStatusVsExitStatus">Batch Status vs. Exit Status</a></span></dt></dl></dd><dt><span class="section"><a href="#configuringForStop">5.3.3. Configuring for Stop</a></span></dt><dd><dl><dt><span class="section"><a href="#endElement">The 'End' Element</a></span></dt><dt><span class="section"><a href="#failElement">The 'Fail' Element</a></span></dt><dt><span class="section"><a href="#stopElement">The 'Stop' Element</a></span></dt></dl></dd><dt><span class="section"><a href="#programmaticFlowDecisions">5.3.4. Programmatic Flow Decisions</a></span></dt><dt><span class="section"><a href="#split-flows">5.3.5. Split Flows</a></span></dt><dt><span class="section"><a href="#external-flows">5.3.6. Externalizing Flow Definitions and Dependencies Between
|
|
Jobs</a></span></dt></dl></dd><dt><span class="section"><a href="#late-binding">5.4. Late Binding of Job and Step Attributes</a></span></dt><dd><dl><dt><span class="section"><a href="#step-scope">5.4.1. Step Scope</a></span></dt><dt><span class="section"><a href="#job-scope">5.4.2. Job Scope</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#readersAndWriters">6. ItemReaders and ItemWriters</a></span></dt><dd><dl><dt><span class="section"><a href="#itemReader">6.1. ItemReader</a></span></dt><dt><span class="section"><a href="#itemWriter">6.2. ItemWriter</a></span></dt><dt><span class="section"><a href="#itemProcessor">6.3. ItemProcessor</a></span></dt><dd><dl><dt><span class="section"><a href="#chainingItemProcessors">6.3.1. Chaining ItemProcessors</a></span></dt><dt><span class="section"><a href="#filiteringRecords">6.3.2. Filtering Records</a></span></dt><dt><span class="section"><a href="#faultTolerant">6.3.3. Fault Tolerance</a></span></dt></dl></dd><dt><span class="section"><a href="#itemStream">6.4. ItemStream</a></span></dt><dt><span class="section"><a href="#delegatePatternAndRegistering">6.5. The Delegate Pattern and Registering with the Step</a></span></dt><dt><span class="section"><a href="#flatFiles">6.6. Flat Files</a></span></dt><dd><dl><dt><span class="section"><a href="#fieldSet">6.6.1. The FieldSet</a></span></dt><dt><span class="section"><a href="#flatFileItemReader">6.6.2. FlatFileItemReader</a></span></dt><dd><dl><dt><span class="section"><a href="#lineMapper">LineMapper</a></span></dt><dt><span class="section"><a href="#lineTokenizer">LineTokenizer</a></span></dt><dt><span class="section"><a href="#fieldSetMapper">FieldSetMapper</a></span></dt><dt><span class="section"><a href="#defaultLineMapper">DefaultLineMapper</a></span></dt><dt><span class="section"><a href="#simpleDelimitedFileReadingExample">Simple Delimited File Reading Example</a></span></dt><dt><span class="section"><a href="#mappingFieldsByName">Mapping Fields by Name</a></span></dt><dt><span class="section"><a href="#beanWrapperFieldSetMapper">Automapping FieldSets to Domain Objects</a></span></dt><dt><span class="section"><a href="#fixedLengthFileFormats">Fixed Length File Formats</a></span></dt><dt><span class="section"><a href="#prefixMatchingLineMapper">Multiple Record Types within a Single File</a></span></dt><dt><span class="section"><a href="#exceptionHandlingInFlatFiles">Exception Handling in Flat Files</a></span></dt></dl></dd><dt><span class="section"><a href="#flatFileItemWriter">6.6.3. FlatFileItemWriter</a></span></dt><dd><dl><dt><span class="section"><a href="#lineAggregator">LineAggregator</a></span></dt><dt><span class="section"><a href="#SimplifiedFileWritingExample">Simplified File Writing Example</a></span></dt><dt><span class="section"><a href="#FieldExtractor">FieldExtractor</a></span></dt><dt><span class="section"><a href="#delimitedFileWritingExample">Delimited File Writing Example</a></span></dt><dt><span class="section"><a href="#fixedWidthFileWritingExample">Fixed Width File Writing Example</a></span></dt><dt><span class="section"><a href="#handlingFileCreation">Handling File Creation</a></span></dt></dl></dd></dl></dd><dt><span class="section"><a href="#xmlReadingWriting">6.7. XML Item Readers and Writers</a></span></dt><dd><dl><dt><span class="section"><a href="#StaxEventItemReader">6.7.1. StaxEventItemReader</a></span></dt><dt><span class="section"><a href="#StaxEventItemWriter">6.7.2. StaxEventItemWriter</a></span></dt></dl></dd><dt><span class="section"><a href="#multiFileInput">6.8. Multi-File Input</a></span></dt><dt><span class="section"><a href="#database">6.9. Database</a></span></dt><dd><dl><dt><span class="section"><a href="#cursorBasedItemReaders">6.9.1. Cursor Based ItemReaders</a></span></dt><dd><dl><dt><span class="section"><a href="#JdbcCursorItemReader">JdbcCursorItemReader</a></span></dt><dt><span class="section"><a href="#HibernateCursorItemReader">HibernateCursorItemReader</a></span></dt><dt><span class="section"><a href="#StoredProcedureItemReader">StoredProcedureItemReader</a></span></dt></dl></dd><dt><span class="section"><a href="#pagingItemReaders">6.9.2. Paging ItemReaders</a></span></dt><dd><dl><dt><span class="section"><a href="#JdbcPagingItemReader">JdbcPagingItemReader</a></span></dt><dt><span class="section"><a href="#JpaPagingItemReader">JpaPagingItemReader</a></span></dt><dt><span class="section"><a href="#IbatisPagingItemReader">IbatisPagingItemReader</a></span></dt></dl></dd><dt><span class="section"><a href="#databaseItemWriters">6.9.3. Database ItemWriters</a></span></dt></dl></dd><dt><span class="section"><a href="#reusingExistingServices">6.10. Reusing Existing Services</a></span></dt><dt><span class="section"><a href="#validatingInput">6.11. Validating Input</a></span></dt><dt><span class="section"><a href="#process-indicator">6.12. Preventing State Persistence</a></span></dt><dt><span class="section"><a href="#customReadersWriters">6.13. Creating Custom ItemReaders and
|
|
ItemWriters</a></span></dt><dd><dl><dt><span class="section"><a href="#customReader">6.13.1. Custom ItemReader Example</a></span></dt><dd><dl><dt><span class="section"><a href="#restartableReader">Making the <code class="classname">ItemReader</code>
|
|
Restartable</a></span></dt></dl></dd><dt><span class="section"><a href="#customWriter">6.13.2. Custom ItemWriter Example</a></span></dt><dd><dl><dt><span class="section"><a href="#restartableWriter">Making the <code class="classname">ItemWriter</code>
|
|
Restartable</a></span></dt></dl></dd></dl></dd></dl></dd><dt><span class="chapter"><a href="#scalability">7. Scaling and Parallel Processing</a></span></dt><dd><dl><dt><span class="section"><a href="#multithreadedStep">7.1. Multi-threaded Step</a></span></dt><dt><span class="section"><a href="#scalabilityParallelSteps">7.2. Parallel Steps</a></span></dt><dt><span class="section"><a href="#remoteChunking">7.3. Remote Chunking</a></span></dt><dt><span class="section"><a href="#partitioning">7.4. Partitioning</a></span></dt><dd><dl><dt><span class="section"><a href="#partitionHandler">7.4.1. PartitionHandler</a></span></dt><dt><span class="section"><a href="#stepExecutionSplitter">7.4.2. Partitioner</a></span></dt><dt><span class="section"><a href="#bindingInputDataToSteps">7.4.3. Binding Input Data to Steps</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#repeat">8. Repeat</a></span></dt><dd><dl><dt><span class="section"><a href="#repeatTemplate">8.1. RepeatTemplate</a></span></dt><dd><dl><dt><span class="section"><a href="#repeatContext">8.1.1. RepeatContext</a></span></dt><dt><span class="section"><a href="#repeatStatus">8.1.2. RepeatStatus</a></span></dt></dl></dd><dt><span class="section"><a href="#completionPolicies">8.2. Completion Policies</a></span></dt><dt><span class="section"><a href="#repeatExceptionHandling">8.3. Exception Handling</a></span></dt><dt><span class="section"><a href="#repeatListeners">8.4. Listeners</a></span></dt><dt><span class="section"><a href="#repeatParallelProcessing">8.5. Parallel Processing</a></span></dt><dt><span class="section"><a href="#declarativeIteration">8.6. Declarative Iteration</a></span></dt></dl></dd><dt><span class="chapter"><a href="#retry">9. Retry</a></span></dt><dd><dl><dt><span class="section"><a href="#retryTemplate">9.1. RetryTemplate</a></span></dt><dd><dl><dt><span class="section"><a href="#retryContext">9.1.1. RetryContext</a></span></dt><dt><span class="section"><a href="#recoveryCallback">9.1.2. RecoveryCallback</a></span></dt><dt><span class="section"><a href="#statelessRetry">9.1.3. Stateless Retry</a></span></dt><dt><span class="section"><a href="#statefulRetry">9.1.4. Stateful Retry</a></span></dt></dl></dd><dt><span class="section"><a href="#retryPolicies">9.2. Retry Policies</a></span></dt><dt><span class="section"><a href="#backoffPolicies">9.3. Backoff Policies</a></span></dt><dt><span class="section"><a href="#retryListeners">9.4. Listeners</a></span></dt><dt><span class="section"><a href="#declarativeRetry">9.5. Declarative Retry</a></span></dt></dl></dd><dt><span class="chapter"><a href="#testing">10. Unit Testing</a></span></dt><dd><dl><dt><span class="section"><a href="#creatingUnitTestClass">10.1. Creating a Unit Test Class</a></span></dt><dt><span class="section"><a href="#endToEndTesting">10.2. End-To-End Testing of Batch Jobs</a></span></dt><dt><span class="section"><a href="#testingIndividualSteps">10.3. Testing Individual Steps</a></span></dt><dt><span class="section"><a href="#d5e3514">10.4. Testing Step-Scoped Components</a></span></dt><dt><span class="section"><a href="#validatingOutputFiles">10.5. Validating Output Files</a></span></dt><dt><span class="section"><a href="#mockingDomainObjects">10.6. Mocking Domain Objects</a></span></dt></dl></dd><dt><span class="chapter"><a href="#patterns">11. Common Batch Patterns</a></span></dt><dd><dl><dt><span class="section"><a href="#loggingItemProcessingAndFailures">11.1. Logging Item Processing and Failures</a></span></dt><dt><span class="section"><a href="#stoppingAJobManuallyForBusinessReasons">11.2. Stopping a Job Manually for Business Reasons</a></span></dt><dt><span class="section"><a href="#addingAFooterRecord">11.3. Adding a Footer Record</a></span></dt><dd><dl><dt><span class="section"><a href="#writingASummaryFooter">11.3.1. Writing a Summary Footer</a></span></dt></dl></dd><dt><span class="section"><a href="#drivingQueryBasedItemReaders">11.4. Driving Query Based ItemReaders</a></span></dt><dt><span class="section"><a href="#multiLineRecords">11.5. Multi-Line Records</a></span></dt><dt><span class="section"><a href="#executingSystemCommands">11.6. Executing System Commands</a></span></dt><dt><span class="section"><a href="#handlingStepCompletionWhenNoInputIsFound">11.7. Handling Step Completion When No Input is Found</a></span></dt><dt><span class="section"><a href="#passingDataToFutureSteps">11.8. Passing Data to Future Steps</a></span></dt></dl></dd><dt><span class="chapter"><a href="#jsr-352">12. JSR-352 Support</a></span></dt><dd><dl><dt><span class="section"><a href="#jsrGeneralNotes">12.1. General Notes Spring Batch and JSR-352</a></span></dt><dt><span class="section"><a href="#jsrSetup">12.2. Setup</a></span></dt><dd><dl><dt><span class="section"><a href="#jsrSetupContexts">12.2.1. Application Contexts</a></span></dt><dt><span class="section"><a href="#jsrSetupLaunching">12.2.2. Launching a JSR-352 based job</a></span></dt></dl></dd><dt><span class="section"><a href="#dependencyInjection">12.3. Dependency Injection</a></span></dt><dt><span class="section"><a href="#jsrJobProperties">12.4. Batch Properties</a></span></dt><dd><dl><dt><span class="section"><a href="#jsrPropertySupport">12.4.1. Property Support</a></span></dt><dt><span class="section"><a href="#jsrBatchPropertyAnnotation">12.4.2. <code class="classname">@BatchProperty</code> annotation</a></span></dt><dt><span class="section"><a href="#jsrPropertySubstitution">12.4.3. Property Substitution</a></span></dt></dl></dd><dt><span class="section"><a href="#jsrProcessingModels">12.5. Processing Models</a></span></dt><dd><dl><dt><span class="section"><a href="#d5e3942">12.5.1. Item based processing</a></span></dt><dt><span class="section"><a href="#d5e3952">12.5.2. Custom checkpointing</a></span></dt></dl></dd><dt><span class="section"><a href="#jsrRunningAJob">12.6. Running a job</a></span></dt><dt><span class="section"><a href="#jsrContexts">12.7. Contexts</a></span></dt><dt><span class="section"><a href="#jsrStepFlow">12.8. Step Flow</a></span></dt><dt><span class="section"><a href="#jsrScaling">12.9. Scaling a JSR-352 batch job</a></span></dt><dd><dl><dt><span class="section"><a href="#jsrPartitioning">12.9.1. Partitioning</a></span></dt></dl></dd><dt><span class="section"><a href="#jsrTesting">12.10. Testing</a></span></dt></dl></dd><dt><span class="chapter"><a href="#springBatchIntegration">13. Spring Batch Integration</a></span></dt><dd><dl><dt><span class="sect1"><a href="#spring-batch-integration-introduction">13.1. Spring Batch Integration Introduction</a></span></dt><dd><dl><dt><span class="sect2"><a href="#namespace-support">13.1.1. Namespace Support</a></span></dt><dt><span class="sect2"><a href="#launching-batch-jobs-through-messages">13.1.2. Launching Batch Jobs through Messages</a></span></dt><dd><dl><dt><span class="sect3"><a href="#transforming-a-file-into-a-joblaunchrequest">Transforming a file into a JobLaunchRequest</a></span></dt><dt><span class="sect3"><a href="#the-jobexecution-response">The JobExecution Response</a></span></dt><dt><span class="sect3"><a href="#spring-batch-integration-configuration">Spring Batch Integration Configuration</a></span></dt><dt><span class="sect3"><a href="#example-itemreader-configuration">Example ItemReader Configuration</a></span></dt></dl></dd><dt><span class="sect2"><a href="#providing-feedback-with-informational-messages">13.1.3. Providing Feedback with Informational Messages</a></span></dt><dt><span class="sect2"><a href="#asynchronous-processors">13.1.4. Asynchronous Processors</a></span></dt><dt><span class="sect2"><a href="#externalizing-batch-process-execution">13.1.5. Externalizing Batch Process Execution</a></span></dt><dd><dl><dt><span class="sect3"><a href="#remote-chunking">Remote Chunking</a></span></dt><dt><span class="sect3"><a href="#remote-partitioning">Remote Partitioning</a></span></dt></dl></dd></dl></dd></dl></dd><dt><span class="appendix"><a href="#listOfReadersAndWriters">A. List of ItemReaders and ItemWriters</a></span></dt><dd><dl><dt><span class="section"><a href="#itemReadersAppendix">A.1. Item Readers</a></span></dt><dt><span class="section"><a href="#itemWritersAppendix">A.2. Item Writers</a></span></dt></dl></dd><dt><span class="appendix"><a href="#metaDataSchema">B. Meta-Data Schema</a></span></dt><dd><dl><dt><span class="section"><a href="#metaDataSchemaOverview">B.1. Overview</a></span></dt><dd><dl><dt><span class="section"><a href="#exampleDDLScripts">B.1.1. Example DDL Scripts</a></span></dt><dt><span class="section"><a href="#metaDataVersion">B.1.2. Version</a></span></dt><dt><span class="section"><a href="#metaDataIdentity">B.1.3. Identity</a></span></dt></dl></dd><dt><span class="section"><a href="#metaDataBatchJobInstance">B.2. BATCH_JOB_INSTANCE</a></span></dt><dt><span class="section"><a href="#metaDataBatchJobParams">B.3. BATCH_JOB_EXECUTION_PARAMS</a></span></dt><dt><span class="section"><a href="#metaDataBatchJobExecution">B.4. BATCH_JOB_EXECUTION</a></span></dt><dt><span class="section"><a href="#metaDataBatchStepExecution">B.5. BATCH_STEP_EXECUTION</a></span></dt><dt><span class="section"><a href="#metaDataBatchJobExecutionContext">B.6. BATCH_JOB_EXECUTION_CONTEXT</a></span></dt><dt><span class="section"><a href="#metaDataBatchStepExecutionContext">B.7. BATCH_STEP_EXECUTION_CONTEXT</a></span></dt><dt><span class="section"><a href="#metaDataArchiving">B.8. Archiving</a></span></dt><dt><span class="section"><a href="#multiByteCharacters">B.9. International and Multi-byte Characters</a></span></dt><dt><span class="section"><a href="#recommendationsForIndexingMetaDataTables">B.10. Recommendations for Indexing Meta Data Tables</a></span></dt></dl></dd><dt><span class="appendix"><a href="#transactions">C. Batch Processing and Transactions</a></span></dt><dd><dl><dt><span class="section"><a href="#transactionsNoRetry">C.1. Simple Batching with No Retry</a></span></dt><dt><span class="section"><a href="#transactionStatelessRetry">C.2. Simple Stateless Retry</a></span></dt><dt><span class="section"><a href="#repeatRetry">C.3. Typical Repeat-Retry Pattern</a></span></dt><dt><span class="section"><a href="#asyncChunkProcessing">C.4. Asynchronous Chunk Processing</a></span></dt><dt><span class="section"><a href="#asyncItemProcessing">C.5. Asynchronous Item Processing</a></span></dt><dt><span class="section"><a href="#transactionPropagation">C.6. Interactions Between Batching and Transaction Propagation</a></span></dt><dt><span class="section"><a href="#specialTransactionOrthonogonal">C.7. Special Case: Transactions with Orthogonal Resources</a></span></dt><dt><span class="section"><a href="#statelessRetryCannotRecover">C.8. Stateless Retry Cannot Recover</a></span></dt></dl></dd><dt><span class="glossary"><a href="#glossary">Glossary</a></span></dt></dl></div>
|
|
|
|
|
|
<div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="spring-batch-intro" href="#spring-batch-intro"></a>1. Spring Batch Introduction</h1></div></div></div><p>Many applications within the enterprise domain require bulk processing
|
|
to perform business operations in mission critical environments. These
|
|
business operations include automated, complex processing of large volumes
|
|
of information that is most efficiently processed without user interaction.
|
|
These operations typically include time based events (e.g. month-end
|
|
calculations, notices or correspondence), periodic application of complex
|
|
business rules processed repetitively across very large data sets (e.g.
|
|
Insurance benefit determination or rate adjustments), or the integration of
|
|
information that is received from internal and external systems that
|
|
typically requires formatting, validation and processing in a transactional
|
|
manner into the system of record. Batch processing is used to process
|
|
billions of transactions every day for enterprises.</p><p>Spring Batch is a lightweight, comprehensive batch framework designed
|
|
to enable the development of robust batch applications vital for the daily
|
|
operations of enterprise systems. Spring Batch builds upon the productivity,
|
|
POJO-based development approach, and general ease of use capabilities people
|
|
have come to know from the Spring Framework, while making it easy for
|
|
developers to access and leverage more advance enterprise services when
|
|
necessary. Spring Batch is not a scheduling framework. There are many good
|
|
enterprise schedulers available in both the commercial and open source
|
|
spaces such as Quartz, Tivoli, Control-M, etc. It is intended to work in
|
|
conjunction with a scheduler, not replace a scheduler.</p><p>Spring Batch provides reusable functions that are essential in
|
|
processing large volumes of records, including logging/tracing, transaction
|
|
management, job processing statistics, job restart, skip, and resource
|
|
management. It also provides more advance technical services and features
|
|
that will enable extremely high-volume and high performance batch jobs
|
|
though optimization and partitioning techniques. Simple as well as complex,
|
|
high-volume batch jobs can leverage the framework in a highly scalable
|
|
manner to process significant volumes of information.</p><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="springBatchBackground" href="#springBatchBackground"></a>1.1 Background</h2></div></div></div><p>While open source software projects and associated communities have
|
|
focused greater attention on web-based and SOA messaging-based
|
|
architecture frameworks, there has been a notable lack of focus on
|
|
reusable architecture frameworks to accommodate Java-based batch
|
|
processing needs, despite continued needs to handle such processing within
|
|
enterprise IT environments. The lack of a standard, reusable batch
|
|
architecture has resulted in the proliferation of many one-off, in-house
|
|
solutions developed within client enterprise IT functions.</p><p>SpringSource and Accenture have collaborated to change this.
|
|
Accenture's hands-on industry and technical experience in implementing
|
|
batch architectures, SpringSource's depth of technical experience, and
|
|
Spring's proven programming model together mark a natural and powerful
|
|
partnership to create high-quality, market relevant software aimed at
|
|
filling an important gap in enterprise Java. Both companies are also
|
|
currently working with a number of clients solving similar problems
|
|
developing Spring-based batch architecture solutions. This has provided
|
|
some useful additional detail and real-life constraints helping to ensure
|
|
the solution can be applied to the real-world problems posed by clients.
|
|
For these reasons and many more, SpringSource and Accenture have teamed to
|
|
collaborate on the development of Spring Batch.</p><p>Accenture has contributed previously proprietary batch processing
|
|
architecture frameworks, based upon decades worth of experience in
|
|
building batch architectures with the last several generations of
|
|
platforms, (i.e., COBOL/Mainframe, C++/Unix, and now Java/anywhere) to the
|
|
Spring Batch project along with committer resources to drive support,
|
|
enhancements, and the future roadmap.</p><p>The collaborative effort between Accenture and SpringSource aims to
|
|
promote the standardization of software processing approaches, frameworks,
|
|
and tools that can be consistently leveraged by enterprise users when
|
|
creating batch applications. Companies and government agencies desiring to
|
|
deliver standard, proven solutions to their enterprise IT environments
|
|
will benefit from Spring Batch.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="springBatchUsageScenarios" href="#springBatchUsageScenarios"></a>1.2 Usage Scenarios</h2></div></div></div><p>A typical batch program generally reads a large number of records
|
|
from a database, file, or queue, processes the data in some fashion, and
|
|
then writes back data in a modified form. Spring Batch automates this
|
|
basic batch iteration, providing the capability to process similar
|
|
transactions as a set, typically in an offline environment without any
|
|
user interaction. Batch jobs are part of most IT projects and Spring Batch
|
|
is the only open source framework that provides a robust, enterprise-scale
|
|
solution.</p><p>Business Scenarios </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Commit batch process periodically</p></li><li class="listitem"><p>Concurrent batch processing: parallel processing of a
|
|
job</p></li><li class="listitem"><p>Staged, enterprise message-driven processing</p></li><li class="listitem"><p>Massively parallel batch processing</p></li><li class="listitem"><p>Manual or scheduled restart after failure</p></li><li class="listitem"><p>Sequential processing of dependent steps (with extensions to
|
|
workflow-driven batches)</p></li><li class="listitem"><p>Partial processing: skip records (e.g. on rollback)</p></li><li class="listitem"><p>Whole-batch transaction: for cases with a small batch size or
|
|
existing stored procedures/scripts</p></li></ul></div><p>Technical Objectives </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Batch developers use the Spring programming model: concentrate
|
|
on business logic; let the framework take care of
|
|
infrastructure.</p></li><li class="listitem"><p>Clear separation of concerns between the infrastructure, the
|
|
batch execution environment, and the batch application.</p></li><li class="listitem"><p>Provide common, core execution services as interfaces that all
|
|
projects can implement.</p></li><li class="listitem"><p>Provide simple and default implementations of the core
|
|
execution interfaces that can be used ‘out of the box’.</p></li><li class="listitem"><p>Easy to configure, customize, and extend services, by
|
|
leveraging the spring framework in all layers.</p></li><li class="listitem"><p>All existing core services should be easy to replace or
|
|
extend, without any impact to the infrastructure layer.</p></li><li class="listitem"><p>Provide a simple deployment model, with the architecture JARs
|
|
completely separate from the application, built using Maven.</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="springBatchArchitecture" href="#springBatchArchitecture"></a>1.3 Spring Batch Architecture</h2></div></div></div><p></p><p>Spring Batch is designed with extensibility and a diverse group of
|
|
end users in mind. The figure below shows a sketch of the layered
|
|
architecture that supports the extensibility and ease of use for end-user
|
|
developers. </p><div class="mediaobject" align="center"><img src="images/spring-batch-layers.png" align="middle"><div class="caption"><p>Figure 1.1: Spring Batch Layered
|
|
Architecture</p></div></div><p>This layered architecture highlights three major high level
|
|
components: Application, Core, and Infrastructure. The application
|
|
contains all batch jobs and custom code written by developers using Spring
|
|
Batch. The Batch Core contains the core runtime classes necessary to
|
|
launch and control a batch job. It includes things such as a
|
|
<code class="classname">JobLauncher</code>, <code class="classname">Job</code>, and
|
|
<code class="classname">Step</code> implementations. Both Application and Core are
|
|
built on top of a common infrastructure. This infrastructure contains
|
|
common readers and writers, and services such as the
|
|
<code class="classname">RetryTemplate</code>, which are used both by application
|
|
developers(<code class="classname">ItemReader</code> and
|
|
<code class="classname">ItemWriter</code>) and the core framework itself.
|
|
(retry)</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="batchArchitectureConsiderations" href="#batchArchitectureConsiderations"></a>1.4 General Batch Principles and Guidelines</h2></div></div></div><p>The following are a number of key principles, guidelines, and general considerations to take into consideration when building a batch solution.</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>A batch architecture typically affects on-line architecture and vice versa. Design with both architectures and environments in mind using common building blocks when possible.</p></li><li class="listitem"><p>Simplify as much as possible and avoid building complex logical structures in single batch applications.</p></li><li class="listitem"><p>Process data as close to where the data physically resides as possible or vice versa (i.e., keep your data where your processing occurs).</p></li><li class="listitem"><p>Minimize system resource use, especially I/O. Perform as many operations as possible in internal memory.</p></li><li class="listitem"><p>Review application I/O (analyze SQL statements) to ensure that unnecessary physical I/O is avoided. In particular, the following four common flaws need to be looked for:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p>Reading data for every transaction when the data could be read once and kept cached or in the working storage;</p></li><li class="listitem"><p>Rereading data for a transaction where the data was read earlier in the same transaction;</p></li><li class="listitem"><p>Causing unnecessary table or index scans;</p></li><li class="listitem"><p>Not specifying key values in the WHERE clause of an SQL statement.</p></li></ul></div><p>
|
|
</p></li><li class="listitem"><p>Do not do things twice in a batch run. For instance, if you need data summarization for reporting purposes, increment stored totals if possible when data is being initially processed, so your reporting application does not have to reprocess the same data.</p></li><li class="listitem"><p>Allocate enough memory at the beginning of a batch application to avoid time-consuming reallocation during the process.</p></li><li class="listitem"><p>Always assume the worst with regard to data integrity. Insert adequate checks and record validation to maintain data integrity.</p></li><li class="listitem"><p>Implement checksums for internal validation where possible. For example, flat files should have a trailer record telling the total of records in the file and an aggregate of the key fields.</p></li><li class="listitem"><p>Plan and execute stress tests as early as possible in a production-like environment with realistic data volumes.</p></li><li class="listitem"><p>In large batch systems backups can be challenging, especially if the system is running concurrent with on-line on a 24-7 basis. Database backups are typically well taken care of in the on-line design, but file backups should be considered to be just as important. If the system depends on flat files, file backup procedures should not only be in place and documented, but regularly tested as well.</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="batchProcessingStrategy" href="#batchProcessingStrategy"></a>1.5 Batch Processing Strategies</h2></div></div></div><p>To help design and implement batch systems, basic batch application building blocks and patterns should be provided to the designers and programmers in form of sample structure charts and code shells. When starting to design a batch job, the business logic should be decomposed into a series of steps which can be implemented using the following standard building blocks:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Conversion Applications:</em></span> For each type of file supplied by or generated to an external system, a conversion application will need to be created to convert the transaction records supplied into a standard format required for processing. This type of batch application can partly or entirely consist of translation utility modules (see Basic Batch Services).</p></li><li class="listitem"><p><span class="emphasis"><em>Validation Applications:</em></span> Validation applications ensure that all input/output records are correct and consistent. Validation is typically based on file headers and trailers, checksums and validation algorithms as well as record level cross-checks.</p></li><li class="listitem"><p><span class="emphasis"><em>Extract Applications:</em></span> An application that reads a set of records from a database or input file, selects records based on predefined rules, and writes the records to an output file.</p></li><li class="listitem"><p><span class="emphasis"><em>Extract/Update Applications:</em></span> An application that reads records from a database or an input file, and makes changes to a database or an output file driven by the data found in each input record.</p></li><li class="listitem"><p><span class="emphasis"><em>Processing and Updating Applications:</em></span> An application that performs processing on input transactions from an extract or a validation application. The processing will usually involve reading a database to obtain data required for processing, potentially updating the database and creating records for output processing.</p></li><li class="listitem"><p><span class="emphasis"><em>Output/Format Applications:</em></span> Applications reading an input file, restructures data from this record according to a standard format, and produces an output file for printing or transmission to another program or system.</p></li></ul></div><p>Additionally a basic application shell should be provided for business logic that cannot be built using the previously mentioned building blocks.</p><p>In addition to the main building blocks, each application may use one or more of standard utility steps, such as:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Sort - A Program that reads an input file and produces an output file where records have been re-sequenced according to a sort key field in the records. Sorts are usually performed by standard system utilities.</p></li><li class="listitem"><p>Split - A program that reads a single input file, and writes each record to one of several output files based on a field value. Splits can be tailored or performed by parameter-driven standard system utilities.</p></li><li class="listitem"><p>Merge - A program that reads records from multiple input files and produces one output file with combined data from the input files. Merges can be tailored or performed by parameter-driven standard system utilities.</p></li></ul></div><p>Batch applications can additionally be categorized by their input source:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Database-driven applications are driven by rows or values retrieved from the database.</p></li><li class="listitem"><p>File-driven applications are driven by records or values retrieved from a file.</p></li><li class="listitem"><p>Message-driven applications are driven by messages retrieved from a message queue.</p></li></ul></div><p>The foundation of any batch system is the processing strategy. Factors affecting the selection of the strategy include: estimated batch system volume, concurrency with on-line or with another batch systems, available batch windows (and with more enterprises wanting to be up and running 24x7, this leaves no obvious batch windows).</p><p>Typical processing options for batch are:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Normal processing in a batch window during off-line</p></li><li class="listitem"><p>Concurrent batch / on-line processing</p></li><li class="listitem"><p>Parallel processing of many different batch runs or jobs at the same time</p></li><li class="listitem"><p>Partitioning (i.e. processing of many instances of the same job at the same time)</p></li><li class="listitem"><p>A combination of these</p></li></ul></div><p>The order in the list above reflects the implementation complexity, processing in a batch window being the easiest and partitioning the most complex to implement.</p><p>Some or all of these options may be supported by a commercial scheduler.</p><p>In the following section these processing options are discussed in more detail. It is important to notice that the commit and locking strategy adopted by batch processes will be dependent on the type of processing performed, and as a rule of thumb and the on-line locking strategy should also use the same principles. Therefore, the batch architecture cannot be simply an afterthought when designing an overall architecture.</p><p>The locking strategy can use only normal database locks, or an additional custom locking service can be implemented in the architecture. The locking service would track database locking (for example by storing the necessary information in a dedicated db-table) and give or deny permissions to the application programs requesting a db operation. Retry logic could also be implemented by this architecture to avoid aborting a batch job in case of a lock situation.</p><p><span class="bold"><strong>1. Normal processing in a batch window</strong></span>
|
|
For simple batch processes running in a separate batch window, where the data being updated is not required by on-line users or other batch processes, concurrency is not an issue and a single commit can be done at the end of the batch run.</p><p>In most cases a more robust approach is more appropriate. A thing to keep in mind is that batch systems have a tendency to grow as time goes by, both in terms of complexity and the data volumes they will handle. If no locking strategy is in place and the system still relies on a single commit point, modifying the batch programs can be painful. Therefore, even with the simplest batch systems, consider the need for commit logic for restart-recovery options as well as the information concerning the more complex cases below.</p><p><span class="bold"><strong>2. Concurrent batch / on-line processing</strong></span>
|
|
Batch applications processing data that can simultaneously be updated by on-line users, should not lock any data (either in the database or in files) which could be required by on-line users for more than a few seconds. Also updates should be committed to the database at the end of every few transaction. This minimizes the portion of data that is unavailable to other processes and the elapsed time the data is unavailable.</p><p>Another option to minimize physical locking is to have a logical row-level locking implemented using either an Optimistic Locking Pattern or a Pessimistic Locking Pattern.</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Optimistic locking assumes a low likelihood of record contention. It typically means inserting a timestamp column in each database table used concurrently by both batch and on-line processing. When an application fetches a row for processing, it also fetches the timestamp. As the application then tries to update the processed row, the update uses the original timestamp in the WHERE clause. If the timestamp matches, the data and the timestamp will be updated successfully. If the timestamp does not match, this indicates that another application has updated the same row between the fetch and the update attempt and therefore the update cannot be performed.</p></li><li class="listitem"><p>Pessimistic locking is any locking strategy that assumes there is a high likelihood of record contention and therefore either a physical or logical lock needs to be obtained at retrieval time. One type of pessimistic logical locking uses a dedicated lock-column in the database table. When an application retrieves the row for update, it sets a flag in the lock column. With the flag in place, other applications attempting to retrieve the same row will logically fail. When the application that set the flag updates the row, it also clears the flag, enabling the row to be retrieved by other applications. Please note, that the integrity of data must be maintained also between the initial fetch and the setting of the flag, for example by using db locks (e.g., SELECT FOR UPDATE). Note also that this method suffers from the same downside as physical locking except that it is somewhat easier to manage building a time-out mechanism that will get the lock released if the user goes to lunch while the record is locked.</p></li></ul></div><p>These patterns are not necessarily suitable for batch processing, but they might be used for concurrent batch and on-line processing (e.g. in cases where the database doesn't support row-level locking). As a general rule, optimistic locking is more suitable for on-line applications, while pessimistic locking is more suitable for batch applications. Whenever logical locking is used, the same scheme must be used for all applications accessing data entities protected by logical locks.</p><p>Note that both of these solutions only address locking a single record. Often we may need to lock a logically related group of records. With physical locks, you have to manage these very carefully in order to avoid potential deadlocks. With logical locks, it is usually best to build a logical lock manager that understands the logical record groups you want to protect and can ensure that locks are coherent and non-deadlocking. This logical lock manager usually uses its own tables for lock management, contention reporting, time-out mechanism, etc.</p><p><span class="bold"><strong>3. Parallel Processing</strong></span>
|
|
Parallel processing allows multiple batch runs / jobs to run in parallel to minimize the total elapsed batch processing time. This is not a problem as long as the jobs are not sharing the same files, db-tables or index spaces. If they do, this service should be implemented using partitioned data. Another option is to build an architecture module for maintaining interdependencies using a control table. A control table should contain a row for each shared resource and whether it is in use by an application or not. The batch architecture or the application in a parallel job would then retrieve information from that table to determine if it can get access to the resource it needs or not.</p><p>If the data access is not a problem, parallel processing can be implemented through the use of additional threads to process in parallel. In the mainframe environment, parallel job classes have traditionally been used, in order to ensure adequate CPU time for all the processes. Regardless, the solution has to be robust enough to ensure time slices for all the running processes.</p><p>Other key issues in parallel processing include load balancing and the availability of general system resources such as files, database buffer pools etc. Also note that the control table itself can easily become a critical resource.</p><p><span class="bold"><strong>4. Partitioning</strong></span>
|
|
Using partitioning allows multiple versions of large batch applications to run concurrently. The purpose of this is to reduce the elapsed time required to process long batch jobs. Processes which can be successfully partitioned are those where the input file can be split and/or the main database tables partitioned to allow the application to run against different sets of data.</p><p>In addition, processes which are partitioned must be designed to only process their assigned data set. A partitioning architecture has to be closely tied to the database design and the database partitioning strategy. Please note, that the database partitioning doesn't necessarily mean physical partitioning of the database, although in most cases this is advisable. The following picture illustrates the partitioning approach:
|
|
|
|
</p><div class="mediaobject" align="center"><img src="images/partitioned.png" align="middle"><div class="caption"><p>Figure 1.2: Partitioned Process</p></div></div><p>
|
|
|
|
<img src="" align="middle">
|
|
</p><p>The architecture should be flexible enough to allow dynamic configuration of the number of partitions. Both automatic and user controlled configuration should be considered. Automatic configuration may be based on parameters such as the input file size and/or the number of input records.</p><p><span class="bold"><strong>4.1 Partitioning Approaches</strong></span>
|
|
The following lists some of the possible partitioning approaches. Selecting a partitioning approach has to be done on a case-by-case basis.</p><p><span class="emphasis"><em>1. Fixed and Even Break-Up of Record Set</em></span></p><p>This involves breaking the input record set into an even number of portions (e.g. 10, where each portion will have exactly 1/10th of the entire record set). Each portion is then processed by one instance of the batch/extract application.</p><p>In order to use this approach, preprocessing will be required to split the recordset up. The result of this split will be a lower and upper bound placement number which can be used as input to the batch/extract application in order to restrict its processing to its portion alone.</p><p>Preprocessing could be a large overhead as it has to calculate and determine the bounds of each portion of the record set.</p><p><span class="emphasis"><em>2. Breakup by a Key Column</em></span></p><p>This involves breaking up the input record set by a key column such as a location code, and assigning data from each key to a batch instance. In order to achieve this, column values can either be</p><p><span class="emphasis"><em>3. Assigned to a batch instance via a partitioning table (see below for details).</em></span></p><p><span class="emphasis"><em>4. Assigned to a batch instance by a portion of the value (e.g. values 0000-0999, 1000 - 1999, etc.)</em></span></p><p>Under option 1, addition of new values will mean a manual reconfiguration of the batch/extract to ensure that the new value is added to a particular instance.</p><p>Under option 2, this will ensure that all values are covered via an instance of the batch job. However, the number of values processed by one instance is dependent on the distribution of column values (i.e. there may be a large number of locations in the 0000-0999 range, and few in the 1000-1999 range). Under this option, the data range should be designed with partitioning in mind.</p><p>Under both options, the optimal even distribution of records to batch instances cannot be realized. There is no dynamic configuration of the number of batch instances used.</p><p><span class="emphasis"><em>5. Breakup by Views</em></span></p><p>This approach is basically breakup by a key column, but on the database level. It involves breaking up the recordset into views. These views will be used by each instance of the batch application during its processing. The breakup will be done by grouping the data.</p><p>With this option, each instance of a batch application will have to be configured to hit a particular view (instead of the master table). Also, with the addition of new data values, this new group of data will have to be included into a view. There is no dynamic configuration capability, as a change in the number of instances will result in a change to the views.</p><p><span class="emphasis"><em>6. Addition of a Processing Indicator</em></span></p><p>This involves the addition of a new column to the input table, which acts as an indicator. As a preprocessing step, all indicators would be marked to non-processed. During the record fetch stage of the batch application, records are read on the condition that that record is marked non-processed, and once they are read (with lock), they are marked processing. When that record is completed, the indicator is updated to either complete or error. Many instances of a batch application can be started without a change, as the additional column ensures that a record is only processed once.</p><p>With this option, I/O on the table increases dynamically. In the case of an updating batch application, this impact is reduced, as a write will have to occur anyway.</p><p><span class="emphasis"><em>7. Extract Table to a Flat File</em></span></p><p>This involves the extraction of the table into a file. This file can then be split into multiple segments and used as input to the batch instances.</p><p>With this option, the additional overhead of extracting the table into a file, and splitting it, may cancel out the effect of multi-partitioning. Dynamic configuration can be achieved via changing the file splitting script.</p><p><span class="emphasis"><em>8. Use of a Hashing Column</em></span></p><p>This scheme involves the addition of a hash column (key/index) to the database tables used to retrieve the driver record. This hash column will have an indicator to determine which instance of the batch application will process this particular row. For example, if there are three batch instances to be started, then an indicator of 'A' will mark that row for processing by instance 1, an indicator of 'B' will mark that row for processing by instance 2, etc.</p><p>The procedure used to retrieve the records would then have an additional WHERE clause to select all rows marked by a particular indicator. The inserts in this table would involve the addition of the marker field, which would be defaulted to one of the instances (e.g. 'A').</p><p>A simple batch application would be used to update the indicators such as to redistribute the load between the different instances. When a sufficiently large number of new rows have been added, this batch can be run (anytime, except in the batch window) to redistribute the new rows to other instances.</p><p>Additional instances of the batch application only require the running of the batch application as above to redistribute the indicators to cater for a new number of instances.</p><p><span class="bold"><strong>4.2 Database and Application design Principles</strong></span></p><p>An architecture that supports multi-partitioned applications which run against partitioned database tables using the key column approach, should include a central partition repository for storing partition parameters. This provides flexibility and ensures maintainability. The repository will generally consist of a single table known as the partition table.</p><p>Information stored in the partition table will be static and in general should be maintained by the DBA. The table should consist of one row of information for each partition of a multi-partitioned application. The table should have columns for: Program ID Code, Partition Number (Logical ID of the partition), Low Value of the db key column for this partition, High Value of the db key column for this partition.</p><p>On program start-up the program id and partition number should be passed to the application from the architecture (Control Processing Tasklet). These variables are used to read the partition table, to determine what range of data the application is to process (if a key column approach is used). In addition the partition number must be used throughout the processing to:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Add to the output files/database updates in order for the merge process to work properly</p></li><li class="listitem"><p>Report normal processing to the batch log and any errors that occur during execution to the architecture error handler</p></li></ul></div><p><span class="bold"><strong>4.3 Minimizing Deadlocks</strong></span></p><p>When applications run in parallel or partitioned, contention in database resources and deadlocks may occur. It is critical that the database design team eliminates potential contention situations as far as possible as part of the database design.</p><p>Also ensure that the database index tables are designed with deadlock prevention and performance in mind.</p><p>Deadlocks or hot spots often occur in administration or architecture tables such as log tables, control tables, and lock tables. The implications of these should be taken into account as well. A realistic stress test is crucial for identifying the possible bottlenecks in the architecture.</p><p>To minimize the impact of conflicts on data, the architecture should provide services such as wait-and-retry intervals when attaching to a database or when encountering a deadlock. This means a built-in mechanism to react to certain database return codes and instead of issuing an immediate error handling, waiting a predetermined amount of time and retrying the database operation.</p><p><span class="bold"><strong>4.4 Parameter Passing and Validation</strong></span></p><p>The partition architecture should be relatively transparent to application developers. The architecture should perform all tasks associated with running the application in a partitioned mode including:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Retrieve partition parameters before application start-up</p></li><li class="listitem"><p>Validate partition parameters before application start-up</p></li><li class="listitem"><p>Pass parameters to application at start-up</p></li></ul></div><p>The validation should include checks to ensure that:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>the application has sufficient partitions to cover the whole data range</p></li><li class="listitem"><p>there are no gaps between partitions</p></li></ul></div><p>If the database is partitioned, some additional validation may be necessary to ensure that a single partition does not span database partitions.</p><p>Also the architecture should take into consideration the consolidation of partitions. Key questions include:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Must all the partitions be finished before going into the next job step?</p></li><li class="listitem"><p>What happens if one of the partitions aborts?</p></li></ul></div></div></div>
|
|
|
|
<div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="whatsNew" href="#whatsNew"></a>2. What's New in Spring Batch 4.0</h1></div></div></div><p>The Spring Batch 4.0 release has three major themes:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Java 8 Requirement</p></li><li class="listitem"><p>Dependencies re-baseline</p></li><li class="listitem"><p>Builders for <code class="classname">ItemReaders</code> and <code class="classname">ItemWriters</code></p></li></ul></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="whatsNewJava" href="#whatsNewJava"></a>2.1 Java 8 Requirement</h2></div></div></div><p>Spring Batch has historically followed Spring Framework's baselines for both
|
|
java version as well as third party dependencies. With Spring Batch 4, the Spring
|
|
Framework version is being upgraded to Spring Framework 5. As such, the java
|
|
version requirement for Spring Batch is also increasing to Java 8.
|
|
</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="whatsNewDependencies" href="#whatsNewDependencies"></a>2.2 Dependencies re-baseline</h2></div></div></div><p>In order to continue to integrate with supported versions of the third party
|
|
libraries Spring Batch utilizes, Spring Batch 4 is updating the dependencies across
|
|
the board. The new dependency versions are in alignment with Spring Framework 5.
|
|
</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="whatsNewBuilders" href="#whatsNewBuilders"></a>2.3 Provide builders for the ItemReaders and ItemWriters</h2></div></div></div><p>Spring Batch 4 is providing a collection of builders for all of the <code class="classname">ItemReaders</code>
|
|
and <code class="classname">ItemWriters</code> that come with the framework. As of this release, builders for the
|
|
<code class="classname">FlatFileItemReader</code>, <code class="classname">FlatFileItemWriter</code>, <code class="classname">JdbcCursorItemReader</code>, and
|
|
<code class="classname">JdbcBatchItemWriter</code> are available. More information can be found in the javadoc
|
|
for Spring Batch.</p></div></div>
|
|
|
|
<div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="domain" href="#domain"></a>3. The Domain Language of Batch</h1></div></div></div>
|
|
|
|
|
|
<p>To any experienced batch architect, the overall concepts of batch
|
|
processing used in Spring Batch should be familiar and comfortable. There
|
|
are "Jobs" and "Steps" and developer supplied processing units called
|
|
ItemReaders and ItemWriters. However, because of the Spring patterns,
|
|
operations, templates, callbacks, and idioms, there are opportunities for
|
|
the following:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">
|
|
<p>significant improvement in adherence to a clear separation of
|
|
concerns</p>
|
|
</li><li class="listitem">
|
|
<p>clearly delineated architectural layers and services provided as
|
|
interfaces</p>
|
|
</li><li class="listitem">
|
|
<p>simple and default implementations that allow for quick adoption
|
|
and ease of use out-of-the-box</p>
|
|
</li><li class="listitem">
|
|
<p>significantly enhanced extensibility</p>
|
|
</li></ul></div>
|
|
|
|
<p>The diagram below is simplified version of the batch reference
|
|
architecture that has been used for decades. It provides an overview of the
|
|
components that make up the domain language of batch processing. This
|
|
architecture framework is a blueprint that has been proven through decades
|
|
of implementations on the last several generations of platforms
|
|
(COBOL/Mainframe, C++/Unix, and now Java/anywhere). JCL and COBOL developers
|
|
are likely to be as comfortable with the concepts as C++, C# and Java
|
|
developers. Spring Batch provides a physical implementation of the layers,
|
|
components and technical services commonly found in robust, maintainable
|
|
systems used to address the creation of simple to complex batch
|
|
applications, with the infrastructure and extensions to address very complex
|
|
processing needs.</p>
|
|
|
|
<div class="mediaobject" align="center"><img src="images/spring-batch-reference-model.png" align="middle"><div class="caption"><p>Figure 2.1: Batch Stereotypes</p></div></div>
|
|
|
|
<p>The diagram above highlights the key concepts that make up the domain
|
|
language of batch. A Job has one to many steps, which has exactly one
|
|
ItemReader, ItemProcessor, and ItemWriter. A job needs to be launched
|
|
(JobLauncher), and meta data about the currently running process needs to be
|
|
stored (JobRepository).</p>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainJob" href="#domainJob"></a>3.1 Job</h2></div></div></div>
|
|
|
|
|
|
<p>This section describes stereotypes relating to the concept of a
|
|
batch job. A <code class="classname">Job</code> is an entity that encapsulates an
|
|
entire batch process. As is common with other Spring projects, a
|
|
<code class="classname">Job</code> will be wired together via an XML configuration
|
|
file or Java based configuration. This configuration may be referred to as
|
|
the "job configuration". However, <code class="classname">Job</code> is just the
|
|
top of an overall hierarchy:</p>
|
|
|
|
<div class="mediaobject" align="center"><img src="images/job-heirarchy.png" align="middle"></div>
|
|
|
|
<p>In Spring Batch, a Job is simply a container for Steps. It combines
|
|
multiple steps that belong logically together in a flow and allows for
|
|
configuration of properties global to all steps, such as restartability.
|
|
The job configuration contains:</p>
|
|
|
|
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">
|
|
<p>The simple name of the job</p>
|
|
</li><li class="listitem">
|
|
<p>Definition and ordering of Steps</p>
|
|
</li><li class="listitem">
|
|
<p>Whether or not the job is restartable</p>
|
|
</li></ul></div>
|
|
|
|
<p>A default simple implementation of the <code class="classname">Job</code>
|
|
interface is provided by Spring Batch in the form of the
|
|
<code class="classname">SimpleJob</code> class which creates some standard
|
|
functionality on top of <code class="classname">Job</code>, however the batch
|
|
namespace abstracts away the need to instantiate it directly. Instead, the
|
|
<code class="code"><job></code> tag can be used:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"footballJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerload"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"gameLoad"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"gameLoad"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"playerSummarization"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerSummarization"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></job></span></pre>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="domainJobInstance" href="#domainJobInstance"></a>3.1.1 JobInstance</h3></div></div></div>
|
|
|
|
|
|
<p>A <code class="classname">JobInstance</code> refers to the concept of a
|
|
logical job run. Let's consider a batch job that should be run once at
|
|
the end of the day, such as the 'EndOfDay' job from the diagram above.
|
|
There is one 'EndOfDay' <code class="classname">Job</code>, but each individual
|
|
run of the <code class="classname">Job</code> must be tracked separately. In the
|
|
case of this job, there will be one logical
|
|
<code class="classname">JobInstance</code> per day. For example, there will be a
|
|
January 1st run, and a January 2nd run. If the January 1st run fails the
|
|
first time and is run again the next day, it is still the January 1st
|
|
run. (Usually this corresponds with the data it is processing as well,
|
|
meaning the January 1st run processes data for January 1st, etc).
|
|
Therefore, each <code class="classname">JobInstance</code> can have multiple
|
|
executions (<code class="classname">JobExecution</code> is discussed in more
|
|
detail below) and only one <code class="classname">JobInstance</code>
|
|
corresponding to a particular <code class="classname">Job</code> and
|
|
identifying <code class="classname">JobParameter</code>s can be running at a given
|
|
time.</p>
|
|
|
|
<p>The definition of a <code class="classname">JobInstance</code> has
|
|
absolutely no bearing on the data the will be loaded. It is entirely up
|
|
to the <code class="classname">ItemReader</code> implementation used to
|
|
determine how data will be loaded. For example, in the EndOfDay
|
|
scenario, there may be a column on the data that indicates the
|
|
'effective date' or 'schedule date' to which the data belongs. So, the
|
|
January 1st run would only load data from the 1st, and the January 2nd
|
|
run would only use data from the 2nd. Because this determination will
|
|
likely be a business decision, it is left up to the
|
|
<code class="classname">ItemReader</code> to decide. What using the same
|
|
<code class="classname">JobInstance</code> will determine, however, is whether
|
|
or not the 'state' (i.e. the <code class="classname">ExecutionContext</code>,
|
|
which is discussed below) from previous executions will be used. Using a
|
|
new <code class="classname">JobInstance</code> will mean 'start from the
|
|
beginning' and using an existing instance will generally mean 'start
|
|
from where you left off'.</p>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="domainJobParameters" href="#domainJobParameters"></a>3.1.2 JobParameters</h3></div></div></div>
|
|
|
|
|
|
<p>Having discussed <code class="classname">JobInstance</code> and how it
|
|
differs from <code class="classname">Job</code>, the natural question to ask is:
|
|
"how is one <code class="classname">JobInstance</code> distinguished from
|
|
another?" The answer is: <code class="classname">JobParameters</code>.
|
|
<code class="classname">JobParameters</code> is a set of parameters used to
|
|
start a batch job. They can be used for identification or even as
|
|
reference data during the run:</p>
|
|
|
|
<div class="mediaobject" align="center"><img src="images/job-stereotypes-parameters.png" align="middle"></div>
|
|
|
|
<p>In the example above, where there are two instances, one for
|
|
January 1st, and another for January 2nd, there is really only one Job,
|
|
one that was started with a job parameter of 01-01-2008 and another that
|
|
was started with a parameter of 01-02-2008. Thus, the contract can be
|
|
defined as: <code class="classname">JobInstance</code> =
|
|
<code class="classname">Job</code> + identifying <code class="classname">JobParameters</code>. This
|
|
allows a developer to effectively control how a
|
|
<code class="classname">JobInstance</code> is defined, since they control what
|
|
parameters are passed in.</p>
|
|
</div>
|
|
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top">
|
|
<p>Not all job parameters are required to contribute to the identification
|
|
of a <code class="classname">JobInstance</code>. By default they do, however the framework
|
|
allows the submission of a <code class="classname">Job</code> with parameters that do
|
|
not contribute to the identity of a <code class="classname">JobInstance</code> as well.</p>
|
|
</td></tr></table></div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="domainJobExecution" href="#domainJobExecution"></a>3.1.3 JobExecution</h3></div></div></div>
|
|
|
|
|
|
<p>A <code class="classname">JobExecution</code> refers to the technical
|
|
concept of a single attempt to run a <code class="classname">Job</code>. An
|
|
execution may end in failure or success, but the
|
|
<code class="classname">JobInstance</code> corresponding to a given execution
|
|
will not be considered complete unless the execution completes
|
|
successfully. Using the EndOfDay <code class="classname">Job</code> described
|
|
above as an example, consider a <code class="classname">JobInstance</code> for
|
|
01-01-2008 that failed the first time it was run. If it is run again
|
|
with the same identifying job parameters as the first run (01-01-2008), a new
|
|
<code class="classname">JobExecution</code> will be created. However, there will
|
|
still be only one <code class="classname">JobInstance</code>.</p>
|
|
|
|
<p>A <code class="classname">Job</code> defines what a job is and how it is
|
|
to be executed, and <code class="classname">JobInstance</code> is a purely
|
|
organizational object to group executions together, primarily to enable
|
|
correct restart semantics. A <code class="classname">JobExecution</code>,
|
|
however, is the primary storage mechanism for what actually happened
|
|
during a run, and as such contains many more properties that must be
|
|
controlled and persisted:</p>
|
|
|
|
<div class="table"><a name="d5e438" href="#d5e438"></a><p class="title"><b>Table 3.1. JobExecution Properties</b></p><div class="table-contents">
|
|
|
|
|
|
<table summary="JobExecution Properties" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col class="c1"><col class="c2"></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">status</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">BatchStatus</code> object that
|
|
indicates the status of the execution. While running, it's
|
|
BatchStatus.STARTED, if it fails, it's BatchStatus.FAILED, and
|
|
if it finishes successfully, it's BatchStatus.COMPLETED</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">startTime</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">java.util.Date</code> representing the
|
|
current system time when the execution was started.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">endTime</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">java.util.Date</code> representing the
|
|
current system time when the execution finished, regardless of
|
|
whether or not it was successful.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">exitStatus</td><td style="border-bottom: 0.5pt solid ; ">The <code class="classname">ExitStatus</code> indicating the
|
|
result of the run. It is most important because it contains an
|
|
exit code that will be returned to the caller. See chapter 5 for
|
|
more details.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">createTime</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">java.util.Date</code> representing the
|
|
current system time when the <code class="classname">JobExecution</code>
|
|
was first persisted. The job may not have been started yet (and
|
|
thus has no start time), but it will always have a createTime,
|
|
which is required by the framework for managing job level
|
|
<code class="classname">ExecutionContext</code>s.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">lastUpdated</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">java.util.Date</code> representing the
|
|
last time a <code class="classname">JobExecution</code> was
|
|
persisted.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">executionContext</td><td style="border-bottom: 0.5pt solid ; ">The 'property bag' containing any user data that needs to
|
|
be persisted between executions.</td></tr><tr><td style="border-right: 0.5pt solid ; ">failureExceptions</td><td style="">The list of exceptions encountered during the execution
|
|
of a <code class="classname">Job</code>. These can be useful if more
|
|
than one exception is encountered during the failure of a
|
|
<code class="classname">Job</code>.</td></tr></tbody></table>
|
|
</div></div><br class="table-break">
|
|
|
|
<p>These properties are important because they will be persisted and
|
|
can be used to completely determine the status of an execution. For
|
|
example, if the EndOfDay job for 01-01 is executed at 9:00 PM, and fails
|
|
at 9:30, the following entries will be made in the batch meta data
|
|
tables:</p>
|
|
|
|
<div class="table"><a name="d5e480" href="#d5e480"></a><p class="title"><b>Table 3.2. BATCH_JOB_INSTANCE</b></p><div class="table-contents">
|
|
|
|
|
|
<table summary="BATCH_JOB_INSTANCE" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-bottom: 0.5pt solid ; ">JOB_NAME</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="">EndOfDayJob</td></tr></tbody></table>
|
|
</div></div><br class="table-break">
|
|
|
|
<div class="table"><a name="d5e490" href="#d5e490"></a><p class="title"><b>Table 3.3. BATCH_JOB_EXECUTION_PARAMS</b></p><div class="table-contents">
|
|
|
|
|
|
<table summary="BATCH_JOB_EXECUTION_PARAMS" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_EXECUTION_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">TYPE_CD</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">KEY_NAME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">DATE_VAL</td><td style="border-bottom: 0.5pt solid ; ">IDENTIFYING</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">DATE</td><td style="border-right: 0.5pt solid ; ">schedule.Date</td><td style="border-right: 0.5pt solid ; ">2008-01-01</td><td style="">TRUE</td></tr></tbody></table>
|
|
</div></div><br class="table-break">
|
|
|
|
<div class="table"><a name="d5e506" href="#d5e506"></a><p class="title"><b>Table 3.4. BATCH_JOB_EXECUTION</b></p><div class="table-contents">
|
|
|
|
|
|
<table summary="BATCH_JOB_EXECUTION" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_EXEC_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">START_TIME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">END_TIME</td><td style="border-bottom: 0.5pt solid ; ">STATUS</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">2008-01-01 21:00</td><td style="border-right: 0.5pt solid ; ">2008-01-01 21:30</td><td style="">FAILED</td></tr></tbody></table>
|
|
</div></div><br class="table-break">
|
|
|
|
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top">
|
|
<p>column names may have been abbreviated or removed for clarity
|
|
and formatting</p>
|
|
</td></tr></table></div>
|
|
|
|
<p>Now that the job has failed, let's assume that it took the entire
|
|
course of the night for the problem to be determined, so that the 'batch
|
|
window' is now closed. Assuming the window starts at 9:00 PM, the job
|
|
will be kicked off again for 01-01, starting where it left off and
|
|
completing successfully at 9:30. Because it's now the next day, the
|
|
01-02 job must be run as well, which is kicked off just afterwards at
|
|
9:31, and completes in its normal one hour time at 10:30. There is no
|
|
requirement that one <code class="classname">JobInstance</code> be kicked off
|
|
after another, unless there is potential for the two jobs to attempt to
|
|
access the same data, causing issues with locking at the database level.
|
|
It is entirely up to the scheduler to determine when a
|
|
<code class="classname">Job</code> should be run. Since they're separate
|
|
<code class="classname">JobInstance</code>s, Spring Batch will make no attempt
|
|
to stop them from being run concurrently. (Attempting to run the same
|
|
<code class="classname">JobInstance</code> while another is already running will
|
|
result in a <code class="classname">JobExecutionAlreadyRunningException</code>
|
|
being thrown). There should now be an extra entry in both the
|
|
<code class="classname">JobInstance</code> and
|
|
<code class="classname">JobParameters</code> tables, and two extra entries in
|
|
the <code class="classname">JobExecution</code> table:</p>
|
|
|
|
<div class="table"><a name="d5e533" href="#d5e533"></a><p class="title"><b>Table 3.5. BATCH_JOB_INSTANCE</b></p><div class="table-contents">
|
|
|
|
|
|
<table summary="BATCH_JOB_INSTANCE" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-bottom: 0.5pt solid ; ">JOB_NAME</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">1</td><td style="border-bottom: 0.5pt solid ; ">EndOfDayJob</td></tr><tr><td style="border-right: 0.5pt solid ; ">2</td><td style="">EndOfDayJob</td></tr></tbody></table>
|
|
</div></div><br class="table-break">
|
|
|
|
<div class="table"><a name="d5e546" href="#d5e546"></a><p class="title"><b>Table 3.6. BATCH_JOB_EXECUTION_PARAMS</b></p><div class="table-contents">
|
|
|
|
|
|
<table summary="BATCH_JOB_EXECUTION_PARAMS" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_EXECUTION_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">TYPE_CD</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">KEY_NAME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">DATE_VAL</td><td style="border-bottom: 0.5pt solid ; ">IDENTIFYING</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">DATE</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">schedule.Date</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2008-01-01 00:00:00</td><td style="border-bottom: 0.5pt solid ; ">TRUE</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">DATE</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">schedule.Date</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2008-01-01 00:00:00</td><td style="border-bottom: 0.5pt solid ; ">TRUE</td></tr><tr><td style="border-right: 0.5pt solid ; ">3</td><td style="border-right: 0.5pt solid ; ">DATE</td><td style="border-right: 0.5pt solid ; ">schedule.Date</td><td style="border-right: 0.5pt solid ; ">2008-01-02 00:00:00</td><td style="">TRUE</td></tr></tbody></table>
|
|
</div></div><br class="table-break">
|
|
|
|
<div class="table"><a name="d5e574" href="#d5e574"></a><p class="title"><b>Table 3.7. BATCH_JOB_EXECUTION</b></p><div class="table-contents">
|
|
|
|
|
|
<table summary="BATCH_JOB_EXECUTION" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_EXEC_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">START_TIME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">END_TIME</td><td style="border-bottom: 0.5pt solid ; ">STATUS</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2008-01-01 21:00</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2008-01-01 21:30</td><td style="border-bottom: 0.5pt solid ; ">FAILED</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2008-01-02 21:00</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">2008-01-02 21:30</td><td style="border-bottom: 0.5pt solid ; ">COMPLETED</td></tr><tr><td style="border-right: 0.5pt solid ; ">3</td><td style="border-right: 0.5pt solid ; ">2</td><td style="border-right: 0.5pt solid ; ">2008-01-02 21:31</td><td style="border-right: 0.5pt solid ; ">2008-01-02 22:29</td><td style="">COMPLETED</td></tr></tbody></table>
|
|
</div></div><br class="table-break">
|
|
|
|
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top">
|
|
<p>column names may have been abbreviated or removed for clarity
|
|
and formatting</p>
|
|
</td></tr></table></div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainStep" href="#domainStep"></a>3.2 Step</h2></div></div></div>
|
|
|
|
|
|
<p>A <code class="classname">Step</code> is a domain object that encapsulates
|
|
an independent, sequential phase of a batch job. Therefore, every
|
|
<code class="classname">Job</code> is composed entirely of one or more steps. A
|
|
<code class="classname">Step</code> contains all of the information necessary to
|
|
define and control the actual batch processing. This is a necessarily
|
|
vague description because the contents of any given
|
|
<code class="classname">Step</code> are at the discretion of the developer writing
|
|
a <code class="classname">Job</code>. A Step can be as simple or complex as the
|
|
developer desires. A simple <code class="classname">Step</code> might load data
|
|
from a file into the database, requiring little or no code. (depending
|
|
upon the implementations used) A more complex <code class="classname">Step</code>
|
|
may have complicated business rules that are applied as part of the
|
|
processing. As with <code class="classname">Job</code>, a
|
|
<code class="classname">Step</code> has an individual
|
|
<code class="classname">StepExecution</code> that corresponds with a unique
|
|
<code class="classname">JobExecution</code>:</p>
|
|
|
|
<div class="mediaobject" align="center"><img src="images/jobHeirarchyWithSteps.png" align="middle"></div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="domainStepExecution" href="#domainStepExecution"></a>3.2.1 StepExecution</h3></div></div></div>
|
|
|
|
|
|
<p>A <code class="classname">StepExecution</code> represents a single attempt
|
|
to execute a <code class="classname">Step</code>. A new
|
|
<code class="classname">StepExecution</code> will be created each time a
|
|
<code class="classname">Step</code> is run, similar to
|
|
<code class="classname">JobExecution</code>. However, if a step fails to execute
|
|
because the step before it fails, there will be no execution persisted
|
|
for it. A <code class="classname">StepExecution</code> will only be created when
|
|
its <code class="classname">Step</code> is actually started.</p>
|
|
|
|
<p>Step executions are represented by objects of the
|
|
<code class="classname">StepExecution</code> class. Each execution contains a
|
|
reference to its corresponding step and
|
|
<code class="classname">JobExecution</code>, and transaction related data such
|
|
as commit and rollback count and start and end times. Additionally, each
|
|
step execution will contain an <code class="classname">ExecutionContext</code>,
|
|
which contains any data a developer needs persisted across batch runs,
|
|
such as statistics or state information needed to restart. The following
|
|
is a listing of the properties for
|
|
<code class="classname">StepExecution</code>:</p>
|
|
|
|
<div class="table"><a name="d5e638" href="#d5e638"></a><p class="title"><b>Table 3.8. StepExecution Properties</b></p><div class="table-contents">
|
|
|
|
|
|
<table summary="StepExecution Properties" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col class="c1"><col class="c2"></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">status</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">BatchStatus</code> object that
|
|
indicates the status of the execution. While it's running, the
|
|
status is BatchStatus.STARTED, if it fails, the status is
|
|
BatchStatus.FAILED, and if it finishes successfully, the status
|
|
is BatchStatus.COMPLETED</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">startTime</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">java.util.Date</code> representing the
|
|
current system time when the execution was started.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">endTime</td><td style="border-bottom: 0.5pt solid ; ">A <code class="classname">java.util.Date</code> representing the
|
|
current system time when the execution finished, regardless of
|
|
whether or not it was successful.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">exitStatus</td><td style="border-bottom: 0.5pt solid ; ">The <code class="classname">ExitStatus</code> indicating the
|
|
result of the execution. It is most important because it
|
|
contains an exit code that will be returned to the caller. See
|
|
chapter 5 for more details.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">executionContext</td><td style="border-bottom: 0.5pt solid ; ">The 'property bag' containing any user data that needs to
|
|
be persisted between executions.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">readCount</td><td style="border-bottom: 0.5pt solid ; ">The number of items that have been successfully
|
|
read</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">writeCount</td><td style="border-bottom: 0.5pt solid ; ">The number of items that have been successfully
|
|
written</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">commitCount</td><td style="border-bottom: 0.5pt solid ; ">The number transactions that have been committed for this
|
|
execution</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">rollbackCount</td><td style="border-bottom: 0.5pt solid ; ">The number of times the business transaction controlled
|
|
by the <code class="classname">Step</code> has been rolled back.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">readSkipCount</td><td style="border-bottom: 0.5pt solid ; ">The number of times <code class="methodname">read</code> has
|
|
failed, resulting in a skipped item.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">processSkipCount</td><td style="border-bottom: 0.5pt solid ; ">The number of times <code class="methodname">process</code> has
|
|
failed, resulting in a skipped item.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">filterCount</td><td style="border-bottom: 0.5pt solid ; ">The number of items that have been 'filtered' by the
|
|
<code class="classname">ItemProcessor</code>.</td></tr><tr><td style="border-right: 0.5pt solid ; ">writeSkipCount</td><td style="">The number of times <code class="methodname">write</code> has
|
|
failed, resulting in a skipped item.</td></tr></tbody></table>
|
|
</div></div><br class="table-break">
|
|
</div>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainExecutionContext" href="#domainExecutionContext"></a>3.3 ExecutionContext</h2></div></div></div>
|
|
|
|
|
|
<p>An <code class="classname">ExecutionContext</code> represents a collection
|
|
of key/value pairs that are persisted and controlled by the framework in
|
|
order to allow developers a place to store persistent state that is scoped
|
|
to a <code class="classname">StepExecution</code> or
|
|
<code class="classname">JobExecution</code>. For those familiar with Quartz, it is
|
|
very similar to <code class="classname">JobDataMap</code>. The best usage example
|
|
is to facilitate restart. Using flat file input as an example, while
|
|
processing individual lines, the framework periodically persists the
|
|
<code class="classname">ExecutionContext</code> at commit points. This allows the
|
|
<code class="classname">ItemReader</code> to store its state in case a fatal error
|
|
occurs during the run, or even if the power goes out. All that is needed
|
|
is to put the current number of lines read into the context, and the
|
|
framework will do the rest:</p>
|
|
|
|
<pre class="programlisting">executionContext.putLong(getKey(LINES_READ_COUNT), reader.getPosition());</pre>
|
|
|
|
<p>Using the EndOfDay example from the Job Stereotypes section as an
|
|
example, assume there's one step: 'loadData', that loads a file into the
|
|
database. After the first failed run, the meta data tables would look like
|
|
the following:</p>
|
|
|
|
<div class="table"><a name="d5e704" href="#d5e704"></a><p class="title"><b>Table 3.9. BATCH_JOB_INSTANCE</b></p><div class="table-contents">
|
|
|
|
|
|
<table summary="BATCH_JOB_INSTANCE" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-bottom: 0.5pt solid ; ">JOB_NAME</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="">EndOfDayJob</td></tr></tbody></table>
|
|
</div></div><p><br class="table-break"></p><div class="table"><a name="d5e714" href="#d5e714"></a><p class="title"><b>Table 3.10. BATCH_JOB_PARAMS</b></p><div class="table-contents">
|
|
|
|
|
|
<table summary="BATCH_JOB_PARAMS" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">TYPE_CD</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">KEY_NAME</td><td style="border-bottom: 0.5pt solid ; ">DATE_VAL</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">DATE</td><td style="border-right: 0.5pt solid ; ">schedule.Date</td><td style="">2008-01-01</td></tr></tbody></table>
|
|
</div></div><p><br class="table-break"></p><div class="table"><a name="d5e728" href="#d5e728"></a><p class="title"><b>Table 3.11. BATCH_JOB_EXECUTION</b></p><div class="table-contents">
|
|
|
|
|
|
<table summary="BATCH_JOB_EXECUTION" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_EXEC_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INST_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">START_TIME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">END_TIME</td><td style="border-bottom: 0.5pt solid ; ">STATUS</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">2008-01-01 21:00</td><td style="border-right: 0.5pt solid ; ">2008-01-01 21:30</td><td style="">FAILED</td></tr></tbody></table>
|
|
</div></div><p><br class="table-break"></p><div class="table"><a name="d5e744" href="#d5e744"></a><p class="title"><b>Table 3.12. BATCH_STEP_EXECUTION</b></p><div class="table-contents">
|
|
|
|
|
|
<table summary="BATCH_STEP_EXECUTION" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">STEP_EXEC_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_EXEC_ID</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">STEP_NAME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">START_TIME</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">END_TIME</td><td style="border-bottom: 0.5pt solid ; ">STATUS</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">1</td><td style="border-right: 0.5pt solid ; ">loadData</td><td style="border-right: 0.5pt solid ; ">2008-01-01 21:00</td><td style="border-right: 0.5pt solid ; ">2008-01-01 21:30</td><td style="">FAILED</td></tr></tbody></table>
|
|
</div></div><p><br class="table-break"></p><div class="table"><a name="d5e762" href="#d5e762"></a><p class="title"><b>Table 3.13. BATCH_STEP_EXECUTION_CONTEXT</b></p><div class="table-contents">
|
|
|
|
|
|
<table summary="BATCH_STEP_EXECUTION_CONTEXT" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">STEP_EXEC_ID</td><td style="border-bottom: 0.5pt solid ; ">SHORT_CONTEXT</td></tr><tr><td style="border-right: 0.5pt solid ; ">1</td><td style="">{piece.count=40321}</td></tr></tbody></table>
|
|
</div></div><p><br class="table-break">In this case, the <code class="classname">Step</code> ran for 30 minutes
|
|
and processed 40,321 'pieces', which would represent lines in a file in
|
|
this scenario. This value will be updated just before each commit by the
|
|
framework, and can contain multiple rows corresponding to entries within
|
|
the <code class="classname">ExecutionContext</code>. Being notified before a
|
|
commit requires one of the various <code class="classname">StepListener</code>s,
|
|
or an <code class="classname">ItemStream</code>, which are discussed in more
|
|
detail later in this guide. As with the previous example, it is assumed
|
|
that the <code class="classname">Job</code> is restarted the next day. When it is
|
|
restarted, the values from the <code class="classname">ExecutionContext</code> of
|
|
the last run are reconstituted from the database, and when the
|
|
<code class="classname">ItemReader</code> is opened, it can check to see if it has
|
|
any stored state in the context, and initialize itself from there:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-keyword">if</span> (executionContext.containsKey(getKey(LINES_READ_COUNT))) {
|
|
log.debug(<span class="hl-string">"Initializing for restart. Restart data is: "</span> + executionContext);
|
|
|
|
<span class="hl-keyword">long</span> lineCount = executionContext.getLong(getKey(LINES_READ_COUNT));
|
|
|
|
LineReader reader = getReader();
|
|
|
|
Object record = <span class="hl-string">""</span>;
|
|
<span class="hl-keyword">while</span> (reader.getPosition() < lineCount && record != null) {
|
|
record = readLine();
|
|
}
|
|
}</pre>
|
|
|
|
<p>In this case, after the above code is executed, the current line
|
|
will be 40,322, allowing the <code class="classname">Step</code> to start again
|
|
from where it left off. The <code class="classname">ExecutionContext</code> can
|
|
also be used for statistics that need to be persisted about the run
|
|
itself. For example, if a flat file contains orders for processing that
|
|
exist across multiple lines, it may be necessary to store how many orders
|
|
have been processed (which is much different from than the number of lines
|
|
read) so that an email can be sent at the end of the
|
|
<code class="classname">Step</code> with the total orders processed in the body.
|
|
The framework handles storing this for the developer, in order to
|
|
correctly scope it with an individual <code class="classname">JobInstance</code>.
|
|
It can be very difficult to know whether an existing
|
|
<code class="classname">ExecutionContext</code> should be used or not. For
|
|
example, using the 'EndOfDay' example from above, when the 01-01 run
|
|
starts again for the second time, the framework recognizes that it is the
|
|
same <code class="classname">JobInstance</code> and on an individual
|
|
<code class="classname">Step</code> basis, pulls the
|
|
<code class="classname">ExecutionContext</code> out of the database and hands it
|
|
as part of the <code class="classname">StepExecution</code> to the
|
|
<code class="classname">Step</code> itself. Conversely, for the 01-02 run the
|
|
framework recognizes that it is a different instance, so an empty context
|
|
must be handed to the <code class="classname">Step</code>. There are many of these
|
|
types of determinations that the framework makes for the developer to
|
|
ensure the state is given to them at the correct time. It is also
|
|
important to note that exactly one <code class="classname">ExecutionContext</code>
|
|
exists per <code class="classname">StepExecution</code> at any given time. Clients
|
|
of the <code class="classname">ExecutionContext</code> should be careful because
|
|
this creates a shared keyspace, so care should be taken when putting
|
|
values in to ensure no data is overwritten. However, the
|
|
<code class="classname">Step</code> stores absolutely no data in the context, so
|
|
there is no way to adversely affect the framework.</p>
|
|
|
|
<p>It is also important to note that there is at least one
|
|
<code class="classname">ExecutionContext</code> per
|
|
<code class="classname">JobExecution</code>, and one for every
|
|
<code class="classname">StepExecution</code>. For example, consider the following
|
|
code snippet:</p>
|
|
|
|
<pre class="programlisting">ExecutionContext ecStep = stepExecution.getExecutionContext();
|
|
ExecutionContext ecJob = jobExecution.getExecutionContext();
|
|
<span class="hl-comment">//ecStep does not equal ecJob</span></pre>
|
|
|
|
<p>As noted in the comment, ecStep will not equal ecJob; they are two
|
|
different <code class="classname">ExecutionContext</code>s. The one scoped to the
|
|
<code class="classname">Step</code> will be saved at every commit point in the
|
|
<code class="classname">Step</code>, whereas the one scoped to the
|
|
<code class="classname">Job</code> will be saved in between every
|
|
<code class="classname">Step</code> execution.</p>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainJobRepository" href="#domainJobRepository"></a>3.4 JobRepository</h2></div></div></div>
|
|
|
|
|
|
<p><code class="classname">JobRepository</code> is the persistence mechanism
|
|
for all of the Stereotypes mentioned above. It provides CRUD operations
|
|
for <code class="classname">JobLauncher</code>, <code class="classname">Job</code>, and
|
|
<code class="classname">Step</code> implementations. When a
|
|
<code class="classname">Job</code> is first launched, a
|
|
<code class="classname">JobExecution</code> is obtained from the repository, and
|
|
during the course of execution <code class="classname">StepExecution</code> and
|
|
<code class="classname">JobExecution</code> implementations are persisted by
|
|
passing them to the repository:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-tag"><job-repository</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobRepository"</span><span class="hl-tag">/></span></pre>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainJobLauncher" href="#domainJobLauncher"></a>3.5 JobLauncher</h2></div></div></div>
|
|
|
|
|
|
<p><code class="classname">JobLauncher </code>represents a simple interface for
|
|
launching a <code class="classname">Job</code> with a given set of
|
|
<code class="classname">JobParameters</code>:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> JobLauncher {
|
|
|
|
<span class="hl-keyword">public</span> JobExecution run(Job job, JobParameters jobParameters)
|
|
<span class="hl-keyword">throws</span> JobExecutionAlreadyRunningException, JobRestartException;
|
|
}</pre>
|
|
|
|
<p>It is expected that implementations will obtain a valid
|
|
<code class="classname">JobExecution</code> from the
|
|
<code class="classname">JobRepository</code> and execute the
|
|
<code class="classname">Job</code>.</p>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainItemReader" href="#domainItemReader"></a>3.6 Item Reader</h2></div></div></div>
|
|
|
|
|
|
<p><code class="classname">ItemReader</code> is an abstraction that represents
|
|
the retrieval of input for a <code class="classname">Step</code>, one item at a
|
|
time. When the <code class="classname">ItemReader</code> has exhausted the items
|
|
it can provide, it will indicate this by returning null. More details
|
|
about the <code class="classname">ItemReader</code> interface and its various
|
|
implementations can be found in <a class="xref" href="#readersAndWriters" title="6. ItemReaders and ItemWriters">Chapter 6, <i>ItemReaders and ItemWriters</i></a>.</p>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainItemWriter" href="#domainItemWriter"></a>3.7 Item Writer</h2></div></div></div>
|
|
|
|
|
|
<p><code class="classname">ItemWriter</code> is an abstraction that
|
|
represents the output of a <code class="classname">Step</code>, one batch
|
|
or chunk of items at a time. Generally, an item writer has no
|
|
knowledge of the input it will receive next, only the item that
|
|
was passed in its current invocation. More details about the
|
|
<code class="classname">ItemWriter</code> interface and its various
|
|
implementations can be found in <a class="xref" href="#readersAndWriters" title="6. ItemReaders and ItemWriters">Chapter 6, <i>ItemReaders and ItemWriters</i></a>.</p>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainItemProcessor" href="#domainItemProcessor"></a>3.8 Item Processor</h2></div></div></div>
|
|
|
|
|
|
<p><code class="classname">ItemProcessor</code> is an abstraction that
|
|
represents the business processing of an item. While the
|
|
<code class="classname">ItemReader</code> reads one item, and the
|
|
<code class="classname">ItemWriter</code> writes them, the
|
|
<code class="classname">ItemProcessor</code> provides access to transform or apply
|
|
other business processing. If, while processing the item, it is determined
|
|
that the item is not valid, returning null indicates that the item should
|
|
not be written out. More details about the ItemProcessor interface can be
|
|
found in <a class="xref" href="#readersAndWriters" title="6. ItemReaders and ItemWriters">Chapter 6, <i>ItemReaders and ItemWriters</i></a>.</p>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="domainBatchNamespace" href="#domainBatchNamespace"></a>3.9 Batch Namespace</h2></div></div></div>
|
|
|
|
|
|
<p>Many of the domain concepts listed above need to be configured in a
|
|
Spring <code class="classname">ApplicationContext</code>. While there are
|
|
implementations of the interfaces above that can be used in a standard
|
|
bean definition, a namespace has been provided for ease of
|
|
configuration:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-tag"><beans:beans</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"</span><span class="bold"><strong>http://www.springframework.org/schema/batch</strong></span>"
|
|
xmlns:beans="http://www.springframework.org/schema/beans"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="
|
|
http://www.springframework.org/schema/beans
|
|
http://www.springframework.org/schema/beans/spring-beans.xsd
|
|
<span class="bold"><strong>http://www.springframework.org/schema/batch
|
|
http://www.springframework.org/schema/batch/spring-batch-2.2.xsd</strong></span>">
|
|
|
|
<span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"ioSampleJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span>
|
|
|
|
<span class="hl-tag"></beans:beans></span></pre>
|
|
|
|
<p>As long as the batch namespace has been declared, any of its
|
|
elements can be used. More information on configuring a
|
|
<code class="classname">Job</code> can be found in <a class="xref" href="#configureJob" title="4. Configuring and Running a Job">Chapter 4, <i>Configuring and Running a Job</i></a>. More information on configuring a Step can be
|
|
found in <a class="xref" href="#configureStep" title="5. Configuring a Step">Chapter 5, <i>Configuring a Step</i></a>.</p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="configureJob" href="#configureJob"></a>4. Configuring and Running a Job</h1></div></div></div><p>In the <a class="link" href="#domain" title="3. The Domain Language of Batch">domain section</a> , the overall
|
|
architecture design was discussed, using the following diagram as a
|
|
guide:</p><div class="mediaobject" align="center"><img src="images/spring-batch-reference-model.png" align="middle"></div><p>While the <code class="classname">Job</code> object may seem like a simple
|
|
container for steps, there are many configuration options of which a
|
|
developers must be aware . Furthermore, there are many considerations for
|
|
how a <code class="classname">Job</code> will be run and how its meta-data will be
|
|
stored during that run. This chapter will explain the various configuration
|
|
options and runtime concerns of a <code class="classname">Job</code> .</p><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="configuringAJob" href="#configuringAJob"></a>4.1 Configuring a Job</h2></div></div></div><p>There are multiple implementations of the <a class="link" href="#">
|
|
<code class="classname">Job</code> </a> interface, however, the namespace
|
|
abstracts away the differences in configuration. It has only three
|
|
required dependencies: a name, <code class="classname">JobRepository</code> , and
|
|
a list of <code class="classname">Step</code>s.</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"footballJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerload"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s1"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"gameLoad"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"gameLoad"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s2"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"playerSummarization"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerSummarization"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s3"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></job></span></pre><p>The examples here use a parent bean definition to create the steps;
|
|
see the section on <a class="link" href="#configureStep" title="5. Configuring a Step">step configuration</a>
|
|
for more options declaring specific step details inline. The XML namespace
|
|
defaults to referencing a repository with an id of 'jobRepository', which
|
|
is a sensible default. However, this can be overridden explicitly:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"footballJob"</span> <span class="bold"><strong>job-repository="specialRepository"</strong></span>>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerload"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s1"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"gameLoad"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"gameLoad"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s3"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"playerSummarization"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerSummarization"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s3"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></job></span></pre><p>In addition to steps a job configuration can contain other elements
|
|
that help with parallelisation (<code class="literal"><split/></code>),
|
|
declarative flow control (<code class="literal"><decision/></code>) and
|
|
externalization of flow definitions
|
|
(<code class="literal"><flow/></code>).</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="restartability" href="#restartability"></a>4.1.1 Restartability</h3></div></div></div><p>One key issue when executing a batch job concerns the behavior of
|
|
a <code class="classname">Job</code> when it is restarted. The launching of a
|
|
<code class="classname">Job</code> is considered to be a 'restart' if a
|
|
<code class="classname">JobExecution</code> already exists for the particular
|
|
<code class="classname">JobInstance</code>. Ideally, all jobs should be able to
|
|
start up where they left off, but there are scenarios where this is not
|
|
possible. <span class="bold"><strong>It is entirely up to the developer to
|
|
ensure that a new <code class="classname">JobInstance</code> is created in this
|
|
scenario</strong></span>. However, Spring Batch does provide some help. If a
|
|
<code class="classname">Job</code> should never be restarted, but should always
|
|
be run as part of a new <code class="classname">JobInstance</code>, then the
|
|
restartable property may be set to 'false':</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"footballJob"</span> <span class="bold"><strong>restartable="false"</strong></span>>
|
|
...
|
|
<span class="hl-tag"></job></span></pre><p>To phrase it another way, setting restartable to false means "this
|
|
Job does not support being started again". Restarting a Job that is not
|
|
restartable will cause a <code class="classname">JobRestartException</code> to
|
|
be thrown:</p><pre class="programlisting">Job job = <span class="hl-keyword">new</span> SimpleJob();
|
|
job.setRestartable(false);
|
|
|
|
JobParameters jobParameters = <span class="hl-keyword">new</span> JobParameters();
|
|
|
|
JobExecution firstExecution = jobRepository.createJobExecution(job, jobParameters);
|
|
jobRepository.saveOrUpdate(firstExecution);
|
|
|
|
<span class="hl-keyword">try</span> {
|
|
jobRepository.createJobExecution(job, jobParameters);
|
|
fail();
|
|
}
|
|
<span class="hl-keyword">catch</span> (JobRestartException e) {
|
|
<span class="hl-comment">// expected</span>
|
|
}</pre><p>This snippet of JUnit code shows how attempting to create a
|
|
<code class="classname">JobExecution</code> the first time for a non restartable
|
|
<code class="classname">job</code> will cause no issues. However, the second
|
|
attempt will throw a <code class="classname">JobRestartException</code>.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="interceptingJobExecution" href="#interceptingJobExecution"></a>4.1.2 Intercepting Job Execution</h3></div></div></div><p>During the course of the execution of a
|
|
<code class="classname">Job</code>, it may be useful to be notified of various
|
|
events in its lifecycle so that custom code may be executed. The
|
|
<code class="classname">SimpleJob</code> allows for this by calling a
|
|
<code class="classname">JobListener</code> at the appropriate time:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> JobExecutionListener {
|
|
|
|
<span class="hl-keyword">void</span> beforeJob(JobExecution jobExecution);
|
|
|
|
<span class="hl-keyword">void</span> afterJob(JobExecution jobExecution);
|
|
|
|
}</pre><p><code class="classname">JobListener</code>s can be added to a
|
|
<code class="classname">SimpleJob</code> via the listeners element on the
|
|
job:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"footballJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerload"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s1"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"gameLoad"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"gameLoad"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s2"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"playerSummarization"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerSummarization"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s3"</span><span class="hl-tag">/></span>
|
|
<span class="bold"><strong> <listeners>
|
|
<listener ref="sampleListener"/>
|
|
</listeners>
|
|
</strong></span><span class="hl-tag"></job></span></pre><p>It should be noted that <code class="methodname">afterJob</code> will be
|
|
called regardless of the success or failure of the
|
|
<code class="classname">Job</code>. If success or failure needs to be determined
|
|
it can be obtained from the <code class="classname">JobExecution</code>:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">void</span> afterJob(JobExecution jobExecution){
|
|
<span class="hl-keyword">if</span>( jobExecution.getStatus() == BatchStatus.COMPLETED ){
|
|
<span class="hl-comment">//job success</span>
|
|
}
|
|
<span class="hl-keyword">else</span> <span class="hl-keyword">if</span>(jobExecution.getStatus() == BatchStatus.FAILED){
|
|
<span class="hl-comment">//job failure</span>
|
|
}
|
|
}</pre><p>The annotations corresponding to this interface are:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="classname">@BeforeJob</code></p></li><li class="listitem"><p><code class="classname">@AfterJob</code></p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="inheritingFromAParentJob" href="#inheritingFromAParentJob"></a>4.1.3 Inheriting from a Parent Job</h3></div></div></div><p>If a group of <code class="classname">Job</code>s share similar, but not
|
|
identical, configurations, then it may be helpful to define a "parent"
|
|
<code class="classname">Job</code> from which the concrete
|
|
<code class="classname">Job</code>s may inherit properties. Similar to class
|
|
inheritance in Java, the "child" <code class="classname">Job</code> will combine
|
|
its elements and attributes with the parent's.</p><p>In the following example, "baseJob" is an abstract
|
|
<code class="classname">Job</code> definition that defines only a list of
|
|
listeners. The <code class="classname">Job</code> "job1" is a concrete
|
|
definition that inherits the list of listeners from "baseJob" and merges
|
|
it with its own list of listeners to produce a
|
|
<code class="classname">Job</code> with two listeners and one
|
|
<code class="classname">Step</code>, "step1".</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"baseJob"</span> <span class="hl-attribute">abstract</span>=<span class="hl-value">"true"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><listeners></span>
|
|
<span class="hl-tag"><listener</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"listenerOne"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><listeners></span>
|
|
<span class="hl-tag"></job></span>
|
|
|
|
<span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"job1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"baseJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"standaloneStep"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><listeners</span> <span class="hl-attribute">merge</span>=<span class="hl-value">"true"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><listener</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"listenerTwo"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><listeners></span>
|
|
<span class="hl-tag"></job></span></pre><p>Please see the section on <a class="link" href="#InheritingFromParentStep" title="5.1.2 Inheriting from a Parent Step">Inheriting from a Parent Step</a>
|
|
for more detailed information.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="d5e953" href="#d5e953"></a>4.1.4 JobParametersValidator</h3></div></div></div><p>A job declared in the XML namespace or using any subclass of
|
|
AbstractJob can optionally declare a validator for the job parameters at
|
|
runtime. This is useful when for instance you need to assert that a job
|
|
is started with all its mandatory parameters. There is a
|
|
DefaultJobParametersValidator that can be used to constrain combinations
|
|
of simple mandatory and optional parameters, and for more complex
|
|
constraints you can implement the interface yourself. The configuration
|
|
of a validator is supported through the XML namespace through a child
|
|
element of the job, e.g:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"job1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"baseJob3"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"standaloneStep"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><validator</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"paremetersValidator"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></job></span></pre><p>The validator can be specified as a reference (as above) or as a
|
|
nested bean definition in the beans namespace.</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="javaConfig" href="#javaConfig"></a>4.2 Java Config</h2></div></div></div><p>Spring 3 brought the ability to configure applications via java instead
|
|
of XML. As of Spring Batch 2.2.0, batch jobs can be configured using the same
|
|
java config. There are two components for the java based configuration:
|
|
the <code class="classname">@EnableBatchConfiguration</code> annotation and two builders.</p><p>The <code class="classname">@EnableBatchProcessing</code> works similarly to the other
|
|
<code class="classname">@Enable*</code> annotations in the Spring family. In this case,
|
|
<code class="classname">@EnableBatchProcessing</code> provides a base configuration for
|
|
building batch jobs. Within this base configuration, an instance of
|
|
<code class="classname">StepScope</code> is created in addition to a number of beans made
|
|
available to be autowired:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="classname">JobRepository</code> - bean name "jobRepository"</p></li><li class="listitem"><p><code class="classname">JobLauncher</code> - bean name "jobLauncher"</p></li><li class="listitem"><p><code class="classname">JobRegistry</code> - bean name "jobRegistry"</p></li><li class="listitem"><p><code class="classname">PlatformTransactionManager</code> - bean name "transactionManager"</p></li><li class="listitem"><p><code class="classname">JobBuilderFactory</code> - bean name "jobBuilders"</p></li><li class="listitem"><p><code class="classname">StepBuilderFactory</code> - bean name "stepBuilders"</p></li></ul></div><p>The core interface for this configuration is the <code class="classname">BatchConfigurer</code>.
|
|
The default implementation provides the beans mentioned above and requires a
|
|
<code class="classname">DataSource</code> as a bean within the context to be provided. This data
|
|
source will be used by the <code class="classname">JobRepository</code>.
|
|
</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>Only one configuration class needs to have the
|
|
<code class="classname">@EnableBatchProcessing</code> annotation. Once you have a class
|
|
annotated with it, you will have all of the above available.</p></td></tr></table></div><p>With the base configuration in place, a user can use the provided builder factories
|
|
to configure a job. Below is an example of a two step job configured via the
|
|
<code class="classname">JobBuilderFactory</code> and the <code class="classname">StepBuilderFactory</code>.</p><pre class="programlisting"><em><span class="hl-annotation" style="color: gray">@Configuration</span></em>
|
|
<em><span class="hl-annotation" style="color: gray">@EnableBatchProcessing</span></em>
|
|
<em><span class="hl-annotation" style="color: gray">@Import(DataSourceConfiguration.class)</span></em>
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> AppConfig {
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Autowired</span></em>
|
|
<span class="hl-keyword">private</span> JobBuilderFactory jobs;
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Autowired</span></em>
|
|
<span class="hl-keyword">private</span> StepBuilderFactory steps;
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Bean</span></em>
|
|
<span class="hl-keyword">public</span> Job job(<em><span class="hl-annotation" style="color: gray">@Qualifier("step1")</span></em> Step step1, <em><span class="hl-annotation" style="color: gray">@Qualifier("step2")</span></em> Step step2) {
|
|
<span class="hl-keyword">return</span> jobs.get(<span class="hl-string">"myJob"</span>).start(step1).next(step2).build();
|
|
}
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Bean</span></em>
|
|
<span class="hl-keyword">protected</span> Step step1(ItemReader<Person> reader, ItemProcessor<Person, Person> processor, ItemWriter<Person> writer) {
|
|
<span class="hl-keyword">return</span> steps.get(<span class="hl-string">"step1"</span>)
|
|
.<Person, Person> chunk(<span class="hl-number">10</span>)
|
|
.reader(reader)
|
|
.processor(processor)
|
|
.writer(writer)
|
|
.build();
|
|
}
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Bean</span></em>
|
|
<span class="hl-keyword">protected</span> Step step2(Tasklet tasklet) {
|
|
<span class="hl-keyword">return</span> steps.get(<span class="hl-string">"step2"</span>)
|
|
.tasklet(tasklet)
|
|
.build();
|
|
}
|
|
}</pre></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="configuringJobRepository" href="#configuringJobRepository"></a>4.3 Configuring a JobRepository</h2></div></div></div><p>As described in earlier, the <a class="link" href="#">
|
|
<code class="classname">JobRepository</code>
|
|
</a> is used for basic CRUD operations of the various persisted
|
|
domain objects within Spring Batch, such as
|
|
<code class="classname">JobExecution</code> and
|
|
<code class="classname">StepExecution</code>. It is required by many of the major
|
|
framework features, such as the <code class="classname">JobLauncher</code>,
|
|
<code class="classname">Job</code>, and <code class="classname">Step</code>. The batch
|
|
namespace abstracts away many of the implementation details of the
|
|
<code class="classname">JobRepository</code> implementations and their
|
|
collaborators. However, there are still a few configuration options
|
|
available:</p><pre class="programlisting"><span class="hl-tag"><job-repository</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobRepository"</span>
|
|
<span class="hl-attribute">data-source</span>=<span class="hl-value">"dataSource"</span>
|
|
<span class="hl-attribute">transaction-manager</span>=<span class="hl-value">"transactionManager"</span>
|
|
<span class="hl-attribute">isolation-level-for-create</span>=<span class="hl-value">"SERIALIZABLE"</span>
|
|
<span class="hl-attribute">table-prefix</span>=<span class="hl-value">"BATCH_"</span>
|
|
<span class="hl-attribute">max-varchar-length</span>=<span class="hl-value">"1000"</span><span class="hl-tag">/></span></pre><p>None of the configuration options listed above are required except
|
|
the id. If they are not set, the defaults shown above will be used. They
|
|
are shown above for awareness purposes. The
|
|
<code class="literal">max-varchar-length</code> defaults to 2500, which is the
|
|
length of the long <code class="literal">VARCHAR</code> columns in the <a class="link" href="#metaDataSchemaOverview" title="B.1 Overview">sample schema scripts</a></p>
|
|
|
|
used to store things like exit code descriptions. If you don't modify the schema and you don't use multi-byte characters you shouldn't need to change it.
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="txConfigForJobRepository" href="#txConfigForJobRepository"></a>4.3.1 Transaction Configuration for the JobRepository</h3></div></div></div><p>If the namespace is used, transactional advice will be
|
|
automatically created around the repository. This is to ensure that the
|
|
batch meta data, including state that is necessary for restarts after a
|
|
failure, is persisted correctly. The behavior of the framework is not
|
|
well defined if the repository methods are not transactional. The
|
|
isolation level in the <code class="code">create*</code> method attributes is
|
|
specified separately to ensure that when jobs are launched, if two
|
|
processes are trying to launch the same job at the same time, only one
|
|
will succeed. The default isolation level for that method is
|
|
SERIALIZABLE, which is quite aggressive: READ_COMMITTED would work just
|
|
as well; READ_UNCOMMITTED would be fine if two processes are not likely
|
|
to collide in this way. However, since a call to the
|
|
<code class="classname">create*</code> method is quite short, it is unlikely
|
|
that the SERIALIZED will cause problems, as long as the database
|
|
platform supports it. However, this can be overridden:</p><p>
|
|
</p><pre class="programlisting"><span class="hl-tag"><job-repository</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobRepository"</span>
|
|
<span class="bold"><strong>isolation-level-for-create="REPEATABLE_READ"</strong></span> /></pre><p>
|
|
</p><p>If the namespace or factory beans aren't used then it is also
|
|
essential to configure the transactional behavior of the repository
|
|
using AOP:</p><p>
|
|
</p><pre class="programlisting"><span class="hl-tag"><aop:config></span>
|
|
<span class="hl-tag"><aop:advisor</span>
|
|
<span class="hl-attribute">pointcut</span>=<span class="hl-value">"execution(* org.springframework.batch.core..*Repository+.*(..))"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><advice-ref="txAdvice" /></span>
|
|
<span class="hl-tag"></aop:config></span>
|
|
|
|
<span class="hl-tag"><tx:advice</span> <span class="hl-attribute">id</span>=<span class="hl-value">"txAdvice"</span> <span class="hl-attribute">transaction-manager</span>=<span class="hl-value">"transactionManager"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tx:attributes></span>
|
|
<span class="hl-tag"><tx:method</span> <span class="hl-attribute">name</span>=<span class="hl-value">"*"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></tx:attributes></span>
|
|
<span class="hl-tag"></tx:advice></span></pre><p>
|
|
</p><p>This fragment can be used as is, with almost no changes. Remember
|
|
also to include the appropriate namespace declarations and to make sure
|
|
spring-tx and spring-aop (or the whole of spring) are on the
|
|
classpath.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="repositoryTablePrefix" href="#repositoryTablePrefix"></a>4.3.2 Changing the Table Prefix</h3></div></div></div><p>Another modifiable property of the
|
|
<code class="classname">JobRepository</code> is the table prefix of the
|
|
meta-data tables. By default they are all prefaced with BATCH_.
|
|
BATCH_JOB_EXECUTION and BATCH_STEP_EXECUTION are two examples. However,
|
|
there are potential reasons to modify this prefix. If the schema names
|
|
needs to be prepended to the table names, or if more than one set of
|
|
meta data tables is needed within the same schema, then the table prefix
|
|
will need to be changed:</p><pre class="programlisting"><span class="hl-tag"><job-repository</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobRepository"</span>
|
|
<span class="bold"><strong>table-prefix="SYSTEM.TEST_"</strong></span> /></pre><p>Given the above changes, every query to the meta data tables will
|
|
be prefixed with "SYSTEM.TEST_". BATCH_JOB_EXECUTION will be referred to
|
|
as SYSTEM.TEST_JOB_EXECUTION.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>Only the table prefix is configurable. The table and column
|
|
names are not.</p></td></tr></table></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="inMemoryRepository" href="#inMemoryRepository"></a>4.3.3 In-Memory Repository</h3></div></div></div><p>There are scenarios in which you may not want to persist your
|
|
domain objects to the database. One reason may be speed; storing domain
|
|
objects at each commit point takes extra time. Another reason may be
|
|
that you just don't need to persist status for a particular job. For
|
|
this reason, Spring batch provides an in-memory Map version of the job
|
|
repository:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobRepository"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"transactionManager"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"transactionManager"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span></pre><p>Note that the in-memory repository is volatile and so does not
|
|
allow restart between JVM instances. It also cannot guarantee that two
|
|
job instances with the same parameters are launched simultaneously, and
|
|
is not suitable for use in a multi-threaded Job, or a locally
|
|
partitioned Step. So use the database version of the repository wherever
|
|
you need those features.</p><p>However it does require a transaction manager to be defined
|
|
because there are rollback semantics within the repository, and because
|
|
the business logic might still be transactional (e.g. RDBMS access). For
|
|
testing purposes many people find the
|
|
<code class="classname">ResourcelessTransactionManager</code> useful.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="nonStandardDatabaseTypesInRepository" href="#nonStandardDatabaseTypesInRepository"></a>4.3.4 Non-standard Database Types in a Repository</h3></div></div></div><p>If you are using a database platform that is not in the list of
|
|
supported platforms, you may be able to use one of the supported types,
|
|
if the SQL variant is close enough. To do this you can use the raw
|
|
<code class="classname">JobRepositoryFactoryBean</code> instead of the namespace
|
|
shortcut and use it to set the database type to the closest
|
|
match:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobRepository"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org...JobRepositoryFactoryBean"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"databaseType"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"db2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span></pre><p>(The <code class="classname">JobRepositoryFactoryBean</code> tries to
|
|
auto-detect the database type from the <code class="classname">DataSource</code>
|
|
if it is not specified.) The major differences between platforms are
|
|
mainly accounted for by the strategy for incrementing primary keys, so
|
|
often it might be necessary to override the
|
|
<code class="literal">incrementerFactory</code> as well (using one of the standard
|
|
implementations from the Spring Framework).</p><p>If even that doesn't work, or you are not using an RDBMS, then the
|
|
only option may be to implement the various <code class="classname">Dao</code>
|
|
interfaces that the <code class="classname">SimpleJobRepository</code> depends
|
|
on and wire one up manually in the normal Spring way.</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="configuringJobLauncher" href="#configuringJobLauncher"></a>4.4 Configuring a JobLauncher</h2></div></div></div><p>The most basic implementation of the
|
|
<code class="classname">JobLauncher</code> interface is the
|
|
<code class="classname">SimpleJobLauncher</code>. Its only required dependency is
|
|
a <code class="classname">JobRepository</code>, in order to obtain an
|
|
execution:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobLauncher"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.core.launch.support.SimpleJobLauncher"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"jobRepository"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"jobRepository"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><p>Once a <a class="link" href="#jobExecution"><code class="classname">JobExecution</code></a> is
|
|
obtained, it is passed to the execute method of
|
|
<code class="classname">Job</code>, ultimately returning the
|
|
<code class="classname">JobExecution</code> to the caller:</p><div class="mediaobject" align="center"><img src="images/job-launcher-sequence-sync.png" align="middle"></div><p>The sequence is straightforward and works well when launched from a
|
|
scheduler. However, issues arise when trying to launch from an HTTP
|
|
request. In this scenario, the launching needs to be done asynchronously
|
|
so that the <code class="classname">SimpleJobLauncher</code> returns immediately
|
|
to its caller. This is because it is not good practice to keep an HTTP
|
|
request open for the amount of time needed by long running processes such
|
|
as batch. An example sequence is below:</p><div class="mediaobject" align="center"><img src="images/job-launcher-sequence-async.png" align="middle"></div><p>The <code class="classname">SimpleJobLauncher</code> can easily be
|
|
configured to allow for this scenario by configuring a
|
|
<code class="classname">TaskExecutor</code>:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobLauncher"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.core.launch.support.SimpleJobLauncher"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"jobRepository"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"jobRepository"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"taskExecutor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.core.task.SimpleAsyncTaskExecutor"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>Any implementation of the spring <code class="classname">TaskExecutor</code>
|
|
interface can be used to control how jobs are asynchronously
|
|
executed.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="runningAJob" href="#runningAJob"></a>4.5 Running a Job</h2></div></div></div><p>At a minimum, launching a batch job requires two things: the
|
|
<code class="classname">Job</code> to be launched and a
|
|
<code class="classname">JobLauncher</code>. Both can be contained within the same
|
|
context or different contexts. For example, if launching a job from the
|
|
command line, a new JVM will be instantiated for each Job, and thus every
|
|
job will have its own <code class="classname">JobLauncher</code>. However, if
|
|
running from within a web container within the scope of an
|
|
<code class="classname">HttpRequest</code>, there will usually be one
|
|
<code class="classname">JobLauncher</code>, configured for asynchronous job
|
|
launching, that multiple requests will invoke to launch their jobs.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="runningJobsFromCommandLine" href="#runningJobsFromCommandLine"></a>4.5.1 Running Jobs from the Command Line</h3></div></div></div><p>For users that want to run their jobs from an enterprise
|
|
scheduler, the command line is the primary interface. This is because
|
|
most schedulers (with the exception of Quartz unless using the
|
|
<code class="classname">NativeJob</code>) work directly with operating system
|
|
processes, primarily kicked off with shell scripts. There are many ways
|
|
to launch a Java process besides a shell script, such as Perl, Ruby, or
|
|
even 'build tools' such as ant or maven. However, because most people
|
|
are familiar with shell scripts, this example will focus on them.</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="commandLineJobRunner" href="#commandLineJobRunner"></a>The CommandLineJobRunner</h4></div></div></div><p>Because the script launching the job must kick off a Java
|
|
Virtual Machine, there needs to be a class with a main method to act
|
|
as the primary entry point. Spring Batch provides an implementation
|
|
that serves just this purpose:
|
|
<code class="classname">CommandLineJobRunner</code>. It's important to note
|
|
that this is just one way to bootstrap your application, but there are
|
|
many ways to launch a Java process, and this class should in no way be
|
|
viewed as definitive. The <code class="classname">CommandLineJobRunner</code>
|
|
performs four tasks:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Load the appropriate
|
|
<code class="classname">ApplicationContext</code></p></li><li class="listitem"><p>Parse command line arguments into
|
|
<code class="classname">JobParameters</code></p></li><li class="listitem"><p>Locate the appropriate job based on arguments</p></li><li class="listitem"><p>Use the <code class="classname">JobLauncher</code> provided in the
|
|
application context to launch the job.</p></li></ul></div><p>All of these tasks are accomplished using only the arguments
|
|
passed in. The following are required arguments:</p><div class="table"><a name="d5e1113" href="#d5e1113"></a><p class="title"><b>Table 4.1. CommandLineJobRunner arguments</b></p><div class="table-contents"><table summary="CommandLineJobRunner arguments" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">jobPath</td><td style="border-bottom: 0.5pt solid ; ">The location of the XML file that will be used to
|
|
create an <code class="classname">ApplicationContext</code>. This file
|
|
should contain everything needed to run the complete
|
|
<code class="classname">Job</code></td></tr><tr><td style="border-right: 0.5pt solid ; ">jobName</td><td style="">The name of the job to be run.</td></tr></tbody></table></div></div><br class="table-break"><p>These arguments must be passed in with the path first and the
|
|
name second. All arguments after these are considered to be
|
|
JobParameters and must be in the format of 'name=value':</p><pre class="screen"><code class="prompt">bash$</code> java CommandLineJobRunner endOfDayJob.xml endOfDay schedule.date(date)=2007/05/05</pre><p>In most cases you would want to use a manifest to declare your
|
|
main class in a jar, but for simplicity, the class was used directly.
|
|
This example is using the same 'EndOfDay' example from the <a class="link" href="#domain" title="3. The Domain Language of Batch">domain section</a>. The first argument is
|
|
'endOfDayJob.xml', which is the Spring
|
|
<code class="classname">ApplicationContext</code> containing the
|
|
<code class="classname">Job</code>. The second argument, 'endOfDay' represents
|
|
the job name. The final argument, 'schedule.date(date)=2007/05/05'
|
|
will be converted into <code class="classname">JobParameters</code>. An
|
|
example of the XML configuration is below:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"endOfDay"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"simpleStep"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></job></span>
|
|
|
|
<span class="hl-comment"><!-- Launcher details removed for clarity --></span>
|
|
<span class="hl-tag"><beans:bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobLauncher"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.core.launch.support.SimpleJobLauncher"</span><span class="hl-tag"> /></span></pre><p>This example is overly simplistic, since there are many more
|
|
requirements to a run a batch job in Spring Batch in general, but it
|
|
serves to show the two main requirements of the
|
|
<code class="classname">CommandLineJobRunner</code>:
|
|
<code class="classname">Job</code> and
|
|
<code class="classname">JobLauncher</code></p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="exitCodes" href="#exitCodes"></a>ExitCodes</h4></div></div></div><p>When launching a batch job from the command-line, an enterprise
|
|
scheduler is often used. Most schedulers are fairly dumb and work only
|
|
at the process level. This means that they only know about some
|
|
operating system process such as a shell script that they're invoking.
|
|
In this scenario, the only way to communicate back to the scheduler
|
|
about the success or failure of a job is through return codes. A
|
|
return code is a number that is returned to a scheduler by the process
|
|
that indicates the result of the run. In the simplest case: 0 is
|
|
success and 1 is failure. However, there may be more complex
|
|
scenarios: If job A returns 4 kick off job B, and if it returns 5 kick
|
|
off job C. This type of behavior is configured at the scheduler level,
|
|
but it is important that a processing framework such as Spring Batch
|
|
provide a way to return a numeric representation of the 'Exit Code'
|
|
for a particular batch job. In Spring Batch this is encapsulated
|
|
within an <code class="classname">ExitStatus</code>, which is covered in more
|
|
detail in Chapter 5. For the purposes of discussing exit codes, the
|
|
only important thing to know is that an
|
|
<code class="classname">ExitStatus</code> has an exit code property that is
|
|
set by the framework (or the developer) and is returned as part of the
|
|
<code class="classname">JobExecution</code> returned from the
|
|
<code class="classname">JobLauncher</code>. The
|
|
<code class="classname">CommandLineJobRunner</code> converts this string value
|
|
to a number using the <code class="classname">ExitCodeMapper</code>
|
|
interface:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ExitCodeMapper {
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">int</span> intValue(String exitCode);
|
|
|
|
}</pre><p>The essential contract of an
|
|
<code class="classname">ExitCodeMapper</code> is that, given a string exit
|
|
code, a number representation will be returned. The default
|
|
implementation used by the job runner is the SimpleJvmExitCodeMapper
|
|
that returns 0 for completion, 1 for generic errors, and 2 for any job
|
|
runner errors such as not being able to find a
|
|
<code class="classname">Job</code> in the provided context. If anything more
|
|
complex than the 3 values above is needed, then a custom
|
|
implementation of the <code class="classname">ExitCodeMapper</code> interface
|
|
must be supplied. Because the
|
|
<code class="classname">CommandLineJobRunner</code> is the class that creates
|
|
an <code class="classname">ApplicationContext</code>, and thus cannot be
|
|
'wired together', any values that need to be overwritten must be
|
|
autowired. This means that if an implementation of
|
|
<code class="classname">ExitCodeMapper</code> is found within the BeanFactory,
|
|
it will be injected into the runner after the context is created. All
|
|
that needs to be done to provide your own
|
|
<code class="classname">ExitCodeMapper</code> is to declare the implementation
|
|
as a root level bean and ensure that it is part of the
|
|
<code class="classname">ApplicationContext</code> that is loaded by the
|
|
runner.</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="runningJobsFromWebContainer" href="#runningJobsFromWebContainer"></a>4.5.2 Running Jobs from within a Web Container</h3></div></div></div><p>Historically, offline processing such as batch jobs have been
|
|
launched from the command-line, as described above. However, there are
|
|
many cases where launching from an <code class="classname">HttpRequest</code> is
|
|
a better option. Many such use cases include reporting, ad-hoc job
|
|
running, and web application support. Because a batch job by definition
|
|
is long running, the most important concern is ensuring to launch the
|
|
job asynchronously:</p><div class="mediaobject" align="center"><img src="images/launch-from-request.png" align="middle"></div><p>The controller in this case is a Spring MVC controller. More
|
|
information on Spring MVC can be found here: <a class="ulink" href="http://docs.spring.io/spring/docs/3.2.x/spring-framework-reference/html/mvc.html" target="_top">http://docs.spring.io/spring/docs/3.2.x/spring-framework-reference/html/mvc.html</a>.
|
|
The controller launches a <code class="classname">Job</code> using a
|
|
<code class="classname">JobLauncher</code> that has been configured to launch
|
|
<a class="link" href="#">asynchronously</a>, which
|
|
immediately returns a <code class="classname">JobExecution</code>. The
|
|
<code class="classname">Job</code> will likely still be running, however, this
|
|
nonblocking behaviour allows the controller to return immediately, which
|
|
is required when handling an <code class="classname">HttpRequest</code>. An
|
|
example is below:</p><pre class="programlisting"><em><span class="hl-annotation" style="color: gray">@Controller</span></em>
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> JobLauncherController {
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Autowired</span></em>
|
|
JobLauncher jobLauncher;
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Autowired</span></em>
|
|
Job job;
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@RequestMapping("/jobLauncher.html")</span></em>
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> handle() <span class="hl-keyword">throws</span> Exception{
|
|
jobLauncher.run(job, <span class="hl-keyword">new</span> JobParameters());
|
|
}
|
|
}</pre></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="advancedMetaData" href="#advancedMetaData"></a>4.6 Advanced Meta-Data Usage</h2></div></div></div><p>So far, both the JobLauncher and JobRepository interfaces have been
|
|
discussed. Together, they represent simple launching of a job, and basic
|
|
CRUD operations of batch domain objects:</p><div class="mediaobject" align="center"><img src="images/job-repository.png" align="middle"></div><p>A <code class="classname">JobLauncher</code> uses the
|
|
<code class="classname">JobRepository</code> to create new
|
|
<code class="classname">JobExecution</code> objects and run them.
|
|
<code class="classname">Job</code> and <code class="classname">Step</code> implementations
|
|
later use the same <code class="classname">JobRepository</code> for basic updates
|
|
of the same executions during the running of a <code class="classname">Job</code>.
|
|
The basic operations suffice for simple scenarios, but in a large batch
|
|
environment with hundreds of batch jobs and complex scheduling
|
|
requirements, more advanced access of the meta data is required:</p><div class="mediaobject" align="center"><img src="images/job-repository-advanced.png" align="middle"></div><p>The <code class="classname">JobExplorer</code> and
|
|
<code class="classname">JobOperator</code> interfaces, which will be discussed
|
|
below, add additional functionality for querying and controlling the meta
|
|
data.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="queryingRepository" href="#queryingRepository"></a>4.6.1 Querying the Repository</h3></div></div></div><p>The most basic need before any advanced features is the ability to
|
|
query the repository for existing executions. This functionality is
|
|
provided by the <code class="classname">JobExplorer</code> interface:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> JobExplorer {
|
|
|
|
List<JobInstance> getJobInstances(String jobName, <span class="hl-keyword">int</span> start, <span class="hl-keyword">int</span> count);
|
|
|
|
JobExecution getJobExecution(Long executionId);
|
|
|
|
StepExecution getStepExecution(Long jobExecutionId, Long stepExecutionId);
|
|
|
|
JobInstance getJobInstance(Long instanceId);
|
|
|
|
List<JobExecution> getJobExecutions(JobInstance jobInstance);
|
|
|
|
Set<JobExecution> findRunningJobExecutions(String jobName);
|
|
}</pre><p>As is evident from the method signatures above,
|
|
<code class="classname">JobExplorer</code> is a read-only version of the
|
|
<code class="classname">JobRepository</code>, and like the
|
|
<code class="classname">JobRepository</code>, it can be easily configured via a
|
|
factory bean:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobExplorer"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...JobExplorerFactoryBean"</span>
|
|
<span class="hl-attribute">p:dataSource-ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag"> /></span></pre><p><a class="link" href="#repositoryTablePrefix" title="4.3.2 Changing the Table Prefix">Earlier in this
|
|
chapter</a>, it was mentioned that the table prefix of the
|
|
<code class="classname">JobRepository</code> can be modified to allow for
|
|
different versions or schemas. Because the
|
|
<code class="classname">JobExplorer</code> is working with the same tables, it
|
|
too needs the ability to set a prefix:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobExplorer"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...JobExplorerFactoryBean"</span>
|
|
<span class="hl-attribute">p:dataSource-ref</span>=<span class="hl-value">"dataSource"</span> <span class="bold"><strong>p:tablePrefix="BATCH_" </strong></span>/></pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="d5e1215" href="#d5e1215"></a>4.6.2 JobRegistry</h3></div></div></div><p>A JobRegistry (and its parent interface JobLocator) is not
|
|
mandatory, but it can be useful if you want to keep track of which jobs
|
|
are available in the context. It is also useful for collecting jobs
|
|
centrally in an application context when they have been created
|
|
elsewhere (e.g. in child contexts). Custom JobRegistry implementations
|
|
can also be used to manipulate the names and other properties of the
|
|
jobs that are registered. There is only one implementation provided by
|
|
the framework and this is based on a simple map from job name to job
|
|
instance. It is configured simply like this:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobRegistry"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...MapJobRegistry"</span><span class="hl-tag"> /></span></pre><p>There are two ways to populate a JobRegistry automatically: using
|
|
a bean post processor and using a registrar lifecycle component. These
|
|
two mechanisms are described in the following sections.</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="d5e1220" href="#d5e1220"></a>JobRegistryBeanPostProcessor</h4></div></div></div><p>This is a bean post-processor that can register all jobs as they
|
|
are created:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobRegistryBeanPostProcessor"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...JobRegistryBeanPostProcessor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"jobRegistry"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"jobRegistry"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span></pre><p>Athough it is not strictly necessary the post-processor in the
|
|
example has been given an id so that it can be included in child
|
|
contexts (e.g. as a parent bean definition) and cause all jobs created
|
|
there to also be regsistered automatically.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="d5e1225" href="#d5e1225"></a>AutomaticJobRegistrar</h4></div></div></div><p>This is a lifecycle component that creates child contexts and
|
|
registers jobs from those contexts as they are created. One advantage
|
|
of doing this is that, while the job names in the child contexts still
|
|
have to be globally unique in the registry, their dependencies can
|
|
have "natural" names. So for example, you can create a set of XML
|
|
configuration files each having only one <code class="classname">Job</code>,
|
|
but all having different definitions of an
|
|
<code class="classname">ItemReader</code> with the same bean name, e.g.
|
|
"reader". If all those files were imported into the same context, the
|
|
reader definitions would clash and override one another, but with the
|
|
automatic regsistrar this is avoided. This makes it easier to
|
|
integrate jobs contributed from separate modules of an
|
|
application.</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...AutomaticJobRegistrar"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"applicationContextFactories"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...ClasspathXmlApplicationContextsFactoryBean"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resources"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"classpath*:/config/job*.xml"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"jobLoader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...DefaultJobLoader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"jobRegistry"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"jobRegistry"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>The registrar has two mandatory properties, one is an array of
|
|
<code class="classname">ApplicationContextFactory</code> (here created from a
|
|
convenient factory bean), and the other is a
|
|
<code class="classname">JobLoader</code>. The <code class="classname">JobLoader</code>
|
|
is responsible for managing the lifecycle of the child contexts and
|
|
registering jobs in the <code class="classname">JobRegistry</code>.</p><p>The <code class="classname">ApplicationContextFactory</code> is
|
|
responsible for creating the child context and the most common usage
|
|
would be as above using a
|
|
<code class="classname">ClassPathXmlApplicationContextFactory</code>. One of
|
|
the features of this factory is that by default it copies some of the
|
|
configuration down from the parent context to the child. So for
|
|
instance you don't have to re-define the
|
|
<code class="classname">PropertyPlaceholderConfigurer</code> or AOP
|
|
configuration in the child, if it should be the same as the
|
|
parent.</p><p>The <code class="classname">AutomaticJobRegistrar</code> can be used in
|
|
conjunction with a <code class="classname">JobRegistryBeanPostProcessor</code>
|
|
if desired (as long as the <code class="classname">DefaultJobLoader</code> is
|
|
used as well). For instance this might be desirable if there are jobs
|
|
defined in the main parent context as well as in the child
|
|
locations.</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="JobOperator" href="#JobOperator"></a>4.6.3 JobOperator</h3></div></div></div><p>As previously discussed, the <code class="classname">JobRepository</code>
|
|
provides CRUD operations on the meta-data, and the
|
|
<code class="classname">JobExplorer</code> provides read-only operations on the
|
|
meta-data. However, those operations are most useful when used together
|
|
to perform common monitoring tasks such as stopping, restarting, or
|
|
summarizing a Job, as is commonly done by batch operators. Spring Batch
|
|
provides for these types of operations via the
|
|
<code class="classname">JobOperator</code> interface:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> JobOperator {
|
|
|
|
List<Long> getExecutions(<span class="hl-keyword">long</span> instanceId) <span class="hl-keyword">throws</span> NoSuchJobInstanceException;
|
|
|
|
List<Long> getJobInstances(String jobName, <span class="hl-keyword">int</span> start, <span class="hl-keyword">int</span> count)
|
|
<span class="hl-keyword">throws</span> NoSuchJobException;
|
|
|
|
Set<Long> getRunningExecutions(String jobName) <span class="hl-keyword">throws</span> NoSuchJobException;
|
|
|
|
String getParameters(<span class="hl-keyword">long</span> executionId) <span class="hl-keyword">throws</span> NoSuchJobExecutionException;
|
|
|
|
Long start(String jobName, String parameters)
|
|
<span class="hl-keyword">throws</span> NoSuchJobException, JobInstanceAlreadyExistsException;
|
|
|
|
Long restart(<span class="hl-keyword">long</span> executionId)
|
|
<span class="hl-keyword">throws</span> JobInstanceAlreadyCompleteException, NoSuchJobExecutionException,
|
|
NoSuchJobException, JobRestartException;
|
|
|
|
Long startNextInstance(String jobName)
|
|
<span class="hl-keyword">throws</span> NoSuchJobException, JobParametersNotFoundException, JobRestartException,
|
|
JobExecutionAlreadyRunningException, JobInstanceAlreadyCompleteException;
|
|
|
|
<span class="hl-keyword">boolean</span> stop(<span class="hl-keyword">long</span> executionId)
|
|
<span class="hl-keyword">throws</span> NoSuchJobExecutionException, JobExecutionNotRunningException;
|
|
|
|
String getSummary(<span class="hl-keyword">long</span> executionId) <span class="hl-keyword">throws</span> NoSuchJobExecutionException;
|
|
|
|
Map<Long, String> getStepExecutionSummaries(<span class="hl-keyword">long</span> executionId)
|
|
<span class="hl-keyword">throws</span> NoSuchJobExecutionException;
|
|
|
|
Set<String> getJobNames();
|
|
|
|
}</pre><p>The above operations represent methods from many different
|
|
interfaces, such as <code class="classname">JobLauncher</code>,
|
|
<code class="classname">JobRepository</code>,
|
|
<code class="classname">JobExplorer</code>, and
|
|
<code class="classname">JobRegistry</code>. For this reason, the provided
|
|
implementation of <code class="classname">JobOperator</code>,
|
|
<code class="classname">SimpleJobOperator</code>, has many dependencies:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobOperator"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...SimpleJobOperator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"jobExplorer"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...JobExplorerFactoryBean"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"jobRepository"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"jobRepository"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"jobRegistry"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"jobRegistry"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"jobLauncher"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"jobLauncher"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top">
|
|
If you set the table prefix on the job repository, don't forget to set it on the job explorer as well.
|
|
</td></tr></table></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="JobParametersIncrementer" href="#JobParametersIncrementer"></a>4.6.4 JobParametersIncrementer</h3></div></div></div><p>Most of the methods on <code class="classname">JobOperator</code> are
|
|
self-explanatory, and more detailed explanations can be found on the
|
|
<a class="ulink" href="http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/launch/JobOperator.html" target="_top">javadoc
|
|
of the interface</a>. However, the
|
|
<code class="methodname">startNextInstance</code> method is worth noting. This
|
|
method will always start a new instance of a <code class="classname">Job</code>.
|
|
This can be extremely useful if there are serious issues in a
|
|
<code class="classname">JobExecution</code> and the <code class="classname">Job</code>
|
|
needs to be started over again from the beginning. Unlike
|
|
<code class="classname">JobLauncher</code> though, which requires a new
|
|
<code class="classname">JobParameters</code> object that will trigger a new
|
|
<code class="classname">JobInstance</code> if the parameters are different from
|
|
any previous set of parameters, the
|
|
<code class="methodname">startNextInstance</code> method will use the
|
|
<code class="classname">JobParametersIncrementer</code> tied to the
|
|
<code class="classname">Job</code> to force the <code class="classname">Job</code> to a
|
|
new instance:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> JobParametersIncrementer {
|
|
|
|
JobParameters getNext(JobParameters parameters);
|
|
|
|
}</pre><p>The contract of <code class="classname">JobParametersIncrementer</code> is
|
|
that, given a <a class="link" href="#"><code class="classname">JobParameters</code></a>
|
|
object, it will return the 'next' <code class="classname">JobParameters</code>
|
|
object by incrementing any necessary values it may contain. This
|
|
strategy is useful because the framework has no way of knowing what
|
|
changes to the <code class="classname">JobParameters</code> make it the 'next'
|
|
instance. For example, if the only value in
|
|
<code class="classname">JobParameters</code> is a date, and the next instance
|
|
should be created, should that value be incremented by one day? Or one
|
|
week (if the job is weekly for instance)? The same can be said for any
|
|
numerical values that help to identify the <code class="classname">Job</code>,
|
|
as shown below:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> SampleIncrementer <span class="hl-keyword">implements</span> JobParametersIncrementer {
|
|
|
|
<span class="hl-keyword">public</span> JobParameters getNext(JobParameters parameters) {
|
|
<span class="hl-keyword">if</span> (parameters==null || parameters.isEmpty()) {
|
|
<span class="hl-keyword">return</span> <span class="hl-keyword">new</span> JobParametersBuilder().addLong(<span class="hl-string">"run.id"</span>, <span class="hl-number">1L</span>).toJobParameters();
|
|
}
|
|
<span class="hl-keyword">long</span> id = parameters.getLong(<span class="hl-string">"run.id"</span>,<span class="hl-number">1L</span>) + <span class="hl-number">1</span>;
|
|
<span class="hl-keyword">return</span> <span class="hl-keyword">new</span> JobParametersBuilder().addLong(<span class="hl-string">"run.id"</span>, id).toJobParameters();
|
|
}
|
|
}</pre><p>In this example, the value with a key of 'run.id' is used to
|
|
discriminate between <code class="classname">JobInstances</code>. If the
|
|
<code class="classname">JobParameters</code> passed in is null, it can be
|
|
assumed that the <code class="classname">Job</code> has never been run before
|
|
and thus its initial state can be returned. However, if not, the old
|
|
value is obtained, incremented by one, and returned. An incrementer can
|
|
be associated with <code class="classname">Job</code> via the 'incrementer'
|
|
attribute in the namespace:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"footballJob"</span> <span class="bold"><strong>incrementer="sampleIncrementer"</strong></span>>
|
|
...
|
|
<span class="hl-tag"></job></span></pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="stoppingAJob" href="#stoppingAJob"></a>4.6.5 Stopping a Job</h3></div></div></div><p>One of the most common use cases of
|
|
<code class="classname">JobOperator</code> is gracefully stopping a
|
|
<code class="classname">Job:</code></p><pre class="programlisting">Set<Long> executions = jobOperator.getRunningExecutions(<span class="hl-string">"sampleJob"</span>);
|
|
jobOperator.stop(executions.iterator().next());</pre><p>The shutdown is not immediate, since there is no way to force
|
|
immediate shutdown, especially if the execution is currently in
|
|
developer code that the framework has no control over, such as a
|
|
business service. However, as soon as control is returned back to the
|
|
framework, it will set the status of the current
|
|
<code class="classname">StepExecution</code> to
|
|
<code class="classname">BatchStatus.STOPPED</code>, save it, then do the same
|
|
for the <code class="classname">JobExecution</code> before finishing.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="d5e1303" href="#d5e1303"></a>4.6.6 Aborting a Job</h3></div></div></div><p>A job execution which is <code class="classname">FAILED</code> can be
|
|
restarted (if the Job is restartable). A job execution whose status is
|
|
<code class="classname">ABANDONED</code> will not be restarted by the framework.
|
|
The <code class="classname">ABANDONED</code> status is also used in step
|
|
executions to mark them as skippable in a restarted job execution: if a
|
|
job is executing and encounters a step that has been marked
|
|
<code class="classname">ABANDONED</code> in the previous failed job execution, it
|
|
will move on to the next step (as determined by the job flow definition
|
|
and the step execution exit status).</p><p>If the process died (<code class="literal">"kill -9"</code> or server
|
|
failure) the job is, of course, not running, but the JobRepository has
|
|
no way of knowing because no-one told it before the process died. You
|
|
have to tell it manually that you know that the execution either failed
|
|
or should be considered aborted (change its status to
|
|
<code class="classname">FAILED</code> or <code class="classname">ABANDONED</code>) - it's
|
|
a business decision and there is no way to automate it. Only change the
|
|
status to <code class="classname">FAILED</code> if it is not restartable, or if
|
|
you know the restart data is valid. There is a utility in Spring Batch
|
|
Admin <code class="classname">JobService</code> to abort a job execution.</p></div></div></div>
|
|
|
|
<div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="configureStep" href="#configureStep"></a>5. Configuring a Step</h1></div></div></div><p>As discussed in <a class="xref" href="#domain" title="3. The Domain Language of Batch">Batch Domain Language</a>, a
|
|
<code class="classname">Step</code> is a domain object that encapsulates an
|
|
independent, sequential phase of a batch job and contains all of the
|
|
information necessary to define and control the actual batch processing.
|
|
This is a necessarily vague description because the contents of any given
|
|
<code class="classname">Step</code> are at the discretion of the developer writing a
|
|
<code class="classname">Job</code>. A Step can be as simple or complex as the
|
|
developer desires. A simple <code class="classname">Step</code> might load data from
|
|
a file into the database, requiring little or no code. (depending upon the
|
|
implementations used) A more complex <code class="classname">Step</code> may have
|
|
complicated business rules that are applied as part of the
|
|
processing.</p><div class="mediaobject" align="center"><img src="images/step.png" align="middle"></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="chunkOrientedProcessing" href="#chunkOrientedProcessing"></a>5.1 Chunk-Oriented Processing</h2></div></div></div><p>Spring Batch uses a 'Chunk Oriented' processing style within its
|
|
most common implementation. Chunk oriented processing refers to reading
|
|
the data one at a time, and creating 'chunks' that will be written out,
|
|
within a transaction boundary. One item is read in from an
|
|
<code class="classname">ItemReader</code>, handed to an
|
|
<code class="classname">ItemProcessor</code>, and aggregated. Once the number of
|
|
items read equals the commit interval, the entire chunk is written out via
|
|
the ItemWriter, and then the transaction is committed.</p><div class="mediaobject" align="center"><img src="images/chunk-oriented-processing.png" align="middle"></div><p>Below is a code representation of the same concepts shown
|
|
above:</p><pre class="programlisting">List items = <span class="hl-keyword">new</span> Arraylist();
|
|
<span class="hl-keyword">for</span>(<span class="hl-keyword">int</span> i = <span class="hl-number">0</span>; i < commitInterval; i++){
|
|
Object item = itemReader.read()
|
|
Object processedItem = itemProcessor.process(item);
|
|
items.add(processedItem);
|
|
}
|
|
itemWriter.write(items);</pre><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="configuringAStep" href="#configuringAStep"></a>5.1.1 Configuring a Step</h3></div></div></div><p>Despite the relatively short list of required dependencies for a
|
|
<code class="classname">Step</code>, it is an extremely complex class that can
|
|
potentially contain many collaborators. In order to ease configuration,
|
|
the Spring Batch namespace can be used:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"sampleJob"</span> <span class="hl-attribute">job-repository</span>=<span class="hl-value">"jobRepository"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet</span> <span class="hl-attribute">transaction-manager</span>=<span class="hl-value">"transactionManager"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"10"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span></pre><p>The configuration above represents the only required dependencies
|
|
to create a item-oriented step:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>reader - The <code class="classname">ItemReader</code> that provides
|
|
items for processing.</p></li><li class="listitem"><p>writer - The <code class="classname">ItemWriter</code> that
|
|
processes the items provided by the
|
|
<code class="classname">ItemReader</code>.</p></li><li class="listitem"><p>transaction-manager - Spring's
|
|
<code class="classname">PlatformTransactionManager</code> that will be
|
|
used to begin and commit transactions during processing.</p></li><li class="listitem"><p>job-repository - The <code class="classname">JobRepository</code>
|
|
that will be used to periodically store the
|
|
<code class="classname">StepExecution</code> and
|
|
<code class="classname">ExecutionContext</code> during processing (just
|
|
before committing). For an in-line <step/> (one defined
|
|
within a <job/>) it is an attribute on the <job/>
|
|
element; for a standalone step, it is defined as an attribute of
|
|
the <tasklet/>.</p></li><li class="listitem"><p>commit-interval - The number of items that will be processed
|
|
before the transaction is committed.</p></li></ul></div><p>It should be noted that, job-repository defaults to
|
|
"jobRepository" and transaction-manager defaults to "transactionManger".
|
|
Furthermore, the <code class="classname">ItemProcessor</code> is optional, not
|
|
required, since the item could be directly passed from the reader to the
|
|
writer.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="InheritingFromParentStep" href="#InheritingFromParentStep"></a>5.1.2 Inheriting from a Parent Step</h3></div></div></div><p>If a group of <code class="classname">Step</code>s share similar
|
|
configurations, then it may be helpful to define a "parent"
|
|
<code class="classname">Step</code> from which the concrete
|
|
<code class="classname">Step</code>s may inherit properties. Similar to class
|
|
inheritance in Java, the "child" <code class="classname">Step</code> will
|
|
combine its elements and attributes with the parent's. The child will
|
|
also override any of the parent's <code class="classname">Step</code>s.</p><p>In the following example, the <code class="classname">Step</code>
|
|
"concreteStep1" will inherit from "parentStep". It will be instantiated
|
|
with 'itemReader', 'itemProcessor', 'itemWriter', startLimit=5, and
|
|
allowStartIfComplete=true. Additionally, the commitInterval will be '5'
|
|
since it is overridden by the "concreteStep1":</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"parentStep"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet</span> <span class="hl-attribute">allow-start-if-complete</span>=<span class="hl-value">"true"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"10"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"concreteStep1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"parentStep"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet</span> <span class="hl-attribute">start-limit</span>=<span class="hl-value">"5"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">processor</span>=<span class="hl-value">"itemProcessor"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"5"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span></pre><p>The id attribute is still required on the step within the job
|
|
element. This is for two reasons:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>The id will be used as the step name when persisting the
|
|
StepExecution. If the same standalone step is referenced in more
|
|
than one step in the job, an error will occur.</p></li><li class="listitem"><p>When creating job flows, as described later in this chapter,
|
|
the next attribute should be referring to the step in the flow,
|
|
not the standalone step.</p></li></ol></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="abstractStep" href="#abstractStep"></a>Abstract Step</h4></div></div></div><p>Sometimes it may be necessary to define a parent
|
|
<code class="classname">Step</code> that is not a complete
|
|
<code class="classname">Step</code> configuration. If, for instance, the
|
|
reader, writer, and tasklet attributes are left off of a
|
|
<code class="classname">Step </code>configuration, then initialization will
|
|
fail. If a parent must be defined without these properties, then the
|
|
"abstract" attribute should be used. An "abstract"
|
|
<code class="classname">Step</code> will not be instantiated; it is used only
|
|
for extending.</p><p>In the following example, the <code class="classname">Step</code>
|
|
"abstractParentStep" would not instantiate if it were not declared to
|
|
be abstract. The <code class="classname">Step</code> "concreteStep2" will have
|
|
'itemReader', 'itemWriter', and commitInterval=10.</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"abstractParentStep"</span> <span class="hl-attribute">abstract</span>=<span class="hl-value">"true"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"10"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"concreteStep2"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"abstractParentStep"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span></pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="mergingListsOnStep" href="#mergingListsOnStep"></a>Merging Lists</h4></div></div></div><p>Some of the configurable elements on
|
|
<code class="classname">Step</code>s are lists; the <listeners/>
|
|
element, for instance. If both the parent and child
|
|
<code class="classname">Step</code>s declare a <listeners/> element,
|
|
then the child's list will override the parent's. In order to allow a
|
|
child to add additional listeners to the list defined by the parent,
|
|
every list element has a "merge" attribute. If the element specifies
|
|
that merge="true", then the child's list will be combined with the
|
|
parent's instead of overriding it.</p><p>In the following example, the <code class="classname">Step</code>
|
|
"concreteStep3" will be created will two listeners:
|
|
<code class="classname">listenerOne</code> and
|
|
<code class="classname">listenerTwo</code>:</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"listenersParentStep"</span> <span class="hl-attribute">abstract</span>=<span class="hl-value">"true"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><listeners></span>
|
|
<span class="hl-tag"><listener</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"listenerOne"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><listeners></span>
|
|
<span class="hl-tag"></step></span>
|
|
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"concreteStep3"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"listenersParentStep"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"5"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"><listeners</span> <span class="hl-attribute">merge</span>=<span class="hl-value">"true"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><listener</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"listenerTwo"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><listeners></span>
|
|
<span class="hl-tag"></step></span></pre></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="commitInterval" href="#commitInterval"></a>5.1.3 The Commit Interval</h3></div></div></div><p>As mentioned above, a step reads in and writes out items,
|
|
periodically committing using the supplied
|
|
<code class="classname">PlatformTransactionManager</code>. With a
|
|
commit-interval of 1, it will commit after writing each individual item.
|
|
This is less than ideal in many situations, since beginning and
|
|
committing a transaction is expensive. Ideally, it is preferable to
|
|
process as many items as possible in each transaction, which is
|
|
completely dependent upon the type of data being processed and the
|
|
resources with which the step is interacting. For this reason, the
|
|
number of items that are processed within a commit can be
|
|
configured.</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"sampleJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span> <span class="bold"><strong>commit-interval="10"</strong></span>/>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span></pre><p>In the example above, 10 items will be processed within each
|
|
transaction. At the beginning of processing a transaction is begun, and
|
|
each time <span class="markup">read</span> is called on the
|
|
<code class="classname">ItemReader</code>, a counter is incremented. When it
|
|
reaches 10, the list of aggregated items is passed to the
|
|
<code class="classname">ItemWriter</code>, and the transaction will be
|
|
committed.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="stepRestart" href="#stepRestart"></a>5.1.4 Configuring a Step for Restart</h3></div></div></div><p>In <a class="xref" href="#configureJob" title="4. Configuring and Running a Job">Chapter 4, <i>Configuring and Running a Job</i></a>, restarting a
|
|
<code class="classname">Job</code> was discussed. Restart has numerous impacts
|
|
on steps, and as such may require some specific configuration.</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="startLimit" href="#startLimit"></a>Setting a StartLimit</h4></div></div></div><p>There are many scenarios where you may want to control the
|
|
number of times a <code class="classname">Step</code> may be started. For
|
|
example, a particular <code class="classname">Step</code> might need to be
|
|
configured so that it only runs once because it invalidates some
|
|
resource that must be fixed manually before it can be run again. This
|
|
is configurable on the step level, since different steps may have
|
|
different requirements. A <code class="classname">Step</code> that may only be
|
|
executed once can exist as part of the same <code class="classname">Job</code>
|
|
as a <code class="classname">Step</code> that can be run infinitely. Below is
|
|
an example start limit configuration:</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet</span> <span class="hl-attribute">start-limit</span>=<span class="hl-value">"1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"10"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span></pre><p>The simple step above can be run only once. Attempting to run it
|
|
again will cause an exception to be thrown. It should be noted that
|
|
the default value for the start-limit is
|
|
<code class="classname">Integer.MAX_VALUE</code>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="allowStartIfComplete" href="#allowStartIfComplete"></a>Restarting a completed step</h4></div></div></div><p>In the case of a restartable job, there may be one or more steps
|
|
that should always be run, regardless of whether or not they were
|
|
successful the first time. An example might be a validation step, or a
|
|
<code class="classname">Step</code> that cleans up resources before
|
|
processing. During normal processing of a restarted job, any step with
|
|
a status of 'COMPLETED', meaning it has already been completed
|
|
successfully, will be skipped. Setting allow-start-if-complete to
|
|
"true" overrides this so that the step will always run:</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet</span> <span class="hl-attribute">allow-start-if-complete</span>=<span class="hl-value">"true"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"10"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span></pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="stepRestartExample" href="#stepRestartExample"></a>Step Restart Configuration Example</h4></div></div></div><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"footballJob"</span> <span class="hl-attribute">restartable</span>=<span class="hl-value">"true"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerload"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"gameLoad"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"playerFileItemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"playerWriter"</span>
|
|
<span class="hl-attribute">commit-interval</span>=<span class="hl-value">"10"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"gameLoad"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"playerSummarization"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet</span> <span class="hl-attribute">allow-start-if-complete</span>=<span class="hl-value">"true"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"gameFileItemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"gameWriter"</span>
|
|
<span class="hl-attribute">commit-interval</span>=<span class="hl-value">"10"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerSummarization"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet</span> <span class="hl-attribute">start-limit</span>=<span class="hl-value">"3"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"playerSummarizationSource"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"summaryWriter"</span>
|
|
<span class="hl-attribute">commit-interval</span>=<span class="hl-value">"10"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span></pre><p>The above example configuration is for a job that loads in
|
|
information about football games and summarizes them. It contains
|
|
three steps: playerLoad, gameLoad, and playerSummarization. The
|
|
playerLoad <code class="classname">Step</code> loads player information from a
|
|
flat file, while the gameLoad <code class="classname">Step</code> does the
|
|
same for games. The final <code class="classname">Step</code>,
|
|
playerSummarization, then summarizes the statistics for each player
|
|
based upon the provided games. It is assumed that the file loaded by
|
|
'playerLoad' must be loaded only once, but that 'gameLoad' will load
|
|
any games found within a particular directory, deleting them after
|
|
they have been successfully loaded into the database. As a result, the
|
|
playerLoad <code class="classname">Step</code> contains no additional
|
|
configuration. It can be started almost limitlessly, and if complete
|
|
will be skipped. The 'gameLoad' <code class="classname">Step</code>, however,
|
|
needs to be run every time in case extra files have been dropped since
|
|
it last executed. It has 'allow-start-if-complete' set to 'true' in
|
|
order to always be started. (It is assumed that the database tables
|
|
games are loaded into has a process indicator on it, to ensure new
|
|
games can be properly found by the summarization step). The
|
|
summarization <code class="classname">Step</code>, which is the most important
|
|
in the <code class="classname">Job</code>, is configured to have a start limit
|
|
of 3. This is useful because if the step continually fails, a new exit
|
|
code will be returned to the operators that control job execution, and
|
|
it won't be allowed to start again until manual intervention has taken
|
|
place.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>This job is purely for example purposes and is not the same as
|
|
the footballJob found in the samples project.</p></td></tr></table></div><p>Run 1:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>playerLoad is executed and completes successfully, adding
|
|
400 players to the 'PLAYERS' table.</p></li><li class="listitem"><p>gameLoad is executed and processes 11 files worth of game
|
|
data, loading their contents into the 'GAMES' table.</p></li><li class="listitem"><p>playerSummarization begins processing and fails after 5
|
|
minutes.</p></li></ol></div><p>Run 2:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>playerLoad is not run, since it has already completed
|
|
successfully, and allow-start-if-complete is 'false' (the
|
|
default).</p></li><li class="listitem"><p>gameLoad is executed again and processes another 2 files,
|
|
loading their contents into the 'GAMES' table as well (with a
|
|
process indicator indicating they have yet to be processed)</p></li><li class="listitem"><p>playerSummarization begins processing of all remaining game
|
|
data (filtering using the process indicator) and fails again after
|
|
30 minutes.</p></li></ol></div><p>Run 3:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>playerLoad is not run, since it has already completed
|
|
successfully, and allow-start-if-complete is 'false' (the
|
|
default).</p></li><li class="listitem"><p>gameLoad is executed again and processes another 2 files,
|
|
loading their contents into the 'GAMES' table as well (with a
|
|
process indicator indicating they have yet to be processed)</p></li><li class="listitem"><p>playerSummarization is not start, and the job is immediately
|
|
killed, since this is the third execution of playerSummarization,
|
|
and its limit is only 2. The limit must either be raised, or the
|
|
<code class="classname">Job</code> must be executed as a new
|
|
<code class="classname">JobInstance</code>.</p></li></ol></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="configuringSkip" href="#configuringSkip"></a>5.1.5 Configuring Skip Logic</h3></div></div></div><p>There are many scenarios where errors encountered while processing
|
|
should not result in <code class="classname">Step</code> failure, but should be
|
|
skipped instead. This is usually a decision that must be made by someone
|
|
who understands the data itself and what meaning it has. Financial data,
|
|
for example, may not be skippable because it results in money being
|
|
transferred, which needs to be completely accurate. Loading a list of
|
|
vendors, on the other hand, might allow for skips. If a vendor is not
|
|
loaded because it was formatted incorrectly or was missing necessary
|
|
information, then there probably won't be issues. Usually these bad
|
|
records are logged as well, which will be covered later when discussing
|
|
listeners.
|
|
</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"flatFileItemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span>
|
|
<span class="hl-attribute">commit-interval</span>=<span class="hl-value">"10"</span> <span class="bold"><strong>skip-limit="10"</strong></span>>
|
|
<span class="bold"><strong><skippable-exception-classes>
|
|
<include class="org.springframework.batch.item.file.FlatFileParseException"/>
|
|
</skippable-exception-classes></strong></span>
|
|
<span class="hl-tag"></chunk></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span></pre><p>In this example, a <code class="classname">FlatFileItemReader</code> is
|
|
used, and if at any point a
|
|
<code class="classname">FlatFileParseException</code> is thrown, it will be
|
|
skipped and counted against the total skip limit of 10. Separate counts
|
|
are made of skips on read, process and write inside the step execution,
|
|
and the limit applies across all. Once the skip limit is reached, the
|
|
next exception found will cause the step to fail.</p><p>One problem with the example above is that any other exception
|
|
besides a <code class="classname">FlatFileParseException</code> will cause the
|
|
<code class="classname">Job</code> to fail. In certain scenarios this may be the
|
|
correct behavior. However, in other scenarios it may be easier to
|
|
identify which exceptions should cause failure and skip everything
|
|
else:
|
|
</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"flatFileItemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span>
|
|
<span class="hl-attribute">commit-interval</span>=<span class="hl-value">"10"</span> <span class="bold"><strong>skip-limit="10"</strong></span>>
|
|
<span class="bold"><strong> <skippable-exception-classes>
|
|
<include class="java.lang.Exception"/>
|
|
<exclude class="java.io.FileNotFoundException"/>
|
|
</skippable-exception-classes>
|
|
</strong></span> <span class="hl-tag"></chunk></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span></pre><p>By 'including' <code class="classname">java.lang.Exception</code> as a
|
|
skippable exception class, the configuration indicates that all
|
|
<code class="classname">Exception</code>s are skippable. However, by 'excluding'
|
|
<code class="classname">java.io.FileNotFoundException</code>, the configuration
|
|
refines the list of skippable exception classes to be all
|
|
<code class="classname">Exception</code>s <span class="emphasis"><em>except</em></span>
|
|
<code class="classname">FileNotFoundException</code>. Any excluded exception
|
|
classes will be fatal if encountered (i.e. not skipped).</p><p>For any exception encountered, the skippability will be determined
|
|
by the nearest superclass in the class hierarchy. Any unclassifed
|
|
exception will be treated as 'fatal'. The order of the
|
|
<code class="code"><include/></code> and <code class="code"><exclude/></code> elements
|
|
does not matter.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="retryLogic" href="#retryLogic"></a>5.1.6 Configuring Retry Logic</h3></div></div></div><p>In most cases you want an exception to cause either a skip or
|
|
<code class="classname">Step</code> failure. However, not all exceptions are
|
|
deterministic. If a <code class="classname">FlatFileParseException</code> is
|
|
encountered while reading, it will always be thrown for that record;
|
|
resetting the <code class="classname">ItemReader</code> will not help. However,
|
|
for other exceptions, such as a
|
|
<code class="classname">DeadlockLoserDataAccessException</code>, which indicates
|
|
that the current process has attempted to update a record that another
|
|
process holds a lock on, waiting and trying again might result in
|
|
success. In this case, retry should be configured:</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span>
|
|
<span class="hl-attribute">commit-interval</span>=<span class="hl-value">"2"</span> <span class="bold"><strong>retry-limit="3"</strong></span>>
|
|
<span class="bold"><strong><retryable-exception-classes>
|
|
<include class="org.springframework.dao.DeadlockLoserDataAccessException"/>
|
|
</retryable-exception-classes></strong></span>
|
|
<span class="hl-tag"></chunk></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span></pre><p>The <code class="classname">Step</code> allows a limit for the number of
|
|
times an individual item can be retried, and a list of exceptions that
|
|
are 'retryable'. More details on how retry works can be found in <a class="xref" href="#retry" title="9. Retry">Chapter 9, <i>Retry</i></a>.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="controllingRollback" href="#controllingRollback"></a>5.1.7 Controlling Rollback</h3></div></div></div><p>By default, regardless of retry or skip, any exceptions thrown
|
|
from the <code class="classname">ItemWriter</code> will cause the transaction
|
|
controlled by the <code class="classname">Step</code> to rollback. If skip is
|
|
configured as described above, exceptions thrown from the
|
|
<code class="classname">ItemReader</code> will not cause a rollback. However,
|
|
there are many scenarios in which exceptions thrown from the
|
|
<code class="classname">ItemWriter</code> should not cause a rollback because no
|
|
action has taken place to invalidate the transaction. For this reason,
|
|
the <code class="classname">Step</code> can be configured with a list of
|
|
exceptions that should not cause rollback.</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><no-rollback-exception-classes></span>
|
|
<span class="hl-tag"><include</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.validator.ValidationException"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></no-rollback-exception-classes></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span></pre><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="transactionalReaders" href="#transactionalReaders"></a>Transactional Readers</h4></div></div></div><p>The basic contract of the <code class="classname">ItemReader</code> is
|
|
that it is forward only. The step buffers reader input, so that in the
|
|
case of a rollback the items don't need to be re-read from the reader.
|
|
However, there are certain scenarios in which the reader is built on
|
|
top of a transactional resource, such as a JMS queue. In this case,
|
|
since the queue is tied to the transaction that is rolled back, the
|
|
messages that have been pulled from the queue will be put back on. For
|
|
this reason, the step can be configured to not buffer the
|
|
items:</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"2"</span>
|
|
<span class="bold"><strong> is-reader-transactional-queue="true"</strong></span>/>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span></pre></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="transactionAttributes" href="#transactionAttributes"></a>5.1.8 Transaction Attributes</h3></div></div></div><p>Transaction attributes can be used to control the isolation,
|
|
propagation, and timeout settings. More information on setting
|
|
transaction attributes can be found in the spring core
|
|
documentation.</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><transaction-attributes</span> <span class="hl-attribute">isolation</span>=<span class="hl-value">"DEFAULT"</span>
|
|
<span class="hl-attribute">propagation</span>=<span class="hl-value">"REQUIRED"</span>
|
|
<span class="hl-attribute">timeout</span>=<span class="hl-value">"30"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span></pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="registeringItemStreams" href="#registeringItemStreams"></a>5.1.9 Registering ItemStreams with the Step</h3></div></div></div><p>The step has to take care of <code class="classname">ItemStream</code>
|
|
callbacks at the necessary points in its lifecycle. (for more
|
|
information on the <code class="classname">ItemStream</code> interface, please
|
|
refer to <a class="xref" href="#itemStream" title="6.4 ItemStream">Section 6.4, “ItemStream”</a>) This is vital if a step fails,
|
|
and might need to be restarted, because the
|
|
<code class="classname">ItemStream</code> interface is where the step gets the
|
|
information it needs about persistent state between executions.</p><p>If the <code class="classname">ItemReader</code>,
|
|
<code class="classname">ItemProcessor</code>, or
|
|
<code class="classname">ItemWriter</code> itself implements the
|
|
<code class="classname">ItemStream</code> interface, then these will be
|
|
registered automatically. Any other streams need to be registered
|
|
separately. This is often the case where there are indirect dependencies
|
|
such as delegates being injected into the reader and writer. A stream
|
|
can be registered on the <code class="classname">Step</code> through the
|
|
'streams' element, as illustrated below:</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"compositeWriter"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"2"</span><span class="hl-tag">></span>
|
|
<span class="bold"><strong><streams>
|
|
<stream ref="fileItemWriter1"/>
|
|
<stream ref="fileItemWriter2"/>
|
|
</streams></strong></span>
|
|
<span class="hl-tag"></chunk></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
|
|
<span class="hl-tag"><beans:bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"compositeWriter"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.support.CompositeItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><beans:property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"delegates"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><beans:list></span>
|
|
<span class="hl-tag"><beans:ref</span> <span class="hl-attribute">bean</span>=<span class="hl-value">"fileItemWriter1"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><beans:ref</span> <span class="hl-attribute">bean</span>=<span class="hl-value">"fileItemWriter2"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></beans:list></span>
|
|
<span class="hl-tag"></beans:property></span>
|
|
<span class="hl-tag"></beans:bean></span></pre><p>In the example above, the
|
|
<code class="classname">CompositeItemWriter</code> is not an
|
|
<code class="classname">ItemStream</code>, but both of its delegates are.
|
|
Therefore, both delegate writers must be explicitly registered as
|
|
streams in order for the framework to handle them correctly. The
|
|
<code class="classname">ItemReader</code> does not need to be explicitly
|
|
registered as a stream because it is a direct property of the
|
|
<code class="classname">Step</code>. The step will now be restartable and the
|
|
state of the reader and writer will be correctly persisted in the event
|
|
of a failure.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="interceptingStepExecution" href="#interceptingStepExecution"></a>5.1.10 Intercepting Step Execution</h3></div></div></div><p>Just as with the <code class="classname">Job</code>, there are many events
|
|
during the execution of a <code class="classname">Step</code> where a user may
|
|
need to perform some functionality. For example, in order to write out
|
|
to a flat file that requires a footer, the
|
|
<code class="classname">ItemWriter</code> needs to be notified when the
|
|
<code class="classname">Step</code> has been completed, so that the footer can
|
|
written. This can be accomplished with one of many
|
|
<code class="classname">Step</code> scoped listeners.</p><p>Any class that implements one of the extensions
|
|
of <code class="classname">StepListener</code> (but not that interface
|
|
itself since it is empty) can be applied to a step via the
|
|
listeners element. The listeners element is valid inside a
|
|
step, tasklet or chunk declaration. It is recommended that you
|
|
declare the listeners at the level which its function applies,
|
|
or if it is multi-featured
|
|
(e.g. <code class="classname">StepExecutionListener</code>
|
|
and <code class="classname">ItemReadListener</code>) then declare it at
|
|
the most granular level that it applies (chunk in the example
|
|
given).</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"reader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"writer"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"10"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><listeners></span>
|
|
<span class="hl-tag"><listener</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"chunkListener"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></listeners></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span></pre><p>An <code class="classname">ItemReader</code>,
|
|
<code class="classname">ItemWriter</code> or
|
|
<code class="classname">ItemProcessor</code> that itself implements one of the
|
|
<code class="classname">StepListener</code> interfaces will be registered
|
|
automatically with the <code class="classname">Step</code> if using the
|
|
namespace <code class="literal"><step></code> element, or one of the the
|
|
<code class="classname">*StepFactoryBean</code> factories. This only applies to
|
|
components directly injected into the <code class="classname">Step</code>: if
|
|
the listener is nested inside another component, it needs to be
|
|
explicitly registered (as described above).</p><p>In addition to the <code class="classname">StepListener</code> interfaces,
|
|
annotations are provided to address the same concerns. Plain old Java
|
|
objects can have methods with these annotations that are then converted
|
|
into the corresponding <code class="classname">StepListener</code> type. It is
|
|
also common to annotate custom implementations of chunk components like
|
|
<code class="classname">ItemReader</code> or <code class="classname">ItemWriter</code>
|
|
or <code class="classname">Tasklet</code>. The annotations are analysed by the
|
|
XML parser for the <code class="code"><listener/></code> elements, so all you
|
|
need to do is use the XML namespace to register the listeners with a
|
|
step.</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="stepExecutionListener" href="#stepExecutionListener"></a>StepExecutionListener</h4></div></div></div><p><code class="classname">StepExecutionListener</code> represents the most
|
|
generic listener for <code class="classname">Step</code> execution. It allows
|
|
for notification before a <code class="classname">Step</code> is started and
|
|
after it has ends, whether it ended normally or failed:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> StepExecutionListener <span class="hl-keyword">extends</span> StepListener {
|
|
|
|
<span class="hl-keyword">void</span> beforeStep(StepExecution stepExecution);
|
|
|
|
ExitStatus afterStep(StepExecution stepExecution);
|
|
|
|
}</pre><p><code class="classname">ExitStatus</code> is the return type of
|
|
<code class="methodname">afterStep</code> in order to allow listeners the
|
|
chance to modify the exit code that is returned upon completion of a
|
|
<code class="classname">Step</code>.</p><p>The annotations corresponding to this interface are:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="classname">@BeforeStep</code></p></li><li class="listitem"><p><code class="classname">@AfterStep</code></p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="chunkListener" href="#chunkListener"></a>ChunkListener</h4></div></div></div><p>A chunk is defined as the items processed within the scope of a
|
|
transaction. Committing a transaction, at each commit interval,
|
|
commits a 'chunk'. A <code class="classname">ChunkListener</code> can be
|
|
useful to perform logic before a chunk begins processing or after a
|
|
chunk has completed successfully:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ChunkListener <span class="hl-keyword">extends</span> StepListener {
|
|
|
|
<span class="hl-keyword">void</span> beforeChunk();
|
|
<span class="hl-keyword">void</span> afterChunk();
|
|
|
|
}</pre><p>The <code class="methodname">beforeChunk</code> method is called after
|
|
the transaction is started, but before <code class="methodname">read</code>
|
|
is called on the <code class="classname">ItemReader</code>. Conversely,
|
|
<code class="methodname">afterChunk</code> is called after the chunk has been
|
|
committed (and not at all if there is a rollback).</p><p>The annotations corresponding to this interface are:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="classname">@BeforeChunk</code></p></li><li class="listitem"><p><code class="classname">@AfterChunk</code></p></li></ul></div><p>A <code class="classname">ChunkListener</code> can be applied
|
|
when there is no chunk declaration: it is
|
|
the <code class="classname">TaskletStep</code> that is responsible for
|
|
calling the <code class="classname">ChunkListener</code> so it applies
|
|
to a non-item-oriented tasklet as well (called before and
|
|
after the tasklet).</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="itemReadListener" href="#itemReadListener"></a>ItemReadListener</h4></div></div></div><p>When discussing skip logic above, it was mentioned that it may
|
|
be beneficial to log the skipped records, so that they can be deal
|
|
with later. In the case of read errors, this can be done with an
|
|
<code class="classname">ItemReaderListener:</code>
|
|
</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ItemReadListener<T> <span class="hl-keyword">extends</span> StepListener {
|
|
|
|
<span class="hl-keyword">void</span> beforeRead();
|
|
<span class="hl-keyword">void</span> afterRead(T item);
|
|
<span class="hl-keyword">void</span> onReadError(Exception ex);
|
|
|
|
}</pre><p>The <code class="methodname">beforeRead</code> method will be called
|
|
before each call to <code class="methodname">read</code> on the
|
|
<code class="classname">ItemReader</code>. The
|
|
<code class="methodname">afterRead</code> method will be called after each
|
|
successful call to <code class="methodname">read</code>, and will be passed
|
|
the item that was read. If there was an error while reading, the
|
|
<code class="classname">onReadError</code> method will be called. The
|
|
exception encountered will be provided so that it can be
|
|
logged.</p><p>The annotations corresponding to this interface are:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="classname">@BeforeRead</code></p></li><li class="listitem"><p><code class="classname">@AfterRead</code></p></li><li class="listitem"><p><code class="classname">@OnReadError</code></p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="itemProcessListener" href="#itemProcessListener"></a>ItemProcessListener</h4></div></div></div><p>Just as with the <code class="classname">ItemReadListener</code>, the
|
|
processing of an item can be 'listened' to:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ItemProcessListener<T, S> <span class="hl-keyword">extends</span> StepListener {
|
|
|
|
<span class="hl-keyword">void</span> beforeProcess(T item);
|
|
<span class="hl-keyword">void</span> afterProcess(T item, S result);
|
|
<span class="hl-keyword">void</span> onProcessError(T item, Exception e);
|
|
|
|
}</pre><p>The <code class="methodname">beforeProcess</code> method will be called
|
|
before <code class="methodname">process</code> on the
|
|
<code class="classname">ItemProcessor</code>, and is handed the item that will
|
|
be processed. The <code class="methodname">afterProcess</code> method will be
|
|
called after the item has been successfully processed. If there was an
|
|
error while processing, the <code class="methodname">onProcessError</code>
|
|
method will be called. The exception encountered and the item that was
|
|
attempted to be processed will be provided, so that they can be
|
|
logged.</p><p>The annotations corresponding to this interface are:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="classname">@BeforeProcess</code></p></li><li class="listitem"><p><code class="classname">@AfterProcess</code></p></li><li class="listitem"><p><code class="classname">@OnProcessError</code></p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="itemWriteListener" href="#itemWriteListener"></a>ItemWriteListener</h4></div></div></div><p>The writing of an item can be 'listened' to with the
|
|
<code class="classname">ItemWriteListener</code>:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ItemWriteListener<S> <span class="hl-keyword">extends</span> StepListener {
|
|
|
|
<span class="hl-keyword">void</span> beforeWrite(List<? <span class="hl-keyword">extends</span> S> items);
|
|
<span class="hl-keyword">void</span> afterWrite(List<? <span class="hl-keyword">extends</span> S> items);
|
|
<span class="hl-keyword">void</span> onWriteError(Exception exception, List<? <span class="hl-keyword">extends</span> S> items);
|
|
|
|
}</pre><p>The <code class="methodname">beforeWrite</code> method will be called
|
|
before <code class="methodname">write</code> on the
|
|
<code class="classname">ItemWriter</code>, and is handed the item that will be
|
|
written. The <code class="methodname">afterWrite</code> method will be called
|
|
after the item has been successfully written. If there was an error
|
|
while writing, the <code class="methodname">onWriteError</code> method will
|
|
be called. The exception encountered and the item that was attempted
|
|
to be written will be provided, so that they can be logged.</p><p>The annotations corresponding to this interface are:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="classname">@BeforeWrite</code></p></li><li class="listitem"><p><code class="classname">@AfterWrite</code></p></li><li class="listitem"><p><code class="classname">@OnWriteError</code></p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="skipListener" href="#skipListener"></a>SkipListener</h4></div></div></div><p><code class="classname">ItemReadListener</code>,
|
|
<code class="classname">ItemProcessListener</code>, and
|
|
<code class="classname">ItemWriteListner</code> all provide mechanisms for
|
|
being notified of errors, but none will inform you that a record has
|
|
actually been skipped. <code class="methodname">onWriteError</code>, for
|
|
example, will be called even if an item is retried and successful. For
|
|
this reason, there is a separate interface for tracking skipped
|
|
items:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> SkipListener<T,S> <span class="hl-keyword">extends</span> StepListener {
|
|
|
|
<span class="hl-keyword">void</span> onSkipInRead(Throwable t);
|
|
<span class="hl-keyword">void</span> onSkipInProcess(T item, Throwable t);
|
|
<span class="hl-keyword">void</span> onSkipInWrite(S item, Throwable t);
|
|
|
|
}</pre><p><code class="methodname">onSkipInRead</code> will be called whenever an
|
|
item is skipped while reading. It should be noted that rollbacks may
|
|
cause the same item to be registered as skipped more than once.
|
|
<code class="methodname">onSkipInWrite</code> will be called when an item is
|
|
skipped while writing. Because the item has been read successfully
|
|
(and not skipped), it is also provided the item itself as an
|
|
argument.</p><p>The annotations corresponding to this interface are:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="classname">@OnSkipInRead</code></p></li><li class="listitem"><p><code class="classname">@OnSkipInWrite</code></p></li><li class="listitem"><p><code class="classname">@OnSkipInProcess</code></p></li></ul></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="skipListenersAndTransactions" href="#skipListenersAndTransactions"></a>SkipListeners and Transactions</h5></div></div></div><p>One of the most common use cases for a
|
|
<code class="classname">SkipListener</code> is to log out a skipped item, so
|
|
that another batch process or even human process can be used to
|
|
evaluate and fix the issue leading to the skip. Because there are
|
|
many cases in which the original transaction may be rolled back,
|
|
Spring Batch makes two guarantees:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>The appropriate skip method (depending on when the error
|
|
happened) will only be called once per item.</p></li><li class="listitem"><p>The <code class="classname">SkipListener</code> will always be
|
|
called just before the transaction is committed. This is to
|
|
ensure that any transactional resources call by the listener are
|
|
not rolled back by a failure within the
|
|
<code class="classname">ItemWriter</code>.</p></li></ol></div></div></div></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="taskletStep" href="#taskletStep"></a>5.2 TaskletStep</h2></div></div></div><p>Chunk-oriented processing is not the only way to process in a
|
|
<code class="classname">Step</code>. What if a <code class="classname">Step</code> must
|
|
consist as a simple stored procedure call? You could implement the call as
|
|
an <code class="classname">ItemReader</code> and return null after the procedure
|
|
finishes, but it is a bit unnatural since there would need to be a no-op
|
|
<code class="classname">ItemWriter</code>. Spring Batch provides the
|
|
<code class="classname">TaskletStep</code> for this scenario.</p><p>The <code class="classname">Tasklet</code> is a simple interface that has
|
|
one method, <code class="methodname">execute</code>, which will be a called
|
|
repeatedly by the <code class="classname">TaskletStep</code> until it either
|
|
returns <code class="literal">RepeatStatus.FINISHED</code> or throws an exception to
|
|
signal a failure. Each call to the <code class="classname">Tasklet</code> is
|
|
wrapped in a transaction. <code class="classname">Tasklet</code> implementors
|
|
might call a stored procedure, a script, or a simple SQL update statement.
|
|
To create a <code class="classname">TaskletStep</code>, the 'ref' attribute of the
|
|
<tasklet/> element should reference a bean defining a
|
|
<code class="classname">Tasklet</code> object; no <chunk/> element should be
|
|
used within the <tasklet/>:</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"myTasklet"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></step></span></pre><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p><code class="classname">TaskletStep</code> will automatically register the
|
|
tasklet as <code class="classname">StepListener</code> if it implements this
|
|
interface</p></td></tr></table></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="taskletAdapter" href="#taskletAdapter"></a>5.2.1 TaskletAdapter</h3></div></div></div><p>As with other adapters for the <code class="classname">ItemReader</code>
|
|
and <code class="classname">ItemWriter</code> interfaces, the
|
|
<code class="classname">Tasklet</code> interface contains an implementation that
|
|
allows for adapting itself to any pre-existing class:
|
|
<code class="classname">TaskletAdapter</code>. An example where this may be
|
|
useful is an existing DAO that is used to update a flag on a set of
|
|
records. The <code class="classname">TaskletAdapter</code> can be used to call
|
|
this class without having to write an adapter for the
|
|
<code class="classname">Tasklet</code> interface:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"myTasklet"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"o.s.b.core.step.tasklet.MethodInvokingTaskletAdapter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"targetObject"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.mycompany.FooDao"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"targetMethod"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"updateFoo"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="exampleTaskletImplementation" href="#exampleTaskletImplementation"></a>5.2.2 Example Tasklet Implementation</h3></div></div></div><p>Many batch jobs contain steps that must be done before the main
|
|
processing begins in order to set up various resources or after
|
|
processing has completed to cleanup those resources. In the case of a
|
|
job that works heavily with files, it is often necessary to delete
|
|
certain files locally after they have been uploaded successfully to
|
|
another location. The example below taken from the Spring Batch samples
|
|
project, is a <code class="classname">Tasklet</code> implementation with just
|
|
such a responsibility:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> FileDeletingTasklet <span class="hl-keyword">implements</span> Tasklet, InitializingBean {
|
|
|
|
<span class="hl-keyword">private</span> Resource directory;
|
|
|
|
<span class="hl-keyword">public</span> RepeatStatus execute(StepContribution contribution,
|
|
ChunkContext chunkContext) <span class="hl-keyword">throws</span> Exception {
|
|
File dir = directory.getFile();
|
|
Assert.state(dir.isDirectory());
|
|
|
|
File[] files = dir.listFiles();
|
|
<span class="hl-keyword">for</span> (<span class="hl-keyword">int</span> i = <span class="hl-number">0</span>; i < files.length; i++) {
|
|
<span class="hl-keyword">boolean</span> deleted = files[i].delete();
|
|
<span class="hl-keyword">if</span> (!deleted) {
|
|
<span class="hl-keyword">throw</span> <span class="hl-keyword">new</span> UnexpectedJobExecutionException(<span class="hl-string">"Could not delete file "</span> +
|
|
files[i].getPath());
|
|
}
|
|
}
|
|
<span class="hl-keyword">return</span> RepeatStatus.FINISHED;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> setDirectoryResource(Resource directory) {
|
|
<span class="hl-keyword">this</span>.directory = directory;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> afterPropertiesSet() <span class="hl-keyword">throws</span> Exception {
|
|
Assert.notNull(directory, <span class="hl-string">"directory must be set"</span>);
|
|
}
|
|
}</pre><p>The above <code class="classname">Tasklet</code> implementation will
|
|
delete all files within a given directory. It should be noted that the
|
|
<code class="methodname">execute</code> method will only be called once. All
|
|
that is left is to reference the <code class="classname">Tasklet</code> from the
|
|
<code class="classname">Step</code>:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"taskletJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"deleteFilesInDir"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"fileDeletingTasklet"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span>
|
|
|
|
<span class="hl-tag"><beans:bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"fileDeletingTasklet"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.sample.tasklet.FileDeletingTasklet"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><beans:property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"directoryResource"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><beans:bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"directory"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.core.io.FileSystemResource"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><beans:constructor-arg</span> <span class="hl-attribute">value</span>=<span class="hl-value">"target/test-outputs/test-dir"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></beans:bean></span>
|
|
<span class="hl-tag"></beans:property></span>
|
|
<span class="hl-tag"></beans:bean></span></pre></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="controllingStepFlow" href="#controllingStepFlow"></a>5.3 Controlling Step Flow</h2></div></div></div><p>With the ability to group steps together within an owning job comes
|
|
the need to be able to control how the job 'flows' from one step to
|
|
another. The failure of a <code class="classname">Step</code> doesn't necessarily
|
|
mean that the <code class="classname">Job</code> should fail. Furthermore, there
|
|
may be more than one type of 'success' which determines which
|
|
<code class="classname">Step</code> should be executed next. Depending upon how a
|
|
group of Steps is configured, certain steps may not even be processed at
|
|
all.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="SequentialFlow" href="#SequentialFlow"></a>5.3.1 Sequential Flow</h3></div></div></div><p>The simplest flow scenario is a job where all of the steps execute
|
|
sequentially:</p><div class="mediaobject" align="center"><img src="images/sequential-flow.png" align="middle"></div><p>This can be achieved using the 'next' attribute of the step
|
|
element:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"job"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"stepA"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s1"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"stepB"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"stepB"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s2"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"stepC"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"stepC"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s3"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></job></span></pre><p>In the scenario above, 'step A' will execute
|
|
first because it is the first <code class="classname">Step</code> listed. If
|
|
'step A' completes normally, then 'step B' will execute, and so on.
|
|
However, if 'step A' fails, then the entire <code class="classname">Job</code>
|
|
will fail and 'step B' will not execute.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>With the Spring Batch namespace, the first step listed in the
|
|
configuration will <span class="emphasis"><em>always</em></span> be the first step
|
|
executed by the <code class="classname">Job</code>. The order of the other
|
|
step elements does not matter, but the first step must always appear
|
|
first in the xml.</p></td></tr></table></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="conditionalFlow" href="#conditionalFlow"></a>5.3.2 Conditional Flow</h3></div></div></div><p>In the example above, there are only two possibilities:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>The <code class="classname">Step</code> is successful and the next
|
|
<code class="classname">Step</code> should be executed.</p></li><li class="listitem"><p>The <code class="classname">Step</code> failed and thus the
|
|
<code class="classname">Job</code> should fail.</p></li></ol></div><p>In many cases, this may be sufficient. However, what about a
|
|
scenario in which the failure of a <code class="classname">Step</code> should
|
|
trigger a different <code class="classname">Step</code>, rather than causing
|
|
failure? </p><div class="mediaobject" align="center"><img src="images/conditional-flow.png" align="middle"></div><p><a name="nextElement" href="#nextElement"></a>In order to handle more complex scenarios, the
|
|
Spring Batch namespace allows transition elements to be defined within
|
|
the step element. One such transition is the "next" element. Like the
|
|
"next" attribute, the "next" element will tell the
|
|
<code class="classname">Job</code> which <code class="classname">Step</code> to execute
|
|
next. However, unlike the attribute, any number of "next" elements are
|
|
allowed on a given <code class="classname">Step</code>, and there is no default
|
|
behavior the case of failure. This means that if transition elements are
|
|
used, then all of the behavior for the <code class="classname">Step</code>'s
|
|
transitions must be defined explicitly. Note also that a single step
|
|
cannot have both a "next" attribute and a transition element.</p><p>The next element specifies a pattern to match and the step to
|
|
execute next:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"job"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"stepA"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><next</span> <span class="hl-attribute">on</span>=<span class="hl-value">"*"</span> <span class="hl-attribute">to</span>=<span class="hl-value">"stepB"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><next</span> <span class="hl-attribute">on</span>=<span class="hl-value">"FAILED"</span> <span class="hl-attribute">to</span>=<span class="hl-value">"stepC"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"stepB"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s2"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"stepC"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"stepC"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s3"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></job></span></pre><p>The "on" attribute of a transition element uses a simple
|
|
pattern-matching scheme to match the <code class="classname">ExitStatus</code>
|
|
that results from the execution of the <code class="classname">Step</code>. Only
|
|
two special characters are allowed in the pattern:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>"*" will zero or more characters</p></li><li class="listitem"><p>"?" will match exactly one character</p></li></ul></div><p>For example, "c*t" will match "cat" and "count", while "c?t" will
|
|
match "cat" but not "count".</p><p>While there is no limit to the number of transition elements on a
|
|
<code class="classname">Step</code>, if the <code class="classname">Step</code>'s
|
|
execution results in an <code class="classname">ExitStatus</code> that is not
|
|
covered by an element, then the framework will throw an exception and
|
|
the <code class="classname">Job</code> will fail. The framework will
|
|
automatically order transitions from most specific to
|
|
least specific. This means that even if the elements were swapped for
|
|
"stepA" in the example above, an <code class="classname">ExitStatus</code> of
|
|
"FAILED" would still go to "stepC".</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="batchStatusVsExitStatus" href="#batchStatusVsExitStatus"></a>Batch Status vs. Exit Status</h4></div></div></div><p>When configuring a <code class="classname">Job</code> for conditional
|
|
flow, it is important to understand the difference between
|
|
<code class="classname">BatchStatus</code> and
|
|
<code class="classname">ExitStatus</code>. <code class="classname">BatchStatus</code>
|
|
is an enumeration that is a property of both
|
|
<code class="classname">JobExecution</code> and
|
|
<code class="classname">StepExecution</code> and is used by the framework to
|
|
record the status of a <code class="classname">Job</code> or
|
|
<code class="classname">Step</code>. It can be one of the following values:
|
|
COMPLETED, STARTING, STARTED, STOPPING, STOPPED, FAILED, ABANDONED or
|
|
UNKNOWN. Most of them are self explanatory: COMPLETED is the status
|
|
set when a step or job has completed successfully, FAILED is set when
|
|
it fails, and so on. The example above contains the following 'next'
|
|
element:</p><pre class="programlisting"><span class="hl-tag"><next</span> <span class="hl-attribute">on</span>=<span class="hl-value">"FAILED"</span> <span class="hl-attribute">to</span>=<span class="hl-value">"stepB"</span><span class="hl-tag"> /></span></pre><p>At first glance, it would appear that the 'on' attribute
|
|
references the <code class="classname">BatchStatus</code> of the
|
|
<code class="classname">Step</code> to which it belongs. However, it actually
|
|
references the <code class="classname">ExitStatus</code> of the
|
|
<code class="classname">Step</code>. As the name implies,
|
|
<code class="classname">ExitStatus</code> represents the status of a
|
|
<code class="classname">Step</code> after it finishes execution. More
|
|
specifically, the 'next' element above references the exit code of the
|
|
<code class="classname">ExitStatus</code>. To write it in English, it says:
|
|
"go to stepB if the exit code is FAILED". By default, the exit code is
|
|
always the same as the <code class="classname">BatchStatus</code> for the
|
|
Step, which is why the entry above works. However, what if the exit
|
|
code needs to be different? A good example comes from the skip sample
|
|
job within the samples project:</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><end</span> <span class="hl-attribute">on</span>=<span class="hl-value">"FAILED"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><next</span> <span class="hl-attribute">on</span>=<span class="hl-value">"COMPLETED WITH SKIPS"</span> <span class="hl-attribute">to</span>=<span class="hl-value">"errorPrint1"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><next</span> <span class="hl-attribute">on</span>=<span class="hl-value">"*"</span> <span class="hl-attribute">to</span>=<span class="hl-value">"step2"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></step></span></pre><p>The above step has three possibilities:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>The <code class="classname">Step</code> failed, in which case the
|
|
job should fail.</p></li><li class="listitem"><p>The <code class="classname">Step</code> completed
|
|
successfully.</p></li><li class="listitem"><p>The <code class="classname">Step</code> completed successfully, but
|
|
with an exit code of 'COMPLETED WITH SKIPS'. In this case, a
|
|
different step should be run to handle the errors.</p></li></ol></div><p>The above configuration will work. However, something needs to
|
|
change the exit code based on the condition of the execution having
|
|
skipped records:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> SkipCheckingListener <span class="hl-keyword">extends</span> StepExecutionListenerSupport {
|
|
<span class="hl-keyword">public</span> ExitStatus afterStep(StepExecution stepExecution) {
|
|
String exitCode = stepExecution.getExitStatus().getExitCode();
|
|
<span class="hl-keyword">if</span> (!exitCode.equals(ExitStatus.FAILED.getExitCode()) &&
|
|
stepExecution.getSkipCount() > <span class="hl-number">0</span>) {
|
|
<span class="hl-keyword">return</span> <span class="hl-keyword">new</span> ExitStatus(<span class="hl-string">"COMPLETED WITH SKIPS"</span>);
|
|
}
|
|
<span class="hl-keyword">else</span> {
|
|
<span class="hl-keyword">return</span> null;
|
|
}
|
|
}
|
|
}</pre><p>The above code is a <code class="classname">StepExecutionListener</code>
|
|
that first checks to make sure the <code class="classname">Step</code> was
|
|
successful, and next if the skip count on the
|
|
<code class="classname">StepExecution</code> is higher than 0. If both
|
|
conditions are met, a new <code class="classname">ExitStatus</code> with an
|
|
exit code of "COMPLETED WITH SKIPS" is returned.</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="configuringForStop" href="#configuringForStop"></a>5.3.3 Configuring for Stop</h3></div></div></div><p>After the discussion of <a class="link" href="#batchStatusVsExitStatus" title="Batch Status vs. Exit Status"><code class="classname">BatchStatus</code> and
|
|
<code class="classname">ExitStatus</code></a>, one might wonder how the
|
|
<code class="classname">BatchStatus</code> and <code class="classname">ExitStatus</code>
|
|
are determined for the <code class="classname">Job</code>. While these statuses
|
|
are determined for the <code class="classname">Step</code> by the code that is
|
|
executed, the statuses for the <code class="classname">Job</code> will be
|
|
determined based on the configuration.</p><p>So far, all of the job configurations discussed have had at least
|
|
one final <code class="classname">Step</code> with no transitions. For example,
|
|
after the following step executes, the <code class="classname">Job</code> will
|
|
end:</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"stepC"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s3"</span><span class="hl-tag">/></span></pre><p>If no transitions are defined for a <code class="classname">Step</code>,
|
|
then the <code class="classname">Job</code>'s statuses will be defined as
|
|
follows:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>If the <code class="classname">Step</code> ends with
|
|
<code class="classname">ExitStatus</code> FAILED, then the
|
|
<code class="classname">Job</code>'s <code class="classname">BatchStatus</code> and
|
|
<code class="classname">ExitStatus</code> will both be FAILED.</p></li><li class="listitem"><p>Otherwise, the <code class="classname">Job</code>'s
|
|
<code class="classname">BatchStatus</code> and
|
|
<code class="classname">ExitStatus</code> will both be COMPLETED.</p></li></ul></div><p>While this method of terminating a batch job is sufficient for
|
|
some batch jobs, such as a simple sequential step job, custom defined
|
|
job-stopping scenarios may be required. For this purpose, Spring Batch
|
|
provides three transition elements to stop a <code class="classname">Job</code>
|
|
(in addition to the <a class="link" href="#nextElement">"next" element</a>
|
|
that we discussed previously). Each of these stopping elements will stop
|
|
a <code class="classname">Job</code> with a particular
|
|
<code class="classname">BatchStatus</code>. It is important to note that the
|
|
stop transition elements will have no effect on either the
|
|
<code class="classname">BatchStatus</code> or <code class="classname">ExitStatus</code>
|
|
of any <code class="classname">Step</code>s in the <code class="classname">Job</code>:
|
|
these elements will only affect the final statuses of the
|
|
<code class="classname">Job</code>. For example, it is possible for every step
|
|
in a job to have a status of FAILED but the job to have a status of
|
|
COMPLETED, or vise versa.</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="endElement" href="#endElement"></a>The 'End' Element</h4></div></div></div><p>The 'end' element instructs a <code class="classname">Job</code> to stop
|
|
with a <code class="classname">BatchStatus</code> of COMPLETED. A
|
|
<code class="classname">Job</code> that has finished with status COMPLETED
|
|
cannot be restarted (the framework will throw a
|
|
<code class="classname">JobInstanceAlreadyCompleteException</code>). The 'end'
|
|
element also allows for an optional 'exit-code' attribute that can be
|
|
used to customize the <code class="classname">ExitStatus</code> of the
|
|
<code class="classname">Job</code>. If no 'exit-code' attribute is given, then
|
|
the <code class="classname">ExitStatus</code> will be "COMPLETED" by default,
|
|
to match the <code class="classname">BatchStatus</code>.</p><p>In the following scenario, if step2 fails, then the
|
|
<code class="classname">Job</code> will stop with a
|
|
<code class="classname">BatchStatus</code> of COMPLETED and an
|
|
<code class="classname">ExitStatus</code> of "COMPLETED" and step3 will not
|
|
execute; otherwise, execution will move to step3. Note that if step2
|
|
fails, the <code class="classname">Job</code> will not be restartable (because
|
|
the status is COMPLETED).</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s1"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"step2"</span><span class="hl-tag">></span>
|
|
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step2"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s2"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><end</span> <span class="hl-attribute">on</span>=<span class="hl-value">"FAILED"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><next</span> <span class="hl-attribute">on</span>=<span class="hl-value">"*"</span> <span class="hl-attribute">to</span>=<span class="hl-value">"step3"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></step></span>
|
|
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step3"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s3"</span><span class="hl-tag">></span></pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="failElement" href="#failElement"></a>The 'Fail' Element</h4></div></div></div><p>The 'fail' element instructs a <code class="classname">Job</code> to
|
|
stop with a <code class="classname">BatchStatus</code> of FAILED. Unlike the
|
|
'end' element, the 'fail' element will not prevent the
|
|
<code class="classname">Job</code> from being restarted. The 'fail' element
|
|
also allows for an optional 'exit-code' attribute that can be used to
|
|
customize the <code class="classname">ExitStatus</code> of the
|
|
<code class="classname">Job</code>. If no 'exit-code' attribute is given, then
|
|
the <code class="classname">ExitStatus</code> will be "FAILED" by default, to
|
|
match the <code class="classname">BatchStatus</code>.</p><p>In the following scenario, if step2 fails, then the
|
|
<code class="classname">Job</code> will stop with a
|
|
<code class="classname">BatchStatus</code> of FAILED and an
|
|
<code class="classname">ExitStatus</code> of "EARLY TERMINATION" and step3
|
|
will not execute; otherwise, execution will move to step3.
|
|
Additionally, if step2 fails, and the <code class="classname">Job</code> is
|
|
restarted, then execution will begin again on step2.</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s1"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"step2"</span><span class="hl-tag">></span>
|
|
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step2"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s2"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><fail</span> <span class="hl-attribute">on</span>=<span class="hl-value">"FAILED"</span> <span class="hl-attribute">exit-code</span>=<span class="hl-value">"EARLY TERMINATION"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><next</span> <span class="hl-attribute">on</span>=<span class="hl-value">"*"</span> <span class="hl-attribute">to</span>=<span class="hl-value">"step3"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></step></span>
|
|
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step3"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s3"</span><span class="hl-tag">></span></pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="stopElement" href="#stopElement"></a>The 'Stop' Element</h4></div></div></div><p>The 'stop' element instructs a <code class="classname">Job</code> to
|
|
stop with a <code class="classname">BatchStatus</code> of STOPPED. Stopping a
|
|
<code class="classname">Job</code> can provide a temporary break in processing
|
|
so that the operator can take some action before restarting the
|
|
<code class="classname">Job</code>. The 'stop' element requires a 'restart'
|
|
attribute that specifies the step where execution should pick up when
|
|
the <code class="classname">Job is restarted</code>.</p><p>In the following scenario, if step1 finishes with COMPLETE, then
|
|
the job will then stop. Once it is restarted, execution will begin on
|
|
step2.</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><stop</span> <span class="hl-attribute">on</span>=<span class="hl-value">"COMPLETED"</span> <span class="hl-attribute">restart</span>=<span class="hl-value">"step2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></step></span>
|
|
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step2"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s2"</span><span class="hl-tag">/></span></pre></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="programmaticFlowDecisions" href="#programmaticFlowDecisions"></a>5.3.4 Programmatic Flow Decisions</h3></div></div></div><p>In some situations, more information than the
|
|
<code class="classname">ExitStatus</code> may be required to decide which step
|
|
to execute next. In this case, a
|
|
<code class="classname">JobExecutionDecider</code> can be used to assist in the
|
|
decision.</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> MyDecider <span class="hl-keyword">implements</span> JobExecutionDecider {
|
|
<span class="hl-keyword">public</span> FlowExecutionStatus decide(JobExecution jobExecution, StepExecution stepExecution) {
|
|
<span class="hl-keyword">if</span> (someCondition) {
|
|
<span class="hl-keyword">return</span> <span class="hl-string">"FAILED"</span>;
|
|
}
|
|
<span class="hl-keyword">else</span> {
|
|
<span class="hl-keyword">return</span> <span class="hl-string">"COMPLETED"</span>;
|
|
}
|
|
}
|
|
}</pre><p>In the job configuration, a "decision" tag will specify the
|
|
decider to use as well as all of the transitions.</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"job"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s1"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"decision"</span><span class="hl-tag"> /></span>
|
|
|
|
<span class="hl-tag"><decision</span> <span class="hl-attribute">id</span>=<span class="hl-value">"decision"</span> <span class="hl-attribute">decider</span>=<span class="hl-value">"decider"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><next</span> <span class="hl-attribute">on</span>=<span class="hl-value">"FAILED"</span> <span class="hl-attribute">to</span>=<span class="hl-value">"step2"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><next</span> <span class="hl-attribute">on</span>=<span class="hl-value">"COMPLETED"</span> <span class="hl-attribute">to</span>=<span class="hl-value">"step3"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></decision></span>
|
|
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step2"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s2"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"step3"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step3"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s3"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></job></span>
|
|
|
|
<span class="hl-tag"><beans:bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"decider"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"com.MyDecider"</span><span class="hl-tag">/></span></pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="split-flows" href="#split-flows"></a>5.3.5 Split Flows</h3></div></div></div><p>Every scenario described so far has involved a
|
|
<code class="classname">Job</code> that executes its
|
|
<code class="classname">Step</code>s one at a time in a linear fashion. In
|
|
addition to this typical style, the Spring Batch namespace also allows
|
|
for a job to be configured with parallel flows using the 'split'
|
|
element. As is seen below, the 'split' element contains one or more
|
|
'flow' elements, where entire separate flows can be defined. A 'split'
|
|
element may also contain any of the previously discussed transition
|
|
elements such as the 'next' attribute or the 'next', 'end', 'fail', or
|
|
'pause' elements.</p><pre class="programlisting"><span class="hl-tag"><split</span> <span class="hl-attribute">id</span>=<span class="hl-value">"split1"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"step4"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><flow></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s1"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"step2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step2"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></flow></span>
|
|
<span class="hl-tag"><flow></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step3"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s3"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></flow></span>
|
|
<span class="hl-tag"></split></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step4"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s4"</span><span class="hl-tag">/></span></pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="external-flows" href="#external-flows"></a>5.3.6 Externalizing Flow Definitions and Dependencies Between
|
|
Jobs</h3></div></div></div><p>Part of the flow in a job can be externalized as a separate bean
|
|
definition, and then re-used. There are two ways to do this, and the
|
|
first is to simply declare the flow as a reference to one defined
|
|
elsewhere:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"job"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><flow</span> <span class="hl-attribute">id</span>=<span class="hl-value">"job1.flow1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"flow1"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"step3"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step3"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s3"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></job></span>
|
|
|
|
<span class="hl-tag"><flow</span> <span class="hl-attribute">id</span>=<span class="hl-value">"flow1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s1"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"step2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step2"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></flow></span></pre><p>The effect of defining an external flow like this is simply to
|
|
insert the steps from the external flow into the job as if they had been
|
|
declared inline. In this way many jobs can refer to the same template
|
|
flow and compose such templates into different logical flows. This is
|
|
also a good way to separate the integration testing of the individual
|
|
flows.</p><p>The other form of an externalized flow is to use a
|
|
<code class="classname">JobStep</code>. A <code class="classname">JobStep</code> is
|
|
similar to a <code class="classname">FlowStep</code>, but actually creates and
|
|
launches a separate job execution for the steps in the flow specified.
|
|
Here is an example:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobStepJob"</span> <span class="hl-attribute">restartable</span>=<span class="hl-value">"true"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobStepJob.step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><job</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"</span><span class="bold"><strong>job</strong></span>" job-launcher="jobLauncher"
|
|
job-parameters-extractor="jobParametersExtractor"/>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span>
|
|
|
|
<span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"</span><span class="bold"><strong>job</strong></span>" restartable="true">...<span class="hl-tag"></job></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobParametersExtractor"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...DefaultJobParametersExtractor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"keys"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"input.file"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span></pre><p>The job parameters extractor is a strategy that determines how a
|
|
the <code class="classname">ExecutionContext</code> for the
|
|
<code class="classname">Step</code> is converted into
|
|
<code class="classname">JobParameters</code> for the Job that is executed. The
|
|
<code class="classname">JobStep</code> is useful when you want to have some more
|
|
granular options for monitoring and reporting on jobs and steps. Using
|
|
<code class="classname">JobStep</code> is also often a good answer to the
|
|
question: "How do I create dependencies between jobs?". It is a good way
|
|
to break up a large system into smaller modules and control the flow of
|
|
jobs.</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="late-binding" href="#late-binding"></a>5.4 Late Binding of Job and Step Attributes</h2></div></div></div><p>Both the XML and Flat File examples above use the Spring
|
|
<code class="classname">Resource</code> abstraction to obtain a file. This works
|
|
because <code class="classname">Resource</code> has a <span class="markup">getFile</span>
|
|
method, which returns a <code class="classname">java.io.File</code>. Both XML and
|
|
Flat File resources can be configured using standard Spring
|
|
constructs:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"flatFileItemReader"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.file.FlatFileItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span>
|
|
<span class="hl-attribute">value</span>=<span class="hl-value">"file://outputs/20070122.testStream.CustomerReportStep.TEMP.txt"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><p>The above <code class="classname">Resource</code> will load the file from
|
|
the file system location specified. Note that absolute locations have to
|
|
start with a double slash ("//"). In most spring applications, this
|
|
solution is good enough because the names of these are known at compile
|
|
time. However, in batch scenarios, the file name may need to be determined
|
|
at runtime as a parameter to the job. This could be solved using '-D'
|
|
parameters, i.e. a system property:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"flatFileItemReader"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.file.FlatFileItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"${input.file.name}"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><p>All that would be required for this solution to work would be a
|
|
system argument (-Dinput.file.name="file://file.txt"). (Note that although
|
|
a <code class="classname">PropertyPlaceholderConfigurer</code> can be used here,
|
|
it is not necessary if the system property is always set because the
|
|
<code class="classname">ResourceEditor</code> in Spring already filters and does
|
|
placeholder replacement on system properties.)</p><p>Often in a batch setting it is preferable to parameterize the file
|
|
name in the <a class="link" href="#"><code class="classname">JobParameters</code></a> of the
|
|
job, instead of through system properties, and access them that way. To
|
|
accomplish this, Spring Batch allows for the late binding of various Job
|
|
and Step attributes:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"flatFileItemReader"</span> <span class="hl-attribute">scope</span>=<span class="hl-value">"step"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.file.FlatFileItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"</span><span class="bold"><strong>#{jobParameters['input.file.name']}</strong></span>" />
|
|
<span class="hl-tag"></bean></span></pre><p>Both the <code class="classname">JobExecution</code> and
|
|
<code class="classname">StepExecution</code> level
|
|
<code class="classname">ExecutionContext</code> can be accessed in the same
|
|
way:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"flatFileItemReader"</span> <span class="hl-attribute">scope</span>=<span class="hl-value">"step"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.file.FlatFileItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"</span><span class="bold"><strong>#{jobExecutionContext['input.file.name']}</strong></span>" />
|
|
<span class="hl-tag"></bean></span></pre><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"flatFileItemReader"</span> <span class="hl-attribute">scope</span>=<span class="hl-value">"step"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.file.FlatFileItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"</span><span class="bold"><strong>#{stepExecutionContext['input.file.name']}</strong></span>" />
|
|
<span class="hl-tag"></bean></span></pre><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>Any bean that uses late-binding must be declared with
|
|
scope="step". See for <a class="xref" href="#step-scope" title="5.4.1 Step Scope">Section 5.4.1, “Step Scope”</a> more
|
|
information.</p></td></tr></table></div><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>If you are using Spring 3.0 (or above) the expressions in
|
|
step-scoped beans are in the Spring Expression Language, a powerful
|
|
general purpose language with many interesting features. To provide
|
|
backward compatibility, if Spring Batch detects the presence of older
|
|
versions of Spring it uses a native expression language that is less
|
|
powerful, and has slightly different parsing rules. The main difference
|
|
is that the map keys in the example above do not need to be quoted with
|
|
Spring 2.5, but the quotes are mandatory in Spring 3.0.</p></td></tr></table></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="step-scope" href="#step-scope"></a>5.4.1 Step Scope</h3></div></div></div><p>All of the late binding examples from above have a scope of "step"
|
|
declared on the bean definition:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"flatFileItemReader"</span> <span class="bold"><strong>scope="step"</strong></span>
|
|
class="org.springframework.batch.item.file.FlatFileItemReader">
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"#{jobParameters[input.file.name]}"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><p>Using a scope of <code class="classname">Step</code> is required in order
|
|
to use late binding since the bean cannot actually be instantiated until
|
|
the <code class="classname">Step</code> starts, which allows the attributes to
|
|
be found. Because it is not part of the Spring container by default, the
|
|
scope must be added explicitly, either by using the
|
|
<code class="literal">batch</code> namespace:</p><pre class="programlisting"><span class="hl-tag"><beans</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"http://www.springframework.org/schema/beans"</span>
|
|
<span class="hl-attribute">xmlns:batch</span>=<span class="hl-value">"http://www.springframework.org/schema/batch"</span>
|
|
<span class="hl-attribute">xmlns:xsi</span>=<span class="hl-value">"http://www.w3.org/2001/XMLSchema-instance"</span>
|
|
<span class="hl-attribute">xsi:schemaLocation</span>=<span class="hl-value">"..."</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><batch:job</span> <span class="hl-attribute">.../></span>
|
|
<span class="hl-attribute">...</span>
|
|
<span class="hl-attribute"></beans></span></pre><p>or by including a bean definition explicitly for the<code class="classname">
|
|
StepScope</code> (but not both):</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.core.scope.StepScope"</span><span class="hl-tag"> /></span></pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="job-scope" href="#job-scope"></a>5.4.2 Job Scope</h3></div></div></div><p>Job scope, introduced in Spring Batch 3.0 is similar to Step scope
|
|
in configuration but is a Scope for the Job context so there is only one
|
|
instance of such a bean per executing job. Additionally, support is provided
|
|
for late binding of references accessible from the JobContext using
|
|
#{..} placeholders. Using this feature, bean properties can be pulled from
|
|
the job or job execution context and the job parameters. E.g.
|
|
</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"..."</span> <span class="hl-attribute">class</span>=<span class="hl-value">"..."</span> <span class="bold"><strong>scope="job"</strong></span>>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"name"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"#{jobParameters[input]}"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
</pre><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"..."</span> <span class="hl-attribute">class</span>=<span class="hl-value">"..."</span> <span class="bold"><strong>scope="job"</strong></span>>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"name"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"#{jobExecutionContext['input.name']}.txt"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
</pre><p>Because it is not part of the Spring container by default, the scope
|
|
must be added explicitly, either by using the <code class="literal">batch</code> namespace:</p><pre class="programlisting"><span class="hl-tag"><beans</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"http://www.springframework.org/schema/beans"</span>
|
|
<span class="hl-attribute">xmlns:batch</span>=<span class="hl-value">"http://www.springframework.org/schema/batch"</span>
|
|
<span class="hl-attribute">xmlns:xsi</span>=<span class="hl-value">"http://www.w3.org/2001/XMLSchema-instance"</span>
|
|
<span class="hl-attribute">xsi:schemaLocation</span>=<span class="hl-value">"..."</span><span class="hl-tag">></span>
|
|
|
|
<span class="hl-tag"><batch:job</span> <span class="hl-attribute">.../></span>
|
|
<span class="hl-attribute">...</span>
|
|
<span class="hl-attribute"></beans></span></pre><p>Or by including a bean definition explicitly for the <code class="classname">JobScope</code> (but not both):</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.core.scope.JobScope"</span><span class="hl-tag"> /></span></pre></div></div></div>
|
|
|
|
<div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="readersAndWriters" href="#readersAndWriters"></a>6. ItemReaders and ItemWriters</h1></div></div></div><p>All batch processing can be described in its most simple form as
|
|
reading in large amounts of data, performing some type of calculation or
|
|
transformation, and writing the result out. Spring Batch provides three key
|
|
interfaces to help perform bulk reading and writing:
|
|
<code class="classname">ItemReader</code>, <code class="classname">ItemProcessor</code> and
|
|
<code class="classname">ItemWriter</code>.</p><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="itemReader" href="#itemReader"></a>6.1 ItemReader</h2></div></div></div><p>Although a simple concept, an <code class="classname">ItemReader</code> is
|
|
the means for providing data from many different types of input. The most
|
|
general examples include: </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Flat File- Flat File Item Readers read lines of data from a
|
|
flat file that typically describe records with fields of data
|
|
defined by fixed positions in the file or delimited by some special
|
|
character (e.g. Comma).</p></li><li class="listitem"><p>XML - XML ItemReaders process XML independently of
|
|
technologies used for parsing, mapping and validating objects. Input
|
|
data allows for the validation of an XML file against an XSD
|
|
schema.</p></li><li class="listitem"><p>Database - A database resource is accessed to return
|
|
resultsets which can be mapped to objects for processing. The
|
|
default SQL ItemReaders invoke a <code class="classname">RowMapper</code> to
|
|
return objects, keep track of the current row if restart is
|
|
required, store basic statistics, and provide some transaction
|
|
enhancements that will be explained later.</p></li></ul></div><p>There are many more possibilities, but we'll focus on the
|
|
basic ones for this chapter. A complete list of all available ItemReaders
|
|
can be found in Appendix A.</p><p><code class="classname">ItemReader</code> is a basic interface for generic
|
|
input operations:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ItemReader<T> {
|
|
|
|
T read() <span class="hl-keyword">throws</span> Exception, UnexpectedInputException, ParseException;
|
|
|
|
}</pre><p>The <code class="methodname">read</code> method defines the most essential
|
|
contract of the <code class="classname">ItemReader</code>; calling it returns one
|
|
Item or null if no more items are left. An item might represent a line in
|
|
a file, a row in a database, or an element in an XML file. It is generally
|
|
expected that these will be mapped to a usable domain object (i.e. Trade,
|
|
Foo, etc) but there is no requirement in the contract to do so.</p><p>It is expected that implementations of the
|
|
<code class="classname">ItemReader</code> interface will be forward only. However,
|
|
if the underlying resource is transactional (such as a JMS queue) then
|
|
calling read may return the same logical item on subsequent calls in a
|
|
rollback scenario. It is also worth noting that a lack of items to process
|
|
by an <code class="classname">ItemReader</code> will not cause an exception to be
|
|
thrown. For example, a database <code class="classname">ItemReader</code> that is
|
|
configured with a query that returns 0 results will simply return null on
|
|
the first invocation of <code class="methodname">read</code>.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="itemWriter" href="#itemWriter"></a>6.2 ItemWriter</h2></div></div></div><p><code class="classname">ItemWriter</code> is similar in functionality to an
|
|
<code class="classname">ItemReader</code>, but with inverse operations. Resources
|
|
still need to be located, opened and closed but they differ in that an
|
|
<code class="classname">ItemWriter</code> writes out, rather than reading in. In
|
|
the case of databases or queues these may be inserts, updates, or sends.
|
|
The format of the serialization of the output is specific to each batch
|
|
job.</p><p>As with <code class="classname">ItemReader</code>,
|
|
<code class="classname">ItemWriter</code> is a fairly generic interface:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ItemWriter<T> {
|
|
|
|
<span class="hl-keyword">void</span> write(List<? <span class="hl-keyword">extends</span> T> items) <span class="hl-keyword">throws</span> Exception;
|
|
|
|
}</pre><p>As with <code class="methodname">read</code> on
|
|
<code class="classname">ItemReader</code>, <code class="methodname">write</code> provides
|
|
the basic contract of <code class="classname">ItemWriter</code>; it will attempt
|
|
to write out the list of items passed in as long as it is open. Because it
|
|
is generally expected that items will be 'batched' together into a chunk
|
|
and then output, the interface accepts a list of items, rather than an
|
|
item by itself. After writing out the list, any flushing that may be
|
|
necessary can be performed before returning from the write method. For
|
|
example, if writing to a Hibernate DAO, multiple calls to write can be
|
|
made, one for each item. The writer can then call close on the hibernate
|
|
Session before returning.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="itemProcessor" href="#itemProcessor"></a>6.3 ItemProcessor</h2></div></div></div><p>The <code class="classname">ItemReader</code> and
|
|
<code class="classname">ItemWriter</code> interfaces are both very useful for
|
|
their specific tasks, but what if you want to insert business logic before
|
|
writing? One option for both reading and writing is to use the composite
|
|
pattern: create an <code class="classname">ItemWriter</code> that contains another
|
|
<code class="classname">ItemWriter</code>, or an <code class="classname">ItemReader</code>
|
|
that contains another <code class="classname">ItemReader</code>. For
|
|
example:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> CompositeItemWriter<T> <span class="hl-keyword">implements</span> ItemWriter<T> {
|
|
|
|
ItemWriter<T> itemWriter;
|
|
|
|
<span class="hl-keyword">public</span> CompositeItemWriter(ItemWriter<T> itemWriter) {
|
|
<span class="hl-keyword">this</span>.itemWriter = itemWriter;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(List<? <span class="hl-keyword">extends</span> T> items) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-comment">//Add business logic here</span>
|
|
itemWriter.write(item);
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> setDelegate(ItemWriter<T> itemWriter){
|
|
<span class="hl-keyword">this</span>.itemWriter = itemWriter;
|
|
}
|
|
}</pre><p>The class above contains another <code class="classname">ItemWriter</code>
|
|
to which it delegates after having provided some business logic. This
|
|
pattern could easily be used for an <code class="classname">ItemReader</code> as
|
|
well, perhaps to obtain more reference data based upon the input that was
|
|
provided by the main <code class="classname">ItemReader</code>. It is also useful
|
|
if you need to control the call to <code class="classname">write</code> yourself.
|
|
However, if you only want to 'transform' the item passed in for writing
|
|
before it is actually written, there isn't much need to call
|
|
<code class="methodname">write</code> yourself: you just want to modify the item.
|
|
For this scenario, Spring Batch provides the
|
|
<code class="classname">ItemProcessor</code> interface:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ItemProcessor<I, O> {
|
|
|
|
O process(I item) <span class="hl-keyword">throws</span> Exception;
|
|
}</pre><p>An <code class="classname">ItemProcessor</code> is very simple; given one
|
|
object, transform it and return another. The provided object may or may
|
|
not be of the same type. The point is that business logic may be applied
|
|
within process, and is completely up to the developer to create. An
|
|
<code class="classname">ItemProcessor</code> can be wired directly into a step,
|
|
For example, assuming an <code class="classname">ItemReader</code> provides a
|
|
class of type Foo, and it needs to be converted to type Bar before being
|
|
written out. An <code class="classname">ItemProcessor</code> can be written that
|
|
performs the conversion:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> Foo {}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> Bar {
|
|
<span class="hl-keyword">public</span> Bar(Foo foo) {}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> FooProcessor <span class="hl-keyword">implements</span> ItemProcessor<Foo,Bar>{
|
|
<span class="hl-keyword">public</span> Bar process(Foo foo) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-comment">//Perform simple transformation, convert a Foo to a Bar</span>
|
|
<span class="hl-keyword">return</span> <span class="hl-keyword">new</span> Bar(foo);
|
|
}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> BarWriter <span class="hl-keyword">implements</span> ItemWriter<Bar>{
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(List<? <span class="hl-keyword">extends</span> Bar> bars) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-comment">//write bars</span>
|
|
}
|
|
}</pre><p>In the very simple example above, there is a class
|
|
<code class="classname">Foo</code>, a class <code class="classname">Bar</code>, and a
|
|
class <code class="classname">FooProcessor</code> that adheres to the
|
|
<code class="classname">ItemProcessor</code> interface. The transformation is
|
|
simple, but any type of transformation could be done here. The
|
|
<code class="classname">BarWriter</code> will be used to write out
|
|
<code class="classname">Bar</code> objects, throwing an exception if any other
|
|
type is provided. Similarly, the <code class="classname">FooProcessor</code> will
|
|
throw an exception if anything but a <code class="classname">Foo</code> is
|
|
provided. The <code class="classname">FooProcessor</code> can then be injected
|
|
into a <code class="classname">Step</code>:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"ioSampleJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">name</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"fooReader"</span> <span class="hl-attribute">processor</span>=<span class="hl-value">"fooProcessor"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"barWriter"</span>
|
|
<span class="hl-attribute">commit-interval</span>=<span class="hl-value">"2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span></pre><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="chainingItemProcessors" href="#chainingItemProcessors"></a>6.3.1 Chaining ItemProcessors</h3></div></div></div><p>Performing a single transformation is useful in many scenarios,
|
|
but what if you want to 'chain' together multiple
|
|
<code class="classname">ItemProcessor</code>s? This can be accomplished using
|
|
the composite pattern mentioned previously. To update the previous,
|
|
single transformation, example, <code class="classname">Foo</code> will be
|
|
transformed to <code class="classname">Bar</code>, which will be transformed to
|
|
<code class="classname">Foobar</code> and written out:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> Foo {}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> Bar {
|
|
<span class="hl-keyword">public</span> Bar(Foo foo) {}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> Foobar{
|
|
<span class="hl-keyword">public</span> Foobar(Bar bar) {}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> FooProcessor <span class="hl-keyword">implements</span> ItemProcessor<Foo,Bar>{
|
|
<span class="hl-keyword">public</span> Bar process(Foo foo) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-comment">//Perform simple transformation, convert a Foo to a Bar</span>
|
|
<span class="hl-keyword">return</span> <span class="hl-keyword">new</span> Bar(foo);
|
|
}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> BarProcessor <span class="hl-keyword">implements</span> ItemProcessor<Bar,FooBar>{
|
|
<span class="hl-keyword">public</span> FooBar process(Bar bar) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-keyword">return</span> <span class="hl-keyword">new</span> Foobar(bar);
|
|
}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> FoobarWriter <span class="hl-keyword">implements</span> ItemWriter<FooBar>{
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(List<? <span class="hl-keyword">extends</span> FooBar> items) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-comment">//write items</span>
|
|
}
|
|
}</pre><p>A <code class="classname">FooProcessor</code> and
|
|
<code class="classname">BarProcessor</code> can be 'chained' together to give
|
|
the resultant <code class="classname">Foobar</code>:</p><pre class="programlisting">CompositeItemProcessor<Foo,Foobar> compositeProcessor =
|
|
<span class="hl-keyword">new</span> CompositeItemProcessor<Foo,Foobar>();
|
|
List itemProcessors = <span class="hl-keyword">new</span> ArrayList();
|
|
itemProcessors.add(<span class="hl-keyword">new</span> FooTransformer());
|
|
itemProcessors.add(<span class="hl-keyword">new</span> BarTransformer());
|
|
compositeProcessor.setDelegates(itemProcessors);</pre><p>Just as with the previous example, the composite processor can be
|
|
configured into the <code class="classname">Step</code>:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"ioSampleJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">name</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"fooReader"</span> <span class="hl-attribute">processor</span>=<span class="hl-value">"compositeProcessor"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"foobarWriter"</span>
|
|
<span class="hl-attribute">commit-interval</span>=<span class="hl-value">"2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"compositeItemProcessor"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.support.CompositeItemProcessor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"delegates"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><list></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"..FooProcessor"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"..BarProcessor"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></list></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="filiteringRecords" href="#filiteringRecords"></a>6.3.2 Filtering Records</h3></div></div></div><p>One typical use for an item processor is to filter out records
|
|
before they are passed to the ItemWriter. Filtering is an action
|
|
distinct from skipping; skipping indicates that a record is invalid
|
|
whereas filtering simply indicates that a record should not be
|
|
written.</p><p>For example, consider a batch job that reads a file containing
|
|
three different types of records: records to insert, records to update,
|
|
and records to delete. If record deletion is not supported by the
|
|
system, then we would not want to send any "delete" records to the
|
|
<code class="classname">ItemWriter</code>. But, since these records are not
|
|
actually bad records, we would want to filter them out, rather than
|
|
skip. As a result, the ItemWriter would receive only "insert" and
|
|
"update" records.</p><p>To filter a record, one simply returns "null" from the
|
|
<code class="classname">ItemProcessor</code>. The framework will detect that the
|
|
result is "null" and avoid adding that item to the list of records
|
|
delivered to the <code class="classname">ItemWriter</code>. As usual, an
|
|
exception thrown from the <code class="classname">ItemProcessor</code> will
|
|
result in a skip.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="faultTolerant" href="#faultTolerant"></a>6.3.3 Fault Tolerance</h3></div></div></div><p>When a chunk is rolled back, items that have been cached
|
|
during reading may be reprocessed. If a step is configured to
|
|
be fault tolerant (uses skip or retry processing typically),
|
|
any ItemProcessor used should be implemented in a way that is
|
|
idempotent. Typically that would consist of performing no changes
|
|
on the input item for the ItemProcessor and only updating the
|
|
instance that is the result.</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="itemStream" href="#itemStream"></a>6.4 ItemStream</h2></div></div></div><p>Both <code class="classname">ItemReader</code>s and
|
|
<code class="classname">ItemWriter</code>s serve their individual purposes well,
|
|
but there is a common concern among both of them that necessitates another
|
|
interface. In general, as part of the scope of a batch job, readers and
|
|
writers need to be opened, closed, and require a mechanism for persisting
|
|
state:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ItemStream {
|
|
|
|
<span class="hl-keyword">void</span> open(ExecutionContext executionContext) <span class="hl-keyword">throws</span> ItemStreamException;
|
|
|
|
<span class="hl-keyword">void</span> update(ExecutionContext executionContext) <span class="hl-keyword">throws</span> ItemStreamException;
|
|
|
|
<span class="hl-keyword">void</span> close() <span class="hl-keyword">throws</span> ItemStreamException;
|
|
}</pre><p>Before describing each method, we should mention the
|
|
<code class="classname">ExecutionContext</code>. Clients of an
|
|
<code class="classname">ItemReader</code> that also implement
|
|
<code class="classname">ItemStream</code> should call
|
|
<code class="methodname">open</code> before any calls to
|
|
<code class="methodname">read</code> in order to open any resources such as files
|
|
or to obtain connections. A similar restriction applies to an
|
|
<code class="classname">ItemWriter</code> that implements
|
|
<code class="classname">ItemStream</code>. As mentioned in Chapter 2, if expected
|
|
data is found in the <code class="classname">ExecutionContext</code>, it may be
|
|
used to start the <code class="classname">ItemReader</code> or
|
|
<code class="classname">ItemWriter</code> at a location other than its initial
|
|
state. Conversely, <code class="methodname">close</code> will be called to ensure
|
|
that any resources allocated during <code class="methodname">open</code> will be
|
|
released safely. <code class="methodname">update</code> is called primarily to
|
|
ensure that any state currently being held is loaded into the provided
|
|
<code class="classname">ExecutionContext</code>. This method will be called before
|
|
committing, to ensure that the current state is persisted in the database
|
|
before commit.</p><p>In the special case where the client of an
|
|
<code class="classname">ItemStream</code> is a <code class="classname">Step</code> (from
|
|
the Spring Batch Core), an <code class="classname">ExecutionContext</code> is
|
|
created for each <code class="classname">StepExecution</code> to allow users to
|
|
store the state of a particular execution, with the expectation that it
|
|
will be returned if the same <code class="classname">JobInstance</code> is started
|
|
again. For those familiar with Quartz, the semantics are very similar to a
|
|
Quartz <code class="classname">JobDataMap</code>.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="delegatePatternAndRegistering" href="#delegatePatternAndRegistering"></a>6.5 The Delegate Pattern and Registering with the Step</h2></div></div></div><p>Note that the <code class="classname">CompositeItemWriter</code> is an
|
|
example of the delegation pattern, which is common in Spring Batch. The
|
|
delegates themselves might implement callback interfaces <code class="classname">StepListener</code>.
|
|
If they do, and they are being used in conjunction with Spring Batch Core
|
|
as part of a <code class="classname">Step</code> in a <code class="classname">Job</code>,
|
|
then they almost certainly need to be registered manually with the
|
|
<code class="classname">Step</code>. A reader, writer, or processor that is
|
|
directly wired into the Step will be registered automatically if it
|
|
implements <code class="classname">ItemStream</code> or a
|
|
<code class="classname">StepListener</code> interface. But because the delegates
|
|
are not known to the <code class="classname">Step</code>, they need to be injected
|
|
as listeners or streams (or both if appropriate):</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"ioSampleJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">name</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"fooReader"</span> <span class="hl-attribute">processor</span>=<span class="hl-value">"fooProcessor"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"compositeItemWriter"</span>
|
|
<span class="hl-attribute">commit-interval</span>=<span class="hl-value">"2"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><streams></span>
|
|
<span class="hl-tag"><stream</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"barWriter"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></streams></span>
|
|
<span class="hl-tag"></chunk></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"compositeItemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"...CustomCompositeItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"delegate"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"barWriter"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"barWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"...BarWriter"</span><span class="hl-tag"> /></span></pre></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="flatFiles" href="#flatFiles"></a>6.6 Flat Files</h2></div></div></div><p>One of the most common mechanisms for interchanging bulk data has
|
|
always been the flat file. Unlike XML, which has an agreed upon standard
|
|
for defining how it is structured (XSD), anyone reading a flat file must
|
|
understand ahead of time exactly how the file is structured. In general,
|
|
all flat files fall into two types: Delimited and Fixed Length. Delimited
|
|
files are those in which fields are separated by a delimiter, such as a
|
|
comma. Fixed Length files have fields that are a set length.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="fieldSet" href="#fieldSet"></a>6.6.1 The FieldSet</h3></div></div></div><p>When working with flat files in Spring Batch, regardless of
|
|
whether it is for input or output, one of the most important classes is
|
|
the <code class="classname">FieldSet</code>. Many architectures and libraries
|
|
contain abstractions for helping you read in from a file, but they
|
|
usually return a String or an array of Strings. This really only gets
|
|
you halfway there. A <code class="classname">FieldSet</code> is Spring Batch’s
|
|
abstraction for enabling the binding of fields from a file resource. It
|
|
allows developers to work with file input in much the same way as they
|
|
would work with database input. A <code class="classname">FieldSet</code> is
|
|
conceptually very similar to a Jdbc <code class="classname">ResultSet</code>.
|
|
FieldSets only require one argument, a <code class="classname">String</code>
|
|
array of tokens. Optionally, you can also configure in the names of the
|
|
fields so that the fields may be accessed either by index or name as
|
|
patterned after <code class="classname">ResultSet</code>:</p><pre class="programlisting">String[] tokens = <span class="hl-keyword">new</span> String[]{<span class="hl-string">"foo"</span>, <span class="hl-string">"1"</span>, <span class="hl-string">"true"</span>};
|
|
FieldSet fs = <span class="hl-keyword">new</span> DefaultFieldSet(tokens);
|
|
String name = fs.readString(<span class="hl-number">0</span>);
|
|
<span class="hl-keyword">int</span> value = fs.readInt(<span class="hl-number">1</span>);
|
|
<span class="hl-keyword">boolean</span> booleanValue = fs.readBoolean(<span class="hl-number">2</span>);</pre><p>There are many more options on the <code class="classname">FieldSet</code>
|
|
interface, such as <code class="classname">Date</code>, long,
|
|
<code class="classname">BigDecimal</code>, etc. The biggest advantage of the
|
|
<code class="classname">FieldSet</code> is that it provides consistent parsing
|
|
of flat file input. Rather than each batch job parsing differently in
|
|
potentially unexpected ways, it can be consistent, both when handling
|
|
errors caused by a format exception, or when doing simple data
|
|
conversions.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="flatFileItemReader" href="#flatFileItemReader"></a>6.6.2 FlatFileItemReader</h3></div></div></div><p>A flat file is any type of file that contains at most
|
|
two-dimensional (tabular) data. Reading flat files in the Spring Batch
|
|
framework is facilitated by the class
|
|
<code class="classname">FlatFileItemReader</code>, which provides basic
|
|
functionality for reading and parsing flat files. The two most important
|
|
required dependencies of <code class="classname">FlatFileItemReader</code> are
|
|
<code class="classname">Resource</code> and <code class="classname">LineMapper.
|
|
</code>The <code class="classname">LineMapper</code> interface will be
|
|
explored more in the next sections. The resource property represents a
|
|
Spring Core <code class="classname">Resource</code>. Documentation explaining
|
|
how to create beans of this type can be found in <a class="ulink" href="http://docs.spring.io/spring/docs/3.2.x/spring-framework-reference/html/resources.html" target="_top"><em class="citetitle">Spring
|
|
Framework, Chapter 5.Resources</em></a>. Therefore, this
|
|
guide will not go into the details of creating
|
|
<code class="classname">Resource</code> objects. However, a simple example of a
|
|
file system resource can be found below:
|
|
</p><pre class="programlisting">Resource resource = <span class="hl-keyword">new</span> FileSystemResource(<span class="hl-string">"resources/trades.csv"</span>);</pre><p>In complex batch environments the directory structures are often
|
|
managed by the EAI infrastructure where drop zones for external
|
|
interfaces are established for moving files from ftp locations to batch
|
|
processing locations and vice versa. File moving utilities are beyond
|
|
the scope of the spring batch architecture but it is not unusual for
|
|
batch job streams to include file moving utilities as steps in the job
|
|
stream. It is sufficient that the batch architecture only needs to know
|
|
how to locate the files to be processed. Spring Batch begins the process
|
|
of feeding the data into the pipe from this starting point. However,
|
|
<a class="ulink" href="http://projects.spring.io/spring-integration/" target="_top"><em class="citetitle">Spring
|
|
Integration</em></a> provides many of these types of
|
|
services.</p><p>The other properties in <code class="classname">FlatFileItemReader</code>
|
|
allow you to further specify how your data will be interpreted: </p><div class="table"><a name="d5e2230" href="#d5e2230"></a><p class="title"><b>Table 6.1. FlatFileItemReader Properties</b></p><div class="table-contents"><table summary="FlatFileItemReader Properties" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col align="center"><col><col></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="center">Property</th><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="center">Type</th><th style="border-bottom: 0.5pt solid ; " align="center">Description</th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">comments</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">String[]</td><td style="border-bottom: 0.5pt solid ; " align="left">Specifies line prefixes that indicate
|
|
comment rows</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">encoding</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">String</td><td style="border-bottom: 0.5pt solid ; " align="left">Specifies what text encoding to use -
|
|
default is "ISO-8859-1"</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">lineMapper</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">LineMapper</td><td style="border-bottom: 0.5pt solid ; " align="left">Converts a <code class="classname">String</code>
|
|
to an <code class="classname">Object</code> representing the
|
|
item.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">linesToSkip</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">int</td><td style="border-bottom: 0.5pt solid ; " align="left">Number of lines to ignore at the top of
|
|
the file</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">recordSeparatorPolicy</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">RecordSeparatorPolicy</td><td style="border-bottom: 0.5pt solid ; " align="left">Used to determine where the line endings
|
|
are and do things like continue over a line ending if inside a
|
|
quoted string.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">resource</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">Resource</td><td style="border-bottom: 0.5pt solid ; " align="left">The resource from which to read.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">skippedLinesCallback</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">LineCallbackHandler</td><td style="border-bottom: 0.5pt solid ; " align="left">Interface which passes the raw line
|
|
content of the lines in the file to be skipped. If linesToSkip
|
|
is set to 2, then this interface will be called twice.</td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">strict</td><td style="border-right: 0.5pt solid ; " align="left">boolean</td><td style="" align="left">In strict mode, the reader will throw an
|
|
exception on ExecutionContext if the input resource does not
|
|
exist.</td></tr></tbody></table></div></div><p><br class="table-break"></p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="lineMapper" href="#lineMapper"></a>LineMapper</h4></div></div></div><p>As with <code class="classname">RowMapper</code>, which takes a low
|
|
level construct such as <code class="classname">ResultSet</code> and returns
|
|
an <code class="classname">Object</code>, flat file processing requires the
|
|
same construct to convert a <code class="classname">String</code> line into an
|
|
<code class="classname">Object</code>:
|
|
</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> LineMapper<T> {
|
|
|
|
T mapLine(String line, <span class="hl-keyword">int</span> lineNumber) <span class="hl-keyword">throws</span> Exception;
|
|
|
|
}</pre><p>The basic contract is that, given the current line and the line
|
|
number with which it is associated, the mapper should return a
|
|
resulting domain object. This is similar to
|
|
<code class="classname">RowMapper</code> in that each line is associated with
|
|
its line number, just as each row in a
|
|
<code class="classname">ResultSet</code> is tied to its row number. This
|
|
allows the line number to be tied to the resulting domain object for
|
|
identity comparison or for more informative logging. However, unlike
|
|
<code class="classname">RowMapper</code>, the
|
|
<code class="classname">LineMapper</code> is given a raw line which, as
|
|
discussed above, only gets you halfway there. The line must be
|
|
tokenized into a <code class="classname">FieldSet</code>, which can then be
|
|
mapped to an object, as described below.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="lineTokenizer" href="#lineTokenizer"></a>LineTokenizer</h4></div></div></div><p>An abstraction for turning a line of input into a line into a
|
|
<code class="classname">FieldSet</code> is necessary because there can be many
|
|
formats of flat file data that need to be converted to a
|
|
<code class="classname">FieldSet</code>. In Spring Batch, this interface is
|
|
the <code class="classname">LineTokenizer</code>:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> LineTokenizer {
|
|
|
|
FieldSet tokenize(String line);
|
|
|
|
}</pre><p>The contract of a <code class="classname">LineTokenizer</code> is such
|
|
that, given a line of input (in theory the
|
|
<code class="classname">String</code> could encompass more than one line), a
|
|
<code class="classname">FieldSet</code> representing the line will be
|
|
returned. This <code class="classname">FieldSet</code> can then be passed to a
|
|
<code class="classname">FieldSetMapper</code>. Spring Batch contains the
|
|
following <code class="classname">LineTokenizer</code> implementations:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="classname">DelmitedLineTokenizer</code> - Used for
|
|
files where fields in a record are separated by a delimiter. The
|
|
most common delimiter is a comma, but pipes or semicolons are
|
|
often used as well.</p></li><li class="listitem"><p><code class="classname">FixedLengthTokenizer</code> - Used for files
|
|
where fields in a record are each a 'fixed width'. The width of
|
|
each field must be defined for each record type.</p></li><li class="listitem"><p><code class="classname">PatternMatchingCompositeLineTokenizer</code>
|
|
- Determines which among a list of
|
|
<code class="classname">LineTokenizer</code>s should be used on a
|
|
particular line by checking against a pattern.</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="fieldSetMapper" href="#fieldSetMapper"></a>FieldSetMapper</h4></div></div></div><p>The <code class="classname">FieldSetMapper</code> interface defines a
|
|
single method, <code class="methodname">mapFieldSet</code>, which takes a
|
|
<code class="classname">FieldSet</code> object and maps its contents to an
|
|
object. This object may be a custom DTO, a domain object, or a simple
|
|
array, depending on the needs of the job. The
|
|
<code class="classname">FieldSetMapper</code> is used in conjunction with the
|
|
<code class="classname">LineTokenizer</code> to translate a line of data from
|
|
a resource into an object of the desired type:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> FieldSetMapper<T> {
|
|
|
|
T mapFieldSet(FieldSet fieldSet);
|
|
|
|
}</pre><p>The pattern used is the same as the
|
|
<code class="classname">RowMapper</code> used by
|
|
<code class="classname">JdbcTemplate</code>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="defaultLineMapper" href="#defaultLineMapper"></a>DefaultLineMapper</h4></div></div></div><p>Now that the basic interfaces for reading in flat files have
|
|
been defined, it becomes clear that three basic steps are
|
|
required:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>Read one line from the file.</p></li><li class="listitem"><p>Pass the string line into the
|
|
<code class="methodname">LineTokenizer#tokenize</code>() method, in
|
|
order to retrieve a <code class="classname">FieldSet</code>.</p></li><li class="listitem"><p>Pass the <code class="classname">FieldSet</code> returned from
|
|
tokenizing to a <code class="classname">FieldSetMapper</code>, returning
|
|
the result from the <code class="methodname">ItemReader#read</code>()
|
|
method.</p></li></ol></div><p>The two interfaces described above represent two separate tasks:
|
|
converting a line into a <code class="classname">FieldSet</code>, and mapping
|
|
a <code class="classname">FieldSet</code> to a domain object. Because the
|
|
input of a <code class="classname">LineTokenizer</code> matches the input of
|
|
the <code class="classname">LineMapper</code> (a line), and the output of a
|
|
<code class="classname">FieldSetMapper</code> matches the output of the
|
|
<code class="classname">LineMapper</code>, a default implementation that uses
|
|
both a <code class="classname">LineTokenizer</code> and
|
|
<code class="classname">FieldSetMapper</code> is provided. The
|
|
<code class="classname">DefaultLineMapper</code> represents the behavior most
|
|
users will need:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> DefaultLineMapper<T> <span class="hl-keyword">implements</span> LineMapper<T>, InitializingBean {
|
|
|
|
<span class="hl-keyword">private</span> LineTokenizer tokenizer;
|
|
|
|
<span class="hl-keyword">private</span> FieldSetMapper<T> fieldSetMapper;
|
|
|
|
<span class="hl-keyword">public</span> T mapLine(String line, <span class="hl-keyword">int</span> lineNumber) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="bold"><strong>return fieldSetMapper.mapFieldSet(tokenizer.tokenize(line));</strong></span>
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> setLineTokenizer(LineTokenizer tokenizer) {
|
|
<span class="hl-keyword">this</span>.tokenizer = tokenizer;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> setFieldSetMapper(FieldSetMapper<T> fieldSetMapper) {
|
|
<span class="hl-keyword">this</span>.fieldSetMapper = fieldSetMapper;
|
|
}
|
|
}</pre><p>The above functionality is provided in a default implementation,
|
|
rather than being built into the reader itself (as was done in
|
|
previous versions of the framework) in order to allow users greater
|
|
flexibility in controlling the parsing process, especially if access
|
|
to the raw line is needed.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="simpleDelimitedFileReadingExample" href="#simpleDelimitedFileReadingExample"></a>Simple Delimited File Reading Example</h4></div></div></div><p>The following example will be used to illustrate this using an
|
|
actual domain scenario. This particular batch job reads in football
|
|
players from the following file:
|
|
</p><pre class="programlisting">ID,lastName,firstName,position,birthYear,debutYear
|
|
"AbduKa00,Abdul-Jabbar,Karim,rb,1974,1996",
|
|
"AbduRa00,Abdullah,Rabih,rb,1975,1999",
|
|
"AberWa00,Abercrombie,Walter,rb,1959,1982",
|
|
"AbraDa00,Abramowicz,Danny,wr,1945,1967",
|
|
"AdamBo00,Adams,Bob,te,1946,1969",
|
|
"AdamCh00,Adams,Charlie,wr,1979,2003" </pre><p>The contents of this file will be mapped to the following
|
|
<code class="classname">Player</code> domain object:
|
|
</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> Player <span class="hl-keyword">implements</span> Serializable {
|
|
|
|
<span class="hl-keyword">private</span> String ID;
|
|
<span class="hl-keyword">private</span> String lastName;
|
|
<span class="hl-keyword">private</span> String firstName;
|
|
<span class="hl-keyword">private</span> String position;
|
|
<span class="hl-keyword">private</span> <span class="hl-keyword">int</span> birthYear;
|
|
<span class="hl-keyword">private</span> <span class="hl-keyword">int</span> debutYear;
|
|
|
|
<span class="hl-keyword">public</span> String toString() {
|
|
<span class="hl-keyword">return</span> <span class="hl-string">"PLAYER:ID="</span> + ID + <span class="hl-string">",Last Name="</span> + lastName +
|
|
<span class="hl-string">",First Name="</span> + firstName + <span class="hl-string">",Position="</span> + position +
|
|
<span class="hl-string">",Birth Year="</span> + birthYear + <span class="hl-string">",DebutYear="</span> +
|
|
debutYear;
|
|
}
|
|
|
|
<span class="hl-comment">// setters and getters...</span>
|
|
}</pre><p>In order to map a <code class="classname">FieldSet</code> into a
|
|
<code class="classname">Player</code> object, a
|
|
<code class="classname">FieldSetMapper</code> that returns players needs to be
|
|
defined:</p><pre class="programlisting"><span class="hl-keyword">protected</span> <span class="hl-keyword">static</span> <span class="hl-keyword">class</span> PlayerFieldSetMapper <span class="hl-keyword">implements</span> FieldSetMapper<Player> {
|
|
<span class="hl-keyword">public</span> Player mapFieldSet(FieldSet fieldSet) {
|
|
Player player = <span class="hl-keyword">new</span> Player();
|
|
|
|
player.setID(fieldSet.readString(<span class="hl-number">0</span>));
|
|
player.setLastName(fieldSet.readString(<span class="hl-number">1</span>));
|
|
player.setFirstName(fieldSet.readString(<span class="hl-number">2</span>));
|
|
player.setPosition(fieldSet.readString(<span class="hl-number">3</span>));
|
|
player.setBirthYear(fieldSet.readInt(<span class="hl-number">4</span>));
|
|
player.setDebutYear(fieldSet.readInt(<span class="hl-number">5</span>));
|
|
|
|
<span class="hl-keyword">return</span> player;
|
|
}
|
|
}</pre><p>The file can then be read by correctly constructing a
|
|
<code class="classname">FlatFileItemReader</code> and calling
|
|
<code class="methodname">read</code>:</p><pre class="programlisting">FlatFileItemReader<Player> itemReader = <span class="hl-keyword">new</span> FlatFileItemReader<Player>();
|
|
itemReader.setResource(<span class="hl-keyword">new</span> FileSystemResource(<span class="hl-string">"resources/players.csv"</span>));
|
|
<span class="hl-comment">//DelimitedLineTokenizer defaults to comma as its delimiter</span>
|
|
DefaultLineMapper<Player> lineMapper = <span class="hl-keyword">new</span> DefaultLineMapper<Player>();
|
|
lineMapper.setLineTokenizer(<span class="hl-keyword">new</span> DelimitedLineTokenizer());
|
|
lineMapper.setFieldSetMapper(<span class="hl-keyword">new</span> PlayerFieldSetMapper());
|
|
itemReader.setLineMapper(lineMapper);
|
|
itemReader.open(<span class="hl-keyword">new</span> ExecutionContext());
|
|
Player player = itemReader.read();</pre><p>Each call to <code class="methodname">read</code> will return a new
|
|
Player object from each line in the file. When the end of the file is
|
|
reached, null will be returned.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="mappingFieldsByName" href="#mappingFieldsByName"></a>Mapping Fields by Name</h4></div></div></div><p>There is one additional piece of functionality that is allowed
|
|
by both <code class="classname">DelimitedLineTokenizer</code> and
|
|
<code class="classname">FixedLengthTokenizer</code> that is similar in
|
|
function to a Jdbc <code class="classname">ResultSet</code>. The names of the
|
|
fields can be injected into either of these
|
|
<code class="classname">LineTokenizer</code> implementations to increase the
|
|
readability of the mapping function. First, the column names of all
|
|
fields in the flat file are injected into the tokenizer:</p><pre class="programlisting">tokenizer.setNames(<span class="hl-keyword">new</span> String[] {<span class="hl-string">"ID"</span>, <span class="hl-string">"lastName"</span>,<span class="hl-string">"firstName"</span>,<span class="hl-string">"position"</span>,<span class="hl-string">"birthYear"</span>,<span class="hl-string">"debutYear"</span>}); </pre><p>A <code class="classname">FieldSetMapper</code> can use this information
|
|
as follows:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> PlayerMapper <span class="hl-keyword">implements</span> FieldSetMapper<Player> {
|
|
<span class="hl-keyword">public</span> Player mapFieldSet(FieldSet fs) {
|
|
|
|
<span class="hl-keyword">if</span>(fs == null){
|
|
<span class="hl-keyword">return</span> null;
|
|
}
|
|
|
|
Player player = <span class="hl-keyword">new</span> Player();
|
|
player.setID(fs.readString(<span class="bold"><strong>"ID"</strong></span>));
|
|
player.setLastName(fs.readString(<span class="bold"><strong>"lastName"</strong></span>));
|
|
player.setFirstName(fs.readString(<span class="bold"><strong>"firstName"</strong></span>));
|
|
player.setPosition(fs.readString(<span class="bold"><strong>"position"</strong></span>));
|
|
player.setDebutYear(fs.readInt(<span class="bold"><strong>"debutYear"</strong></span>));
|
|
player.setBirthYear(fs.readInt(<span class="bold"><strong>"birthYear"</strong></span>));
|
|
|
|
<span class="hl-keyword">return</span> player;
|
|
}
|
|
}</pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="beanWrapperFieldSetMapper" href="#beanWrapperFieldSetMapper"></a>Automapping FieldSets to Domain Objects</h4></div></div></div><p>For many, having to write a specific
|
|
<code class="classname">FieldSetMapper</code> is equally as cumbersome as
|
|
writing a specific <code class="classname">RowMapper</code> for a
|
|
<code class="classname">JdbcTemplate</code>. Spring Batch makes this easier by
|
|
providing a <code class="classname">FieldSetMapper</code> that automatically
|
|
maps fields by matching a field name with a setter on the object using
|
|
the JavaBean specification. Again using the football example, the
|
|
<code class="classname">BeanWrapperFieldSetMapper</code> configuration looks
|
|
like the following:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"fieldSetMapper"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"prototypeBeanName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"player"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"player"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.sample.domain.Player"</span>
|
|
<span class="hl-attribute">scope</span>=<span class="hl-value">"prototype"</span><span class="hl-tag"> /></span></pre><p>For each entry in the <code class="classname">FieldSet</code>, the
|
|
mapper will look for a corresponding setter on a new instance of the
|
|
<code class="classname">Player</code> object (for this reason, prototype scope
|
|
is required) in the same way the Spring container will look for
|
|
setters matching a property name. Each available field in the
|
|
<code class="classname">FieldSet</code> will be mapped, and the resultant
|
|
<code class="classname">Player</code> object will be returned, with no code
|
|
required.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="fixedLengthFileFormats" href="#fixedLengthFileFormats"></a>Fixed Length File Formats</h4></div></div></div><p>So far only delimited files have been discussed in much detail,
|
|
however, they represent only half of the file reading picture. Many
|
|
organizations that use flat files use fixed length formats. An example
|
|
fixed length file is below:</p><pre class="programlisting">UK21341EAH4121131.11customer1
|
|
UK21341EAH4221232.11customer2
|
|
UK21341EAH4321333.11customer3
|
|
UK21341EAH4421434.11customer4
|
|
UK21341EAH4521535.11customer5</pre><p>While this looks like one large field, it actually represent 4
|
|
distinct fields:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>ISIN: Unique identifier for the item being order - 12
|
|
characters long.</p></li><li class="listitem"><p>Quantity: Number of this item being ordered - 3 characters
|
|
long.</p></li><li class="listitem"><p>Price: Price of the item - 5 characters long.</p></li><li class="listitem"><p>Customer: Id of the customer ordering the item - 9
|
|
characters long.</p></li></ol></div><p>When configuring the
|
|
<code class="classname">FixedLengthLineTokenizer</code>, each of these lengths
|
|
must be provided in the form of ranges:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"fixedLengthLineTokenizer"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.io.file.transform.FixedLengthTokenizer"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"names"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"ISIN,Quantity,Price,Customer"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"columns"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"1-12, 13-15, 16-20, 21-29"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><p>Because the <code class="classname">FixedLengthLineTokenizer</code> uses
|
|
the same <code class="classname">LineTokenizer</code> interface as discussed
|
|
above, it will return the same <code class="classname">FieldSet</code> as if a
|
|
delimiter had been used. This allows the same approaches to be used in
|
|
handling its output, such as using the
|
|
<code class="classname">BeanWrapperFieldSetMapper</code>.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>Supporting the above syntax for ranges requires that a
|
|
specialized property editor,
|
|
<code class="classname">RangeArrayPropertyEditor</code>, be configured in
|
|
the <code class="classname">ApplicationContext</code>. However, this bean
|
|
is automatically declared in an
|
|
<code class="classname">ApplicationContext</code> where the batch
|
|
namespace is used.</p></td></tr></table></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="prefixMatchingLineMapper" href="#prefixMatchingLineMapper"></a>Multiple Record Types within a Single File</h4></div></div></div><p>All of the file reading examples up to this point have all made
|
|
a key assumption for simplicity's sake: all of the records in a file
|
|
have the same format. However, this may not always be the case. It is
|
|
very common that a file might have records with different formats that
|
|
need to be tokenized differently and mapped to different objects. The
|
|
following excerpt from a file illustrates this:</p><pre class="programlisting">USER;Smith;Peter;;T;20014539;F
|
|
LINEA;1044391041ABC037.49G201XX1383.12H
|
|
LINEB;2134776319DEF422.99M005LI</pre><p>In this file we have three types of records, "USER", "LINEA",
|
|
and "LINEB". A "USER" line corresponds to a User object. "LINEA" and
|
|
"LINEB" both correspond to Line objects, though a "LINEA" has more
|
|
information than a "LINEB".</p><p>The <code class="classname">ItemReader </code>will read each line
|
|
individually, but we must specify different
|
|
<code class="classname">LineTokenizer</code> and
|
|
<code class="classname">FieldSetMapper</code> objects so that the
|
|
<code class="classname">ItemWriter</code> will receive the correct items. The
|
|
<code class="classname">PatternMatchingCompositeLineMapper</code> makes this
|
|
easy by allowing maps of patterns to
|
|
<code class="classname">LineTokenizer</code>s and patterns to
|
|
<code class="classname">FieldSetMapper</code>s to be configured:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"orderFileLineMapper"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...PatternMatchingCompositeLineMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"tokenizers"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><map></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"USER*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"userTokenizer"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"LINEA*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"lineATokenizer"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"LINEB*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"lineBTokenizer"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></map></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"fieldSetMappers"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><map></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"USER*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"userFieldSetMapper"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"LINE*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"lineFieldSetMapper"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></map></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>In this example, "LINEA" and "LINEB" have separate
|
|
<code class="classname">LineTokenizer</code>s but they both use the same
|
|
<code class="classname">FieldSetMapper</code>.</p><p>The <code class="classname">PatternMatchingCompositeLineMapper</code>
|
|
makes use of the <code class="classname">PatternMatcher</code>'s
|
|
<code class="classname">match</code> method in order to select the correct
|
|
delegate for each line. The <code class="classname">PatternMatcher</code>
|
|
allows for two wildcard characters with special meaning: the question
|
|
mark ("?") will match exactly one character, while the asterisk ("*")
|
|
will match zero or more characters. Note that in the configuration
|
|
above, all patterns end with an asterisk, making them effectively
|
|
prefixes to lines. The <code class="classname">PatternMatcher</code> will
|
|
always match the most specific pattern possible, regardless of the
|
|
order in the configuration. So if "LINE*" and "LINEA*" were both
|
|
listed as patterns, "LINEA" would match pattern "LINEA*", while
|
|
"LINEB" would match pattern "LINE*". Additionally, a single asterisk
|
|
("*") can serve as a default by matching any line not matched by any
|
|
other pattern.</p><pre class="programlisting"><span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"defaultLineTokenizer"</span><span class="hl-tag"> /></span></pre><p>There is also a
|
|
<code class="classname">PatternMatchingCompositeLineTokenizer</code> that can
|
|
be used for tokenization alone.</p><p>It is also common for a flat file to contain records that each
|
|
span multiple lines. To handle this situation, a more complex strategy
|
|
is required. A demonstration of this common pattern can be found in
|
|
<a class="xref" href="#multiLineRecords" title="11.5 Multi-Line Records">Section 11.5, “Multi-Line Records”</a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="exceptionHandlingInFlatFiles" href="#exceptionHandlingInFlatFiles"></a>Exception Handling in Flat Files</h4></div></div></div><p>There are many scenarios when tokenizing a line may cause
|
|
exceptions to be thrown. Many flat files are imperfect and contain
|
|
records that aren't formatted correctly. Many users choose to skip
|
|
these erroneous lines, logging out the issue, original line, and line
|
|
number. These logs can later be inspected manually or by another batch
|
|
job. For this reason, Spring Batch provides a hierarchy of exceptions
|
|
for handling parse exceptions:
|
|
<code class="classname">FlatFileParseException</code> and
|
|
<code class="classname">FlatFileFormatException</code>.
|
|
<code class="classname">FlatFileParseException</code> is thrown by the
|
|
<code class="classname">FlatFileItemReader</code> when any errors are
|
|
encountered while trying to read a file.
|
|
<code class="classname">FlatFileFormatException</code> is thrown by
|
|
implementations of the <code class="classname">LineTokenizer</code> interface,
|
|
and indicates a more specific error encountered while
|
|
tokenizing.</p><div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="incorrectTokenCountException" href="#incorrectTokenCountException"></a>IncorrectTokenCountException</h5></div></div></div><p>Both <code class="classname">DelimitedLineTokenizer</code> and
|
|
<code class="classname">FixedLengthLineTokenizer</code> have the ability to
|
|
specify column names that can be used for creating a
|
|
<code class="classname">FieldSet</code>. However, if the number of column
|
|
names doesn't match the number of columns found while tokenizing a
|
|
line the <code class="classname">FieldSet</code> can't be created, and a
|
|
<code class="classname">IncorrectTokenCountException</code> is thrown, which
|
|
contains the number of tokens encountered, and the number
|
|
expected:</p><pre class="programlisting">tokenizer.setNames(<span class="hl-keyword">new</span> String[] {<span class="hl-string">"A"</span>, <span class="hl-string">"B"</span>, <span class="hl-string">"C"</span>, <span class="hl-string">"D"</span>});
|
|
|
|
<span class="hl-keyword">try</span> {
|
|
tokenizer.tokenize(<span class="hl-string">"a,b,c"</span>);
|
|
}
|
|
<span class="hl-keyword">catch</span>(IncorrectTokenCountException e){
|
|
assertEquals(<span class="hl-number">4</span>, e.getExpectedCount());
|
|
assertEquals(<span class="hl-number">3</span>, e.getActualCount());
|
|
}</pre><p>Because the tokenizer was configured with 4 column names, but
|
|
only 3 tokens were found in the file, an
|
|
<code class="classname">IncorrectTokenCountException</code> was
|
|
thrown.</p></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="incorrectLineLengthException" href="#incorrectLineLengthException"></a>IncorrectLineLengthException</h5></div></div></div><p>Files formatted in a fixed length format have additional
|
|
requirements when parsing because, unlike a delimited format, each
|
|
column must strictly adhere to its predefined width. If the total
|
|
line length doesn't add up to the widest value of this column, an
|
|
exception is thrown:</p><pre class="programlisting">tokenizer.setColumns(<span class="hl-keyword">new</span> Range[] { <span class="hl-keyword">new</span> Range(<span class="hl-number">1</span>, <span class="hl-number">5</span>),
|
|
<span class="hl-keyword">new</span> Range(<span class="hl-number">6</span>, <span class="hl-number">10</span>),
|
|
<span class="hl-keyword">new</span> Range(<span class="hl-number">11</span>, <span class="hl-number">15</span>) });
|
|
<span class="hl-keyword">try</span> {
|
|
tokenizer.tokenize(<span class="hl-string">"12345"</span>);
|
|
fail(<span class="hl-string">"Expected IncorrectLineLengthException"</span>);
|
|
}
|
|
<span class="hl-keyword">catch</span> (IncorrectLineLengthException ex) {
|
|
assertEquals(<span class="hl-number">15</span>, ex.getExpectedLength());
|
|
assertEquals(<span class="hl-number">5</span>, ex.getActualLength());
|
|
}</pre><p>The configured ranges for the tokenizer above are: 1-5, 6-10,
|
|
and 11-15, thus the total length of the line expected is 15.
|
|
However, in this case a line of length 5 was passed in, causing an
|
|
<code class="classname">IncorrectLineLengthException</code> to be thrown.
|
|
Throwing an exception here rather than only mapping the first column
|
|
allows the processing of the line to fail earlier, and with more
|
|
information than it would if it failed while trying to read in
|
|
column 2 in a <code class="classname">FieldSetMapper</code>. However, there
|
|
are scenarios where the length of the line isn't always constant.
|
|
For this reason, validation of line length can be turned off via the
|
|
'strict' property:</p><pre class="programlisting">tokenizer.setColumns(<span class="hl-keyword">new</span> Range[] { <span class="hl-keyword">new</span> Range(<span class="hl-number">1</span>, <span class="hl-number">5</span>), <span class="hl-keyword">new</span> Range(<span class="hl-number">6</span>, <span class="hl-number">10</span>) });
|
|
<span class="bold"><strong>tokenizer.setStrict(false);</strong></span>
|
|
FieldSet tokens = tokenizer.tokenize(<span class="hl-string">"12345"</span>);
|
|
assertEquals(<span class="hl-string">"12345"</span>, tokens.readString(<span class="hl-number">0</span>));
|
|
assertEquals(<span class="hl-string">""</span>, tokens.readString(<span class="hl-number">1</span>));</pre><p>The above example is almost identical to the one before it,
|
|
except that tokenizer.setStrict(false) was called. This setting
|
|
tells the tokenizer to not enforce line lengths when tokenizing the
|
|
line. A <code class="classname">FieldSet</code> is now correctly created and
|
|
returned. However, it will only contain empty tokens for the
|
|
remaining values.</p></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="flatFileItemWriter" href="#flatFileItemWriter"></a>6.6.3 FlatFileItemWriter</h3></div></div></div><p>Writing out to flat files has the same problems and issues that
|
|
reading in from a file must overcome. A step must be able to write out
|
|
in either delimited or fixed length formats in a transactional
|
|
manner.</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="lineAggregator" href="#lineAggregator"></a>LineAggregator</h4></div></div></div><p>Just as the <code class="classname">LineTokenizer</code> interface is
|
|
necessary to take an item and turn it into a
|
|
<code class="classname">String</code>, file writing must have a way to
|
|
aggregate multiple fields into a single string for writing to a file.
|
|
In Spring Batch this is the
|
|
<code class="classname">LineAggregator</code>:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> LineAggregator<T> {
|
|
|
|
<span class="hl-keyword">public</span> String aggregate(T item);
|
|
|
|
}</pre><p>The <code class="classname">LineAggregator</code> is the opposite of a
|
|
<code class="classname">LineTokenizer</code>.
|
|
<code class="classname">LineTokenizer</code> takes a
|
|
<code class="classname">String</code> and returns a
|
|
<code class="classname">FieldSet</code>, whereas
|
|
<code class="classname">LineAggregator</code> takes an
|
|
<code class="classname">item</code> and returns a
|
|
<code class="classname">String</code>.</p><div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="PassThroughLineAggregator" href="#PassThroughLineAggregator"></a>PassThroughLineAggregator</h5></div></div></div><p>The most basic implementation of the LineAggregator interface
|
|
is the <code class="classname">PassThroughLineAggregator</code>, which
|
|
simply assumes that the object is already a string, or that its
|
|
string representation is acceptable for writing:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> PassThroughLineAggregator<T> <span class="hl-keyword">implements</span> LineAggregator<T> {
|
|
|
|
<span class="hl-keyword">public</span> String aggregate(T item) {
|
|
<span class="hl-keyword">return</span> item.toString();
|
|
}
|
|
}</pre><p>The above implementation is useful if direct control of
|
|
creating the string is required, but the advantages of a
|
|
<code class="classname">FlatFileItemWriter</code>, such as transaction and
|
|
restart support, are necessary.</p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="SimplifiedFileWritingExample" href="#SimplifiedFileWritingExample"></a>Simplified File Writing Example</h4></div></div></div><p>Now that the <code class="classname">LineAggregator</code> interface and
|
|
its most basic implementation,
|
|
<code class="classname">PassThroughLineAggregator</code>, have been defined,
|
|
the basic flow of writing can be explained:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>The object to be written is passed to the
|
|
<code class="classname">LineAggregator</code> in order to obtain a
|
|
<code class="classname">String</code>.</p></li><li class="listitem"><p>The returned <code class="classname">String</code> is written to the
|
|
configured file.</p></li></ol></div><p>The following excerpt from the
|
|
<code class="classname">FlatFileItemWriter</code> expresses this in
|
|
code:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(T item) <span class="hl-keyword">throws</span> Exception {
|
|
write(lineAggregator.aggregate(item) + LINE_SEPARATOR);
|
|
}</pre><p>A simple configuration would look like the following:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...FlatFileItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"file:target/test-outputs/output.txt"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"lineAggregator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...PassThroughLineAggregator"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="FieldExtractor" href="#FieldExtractor"></a>FieldExtractor</h4></div></div></div><p>The above example may be useful for the most basic uses of a
|
|
writing to a file. However, most users of the
|
|
<code class="classname">FlatFileItemWriter</code> will have a domain object
|
|
that needs to be written out, and thus must be converted into a line.
|
|
In file reading, the following was required:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>Read one line from the file.</p></li><li class="listitem"><p>Pass the string line into the
|
|
<code class="methodname">LineTokenizer#tokenize</code>() method, in
|
|
order to retrieve a <code class="classname">FieldSet</code></p></li><li class="listitem"><p>Pass the <code class="classname">FieldSet</code> returned from
|
|
tokenizing to a <code class="classname">FieldSetMapper</code>, returning
|
|
the result from the <code class="methodname">ItemReader#read</code>()
|
|
method</p></li></ol></div><p>File writing has similar, but inverse steps:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>Pass the item to be written to the writer</p></li><li class="listitem"><p>convert the fields on the item into an array</p></li><li class="listitem"><p>aggregate the resulting array into a line</p></li></ol></div><p>Because there is no way for the framework to know which fields
|
|
from the object need to be written out, a
|
|
<code class="classname">FieldExtractor</code> must be written to accomplish
|
|
the task of turning the item into an array:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> FieldExtractor<T> {
|
|
|
|
Object[] extract(T item);
|
|
|
|
}</pre><p>Implementations of the <code class="classname">FieldExtractor</code>
|
|
interface should create an array from the fields of the provided
|
|
object, which can then be written out with a delimiter between the
|
|
elements, or as part of a field-width line.</p><div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="PassThroughFieldExtractor" href="#PassThroughFieldExtractor"></a>PassThroughFieldExtractor</h5></div></div></div><p>There are many cases where a collection, such as an array,
|
|
<code class="classname">Collection</code>, or
|
|
<code class="classname">FieldSet</code>, needs to be written out.
|
|
"Extracting" an array from a one of these collection types is very
|
|
straightforward: simply convert the collection to an array.
|
|
Therefore, the <code class="classname">PassThroughFieldExtractor</code>
|
|
should be used in this scenario. It should be noted, that if the
|
|
object passed in is not a type of collection, then the
|
|
<code class="classname">PassThroughFieldExtractor</code> will return an
|
|
array containing solely the item to be extracted.</p></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="BeanWrapperFieldExtractor" href="#BeanWrapperFieldExtractor"></a>BeanWrapperFieldExtractor</h5></div></div></div><p>As with the <code class="classname">BeanWrapperFieldSetMapper</code>
|
|
described in the file reading section, it is often preferable to
|
|
configure how to convert a domain object to an object array, rather
|
|
than writing the conversion yourself. The
|
|
<code class="classname">BeanWrapperFieldExtractor</code> provides just this
|
|
type of functionality:</p><pre class="programlisting">BeanWrapperFieldExtractor<Name> extractor = <span class="hl-keyword">new</span> BeanWrapperFieldExtractor<Name>();
|
|
extractor.setNames(<span class="hl-keyword">new</span> String[] { <span class="hl-string">"first"</span>, <span class="hl-string">"last"</span>, <span class="hl-string">"born"</span> });
|
|
|
|
String first = <span class="hl-string">"Alan"</span>;
|
|
String last = <span class="hl-string">"Turing"</span>;
|
|
<span class="hl-keyword">int</span> born = <span class="hl-number">1912</span>;
|
|
|
|
Name n = <span class="hl-keyword">new</span> Name(first, last, born);
|
|
Object[] values = extractor.extract(n);
|
|
|
|
assertEquals(first, values[<span class="hl-number">0</span>]);
|
|
assertEquals(last, values[<span class="hl-number">1</span>]);
|
|
assertEquals(born, values[<span class="hl-number">2</span>]);</pre><p>This extractor implementation has only one required property,
|
|
the names of the fields to map. Just as the
|
|
<code class="classname">BeanWrapperFieldSetMapper</code> needs field names
|
|
to map fields on the <code class="classname">FieldSet</code> to setters on
|
|
the provided object, the
|
|
<code class="classname">BeanWrapperFieldExtractor</code> needs names to map
|
|
to getters for creating an object array. It is worth noting that the
|
|
order of the names determines the order of the fields within the
|
|
array.</p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="delimitedFileWritingExample" href="#delimitedFileWritingExample"></a>Delimited File Writing Example</h4></div></div></div><p>The most basic flat file format is one in which all fields are
|
|
separated by a delimiter. This can be accomplished using a
|
|
<code class="classname">DelimitedLineAggregator</code>. The example below
|
|
writes out a simple domain object that represents a credit to a
|
|
customer account:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> CustomerCredit {
|
|
|
|
<span class="hl-keyword">private</span> <span class="hl-keyword">int</span> id;
|
|
<span class="hl-keyword">private</span> String name;
|
|
<span class="hl-keyword">private</span> BigDecimal credit;
|
|
|
|
<span class="hl-comment">//getters and setters removed for clarity</span>
|
|
}</pre><p>Because a domain object is being used, an implementation of the
|
|
FieldExtractor interface must be provided, along with the delimiter to
|
|
use:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.file.FlatFileItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"outputResource"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"lineAggregator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...DelimitedLineAggregator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"delimiter"</span> <span class="hl-attribute">value</span>=<span class="hl-value">","</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"fieldExtractor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...BeanWrapperFieldExtractor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"names"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"name,credit"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>In this case, the
|
|
<code class="classname">BeanWrapperFieldExtractor</code> described earlier in
|
|
this chapter is used to turn the name and credit fields within
|
|
<code class="classname">CustomerCredit</code> into an object array, which is
|
|
then written out with commas between each field.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="fixedWidthFileWritingExample" href="#fixedWidthFileWritingExample"></a>Fixed Width File Writing Example</h4></div></div></div><p>Delimited is not the only type of flat file format. Many prefer
|
|
to use a set width for each column to delineate between fields, which
|
|
is usually referred to as 'fixed width'. Spring Batch supports this in
|
|
file writing via the <code class="classname">FormatterLineAggregator</code>.
|
|
Using the same <code class="classname">CustomerCredit</code> domain object
|
|
described above, it can be configured as follows:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.file.FlatFileItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"outputResource"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"lineAggregator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...FormatterLineAggregator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"fieldExtractor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...BeanWrapperFieldExtractor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"names"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"name,credit"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"format"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"%-9s%-2.0f"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>Most of the above example should look familiar. However, the
|
|
value of the format property is new:</p><pre class="programlisting"><span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"format"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"%-9s%-2.0f"</span><span class="hl-tag"> /></span></pre><p>The underlying implementation is built using the same
|
|
<code class="classname">Formatter</code> added as part of Java 5. The Java
|
|
<code class="classname">Formatter</code> is based on the
|
|
<code class="methodname">printf</code> functionality of the C programming
|
|
language. Most details on how to configure a formatter can be found in
|
|
the javadoc of <a class="ulink" href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/Formatter.html" target="_top"><em class="citetitle">Formatter</em></a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="handlingFileCreation" href="#handlingFileCreation"></a>Handling File Creation</h4></div></div></div><p><code class="classname">FlatFileItemReader</code> has a very simple
|
|
relationship with file resources. When the reader is initialized, it
|
|
opens the file if it exists, and throws an exception if it does not.
|
|
File writing isn't quite so simple. At first glance it seems like a
|
|
similar straight forward contract should exist for
|
|
<code class="classname">FlatFileItemWriter</code>: if the file already exists,
|
|
throw an exception, and if it does not, create it and start writing.
|
|
However, potentially restarting a <code class="classname">Job</code> can cause
|
|
issues. In normal restart scenarios, the contract is reversed: if the
|
|
file exists, start writing to it from the last known good position,
|
|
and if it does not, throw an exception. However, what happens if the
|
|
file name for this job is always the same? In this case, you would
|
|
want to delete the file if it exists, unless it's a restart. Because
|
|
of this possibility, the <code class="classname">FlatFileItemWriter</code>
|
|
contains the property, <code class="methodname">shouldDeleteIfExists</code>.
|
|
Setting this property to true will cause an existing file with the
|
|
same name to be deleted when the writer is opened.</p></div></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="xmlReadingWriting" href="#xmlReadingWriting"></a>6.7 XML Item Readers and Writers</h2></div></div></div><p>Spring Batch provides transactional infrastructure for both reading
|
|
XML records and mapping them to Java objects as well as writing Java
|
|
objects as XML records.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note: Constraints on streaming XML"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Constraints on streaming XML</th></tr><tr><td align="left" valign="top"><p>The StAX API is used for I/O as other standard XML parsing APIs do
|
|
not fit batch processing requirements (DOM loads the whole input into
|
|
memory at once and SAX controls the parsing process allowing the user
|
|
only to provide callbacks).</p></td></tr></table></div><p>Lets take a closer look how XML input and output works in Spring
|
|
Batch. First, there are a few concepts that vary from file reading and
|
|
writing but are common across Spring Batch XML processing. With XML
|
|
processing, instead of lines of records (FieldSets) that need to be
|
|
tokenized, it is assumed an XML resource is a collection of 'fragments'
|
|
corresponding to individual records:</p><div class="mediaobject" align="center"><img src="images/xmlinput.png" align="middle"><div class="caption"><p>Figure 3.1: XML Input</p></div></div><p>The 'trade' tag is defined as the 'root element' in the scenario
|
|
above. Everything between '<trade>' and '</trade>' is
|
|
considered one 'fragment'. Spring Batch uses Object/XML Mapping (OXM) to
|
|
bind fragments to objects. However, Spring Batch is not tied to any
|
|
particular XML binding technology. Typical use is to delegate to <a class="ulink" href="http://docs.spring.io/spring-ws/site/reference/html/oxm.html" target="_top"><em class="citetitle">Spring
|
|
OXM</em></a>, which provides uniform abstraction for the most
|
|
popular OXM technologies. The dependency on Spring OXM is optional and you
|
|
can choose to implement Spring Batch specific interfaces if desired. The
|
|
relationship to the technologies that OXM supports can be shown as the
|
|
following:</p><div class="mediaobject" align="center"><img src="images/oxm-fragments.png" align="middle"><div class="caption"><p>Figure 3.2: OXM Binding</p></div></div><p>Now with an introduction to OXM and how one can use XML fragments to
|
|
represent records, let's take a closer look at readers and writers.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="StaxEventItemReader" href="#StaxEventItemReader"></a>6.7.1 StaxEventItemReader</h3></div></div></div><p>The <code class="classname">StaxEventItemReader</code> configuration
|
|
provides a typical setup for the processing of records from an XML input
|
|
stream. First, lets examine a set of XML records that the
|
|
<code class="classname">StaxEventItemReader</code> can process.</p><pre class="programlisting"><span class="hl-directive" style="color: maroon"><?xml version="1.0" encoding="UTF-8"?></span>
|
|
<span class="hl-tag"><records></span>
|
|
<span class="hl-tag"><trade</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"http://springframework.org/batch/sample/io/oxm/domain"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><isin></span>XYZ0001<span class="hl-tag"></isin></span>
|
|
<span class="hl-tag"><quantity></span>5<span class="hl-tag"></quantity></span>
|
|
<span class="hl-tag"><price></span>11.39<span class="hl-tag"></price></span>
|
|
<span class="hl-tag"><customer></span>Customer1<span class="hl-tag"></customer></span>
|
|
<span class="hl-tag"></trade></span>
|
|
<span class="hl-tag"><trade</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"http://springframework.org/batch/sample/io/oxm/domain"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><isin></span>XYZ0002<span class="hl-tag"></isin></span>
|
|
<span class="hl-tag"><quantity></span>2<span class="hl-tag"></quantity></span>
|
|
<span class="hl-tag"><price></span>72.99<span class="hl-tag"></price></span>
|
|
<span class="hl-tag"><customer></span>Customer2c<span class="hl-tag"></customer></span>
|
|
<span class="hl-tag"></trade></span>
|
|
<span class="hl-tag"><trade</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"http://springframework.org/batch/sample/io/oxm/domain"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><isin></span>XYZ0003<span class="hl-tag"></isin></span>
|
|
<span class="hl-tag"><quantity></span>9<span class="hl-tag"></quantity></span>
|
|
<span class="hl-tag"><price></span>99.99<span class="hl-tag"></price></span>
|
|
<span class="hl-tag"><customer></span>Customer3<span class="hl-tag"></customer></span>
|
|
<span class="hl-tag"></trade></span>
|
|
<span class="hl-tag"></records></span></pre><p>To be able to process the XML records the following is needed:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Root Element Name - Name of the root element of the fragment
|
|
that constitutes the object to be mapped. The example
|
|
configuration demonstrates this with the value of trade.</p></li><li class="listitem"><p>Resource - Spring Resource that represents the file to be
|
|
read.</p></li><li class="listitem"><p><code class="classname">Unmarshaller</code> - Unmarshalling
|
|
facility provided by Spring OXM for mapping the XML fragment to an
|
|
object.</p></li></ul></div><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.xml.StaxEventItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"fragmentRootElementName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"trade"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"data/iosample/input/input.xml"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"unmarshaller"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"tradeMarshaller"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><p>Notice that in this example we have chosen to use an
|
|
<code class="classname">XStreamMarshaller</code> which accepts an alias passed
|
|
in as a map with the first key and value being the name of the fragment
|
|
(i.e. root element) and the object type to bind. Then, similar to a
|
|
<code class="classname">FieldSet</code>, the names of the other elements that
|
|
map to fields within the object type are described as key/value pairs in
|
|
the map. In the configuration file we can use a Spring configuration
|
|
utility to describe the required alias as follows:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"tradeMarshaller"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.oxm.xstream.XStreamMarshaller"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"aliases"</span><span class="hl-tag">></span>
|
|
<span class="bold"><strong> <util:map id="aliases">
|
|
<entry key="trade"
|
|
value="org.springframework.batch.sample.domain.Trade" />
|
|
<entry key="price" value="java.math.BigDecimal" />
|
|
<entry key="name" value="java.lang.String" />
|
|
</util:map></strong></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>On input the reader reads the XML resource until it recognizes
|
|
that a new fragment is about to start (by matching the tag name by
|
|
default). The reader creates a standalone XML document from the fragment
|
|
(or at least makes it appear so) and passes the document to a
|
|
deserializer (typically a wrapper around a Spring OXM
|
|
<code class="classname">Unmarshaller</code>) to map the XML to a Java
|
|
object.</p><p>In summary, this procedure is analogous to the following scripted
|
|
Java code which uses the injection provided by the Spring
|
|
configuration:</p><pre class="programlisting">StaxEventItemReader xmlStaxEventItemReader = <span class="hl-keyword">new</span> StaxEventItemReader()
|
|
Resource resource = <span class="hl-keyword">new</span> ByteArrayResource(xmlResource.getBytes())
|
|
|
|
Map aliases = <span class="hl-keyword">new</span> HashMap();
|
|
aliases.put(<span class="hl-string">"trade"</span>,<span class="hl-string">"org.springframework.batch.sample.domain.Trade"</span>);
|
|
aliases.put(<span class="hl-string">"price"</span>,<span class="hl-string">"java.math.BigDecimal"</span>);
|
|
aliases.put(<span class="hl-string">"customer"</span>,<span class="hl-string">"java.lang.String"</span>);
|
|
XStreamMarshaller unmarshaller = <span class="hl-keyword">new</span> XStreamMarshaller();
|
|
unmarshaller.setAliases(aliases);
|
|
xmlStaxEventItemReader.setUnmarshaller(unmarshaller);
|
|
xmlStaxEventItemReader.setResource(resource);
|
|
xmlStaxEventItemReader.setFragmentRootElementName(<span class="hl-string">"trade"</span>);
|
|
xmlStaxEventItemReader.open(<span class="hl-keyword">new</span> ExecutionContext());
|
|
|
|
<span class="hl-keyword">boolean</span> hasNext = true
|
|
|
|
CustomerCredit credit = null;
|
|
|
|
<span class="hl-keyword">while</span> (hasNext) {
|
|
credit = xmlStaxEventItemReader.read();
|
|
<span class="hl-keyword">if</span> (credit == null) {
|
|
hasNext = false;
|
|
}
|
|
<span class="hl-keyword">else</span> {
|
|
System.out.println(credit);
|
|
}
|
|
}</pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="StaxEventItemWriter" href="#StaxEventItemWriter"></a>6.7.2 StaxEventItemWriter</h3></div></div></div><p>Output works symmetrically to input. The
|
|
<code class="classname">StaxEventItemWriter</code> needs a
|
|
<code class="classname">Resource</code>, a marshaller, and a <code class="literal">rootTagName</code>. A Java
|
|
object is passed to a marshaller (typically a standard Spring OXM
|
|
<code class="classname">Marshaller</code>) which writes to a
|
|
<code class="classname">Resource</code> using a custom event writer that filters
|
|
the <code class="classname">StartDocument</code> and
|
|
<code class="classname">EndDocument</code> events produced for each fragment by
|
|
the OXM tools. We'll show this in an example using the
|
|
<code class="classname">MarshallingEventWriterSerializer</code>. The Spring
|
|
configuration for this setup looks as follows:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.xml.StaxEventItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"outputResource"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"marshaller"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"customerCreditMarshaller"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rootTagName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"customers"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"overwriteOutput"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"true"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><p>The configuration sets up the three required properties and
|
|
optionally sets the overwriteOutput=true, mentioned earlier in the
|
|
chapter for specifying whether an existing file can be overwritten. It
|
|
should be noted the marshaller used for the writer is the exact same as
|
|
the one used in the reading example from earlier in the chapter:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"customerCreditMarshaller"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.oxm.xstream.XStreamMarshaller"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"aliases"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><util:map</span> <span class="hl-attribute">id</span>=<span class="hl-value">"aliases"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"customer"</span>
|
|
<span class="hl-attribute">value</span>=<span class="hl-value">"org.springframework.batch.sample.domain.CustomerCredit"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"credit"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"java.math.BigDecimal"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"name"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"java.lang.String"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></util:map></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>To summarize with a Java example, the following code illustrates
|
|
all of the points discussed, demonstrating the programmatic setup of the
|
|
required properties:</p><pre class="programlisting">StaxEventItemWriter staxItemWriter = <span class="hl-keyword">new</span> StaxEventItemWriter()
|
|
FileSystemResource resource = <span class="hl-keyword">new</span> FileSystemResource(<span class="hl-string">"data/outputFile.xml"</span>)
|
|
|
|
Map aliases = <span class="hl-keyword">new</span> HashMap();
|
|
aliases.put(<span class="hl-string">"customer"</span>,<span class="hl-string">"org.springframework.batch.sample.domain.CustomerCredit"</span>);
|
|
aliases.put(<span class="hl-string">"credit"</span>,<span class="hl-string">"java.math.BigDecimal"</span>);
|
|
aliases.put(<span class="hl-string">"name"</span>,<span class="hl-string">"java.lang.String"</span>);
|
|
Marshaller marshaller = <span class="hl-keyword">new</span> XStreamMarshaller();
|
|
marshaller.setAliases(aliases);
|
|
|
|
staxItemWriter.setResource(resource);
|
|
staxItemWriter.setMarshaller(marshaller);
|
|
staxItemWriter.setRootTagName(<span class="hl-string">"trades"</span>);
|
|
staxItemWriter.setOverwriteOutput(true);
|
|
|
|
ExecutionContext executionContext = <span class="hl-keyword">new</span> ExecutionContext();
|
|
staxItemWriter.open(executionContext);
|
|
CustomerCredit Credit = <span class="hl-keyword">new</span> CustomerCredit();
|
|
trade.setPrice(<span class="hl-number">11.39</span>);
|
|
credit.setName(<span class="hl-string">"Customer1"</span>);
|
|
staxItemWriter.write(trade);</pre></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="multiFileInput" href="#multiFileInput"></a>6.8 Multi-File Input</h2></div></div></div><p>It is a common requirement to process multiple files within a single
|
|
<code class="classname">Step</code>. Assuming the files all have the same
|
|
formatting, the <code class="classname">MultiResourceItemReader</code> supports
|
|
this type of input for both XML and flat file processing. Consider the
|
|
following files in a directory:</p><pre class="programlisting">file-1.txt file-2.txt ignored.txt</pre><p>file-1.txt and file-2.txt are formatted the same and for business
|
|
reasons should be processed together. The
|
|
<code class="classname">MuliResourceItemReader</code> can be used to read in both
|
|
files by using wildcards:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"multiResourceReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...MultiResourceItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resources"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"classpath:data/input/file-*.txt"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"delegate"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"flatFileItemReader"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><p>The referenced delegate is a simple
|
|
<code class="classname">FlatFileItemReader</code>. The above configuration will
|
|
read input from both files, handling rollback and restart scenarios. It
|
|
should be noted that, as with any <code class="classname">ItemReader</code>,
|
|
adding extra input (in this case a file) could cause potential issues when
|
|
restarting. It is recommended that batch jobs work with their own
|
|
individual directories until completed successfully.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="database" href="#database"></a>6.9 Database</h2></div></div></div><p>Like most enterprise application styles, a database is the central
|
|
storage mechanism for batch. However, batch differs from other application
|
|
styles due to the sheer size of the datasets with which the system must
|
|
work. If a SQL statement returns 1 million rows, the result set probably
|
|
holds all returned results in memory until all rows have been read. Spring
|
|
Batch provides two types of solutions for this problem: Cursor and Paging
|
|
database ItemReaders.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="cursorBasedItemReaders" href="#cursorBasedItemReaders"></a>6.9.1 Cursor Based ItemReaders</h3></div></div></div><p>Using a database cursor is generally the default approach of most
|
|
batch developers, because it is the database's solution to the problem
|
|
of 'streaming' relational data. The Java
|
|
<code class="classname">ResultSet</code> class is essentially an object
|
|
orientated mechanism for manipulating a cursor. A
|
|
<code class="classname">ResultSet</code> maintains a cursor to the current row
|
|
of data. Calling <code class="methodname">next</code> on a
|
|
<code class="classname">ResultSet</code> moves this cursor to the next row.
|
|
Spring Batch cursor based ItemReaders open the a cursor on
|
|
initialization, and move the cursor forward one row for every call to
|
|
<code class="methodname">read</code>, returning a mapped object that can be
|
|
used for processing. The <code class="methodname">close</code> method will then
|
|
be called to ensure all resources are freed up. The Spring core
|
|
<code class="classname">JdbcTemplate</code> gets around this problem by using
|
|
the callback pattern to completely map all rows in a
|
|
<code class="classname">ResultSet</code> and close before returning control back
|
|
to the method caller. However, in batch this must wait until the step is
|
|
complete. Below is a generic diagram of how a cursor based
|
|
<code class="classname">ItemReader</code> works, and while a SQL statement is
|
|
used as an example since it is so widely known, any technology could
|
|
implement the basic approach:</p><div class="mediaobject" align="center"><img src="images/cursorExample.png" align="middle"></div><p>This example illustrates the basic pattern. Given a 'FOO' table,
|
|
which has three columns: ID, NAME, and BAR, select all rows with an ID
|
|
greater than 1 but less than 7. This puts the beginning of the cursor
|
|
(row 1) on ID 2. The result of this row should be a completely mapped
|
|
Foo object. Calling <code class="methodname">read</code>() again moves the
|
|
cursor to the next row, which is the Foo with an ID of 3. The results of
|
|
these reads will be written out after each
|
|
<code class="methodname">read</code>, thus allowing the objects to be garbage
|
|
collected (assuming no instance variables are maintaining references to
|
|
them).</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="JdbcCursorItemReader" href="#JdbcCursorItemReader"></a>JdbcCursorItemReader</h4></div></div></div><p><code class="classname">JdbcCursorItemReader</code> is the Jdbc
|
|
implementation of the cursor based technique. It works directly with a
|
|
<code class="classname">ResultSet</code> and requires a SQL statement to run
|
|
against a connection obtained from a
|
|
<code class="classname">DataSource</code>. The following database schema will
|
|
be used as an example:</p><pre class="programlisting"><span class="hl-keyword">CREATE</span> <span class="hl-keyword">TABLE</span> CUSTOMER (
|
|
ID <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">IDENTITY</span> <span class="hl-keyword">PRIMARY</span> <span class="hl-keyword">KEY</span>,
|
|
<span class="hl-keyword">NAME</span> <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">45</span>),
|
|
CREDIT <span class="hl-keyword">FLOAT</span>
|
|
);</pre><p>Many people prefer to use a domain object for each row, so we'll
|
|
use an implementation of the <code class="classname">RowMapper</code>
|
|
interface to map a <code class="classname">CustomerCredit</code>
|
|
object:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> CustomerCreditRowMapper <span class="hl-keyword">implements</span> RowMapper {
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">static</span> <span class="hl-keyword">final</span> String ID_COLUMN = <span class="hl-string">"id"</span>;
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">static</span> <span class="hl-keyword">final</span> String NAME_COLUMN = <span class="hl-string">"name"</span>;
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">static</span> <span class="hl-keyword">final</span> String CREDIT_COLUMN = <span class="hl-string">"credit"</span>;
|
|
|
|
<span class="hl-keyword">public</span> Object mapRow(ResultSet rs, <span class="hl-keyword">int</span> rowNum) <span class="hl-keyword">throws</span> SQLException {
|
|
CustomerCredit customerCredit = <span class="hl-keyword">new</span> CustomerCredit();
|
|
|
|
customerCredit.setId(rs.getInt(ID_COLUMN));
|
|
customerCredit.setName(rs.getString(NAME_COLUMN));
|
|
customerCredit.setCredit(rs.getBigDecimal(CREDIT_COLUMN));
|
|
|
|
<span class="hl-keyword">return</span> customerCredit;
|
|
}
|
|
}</pre><p>Because <code class="classname">JdbcTemplate</code> is so familiar to
|
|
users of Spring, and the <code class="classname">JdbcCursorItemReader</code>
|
|
shares key interfaces with it, it is useful to see an example of how
|
|
to read in this data with <code class="classname">JdbcTemplate</code>, in
|
|
order to contrast it with the <code class="classname">ItemReader</code>. For
|
|
the purposes of this example, let's assume there are 1,000 rows in the
|
|
CUSTOMER database. The first example will be using
|
|
<code class="classname">JdbcTemplate</code>:</p><pre class="programlisting"><span class="hl-comment">//For simplicity sake, assume a dataSource has already been obtained</span>
|
|
JdbcTemplate jdbcTemplate = <span class="hl-keyword">new</span> JdbcTemplate(dataSource);
|
|
List customerCredits = jdbcTemplate.query(<span class="hl-string">"SELECT ID, NAME, CREDIT from CUSTOMER"</span>,
|
|
<span class="hl-keyword">new</span> CustomerCreditRowMapper());</pre><p>After running this code snippet the customerCredits list will
|
|
contain 1,000 <code class="classname">CustomerCredit</code> objects. In the
|
|
query method, a connection will be obtained from the
|
|
<code class="classname">DataSource</code>, the provided SQL will be run
|
|
against it, and the <code class="methodname">mapRow</code> method will be
|
|
called for each row in the <code class="classname">ResultSet</code>. Let's
|
|
contrast this with the approach of the
|
|
<code class="classname">JdbcCursorItemReader</code>:</p><pre class="programlisting">JdbcCursorItemReader itemReader = <span class="hl-keyword">new</span> JdbcCursorItemReader();
|
|
itemReader.setDataSource(dataSource);
|
|
itemReader.setSql(<span class="hl-string">"SELECT ID, NAME, CREDIT from CUSTOMER"</span>);
|
|
itemReader.setRowMapper(<span class="hl-keyword">new</span> CustomerCreditRowMapper());
|
|
<span class="hl-keyword">int</span> counter = <span class="hl-number">0</span>;
|
|
ExecutionContext executionContext = <span class="hl-keyword">new</span> ExecutionContext();
|
|
itemReader.open(executionContext);
|
|
Object customerCredit = <span class="hl-keyword">new</span> Object();
|
|
<span class="hl-keyword">while</span>(customerCredit != null){
|
|
customerCredit = itemReader.read();
|
|
counter++;
|
|
}
|
|
itemReader.close(executionContext);</pre><p>After running this code snippet the counter will equal 1,000. If
|
|
the code above had put the returned customerCredit into a list, the
|
|
result would have been exactly the same as with the
|
|
<code class="classname">JdbcTemplate</code> example. However, the big
|
|
advantage of the <code class="classname">ItemReader</code> is that it allows
|
|
items to be 'streamed'. The <code class="methodname">read</code> method can
|
|
be called once, and the item written out via an
|
|
<code class="classname">ItemWriter</code>, and then the next item obtained via
|
|
<code class="methodname">read</code>. This allows item reading and writing to
|
|
be done in 'chunks' and committed periodically, which is the essence
|
|
of high performance batch processing. Furthermore, it is very easily
|
|
configured for injection into a Spring Batch
|
|
<code class="classname">Step</code>:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...JdbcCursorItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"sql"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"select ID, NAME, CREDIT from CUSTOMER"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.sample.domain.CustomerCreditRowMapper"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="JdbcCursorItemReaderProperties" href="#JdbcCursorItemReaderProperties"></a>Additional Properties</h5></div></div></div><p>Because there are so many varying options for opening a cursor
|
|
in Java, there are many properties on the
|
|
<code class="classname">JdbcCustorItemReader</code> that can be set:</p><div class="table"><a name="d5e2752" href="#d5e2752"></a><p class="title"><b>Table 6.2. JdbcCursorItemReader Properties</b></p><div class="table-contents"><table summary="JdbcCursorItemReader Properties" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">ignoreWarnings</td><td style="border-bottom: 0.5pt solid ; ">Determines whether or not SQLWarnings are logged or
|
|
cause an exception - default is true</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">fetchSize</td><td style="border-bottom: 0.5pt solid ; ">Gives the Jdbc driver a hint as to the number of rows
|
|
that should be fetched from the database when more rows are
|
|
needed by the <code class="classname">ResultSet</code> object used
|
|
by the <code class="classname">ItemReader</code>. By default, no
|
|
hint is given.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">maxRows</td><td style="border-bottom: 0.5pt solid ; ">Sets the limit for the maximum number of rows the
|
|
underlying <code class="classname">ResultSet</code> can hold at any
|
|
one time.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">queryTimeout</td><td style="border-bottom: 0.5pt solid ; ">Sets the number of seconds the driver will wait for a
|
|
<code class="classname">Statement</code> object to execute to the
|
|
given number of seconds. If the limit is exceeded, a
|
|
<code class="classname">DataAccessEception</code> is thrown.
|
|
(Consult your driver vendor documentation for
|
|
details).</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">verifyCursorPosition</td><td style="border-bottom: 0.5pt solid ; ">Because the same <code class="classname">ResultSet</code>
|
|
held by the <code class="classname">ItemReader</code> is passed to
|
|
the <code class="classname">RowMapper</code>, it is possible for
|
|
users to call <code class="methodname">ResultSet.next</code>()
|
|
themselves, which could cause issues with the reader's
|
|
internal count. Setting this value to true will cause an
|
|
exception to be thrown if the cursor position is not the
|
|
same after the <code class="classname">RowMapper</code> call as it
|
|
was before.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">saveState</td><td style="border-bottom: 0.5pt solid ; ">Indicates whether or not the reader's state should be
|
|
saved in the <code class="classname">ExecutionContext</code>
|
|
provided by
|
|
<code class="methodname">ItemStream#update</code>(<code class="classname">ExecutionContext</code>)
|
|
The default value is true.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">driverSupportsAbsolute</td><td style="border-bottom: 0.5pt solid ; ">Defaults to false. Indicates whether the Jdbc driver
|
|
supports setting the absolute row on a
|
|
<code class="classname">ResultSet</code>. It is recommended that
|
|
this is set to true for Jdbc drivers that supports
|
|
<code class="methodname">ResultSet.absolute</code>() as it may
|
|
improve performance, especially if a step fails while
|
|
working with a large data set.</td></tr><tr><td style="border-right: 0.5pt solid ; ">setUseSharedExtendedConnection</td><td style="">Defaults to false. Indicates whether the connection
|
|
used for the cursor should be used by all other processing
|
|
thus sharing the same transaction. If this is set to false,
|
|
which is the default, then the cursor will be opened using
|
|
its own connection and will not participate in any
|
|
transactions started for the rest of the step processing. If
|
|
you set this flag to true then you must wrap the
|
|
<code class="classname">DataSource</code> in an
|
|
<code class="classname">ExtendedConnectionDataSourceProxy</code> to
|
|
prevent the connection from being closed and released after
|
|
each commit. When you set this option to true then the
|
|
statement used to open the cursor will be created with both
|
|
'READ_ONLY' and 'HOLD_CUSORS_OVER_COMMIT' options. This
|
|
allows holding the cursor open over transaction start and
|
|
commits performed in the step processing. To use this
|
|
feature you need a database that supports this and a Jdbc
|
|
driver supporting Jdbc 3.0 or later.</td></tr></tbody></table></div></div><br class="table-break"></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="HibernateCursorItemReader" href="#HibernateCursorItemReader"></a>HibernateCursorItemReader</h4></div></div></div><p>Just as normal Spring users make important decisions about
|
|
whether or not to use ORM solutions, which affect whether or not they
|
|
use a <code class="classname">JdbcTemplate</code> or a
|
|
<code class="classname">HibernateTemplate</code>, Spring Batch users have the
|
|
same options. <code class="classname">HibernateCursorItemReader</code> is the
|
|
Hibernate implementation of the cursor technique. Hibernate's usage in
|
|
batch has been fairly controversial. This has largely been because
|
|
Hibernate was originally developed to support online application
|
|
styles. However, that doesn't mean it can't be used for batch
|
|
processing. The easiest approach for solving this problem is to use a
|
|
<code class="classname">StatelessSession</code> rather than a standard
|
|
session. This removes all of the caching and dirty checking hibernate
|
|
employs that can cause issues in a batch scenario. For more
|
|
information on the differences between stateless and normal hibernate
|
|
sessions, refer to the documentation of your specific hibernate
|
|
release. The <code class="classname">HibernateCursorItemReader</code> allows
|
|
you to declare an HQL statement and pass in a
|
|
<code class="classname">SessionFactory</code>, which will pass back one item
|
|
per call to <code class="methodname">read</code> in the same basic fashion as
|
|
the <code class="classname">JdbcCursorItemReader</code>. Below is an example
|
|
configuration using the same 'customer credit' example as the JDBC
|
|
reader:</p><pre class="programlisting">HibernateCursorItemReader itemReader = <span class="hl-keyword">new</span> HibernateCursorItemReader();
|
|
itemReader.setQueryString(<span class="hl-string">"from CustomerCredit"</span>);
|
|
<span class="hl-comment">//For simplicity sake, assume sessionFactory already obtained.</span>
|
|
itemReader.setSessionFactory(sessionFactory);
|
|
itemReader.setUseStatelessSession(true);
|
|
<span class="hl-keyword">int</span> counter = <span class="hl-number">0</span>;
|
|
ExecutionContext executionContext = <span class="hl-keyword">new</span> ExecutionContext();
|
|
itemReader.open(executionContext);
|
|
Object customerCredit = <span class="hl-keyword">new</span> Object();
|
|
<span class="hl-keyword">while</span>(customerCredit != null){
|
|
customerCredit = itemReader.read();
|
|
counter++;
|
|
}
|
|
itemReader.close(executionContext);</pre><p>This configured <code class="classname">ItemReader</code> will return
|
|
<code class="classname">CustomerCredit</code> objects in the exact same manner
|
|
as described by the <code class="classname">JdbcCursorItemReader</code>,
|
|
assuming hibernate mapping files have been created correctly for the
|
|
Customer table. The 'useStatelessSession' property defaults to true,
|
|
but has been added here to draw attention to the ability to switch it
|
|
on or off. It is also worth noting that the fetchSize of the
|
|
underlying cursor can be set via the setFetchSize property. As with
|
|
<code class="classname">JdbcCursorItemReader</code>, configuration is
|
|
straightforward:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.database.HibernateCursorItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"sessionFactory"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"sessionFactory"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"queryString"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"from CustomerCredit"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="StoredProcedureItemReader" href="#StoredProcedureItemReader"></a>StoredProcedureItemReader</h4></div></div></div><p>Sometimes it is necessary to obtain the cursor data using a
|
|
stored procedure. The <code class="classname">StoredProcedureItemReader</code>
|
|
works like the <code class="classname">JdbcCursorItemReader</code> except that
|
|
instead of executing a query to obtain a cursor we execute a stored
|
|
procedure that returns a cursor. The stored procedure can return the
|
|
cursor in three different ways:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>as a returned ResultSet (used by SQL Server, Sybase, DB2,
|
|
Derby and MySQL)</p></li><li class="listitem"><p>as a ref-cursor returned as an out parameter (used by Oracle
|
|
and PostgreSQL)</p></li><li class="listitem"><p>as the return value of a stored function call</p></li></ol></div><p>Below is a basic example configuration using the same 'customer
|
|
credit' example as earlier:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"reader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"o.s.batch.item.database.StoredProcedureItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"procedureName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"sp_customer_credit"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.sample.domain.CustomerCreditRowMapper"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
</pre><p>This example relies on the stored procedure to provide a
|
|
ResultSet as a returned result (option 1 above). </p><p>If the stored procedure returned a ref-cursor (option 2) then we
|
|
would need to provide the position of the out parameter that is the
|
|
returned ref-cursor. Here is an example where the first parameter is
|
|
the returned ref-cursor:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"reader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"o.s.batch.item.database.StoredProcedureItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"procedureName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"sp_customer_credit"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"refCursorPosition"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"1"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.sample.domain.CustomerCreditRowMapper"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
</pre><p>If the cursor was returned from a stored function (option 3) we
|
|
would need to set the property "<code class="varname">function</code>" to
|
|
<code class="literal">true</code>. It defaults to <code class="literal">false</code>. Here
|
|
is what that would look like:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"reader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"o.s.batch.item.database.StoredProcedureItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"procedureName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"sp_customer_credit"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"function"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"true"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.sample.domain.CustomerCreditRowMapper"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
</pre><p>In all of these cases we need to define a
|
|
<code class="classname">RowMapper</code> as well as a
|
|
<code class="classname">DataSource</code> and the actual procedure
|
|
name.</p><p>If the stored procedure or function takes in parameter then they
|
|
must be declared and set via the parameters property. Here is an
|
|
example for Oracle that declares three parameters. The first one is
|
|
the out parameter that returns the ref-cursor, the second and third
|
|
are in parameters that takes a value of type INTEGER:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"reader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"o.s.batch.item.database.StoredProcedureItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"procedureName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"spring.cursor_func"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"parameters"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><list></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.jdbc.core.SqlOutParameter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><constructor-arg</span> <span class="hl-attribute">index</span>=<span class="hl-value">"0"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"newid"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><constructor-arg</span> <span class="hl-attribute">index</span>=<span class="hl-value">"1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><util:constant</span> <span class="hl-attribute">static-field</span>=<span class="hl-value">"oracle.jdbc.OracleTypes.CURSOR"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></constructor-arg></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.jdbc.core.SqlParameter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><constructor-arg</span> <span class="hl-attribute">index</span>=<span class="hl-value">"0"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"amount"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><constructor-arg</span> <span class="hl-attribute">index</span>=<span class="hl-value">"1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><util:constant</span> <span class="hl-attribute">static-field</span>=<span class="hl-value">"java.sql.Types.INTEGER"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></constructor-arg></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.jdbc.core.SqlParameter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><constructor-arg</span> <span class="hl-attribute">index</span>=<span class="hl-value">"0"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"custid"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><constructor-arg</span> <span class="hl-attribute">index</span>=<span class="hl-value">"1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><util:constant</span> <span class="hl-attribute">static-field</span>=<span class="hl-value">"java.sql.Types.INTEGER"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></constructor-arg></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></list></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"refCursorPosition"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"1"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"rowMapper"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"preparedStatementSetter"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"parameterSetter"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span></pre><p>In addition to the parameter declarations we need to specify a
|
|
<code class="classname">PreparedStatementSetter</code> implementation that
|
|
sets the parameter values for the call. This works the same as for the
|
|
<code class="classname">JdbcCursorItemReader</code> above. All the additional
|
|
properties listed in <a class="xref" href="#JdbcCursorItemReaderProperties" title="Additional Properties">the section called “Additional Properties”</a>
|
|
apply to the <code class="classname">StoredProcedureItemReader</code> as well.
|
|
</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="pagingItemReaders" href="#pagingItemReaders"></a>6.9.2 Paging ItemReaders</h3></div></div></div><p>An alternative to using a database cursor is executing multiple
|
|
queries where each query is bringing back a portion of the results. We
|
|
refer to this portion as a page. Each query that is executed must
|
|
specify the starting row number and the number of rows that we want
|
|
returned for the page.</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="JdbcPagingItemReader" href="#JdbcPagingItemReader"></a>JdbcPagingItemReader</h4></div></div></div><p>One implementation of a paging <code class="classname">ItemReader</code>
|
|
is the <code class="classname">JdbcPagingItemReader</code>. The
|
|
<code class="classname">JdbcPagingItemReader</code> needs a
|
|
<code class="classname">PagingQueryProvider</code> responsible for providing
|
|
the SQL queries used to retrieve the rows making up a page. Since each
|
|
database has its own strategy for providing paging support, we need to
|
|
use a different <code class="classname">PagingQueryProvider</code> for each
|
|
supported database type. There is also the
|
|
<code class="classname">SqlPagingQueryProviderFactoryBean</code> that will
|
|
auto-detect the database that is being used and determine the
|
|
appropriate <code class="classname">PagingQueryProvider</code> implementation.
|
|
This simplifies the configuration and is the recommended best
|
|
practice.</p><p>The <code class="classname">SqlPagingQueryProviderFactoryBean</code>
|
|
requires that you specify a select clause and a from clause. You can
|
|
also provide an optional where clause. These clauses will be used to
|
|
build an SQL statement combined with the required sortKey.</p><p>After the reader has been opened, it will pass back one item per
|
|
call to <code class="methodname">read</code> in the same basic fashion as any
|
|
other <code class="classname">ItemReader</code>. The paging happens behind the
|
|
scenes when additional rows are needed.</p><p>Below is an example configuration using a similar 'customer
|
|
credit' example as the cursor based ItemReaders above:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...JdbcPagingItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"queryProvider"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...SqlPagingQueryProviderFactoryBean"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"selectClause"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"select id, name, credit"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"fromClause"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"from customer"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"whereClause"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"where status=:status"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"sortKey"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"id"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"parameterValues"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><map></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"status"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"NEW"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></map></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"pageSize"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"1000"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"customerMapper"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span></pre><p>This configured <code class="classname">ItemReader</code> will return
|
|
<code class="classname">CustomerCredit</code> objects using the
|
|
<code class="classname">RowMapper</code> that must be specified. The
|
|
'pageSize' property determines the number of entities read from the
|
|
database for each query execution.</p><p>The 'parameterValues' property can be used to specify a Map of
|
|
parameter values for the query. If you use named parameters in the
|
|
where clause the key for each entry should match the name of the named
|
|
parameter. If you use a traditional '?' placeholder then the key for
|
|
each entry should be the number of the placeholder, starting with
|
|
1.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="JpaPagingItemReader" href="#JpaPagingItemReader"></a>JpaPagingItemReader</h4></div></div></div><p>Another implementation of a paging
|
|
<code class="classname">ItemReader</code> is the
|
|
<code class="classname">JpaPagingItemReader</code>. JPA doesn't have a concept
|
|
similar to the Hibernate <code class="classname">StatelessSession</code> so we
|
|
have to use other features provided by the JPA specification. Since
|
|
JPA supports paging, this is a natural choice when it comes to using
|
|
JPA for batch processing. After each page is read, the entities will
|
|
become detached and the persistence context will be cleared in order
|
|
to allow the entities to be garbage collected once the page is
|
|
processed.</p><p>The <code class="classname">JpaPagingItemReader</code> allows you to
|
|
declare a JPQL statement and pass in a
|
|
<code class="classname">EntityManagerFactory</code>. It will then pass back
|
|
one item per call to <code class="methodname">read</code> in the same basic
|
|
fashion as any other <code class="classname">ItemReader</code>. The paging
|
|
happens behind the scenes when additional entities are needed. Below
|
|
is an example configuration using the same 'customer credit' example
|
|
as the JDBC reader above:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...JpaPagingItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"entityManagerFactory"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"entityManagerFactory"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"queryString"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"select c from CustomerCredit c"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"pageSize"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"1000"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span></pre><p>This configured <code class="classname">ItemReader</code> will return
|
|
<code class="classname">CustomerCredit</code> objects in the exact same manner
|
|
as described by the <code class="classname">JdbcPagingItemReader</code> above,
|
|
assuming the Customer object has the correct JPA annotations or ORM
|
|
mapping file. The 'pageSize' property determines the number of
|
|
entities read from the database for each query execution.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="IbatisPagingItemReader" href="#IbatisPagingItemReader"></a>IbatisPagingItemReader</h4></div></div></div><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top">This reader is deprecated as of Spring Batch 3.0.</td></tr></table></div><p>If you use IBATIS for your data access then you can use the
|
|
<code class="classname">IbatisPagingItemReader</code> which, as the name
|
|
indicates, is an implementation of a paging
|
|
<code class="classname">ItemReader</code>. IBATIS doesn't have direct support
|
|
for reading rows in pages but by providing a couple of standard
|
|
variables you can add paging support to your IBATIS queries.</p><p>Here is an example of a configuration for a
|
|
<code class="classname">IbatisPagingItemReader</code> reading CustomerCredits
|
|
as in the examples above:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...IbatisPagingItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"sqlMapClient"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"sqlMapClient"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"queryId"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"getPagedCustomerCredits"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"pageSize"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"1000"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span></pre><p>The <code class="classname">IbatisPagingItemReader</code> configuration
|
|
above references an IBATIS query called "getPagedCustomerCredits".
|
|
Here is an example of what that query should look like for
|
|
MySQL.</p><pre class="programlisting"><span class="hl-tag"><select</span> <span class="hl-attribute">id</span>=<span class="hl-value">"getPagedCustomerCredits"</span> <span class="hl-attribute">resultMap</span>=<span class="hl-value">"customerCreditResult"</span><span class="hl-tag">></span>
|
|
select id, name, credit from customer order by id asc LIMIT #_skiprows#, #_pagesize#
|
|
<span class="hl-tag"></select></span></pre><p>The <code class="classname">_skiprows</code> and
|
|
<code class="classname">_pagesize</code> variables are provided by the
|
|
<code class="classname">IbatisPagingItemReader</code> and there is also a
|
|
<code class="classname">_page</code> variable that can be used if necessary.
|
|
The syntax for the paging queries varies with the database used. Here
|
|
is an example for Oracle (unfortunately we need to use CDATA for some
|
|
operators since this belongs in an XML document):</p><pre class="programlisting"><span class="hl-tag"><select</span> <span class="hl-attribute">id</span>=<span class="hl-value">"getPagedCustomerCredits"</span> <span class="hl-attribute">resultMap</span>=<span class="hl-value">"customerCreditResult"</span><span class="hl-tag">></span>
|
|
select * from (
|
|
select * from (
|
|
select t.id, t.name, t.credit, ROWNUM ROWNUM_ from customer t order by id
|
|
)) where ROWNUM_ <span class="hl-tag"><![CDATA[</span> > <span class="hl-tag">]]></span> ( #_page# * #_pagesize# )
|
|
) where ROWNUM <span class="hl-tag"><![CDATA[</span> <= <span class="hl-tag">]]></span> #_pagesize#
|
|
<span class="hl-tag"></select></span></pre></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="databaseItemWriters" href="#databaseItemWriters"></a>6.9.3 Database ItemWriters</h3></div></div></div><p>While both Flat Files and XML have specific ItemWriters, there is
|
|
no exact equivalent in the database world. This is because transactions
|
|
provide all the functionality that is needed. ItemWriters are necessary
|
|
for files because they must act as if they're transactional, keeping
|
|
track of written items and flushing or clearing at the appropriate
|
|
times. Databases have no need for this functionality, since the write is
|
|
already contained in a transaction. Users can create their own DAOs that
|
|
implement the <code class="classname">ItemWriter</code> interface or use one
|
|
from a custom <code class="classname">ItemWriter</code> that's written for
|
|
generic processing concerns, either way, they should work without any
|
|
issues. One thing to look out for is the performance and error handling
|
|
capabilities that are provided by batching the outputs. This is most
|
|
common when using hibernate as an <code class="classname">ItemWriter</code>, but
|
|
could have the same issues when using Jdbc batch mode. Batching database
|
|
output doesn't have any inherent flaws, assuming we are careful to flush
|
|
and there are no errors in the data. However, any errors while writing
|
|
out can cause confusion because there is no way to know which individual
|
|
item caused an exception, or even if any individual item was
|
|
responsible, as illustrated below:</p><div class="mediaobject" align="center"><img src="images/errorOnFlush.png" align="middle"></div><p>If items are buffered before being written out, any
|
|
errors encountered will not be thrown until the buffer is flushed just
|
|
before a commit. For example, let's assume that 20 items will be written
|
|
per chunk, and the 15th item throws a DataIntegrityViolationException.
|
|
As far as the Step is concerned, all 20 item will be written out
|
|
successfully, since there's no way to know that an error will occur
|
|
until they are actually written out. Once
|
|
<code class="classname">Session#</code><code class="methodname">flush</code>() is
|
|
called, the buffer will be emptied and the exception will be hit. At
|
|
this point, there's nothing the <code class="classname">Step</code> can do, the
|
|
transaction must be rolled back. Normally, this exception might cause
|
|
the Item to be skipped (depending upon the skip/retry policies), and
|
|
then it won't be written out again. However, in the batched scenario,
|
|
there's no way for it to know which item caused the issue, the whole
|
|
buffer was being written out when the failure happened. The only way to
|
|
solve this issue is to flush after each item:</p><div class="mediaobject" align="center"><img src="images/errorOnWrite.png" align="middle"></div><p>This is a common use case, especially when using Hibernate, and
|
|
the simple guideline for implementations of
|
|
<code class="classname">ItemWriter</code>, is to flush on each call to
|
|
<code class="methodname">write()</code>. Doing so allows for items to be
|
|
skipped reliably, with Spring Batch taking care internally of the
|
|
granularity of the calls to <code class="classname">ItemWriter</code> after an
|
|
error.</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="reusingExistingServices" href="#reusingExistingServices"></a>6.10 Reusing Existing Services</h2></div></div></div><p>Batch systems are often used in conjunction with other application
|
|
styles. The most common is an online system, but it may also support
|
|
integration or even a thick client application by moving necessary bulk
|
|
data that each application style uses. For this reason, it is common that
|
|
many users want to reuse existing DAOs or other services within their
|
|
batch jobs. The Spring container itself makes this fairly easy by allowing
|
|
any necessary class to be injected. However, there may be cases where the
|
|
existing service needs to act as an <code class="classname">ItemReader</code> or
|
|
<code class="classname">ItemWriter</code>, either to satisfy the dependency of
|
|
another Spring Batch class, or because it truly is the main
|
|
<code class="classname">ItemReader</code> for a step. It is fairly trivial to
|
|
write an adaptor class for each service that needs wrapping, but because
|
|
it is such a common concern, Spring Batch provides implementations:
|
|
<code class="classname">ItemReaderAdapter</code> and
|
|
<code class="classname">ItemWriterAdapter</code>. Both classes implement the
|
|
standard Spring method invoking the delegate pattern and are fairly simple
|
|
to set up. Below is an example of the reader:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.adapter.ItemReaderAdapter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"targetObject"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"fooService"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"targetMethod"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"generateFoo"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"fooService"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.sample.FooService"</span><span class="hl-tag"> /></span></pre><p>One important point to note is that the contract of the targetMethod
|
|
must be the same as the contract for <code class="methodname">read</code>: when
|
|
exhausted it will return null, otherwise an <code class="classname">Object</code>.
|
|
Anything else will prevent the framework from knowing when processing
|
|
should end, either causing an infinite loop or incorrect failure,
|
|
depending upon the implementation of the
|
|
<code class="classname">ItemWriter</code>. The <code class="classname">ItemWriter</code>
|
|
implementation is equally as simple:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.adapter.ItemWriterAdapter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"targetObject"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"fooService"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"targetMethod"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"processFoo"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"fooService"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.sample.FooService"</span><span class="hl-tag"> /></span>
|
|
</pre></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="validatingInput" href="#validatingInput"></a>6.11 Validating Input</h2></div></div></div><p>During the course of this chapter, multiple approaches to parsing
|
|
input have been discussed. Each major implementation will throw an
|
|
exception if it is not 'well-formed'. The
|
|
<code class="classname">FixedLengthTokenizer</code> will throw an exception if a
|
|
range of data is missing. Similarly, attempting to access an index in a
|
|
<code class="classname">RowMapper</code> of <code class="classname">FieldSetMapper</code>
|
|
that doesn't exist or is in a different format than the one expected will
|
|
cause an exception to be thrown. All of these types of exceptions will be
|
|
thrown before <code class="methodname">read</code> returns. However, they don't
|
|
address the issue of whether or not the returned item is valid. For
|
|
example, if one of the fields is an age, it obviously cannot be negative.
|
|
It will parse correctly, because it existed and is a number, but it won't
|
|
cause an exception. Since there are already a plethora of Validation
|
|
frameworks, Spring Batch does not attempt to provide yet another, but
|
|
rather provides a very simple interface that can be implemented by any
|
|
number of frameworks:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> Validator {
|
|
|
|
<span class="hl-keyword">void</span> validate(Object value) <span class="hl-keyword">throws</span> ValidationException;
|
|
|
|
}</pre><p>The contract is that the <code class="methodname">validate</code> method
|
|
will throw an exception if the object is invalid, and return normally if
|
|
it is valid. Spring Batch provides an out of the box
|
|
<code class="classname">ItemProcessor:</code></p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.validator.ValidatingItemProcessor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"validator"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"validator"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"validator"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.validator.SpringValidator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"validator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"orderValidator"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springmodules.validation.valang.ValangValidator"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"valang"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><value></span>
|
|
<span class="hl-tag"><![CDATA[</span>
|
|
{ orderId : ? > 0 AND ? <= 9999999999 : 'Incorrect order ID' : 'error.order.id' }
|
|
{ totalLines : ? = size(lineItems) : 'Bad count of order lines'
|
|
: 'error.order.lines.badcount'}
|
|
{ customer.registered : customer.businessCustomer = FALSE OR ? = TRUE
|
|
: 'Business customer must be registered'
|
|
: 'error.customer.registration'}
|
|
{ customer.companyName : customer.businessCustomer = FALSE OR ? HAS TEXT
|
|
: 'Company name for business customer is mandatory'
|
|
:'error.customer.companyname'}
|
|
<span class="hl-tag">]]></span>
|
|
<span class="hl-tag"></value></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>This simple example shows a simple
|
|
<code class="classname">ValangValidator</code> that is used to validate an order
|
|
object. The intent is not to show Valang functionality as much as to show
|
|
how a validator could be added.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="process-indicator" href="#process-indicator"></a>6.12 Preventing State Persistence</h2></div></div></div><p>By default, all of the <code class="classname">ItemReader</code> and
|
|
<code class="classname">ItemWriter</code> implementations store their current
|
|
state in the <code class="classname">ExecutionContext</code> before it is
|
|
committed. However, this may not always be the desired behavior. For
|
|
example, many developers choose to make their database readers
|
|
'rerunnable' by using a process indicator. An extra column is added to the
|
|
input data to indicate whether or not it has been processed. When a
|
|
particular record is being read (or written out) the processed flag is
|
|
flipped from false to true. The SQL statement can then contain an extra
|
|
statement in the where clause, such as "where PROCESSED_IND = false",
|
|
thereby ensuring that only unprocessed records will be returned in the
|
|
case of a restart. In this scenario, it is preferable to not store any
|
|
state, such as the current row number, since it will be irrelevant upon
|
|
restart. For this reason, all readers and writers include the 'saveState'
|
|
property:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"playerSummarizationSource"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...JdbcCursorItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"dataSource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"dataSource"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"rowMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.sample.PlayerSummaryMapper"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="bold"><strong><property name="saveState" value="false" /></strong></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"sql"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><value></span>
|
|
SELECT games.player_id, games.year_no, SUM(COMPLETES),
|
|
SUM(ATTEMPTS), SUM(PASSING_YARDS), SUM(PASSING_TD),
|
|
SUM(INTERCEPTIONS), SUM(RUSHES), SUM(RUSH_YARDS),
|
|
SUM(RECEPTIONS), SUM(RECEPTIONS_YARDS), SUM(TOTAL_TD)
|
|
from games, players where players.player_id =
|
|
games.player_id group by games.player_id, games.year_no
|
|
<span class="hl-tag"></value></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>The <code class="classname">ItemReader</code> configured above will not make
|
|
any entries in the <code class="classname">ExecutionContext</code> for any
|
|
executions in which it participates.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="customReadersWriters" href="#customReadersWriters"></a>6.13 Creating Custom ItemReaders and
|
|
ItemWriters</h2></div></div></div><p>So far in this chapter the basic contracts that exist for reading
|
|
and writing in Spring Batch and some common implementations have been
|
|
discussed. However, these are all fairly generic, and there are many
|
|
potential scenarios that may not be covered by out of the box
|
|
implementations. This section will show, using a simple example, how to
|
|
create a custom <code class="classname">ItemReader</code> and
|
|
<code class="classname">ItemWriter</code> implementation and implement their
|
|
contracts correctly. The <code class="classname">ItemReader</code> will also
|
|
implement <code class="classname">ItemStream</code>, in order to illustrate how to
|
|
make a reader or writer restartable.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="customReader" href="#customReader"></a>6.13.1 Custom ItemReader Example</h3></div></div></div><p>For the purpose of this example, a simple
|
|
<code class="classname">ItemReader</code> implementation that reads from a
|
|
provided list will be created. We'll start out by implementing the most
|
|
basic contract of <code class="classname">ItemReader</code>,
|
|
<code class="methodname">read</code>:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> CustomItemReader<T> <span class="hl-keyword">implements</span> ItemReader<T>{
|
|
|
|
List<T> items;
|
|
|
|
<span class="hl-keyword">public</span> CustomItemReader(List<T> items) {
|
|
<span class="hl-keyword">this</span>.items = items;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> T read() <span class="hl-keyword">throws</span> Exception, UnexpectedInputException,
|
|
NoWorkFoundException, ParseException {
|
|
|
|
<span class="hl-keyword">if</span> (!items.isEmpty()) {
|
|
<span class="hl-keyword">return</span> items.remove(<span class="hl-number">0</span>);
|
|
}
|
|
<span class="hl-keyword">return</span> null;
|
|
}
|
|
}</pre><p>This very simple class takes a list of items, and returns them one
|
|
at a time, removing each from the list. When the list is empty, it
|
|
returns null, thus satisfying the most basic requirements of an
|
|
<code class="classname">ItemReader</code>, as illustrated below:</p><pre class="programlisting">List<String> items = <span class="hl-keyword">new</span> ArrayList<String>();
|
|
items.add(<span class="hl-string">"1"</span>);
|
|
items.add(<span class="hl-string">"2"</span>);
|
|
items.add(<span class="hl-string">"3"</span>);
|
|
|
|
ItemReader itemReader = <span class="hl-keyword">new</span> CustomItemReader<String>(items);
|
|
assertEquals(<span class="hl-string">"1"</span>, itemReader.read());
|
|
assertEquals(<span class="hl-string">"2"</span>, itemReader.read());
|
|
assertEquals(<span class="hl-string">"3"</span>, itemReader.read());
|
|
assertNull(itemReader.read());</pre><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="restartableReader" href="#restartableReader"></a>Making the <code class="classname">ItemReader</code>
|
|
Restartable</h4></div></div></div><p>The final challenge now is to make the
|
|
<code class="classname">ItemReader</code> restartable. Currently, if the power
|
|
goes out, and processing begins again, the
|
|
<code class="classname">ItemReader</code> must start at the beginning. This is
|
|
actually valid in many scenarios, but it is sometimes preferable that
|
|
a batch job starts where it left off. The key discriminant is often
|
|
whether the reader is stateful or stateless. A stateless reader does
|
|
not need to worry about restartability, but a stateful one has to try
|
|
and reconstitute its last known state on restart. For this reason, we
|
|
recommend that you keep custom readers stateless if possible, so you
|
|
don't have to worry about restartability.</p><p>If you do need to store state, then the
|
|
<code class="classname">ItemStream</code> interface should be used:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> CustomItemReader<T> <span class="hl-keyword">implements</span> ItemReader<T>, ItemStream {
|
|
|
|
List<T> items;
|
|
<span class="hl-keyword">int</span> currentIndex = <span class="hl-number">0</span>;
|
|
<span class="hl-keyword">private</span> <span class="hl-keyword">static</span> <span class="hl-keyword">final</span> String CURRENT_INDEX = <span class="hl-string">"current.index"</span>;
|
|
|
|
<span class="hl-keyword">public</span> CustomItemReader(List<T> items) {
|
|
<span class="hl-keyword">this</span>.items = items;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> T read() <span class="hl-keyword">throws</span> Exception, UnexpectedInputException,
|
|
ParseException {
|
|
|
|
<span class="hl-keyword">if</span> (currentIndex < items.size()) {
|
|
<span class="hl-keyword">return</span> items.get(currentIndex++);
|
|
}
|
|
|
|
<span class="hl-keyword">return</span> null;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> open(ExecutionContext executionContext) <span class="hl-keyword">throws</span> ItemStreamException {
|
|
<span class="hl-keyword">if</span>(executionContext.containsKey(CURRENT_INDEX)){
|
|
currentIndex = <span class="hl-keyword">new</span> Long(executionContext.getLong(CURRENT_INDEX)).intValue();
|
|
}
|
|
<span class="hl-keyword">else</span>{
|
|
currentIndex = <span class="hl-number">0</span>;
|
|
}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> update(ExecutionContext executionContext) <span class="hl-keyword">throws</span> ItemStreamException {
|
|
executionContext.putLong(CURRENT_INDEX, <span class="hl-keyword">new</span> Long(currentIndex).longValue());
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> close() <span class="hl-keyword">throws</span> ItemStreamException {}
|
|
}</pre><p>On each call to the <code class="classname">ItemStream</code>
|
|
<code class="methodname">update</code> method, the current index of the
|
|
<code class="classname">ItemReader</code> will be stored in the provided
|
|
<code class="classname">ExecutionContext</code> with a key of 'current.index'.
|
|
When the <code class="classname">ItemStream</code> <code class="classname">open</code>
|
|
method is called, the <code class="classname">ExecutionContext</code> is
|
|
checked to see if it contains an entry with that key. If the key is
|
|
found, then the current index is moved to that location. This is a
|
|
fairly trivial example, but it still meets the general
|
|
contract:</p><pre class="programlisting">ExecutionContext executionContext = <span class="hl-keyword">new</span> ExecutionContext();
|
|
((ItemStream)itemReader).open(executionContext);
|
|
assertEquals(<span class="hl-string">"1"</span>, itemReader.read());
|
|
((ItemStream)itemReader).update(executionContext);
|
|
|
|
List<String> items = <span class="hl-keyword">new</span> ArrayList<String>();
|
|
items.add(<span class="hl-string">"1"</span>);
|
|
items.add(<span class="hl-string">"2"</span>);
|
|
items.add(<span class="hl-string">"3"</span>);
|
|
itemReader = <span class="hl-keyword">new</span> CustomItemReader<String>(items);
|
|
|
|
((ItemStream)itemReader).open(executionContext);
|
|
assertEquals(<span class="hl-string">"2"</span>, itemReader.read());</pre><p>Most ItemReaders have much more sophisticated restart logic. The
|
|
<code class="classname">JdbcCursorItemReader</code>, for example, stores the
|
|
row id of the last processed row in the Cursor.</p><p>It is also worth noting that the key used within the
|
|
<code class="classname">ExecutionContext</code> should not be trivial. That is
|
|
because the same <code class="classname">ExecutionContext</code> is used for
|
|
all <code class="classname">ItemStream</code>s within a
|
|
<code class="classname">Step</code>. In most cases, simply prepending the key
|
|
with the class name should be enough to guarantee uniqueness. However,
|
|
in the rare cases where two of the same type of
|
|
<code class="classname">ItemStream</code> are used in the same step (which can
|
|
happen if two files are need for output) then a more unique name will
|
|
be needed. For this reason, many of the Spring Batch
|
|
<code class="classname">ItemReader</code> and
|
|
<code class="classname">ItemWriter</code> implementations have a
|
|
<code class="methodname">setName</code>() property that allows this key name
|
|
to be overridden.</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="customWriter" href="#customWriter"></a>6.13.2 Custom ItemWriter Example</h3></div></div></div><p>Implementing a Custom <code class="classname">ItemWriter</code> is similar
|
|
in many ways to the <code class="classname">ItemReader</code> example above, but
|
|
differs in enough ways as to warrant its own example. However, adding
|
|
restartability is essentially the same, so it won't be covered in this
|
|
example. As with the <code class="classname">ItemReader</code> example, a
|
|
<code class="classname">List</code> will be used in order to keep the example as
|
|
simple as possible:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> CustomItemWriter<T> <span class="hl-keyword">implements</span> ItemWriter<T> {
|
|
|
|
List<T> output = TransactionAwareProxyFactory.createTransactionalList();
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(List<? <span class="hl-keyword">extends</span> T> items) <span class="hl-keyword">throws</span> Exception {
|
|
output.addAll(items);
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> List<T> getOutput() {
|
|
<span class="hl-keyword">return</span> output;
|
|
}
|
|
}</pre><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="restartableWriter" href="#restartableWriter"></a>Making the <code class="classname">ItemWriter</code>
|
|
Restartable</h4></div></div></div><p>To make the ItemWriter restartable we would follow the same
|
|
process as for the <code class="classname">ItemReader</code>, adding and
|
|
implementing the <code class="classname">ItemStream</code> interface to
|
|
synchronize the execution context. In the example we might have to
|
|
count the number of items processed and add that as a footer record.
|
|
If we needed to do that, we could implement
|
|
<code class="classname">ItemStream</code> in our
|
|
<code class="classname">ItemWriter</code> so that the counter was
|
|
reconstituted from the execution context if the stream was
|
|
re-opened.</p><p>In many realistic cases, custom ItemWriters also delegate to
|
|
another writer that itself is restartable (e.g. when writing to a
|
|
file), or else it writes to a transactional resource so doesn't need
|
|
to be restartable because it is stateless. When you have a stateful
|
|
writer you should probably also be sure to implement
|
|
<code class="classname">ItemStream</code> as well as
|
|
<code class="classname">ItemWriter</code>. Remember also that the client of
|
|
the writer needs to be aware of the <code class="classname">ItemStream</code>,
|
|
so you may need to register it as a stream in the configuration
|
|
xml.</p></div></div></div></div>
|
|
|
|
<div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="scalability" href="#scalability"></a>7. Scaling and Parallel Processing</h1></div></div></div><p>Many batch processing problems can be solved with single threaded,
|
|
single process jobs, so it is always a good idea to properly check if that
|
|
meets your needs before thinking about more complex implementations. Measure
|
|
the performance of a realistic job and see if the simplest implementation
|
|
meets your needs first: you can read and write a file of several hundred
|
|
megabytes in well under a minute, even with standard hardware.</p><p>When you are ready to start implementing a job with some parallel
|
|
processing, Spring Batch offers a range of options, which are described in
|
|
this chapter, although some features are covered elsewhere. At a high level
|
|
there are two modes of parallel processing: single process, multi-threaded;
|
|
and multi-process. These break down into categories as well, as
|
|
follows:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Multi-threaded Step (single process)</p></li><li class="listitem"><p>Parallel Steps (single process)</p></li><li class="listitem"><p>Remote Chunking of Step (multi process)</p></li><li class="listitem"><p>Partitioning a Step (single or multi process)</p></li></ul></div><p>Next we review the single-process options first, and then the
|
|
multi-process options.</p><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="multithreadedStep" href="#multithreadedStep"></a>7.1 Multi-threaded Step</h2></div></div></div><p>The simplest way to start parallel processing is to add a
|
|
<code class="classname">TaskExecutor</code> to your Step configuration, e.g. as an
|
|
attribute of the <code class="literal">tasklet</code>:</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"loading"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet</span> <span class="hl-attribute">task-executor</span>=<span class="hl-value">"taskExecutor"</span><span class="hl-tag">></span>...<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span></pre><p>In this example the taskExecutor is a reference to another bean
|
|
definition, implementing the <code class="classname">TaskExecutor</code>
|
|
interface. <code class="classname">TaskExecutor</code> is a standard Spring
|
|
interface, so consult the Spring User Guide for details of available
|
|
implementations. The simplest multi-threaded
|
|
<code class="classname">TaskExecutor</code> is a
|
|
<code class="classname">SimpleAsyncTaskExecutor</code>.</p><p>The result of the above configuration will be that the Step
|
|
executes by reading, processing and writing each chunk of items
|
|
(each commit interval) in a separate thread of execution. Note
|
|
that this means there is no fixed order for the items to be
|
|
processed, and a chunk might contain items that are
|
|
non-consecutive compared to the single-threaded case. In addition
|
|
to any limits placed by the task executor (e.g. if it is backed by
|
|
a thread pool), there is a throttle limit in the tasklet
|
|
configuration which defaults to 4. You may need to increase this
|
|
to ensure that a thread pool is fully utilised, e.g.</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"loading"</span><span class="hl-tag">></span> <span class="hl-tag"><tasklet</span>
|
|
<span class="hl-attribute">task-executor</span>=<span class="hl-value">"taskExecutor"</span>
|
|
<span class="hl-attribute">throttle-limit</span>=<span class="hl-value">"20"</span><span class="hl-tag">></span>...<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span></pre><p>Note also that there may be limits placed on concurrency by
|
|
any pooled resources used in your step, such as
|
|
a <code class="classname">DataSource</code>. Be sure to make the pool in
|
|
those resources at least as large as the desired number of
|
|
concurrent threads in the step.</p><p>There are some practical limitations of using multi-threaded Steps
|
|
for some common Batch use cases. Many participants in a Step (e.g. readers
|
|
and writers) are stateful, and if the state is not segregated by thread,
|
|
then those components are not usable in a multi-threaded Step. In
|
|
particular most of the off-the-shelf readers and writers from Spring Batch
|
|
are not designed for multi-threaded use. It is, however, possible to work
|
|
with stateless or thread safe readers and writers, and there is a sample
|
|
(parallelJob) in the Spring Batch Samples that show the use of a process
|
|
indicator (see <a class="xref" href="#process-indicator" title="6.12 Preventing State Persistence">Section 6.12, “Preventing State Persistence”</a>) to keep
|
|
track of items that have been processed in a database input table.</p><p>Spring Batch provides some implementations of
|
|
<code class="classname">ItemWriter</code> and
|
|
<code class="classname">ItemReader</code>. Usually they say in the
|
|
Javadocs if they are thread safe or not, or what you have to do to
|
|
avoid problems in a concurrent environment. If there is no
|
|
information in Javadocs, you can check the implementation to see
|
|
if there is any state. If a reader is not thread safe, it may
|
|
still be efficient to use it in your own synchronizing delegator.
|
|
You can synchronize the call to <code class="literal">read()</code> and as
|
|
long as the processing and writing is the most expensive part of
|
|
the chunk your step may still complete much faster than in a
|
|
single threaded configuration.
|
|
</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="scalabilityParallelSteps" href="#scalabilityParallelSteps"></a>7.2 Parallel Steps</h2></div></div></div><p>As long as the application logic that needs to be parallelized can
|
|
be split into distinct responsibilities, and assigned to individual steps
|
|
then it can be parallelized in a single process. Parallel Step execution
|
|
is easy to configure and use, for example, to execute steps
|
|
<code class="literal">(step1,step2)</code> in parallel with
|
|
<code class="literal">step3</code>, you could configure a flow like this:</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"job1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><split</span> <span class="hl-attribute">id</span>=<span class="hl-value">"split1"</span> <span class="hl-attribute">task-executor</span>=<span class="hl-value">"taskExecutor"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"step4"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><flow></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s1"</span> <span class="hl-attribute">next</span>=<span class="hl-value">"step2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step2"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></flow></span>
|
|
<span class="hl-tag"><flow></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step3"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s3"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></flow></span>
|
|
<span class="hl-tag"></split></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step4"</span> <span class="hl-attribute">parent</span>=<span class="hl-value">"s4"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></job></span>
|
|
|
|
<span class="hl-tag"><beans:bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"taskExecutor"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...SimpleAsyncTaskExecutor"</span><span class="hl-tag">/></span></pre><p>The configurable "task-executor" attribute is used to specify which
|
|
TaskExecutor implementation should be used to execute the individual
|
|
flows. The default is <code class="classname">SyncTaskExecutor</code>, but an
|
|
asynchronous TaskExecutor is required to run the steps in parallel. Note
|
|
that the job will ensure that every flow in the split completes before
|
|
aggregating the exit statuses and transitioning.</p><p>See the section on <a class="xref" href="#split-flows" title="5.3.5 Split Flows">Section 5.3.5, “Split Flows”</a> for more
|
|
detail.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="remoteChunking" href="#remoteChunking"></a>7.3 Remote Chunking</h2></div></div></div><p>In Remote Chunking the Step processing is split across multiple
|
|
processes, communicating with each other through some middleware. Here is
|
|
a picture of the pattern in action:</p><div class="mediaobject" align="center"><img src="images/remote-chunking.png" align="middle"></div><p>The Master component is a single process, and the Slaves are
|
|
multiple remote processes. Clearly this pattern works best if the Master
|
|
is not a bottleneck, so the processing must be more expensive than the
|
|
reading of items (this is often the case in practice).</p><p>The Master is just an implementation of a Spring Batch
|
|
<code class="classname">Step</code>, with the ItemWriter replaced with a generic
|
|
version that knows how to send chunks of items to the middleware as
|
|
messages. The Slaves are standard listeners for whatever middleware is
|
|
being used (e.g. with JMS they would be
|
|
<code class="classname">MesssageListeners</code>), and their role is to process
|
|
the chunks of items using a standard <code class="classname">ItemWriter</code> or
|
|
<code class="classname">ItemProcessor</code> plus
|
|
<code class="classname">ItemWriter</code>, through the
|
|
<code class="classname">ChunkProcessor</code> interface. One of the advantages of
|
|
using this pattern is that the reader, processor and writer components are
|
|
off-the-shelf (the same as would be used for a local execution of the
|
|
step). The items are divided up dynamically and work is shared through the
|
|
middleware, so if the listeners are all eager consumers, then load
|
|
balancing is automatic.</p><p>The middleware has to be durable, with guaranteed delivery and
|
|
single consumer for each message. JMS is the obvious candidate, but other
|
|
options exist in the grid computing and shared memory product space (e.g.
|
|
Java Spaces).</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="partitioning" href="#partitioning"></a>7.4 Partitioning</h2></div></div></div><p>Spring Batch also provides an SPI for partitioning a Step execution
|
|
and executing it remotely. In this case the remote participants are simply
|
|
Step instances that could just as easily have been configured and used for
|
|
local processing. Here is a picture of the pattern in action:</p><div class="mediaobject" align="center"><img src="images/partitioning-overview.png" align="middle"></div><p>The Job is executing on the left hand side as a sequence of Steps,
|
|
and one of the Steps is labelled as a Master. The Slaves in this picture
|
|
are all identical instances of a Step, which could in fact take the place
|
|
of the Master resulting in the same outcome for the Job. The Slaves are
|
|
typically going to be remote services, but could also be local threads of
|
|
execution. The messages sent by the Master to the Slaves in this pattern
|
|
do not need to be durable, or have guaranteed delivery: Spring Batch
|
|
meta-data in the <code class="classname">JobRepository</code> will ensure that
|
|
each Slave is executed once and only once for each Job execution.</p><p>The SPI in Spring Batch consists of a special implementation of Step
|
|
(the <code class="classname">PartitionStep</code>), and two strategy interfaces
|
|
that need to be implemented for the specific environment. The strategy
|
|
interfaces are <code class="classname">PartitionHandler</code> and
|
|
<code class="classname">StepExecutionSplitter</code>, and their role is show in
|
|
the sequence diagram below:</p><div class="mediaobject" align="center"><img src="images/partitioning-spi.png" align="middle"></div><p>The Step on the right in this case is the "remote" Slave, so
|
|
potentially there are many objects and or processes playing this role, and
|
|
the PartitionStep is shown driving the execution. The PartitionStep
|
|
configuration looks like this:</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1.master"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><partition</span> <span class="hl-attribute">step</span>=<span class="hl-value">"step1"</span> <span class="hl-attribute">partitioner</span>=<span class="hl-value">"partitioner"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><handler</span> <span class="hl-attribute">grid-size</span>=<span class="hl-value">"10"</span> <span class="hl-attribute">task-executor</span>=<span class="hl-value">"taskExecutor"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></partition></span>
|
|
<span class="hl-tag"></step></span></pre><p>Similar to the multi-threaded step's throttle-limit
|
|
attribute, the grid-size attribute prevents the task executor from
|
|
being saturated with requests from a single step.</p><p>There is a simple example which can be copied and extended in the
|
|
unit test suite for Spring Batch Samples (see
|
|
<code class="classname">*PartitionJob.xml</code> configuration).</p><p>Spring Batch creates step executions for the partitions called
|
|
"step1:partition0", etc., so many people prefer to call the master step
|
|
"step1:master" for consistency. With Spring 3.0 you can do this using an
|
|
alias for the step (specifying the <code class="literal">name</code> attribute
|
|
instead of the <code class="literal">id</code>). </p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="partitionHandler" href="#partitionHandler"></a>7.4.1 PartitionHandler</h3></div></div></div><p>The <code class="classname">PartitionHandler</code> is the component that
|
|
knows about the fabric of the remoting or grid environment. It is able
|
|
to send <code class="classname">StepExecution</code> requests to the remote
|
|
Steps, wrapped in some fabric-specific format, like a DTO. It does not
|
|
have to know how to split up the input data, or how to aggregate the
|
|
result of multiple Step executions. Generally speaking it probably also
|
|
doesn't need to know about resilience or failover, since those are
|
|
features of the fabric in many cases, and anyway Spring Batch always
|
|
provides restartability independent of the fabric: a failed Job can
|
|
always be restarted and only the failed Steps will be
|
|
re-executed.</p><p><code class="classname">The PartitionHandler</code> interface can have
|
|
specialized implementations for a variety of fabric types: e.g. simple
|
|
RMI remoting, EJB remoting, custom web service, JMS, Java Spaces, shared
|
|
memory grids (like Terracotta or Coherence), grid execution fabrics
|
|
(like GridGain). Spring Batch does not contain implementations for any
|
|
proprietary grid or remoting fabrics.</p><p>Spring Batch does however provide a useful implementation of
|
|
<code class="classname">PartitionHandler</code> that executes Steps locally in
|
|
separate threads of execution, using the
|
|
<code class="classname">TaskExecutor</code> strategy from Spring. The
|
|
implementation is called
|
|
<code class="classname">TaskExecutorPartitionHandler</code>, and it is the
|
|
default for a step configured with the XML namespace as above. It can
|
|
also be configured explicitly like this:</p><pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1.master"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><partition</span> <span class="hl-attribute">step</span>=<span class="hl-value">"step1"</span> <span class="hl-attribute">handler</span>=<span class="hl-value">"handler"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></step></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...TaskExecutorPartitionHandler"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"taskExecutor"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"taskExecutor"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"step"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"step1"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"gridSize"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"10"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre><p>The <code class="literal">gridSize</code> determines the number of separate
|
|
step executions to create, so it can be matched to the size of the
|
|
thread pool in the <code class="classname">TaskExecutor</code>, or else it can
|
|
be set to be larger than the number of threads available, in which case
|
|
the blocks of work are smaller.</p><p>The <code class="classname">TaskExecutorPartitionHandler</code> is quite
|
|
useful for IO intensive Steps, like copying large numbers of files or
|
|
replicating filesystems into content management systems. It can also be
|
|
used for remote execution by providing a Step implementation that is a
|
|
proxy for a remote invocation (e.g. using Spring Remoting).</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="stepExecutionSplitter" href="#stepExecutionSplitter"></a>7.4.2 Partitioner</h3></div></div></div><p>The Partitioner has a simpler responsibility: to generate
|
|
execution contexts as input parameters for new step executions only (no
|
|
need to worry about restarts). It has a single method:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> Partitioner {
|
|
Map<String, ExecutionContext> partition(<span class="hl-keyword">int</span> gridSize);
|
|
}</pre><p>The return value from this method associates a unique name for
|
|
each step execution (the <code class="classname">String</code>), with input
|
|
parameters in the form of an <code class="classname">ExecutionContext</code>.
|
|
The names show up later in the Batch meta data as the step name in the
|
|
partitioned <code class="classname">StepExecutions</code>. The
|
|
<code class="classname">ExecutionContext</code> is just a bag of name-value
|
|
pairs, so it might contain a range of primary keys, or line numbers, or
|
|
the location of an input file. The remote <code class="classname">Step</code>
|
|
then normally binds to the context input using <code class="literal">#{...}</code>
|
|
placeholders (late binding in step scope), as illustrated in the next
|
|
section.</p><p>The names of the step executions (the keys in the
|
|
<code class="classname">Map</code> returned by
|
|
<code class="classname">Partitioner</code>) need to be unique amongst the step
|
|
executions of a Job, but do not have any other specific requirements.
|
|
The easiest way to do this, and to make the names meaningful for users,
|
|
is to use a prefix+suffix naming convention, where the prefix is the
|
|
name of the step that is being executed (which itself is unique in the
|
|
<code class="classname">Job</code>), and the suffix is just a counter. There is
|
|
a <code class="classname">SimplePartitioner</code> in the framework that uses
|
|
this convention.</p><p>An optional interface
|
|
<code class="classname">PartitioneNameProvider</code> can be used to
|
|
provide the partition names separately from the partitions
|
|
themselves. If a <code class="classname">Partitioner</code> implements
|
|
this interface then on a restart only the names will be queried.
|
|
If partitioning is expensive this can be a useful optimisation.
|
|
Obviously the names provided by the
|
|
<code class="classname">PartitioneNameProvider</code> must match those
|
|
provided by the <code class="classname">Partitioner</code>.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="bindingInputDataToSteps" href="#bindingInputDataToSteps"></a>7.4.3 Binding Input Data to Steps</h3></div></div></div><p>It is very efficient for the steps that are executed by the
|
|
PartitionHandler to have identical configuration, and for their input
|
|
parameters to be bound at runtime from the ExecutionContext. This is
|
|
easy to do with the StepScope feature of Spring Batch (covered in more
|
|
detail in the section on <a class="xref" href="#late-binding" title="5.4 Late Binding of Job and Step Attributes">Late Binding</a>). For example
|
|
if the <code class="classname">Partitioner</code> creates
|
|
<code class="classname">ExecutionContext</code> instances with an attribute key
|
|
<code class="literal">fileName</code>, pointing to a different file (or
|
|
directory) for each step invocation, the
|
|
<code class="classname">Partitioner</code> output might look like this:</p><div class="table"><a name="d5e3165" href="#d5e3165"></a><p class="title"><b>Table 7.1. Example step execution name to execution context provided by
|
|
Partitioner targeting directory processing</b></p><div class="table-contents"><table summary="Example step execution name to execution context provided by
 Partitioner targeting directory processing" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; "><span class="bold"><strong>Step Execution Name
|
|
(key)</strong></span></td><td style="border-bottom: 0.5pt solid ; "><span class="bold"><strong>ExecutionContext
|
|
(value)</strong></span></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">filecopy:partition0</td><td style="border-bottom: 0.5pt solid ; ">fileName=/home/data/one</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">filecopy:partition1</td><td style="border-bottom: 0.5pt solid ; ">fileName=/home/data/two</td></tr><tr><td style="border-right: 0.5pt solid ; ">filecopy:partition2</td><td style="">fileName=/home/data/three</td></tr></tbody></table></div></div><br class="table-break"><p>Then the file name can be bound to a step using late binding to
|
|
the execution context:</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">scope</span>=<span class="hl-value">"step"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...MultiResourceItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"</span><span class="bold"><strong>#{stepExecutionContext[fileName]}/*</strong></span>"/>
|
|
<span class="hl-tag"></bean></span></pre></div></div></div>
|
|
|
|
<div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="repeat" href="#repeat"></a>8. Repeat</h1></div></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="repeatTemplate" href="#repeatTemplate"></a>8.1 RepeatTemplate</h2></div></div></div><p>Batch processing is about repetitive actions - either as a simple
|
|
optimization, or as part of a job. To strategize and generalize the
|
|
repetition as well as to provide what amounts to an iterator framework,
|
|
Spring Batch has the <code class="classname">RepeatOperations</code> interface.
|
|
The <code class="classname">RepeatOperations</code> interface looks like
|
|
this:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> RepeatOperations {
|
|
|
|
RepeatStatus iterate(RepeatCallback callback) <span class="hl-keyword">throws</span> RepeatException;
|
|
|
|
}</pre><p>The callback is a simple interface that allows you to insert
|
|
some business logic to be repeated:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> RepeatCallback {
|
|
|
|
RepeatStatus doInIteration(RepeatContext context) <span class="hl-keyword">throws</span> Exception;
|
|
|
|
}</pre><p>The callback is executed repeatedly until the implementation
|
|
decides that the iteration should end. The return value in these
|
|
interfaces is an enumeration that can either be
|
|
<code class="code">RepeatStatus.CONTINUABLE</code> or
|
|
<code class="code">RepeatStatus.FINISHED</code>. A <code class="classname">RepeatStatus</code>
|
|
conveys information to the caller of the repeat operations about whether
|
|
there is any more work to do. Generally speaking, implementations of
|
|
<code class="classname">RepeatOperations</code> should inspect the
|
|
<code class="classname">RepeatStatus</code> and use it as part of the decision to
|
|
end the iteration. Any callback that wishes to signal to the caller that
|
|
there is no more work to do can return
|
|
<code class="code">RepeatStatus.FINISHED</code>.</p><p>The simplest general purpose implementation of
|
|
<code class="classname">RepeatOperations</code> is
|
|
<code class="classname">RepeatTemplate</code>. It could be used like this:</p><pre class="programlisting">RepeatTemplate template = <span class="hl-keyword">new</span> RepeatTemplate();
|
|
|
|
template.setCompletionPolicy(<span class="hl-keyword">new</span> FixedChunkSizeCompletionPolicy(<span class="hl-number">2</span>));
|
|
|
|
template.iterate(<span class="hl-keyword">new</span> RepeatCallback() {
|
|
|
|
<span class="hl-keyword">public</span> ExitStatus doInIteration(RepeatContext context) {
|
|
<span class="hl-comment">// Do stuff in batch...</span>
|
|
<span class="hl-keyword">return</span> ExitStatus.CONTINUABLE;
|
|
}
|
|
|
|
});</pre><p>In the example we return <code class="code">RepeatStatus.CONTINUABLE</code> to
|
|
show that there is more work to do. The callback can also return
|
|
<code class="code">ExitStatus.FINISHED</code> if it wants to signal to the caller that
|
|
there is no more work to do. Some iterations can be terminated by
|
|
considerations intrinsic to the work being done in the callback, others
|
|
are effectively infinite loops as far as the callback is concerned and the
|
|
completion decision is delegated to an external policy as in the case
|
|
above.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="repeatContext" href="#repeatContext"></a>8.1.1 RepeatContext</h3></div></div></div><p>The method parameter for the <code class="classname">RepeatCallback</code>
|
|
is a <code class="classname">RepeatContext</code>. Many callbacks will simply
|
|
ignore the context, but if necessary it can be used as an attribute bag
|
|
to store transient data for the duration of the iteration. After the
|
|
<code class="methodname">iterate</code> method returns, the context will no
|
|
longer exist.</p><p>A <code class="classname">RepeatContext</code> will have a parent context
|
|
if there is a nested iteration in progress. The parent context is
|
|
occasionally useful for storing data that need to be shared between
|
|
calls to <code class="methodname">iterate</code>. This is the case for instance
|
|
if you want to count the number of occurrences of an event in the
|
|
iteration and remember it across subsequent calls.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="repeatStatus" href="#repeatStatus"></a>8.1.2 RepeatStatus</h3></div></div></div><p><code class="classname">RepeatStatus</code> is an enumeration used by
|
|
Spring Batch to indicate whether processing has finished. These are
|
|
possible <code class="classname">RepeatStatus</code> values:</p><div class="table"><a name="d5e3224" href="#d5e3224"></a><p class="title"><b>Table 8.1. ExitStatus Properties</b></p><div class="table-contents"><table summary="ExitStatus Properties" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; "><span class="bold"><strong>Value</strong></span></td><td style="border-bottom: 0.5pt solid ; "><span class="bold"><strong>Description</strong></span></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">CONTINUABLE</td><td style="border-bottom: 0.5pt solid ; ">There is more work to do.</td></tr><tr><td style="border-right: 0.5pt solid ; ">FINISHED</td><td style="">No more repetitions should take place.</td></tr></tbody></table></div></div><br class="table-break"><p><code class="classname">RepeatStatus</code> values can also be combined
|
|
with a logical AND operation using the <code class="methodname">and</code>()
|
|
method in <code class="classname">RepeatStatus</code>. The effect of this is to
|
|
do a logical AND on the continuable flag. In other words, if either
|
|
status is <code class="code">FINISHED</code>, then the result will be
|
|
<code class="code">FINISHED</code>.</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="completionPolicies" href="#completionPolicies"></a>8.2 Completion Policies</h2></div></div></div><p>Inside a <code class="classname">RepeatTemplate</code> the termination of
|
|
the loop in the <code class="methodname">iterate</code> method is determined by a
|
|
<code class="classname">CompletionPolicy</code> which is also a factory for the
|
|
<code class="classname">RepeatContext</code>. The
|
|
<code class="classname">RepeatTemplate</code> has the responsibility to use the
|
|
current policy to create a <code class="classname">RepeatContext</code> and pass
|
|
that in to the <code class="classname">RepeatCallback</code> at every stage in the
|
|
iteration. After a callback completes its
|
|
<code class="methodname">doInIteration</code>, the
|
|
<code class="classname">RepeatTemplate</code> has to make a call to the
|
|
<code class="classname">CompletionPolicy</code> to ask it to update its state
|
|
(which will be stored in the <code class="classname">RepeatContext</code>). Then
|
|
it asks the policy if the iteration is complete.</p><p>Spring Batch provides some simple general purpose implementations of
|
|
<code class="classname">CompletionPolicy</code>. The
|
|
<code class="classname">SimpleCompletionPolicy</code> just allows an execution up
|
|
to a fixed number of times (with <code class="code">RepeatStatus.FINISHED</code>
|
|
forcing early completion at any time).</p><p>Users might need to implement their own completion policies for more
|
|
complicated decisions. For example, a batch processing window that
|
|
prevents batch jobs from executing once the online systems are in use
|
|
would require a custom policy.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="repeatExceptionHandling" href="#repeatExceptionHandling"></a>8.3 Exception Handling</h2></div></div></div><p>If there is an exception thrown inside a
|
|
<code class="classname">RepeatCallback</code>, the
|
|
<code class="classname">RepeatTemplate</code> consults an
|
|
<code class="classname">ExceptionHandler</code> which can decide whether or not to
|
|
re-throw the exception.</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> ExceptionHandler {
|
|
|
|
<span class="hl-keyword">void</span> handleException(RepeatContext context, Throwable throwable)
|
|
<span class="hl-keyword">throws</span> RuntimeException;
|
|
|
|
}</pre><p>A common use case is to count the number of exceptions of a
|
|
given type, and fail when a limit is reached. For this purpose Spring
|
|
Batch provides the <code class="classname">SimpleLimitExceptionHandler</code> and
|
|
slightly more flexible
|
|
<code class="classname">RethrowOnThresholdExceptionHandler</code>. The
|
|
<code class="classname">SimpleLimitExceptionHandler</code> has a limit property
|
|
and an exception type that should be compared with the current exception -
|
|
all subclasses of the provided type are also counted. Exceptions of the
|
|
given type are ignored until the limit is reached, and then rethrown.
|
|
Those of other types are always rethrown.</p><p>An important optional property of the
|
|
<code class="classname">SimpleLimitExceptionHandler</code> is the boolean flag
|
|
<code class="code">useParent</code>. It is false by default, so the limit is only
|
|
accounted for in the current <code class="classname">RepeatContext</code>. When
|
|
set to true, the limit is kept across sibling contexts in a nested
|
|
iteration (e.g. a set of chunks inside a step).</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="repeatListeners" href="#repeatListeners"></a>8.4 Listeners</h2></div></div></div><p>Often it is useful to be able to receive additional callbacks for
|
|
cross cutting concerns across a number of different iterations. For this
|
|
purpose Spring Batch provides the <code class="classname">RepeatListener</code>
|
|
interface. The <code class="classname">RepeatTemplate</code> allows users to
|
|
register <code class="classname">RepeatListener</code>s, and they will be given
|
|
callbacks with the <code class="classname">RepeatContext</code> and
|
|
<code class="classname">RepeatStatus</code> where available during the
|
|
iteration.</p><p>The interface looks like this:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> RepeatListener {
|
|
<span class="hl-keyword">void</span> before(RepeatContext context);
|
|
<span class="hl-keyword">void</span> after(RepeatContext context, RepeatStatus result);
|
|
<span class="hl-keyword">void</span> open(RepeatContext context);
|
|
<span class="hl-keyword">void</span> onError(RepeatContext context, Throwable e);
|
|
<span class="hl-keyword">void</span> close(RepeatContext context);
|
|
}</pre><p>The <code class="methodname">open</code> and
|
|
<code class="methodname">close</code> callbacks come before and after the entire
|
|
iteration. <code class="methodname">before</code>, <code class="methodname">after</code>
|
|
and <code class="methodname">onError</code> apply to the individual
|
|
RepeatCallback calls.</p><p>Note that when there is more than one listener, they are in a list,
|
|
so there is an order. In this case <code class="methodname">open</code> and
|
|
<code class="methodname">before</code> are called in the same order while
|
|
<code class="methodname">after</code>, <code class="methodname">onError</code> and
|
|
<code class="methodname">close</code> are called in reverse order.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="repeatParallelProcessing" href="#repeatParallelProcessing"></a>8.5 Parallel Processing</h2></div></div></div><p>Implementations of <code class="classname">RepeatOperations</code> are not
|
|
restricted to executing the callback sequentially. It is quite important
|
|
that some implementations are able to execute their callbacks in parallel.
|
|
To this end, Spring Batch provides the
|
|
<code class="classname">TaskExecutorRepeatTemplate</code>, which uses the Spring
|
|
<code class="classname">TaskExecutor</code> strategy to run the
|
|
<code class="classname">RepeatCallback</code>. The default is to use a
|
|
<code class="classname">SynchronousTaskExecutor</code>, which has the effect of
|
|
executing the whole iteration in the same thread (the same as a normal
|
|
<code class="classname">RepeatTemplate</code>).</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="declarativeIteration" href="#declarativeIteration"></a>8.6 Declarative Iteration</h2></div></div></div><p>Sometimes there is some business processing that you know you want
|
|
to repeat every time it happens. The classic example of this is the
|
|
optimization of a message pipeline - it is more efficient to process a
|
|
batch of messages, if they are arriving frequently, than to bear the cost
|
|
of a separate transaction for every message. Spring Batch provides an AOP
|
|
interceptor that wraps a method call in a
|
|
<code class="classname">RepeatOperations</code> for just this purpose. The
|
|
<code class="classname">RepeatOperationsInterceptor</code> executes the
|
|
intercepted method and repeats according to the
|
|
<code class="classname">CompletionPolicy</code> in the provided
|
|
<code class="classname">RepeatTemplate</code>.</p><p>Here is an example of declarative iteration using the Spring AOP
|
|
namespace to repeat a service call to a method called
|
|
<code class="methodname">processMessage</code> (for more detail on how to
|
|
configure AOP interceptors see the Spring User Guide):</p><pre class="programlisting"><span class="hl-tag"><aop:config></span>
|
|
<span class="hl-tag"><aop:pointcut</span> <span class="hl-attribute">id</span>=<span class="hl-value">"transactional"</span>
|
|
<span class="hl-attribute">expression</span>=<span class="hl-value">"execution(* com..*Service.processMessage(..))"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><aop:advisor</span> <span class="hl-attribute">pointcut-ref</span>=<span class="hl-value">"transactional"</span>
|
|
<span class="hl-attribute">advice-ref</span>=<span class="hl-value">"retryAdvice"</span> <span class="hl-attribute">order</span>=<span class="hl-value">"-1"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></aop:config></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"retryAdvice"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...RepeatOperationsInterceptor"</span><span class="hl-tag">/></span></pre><p>The example above uses a default
|
|
<code class="classname">RepeatTemplate</code> inside the interceptor. To change
|
|
the policies, listeners etc. you only need to inject an instance of
|
|
<code class="classname">RepeatTemplate</code> into the interceptor.</p><p>If the intercepted method returns <code class="code">void</code> then the
|
|
interceptor always returns ExitStatus.CONTINUABLE (so there is a danger of
|
|
an infinite loop if the <code class="classname">CompletionPolicy</code> does not
|
|
have a finite end point). Otherwise it returns
|
|
<code class="code">ExitStatus.CONTINUABLE</code> until the return value from the
|
|
intercepted method is null, at which point it returns
|
|
<code class="code">ExitStatus.FINISHED</code>. So the business logic inside the target
|
|
method can signal that there is no more work to do by returning
|
|
<code class="code">null</code>, or by throwing an exception that is re-thrown by the
|
|
<code class="classname">ExceptionHandler</code> in the provided
|
|
<code class="classname">RepeatTemplate</code>.</p></div></div>
|
|
|
|
<div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="retry" href="#retry"></a>9. Retry</h1></div></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="retryTemplate" href="#retryTemplate"></a>9.1 RetryTemplate</h2></div></div></div><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>The retry functionality was pulled out of Spring Batch as of 2.2.0.
|
|
It is now part of a new library, Spring Retry.</p></td></tr></table></div><p>To make processing more robust and less prone to failure, sometimes
|
|
it helps to automatically retry a failed operation in case it might
|
|
succeed on a subsequent attempt. Errors that are susceptible to this kind
|
|
of treatment are transient in nature. For example a remote call to a web
|
|
service or RMI service that fails because of a network glitch or a
|
|
<code class="classname">DeadLockLoserException</code> in a database update may
|
|
resolve themselves after a short wait. To automate the retry of such
|
|
operations Spring Batch has the <code class="classname">RetryOperations</code>
|
|
strategy. The <code class="classname">RetryOperations</code> interface looks like
|
|
this:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> RetryOperations {
|
|
|
|
<T> T execute(RetryCallback<T> retryCallback) <span class="hl-keyword">throws</span> Exception;
|
|
|
|
<T> T execute(RetryCallback<T> retryCallback, RecoveryCallback<T> recoveryCallback)
|
|
<span class="hl-keyword">throws</span> Exception;
|
|
|
|
<T> T execute(RetryCallback<T> retryCallback, RetryState retryState)
|
|
<span class="hl-keyword">throws</span> Exception, ExhaustedRetryException;
|
|
|
|
<T> T execute(RetryCallback<T> retryCallback, RecoveryCallback<T> recoveryCallback,
|
|
RetryState retryState) <span class="hl-keyword">throws</span> Exception;
|
|
|
|
}</pre><p>The basic callback is a simple interface that allows you to
|
|
insert some business logic to be retried:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> RetryCallback<T> {
|
|
|
|
T doWithRetry(RetryContext context) <span class="hl-keyword">throws</span> Throwable;
|
|
|
|
}</pre><p>The callback is executed and if it fails (by throwing an
|
|
<code class="classname">Exception</code>), it will be retried until either it is
|
|
successful, or the implementation decides to abort. There are a number of
|
|
overloaded <code class="methodname">execute</code> methods in the
|
|
<code class="classname">RetryOperations</code> interface dealing with various use
|
|
cases for recovery when all retry attempts are exhausted, and also with
|
|
retry state, which allows clients and implementations to store information
|
|
between calls (more on this later).</p><p>The simplest general purpose implementation of
|
|
<code class="classname">RetryOperations</code> is
|
|
<code class="classname">RetryTemplate</code>. It could be used like this</p><pre class="programlisting">RetryTemplate template = <span class="hl-keyword">new</span> RetryTemplate();
|
|
|
|
TimeoutRetryPolicy policy = <span class="hl-keyword">new</span> TimeoutRetryPolicy();
|
|
policy.setTimeout(<span class="hl-number">30000L</span>);
|
|
|
|
template.setRetryPolicy(policy);
|
|
|
|
Foo result = template.execute(<span class="hl-keyword">new</span> RetryCallback<Foo>() {
|
|
|
|
<span class="hl-keyword">public</span> Foo doWithRetry(RetryContext context) {
|
|
<span class="hl-comment">// Do stuff that might fail, e.g. webservice operation</span>
|
|
<span class="hl-keyword">return</span> result;
|
|
}
|
|
|
|
});</pre><p>In the example we execute a web service call and return the result
|
|
to the user. If that call fails then it is retried until a timeout is
|
|
reached.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="retryContext" href="#retryContext"></a>9.1.1 RetryContext</h3></div></div></div><p>The method parameter for the <code class="classname">RetryCallback</code>
|
|
is a <code class="classname">RetryContext</code>. Many callbacks will simply
|
|
ignore the context, but if necessary it can be used as an attribute bag
|
|
to store data for the duration of the iteration.</p><p>A <code class="classname">RetryContext</code> will have a parent context
|
|
if there is a nested retry in progress in the same thread. The parent
|
|
context is occasionally useful for storing data that need to be shared
|
|
between calls to <code class="methodname">execute</code>.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="recoveryCallback" href="#recoveryCallback"></a>9.1.2 RecoveryCallback</h3></div></div></div><p>When a retry is exhausted the
|
|
<code class="classname">RetryOperations</code> can pass control to a different
|
|
callback, the <code class="classname">RecoveryCallback</code>. To use this
|
|
feature clients just pass in the callbacks together to the same method,
|
|
for example:</p><pre class="programlisting">Foo foo = template.execute(<span class="hl-keyword">new</span> RetryCallback<Foo>() {
|
|
<span class="hl-keyword">public</span> Foo doWithRetry(RetryContext context) {
|
|
<span class="hl-comment">// business logic here</span>
|
|
},
|
|
<span class="hl-keyword">new</span> RecoveryCallback<Foo>() {
|
|
Foo recover(RetryContext context) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-comment">// recover logic here</span>
|
|
}
|
|
});</pre><p>If the business logic does not succeed before the template
|
|
decides to abort, then the client is given the chance to do some
|
|
alternate processing through the recovery callback.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="statelessRetry" href="#statelessRetry"></a>9.1.3 Stateless Retry</h3></div></div></div><p>In the simplest case, a retry is just a while loop: the
|
|
<code class="classname">RetryTemplate</code> can just keep trying until it
|
|
either succeeds or fails. The <code class="classname">RetryContext</code>
|
|
contains some state to determine whether to retry or abort, but this
|
|
state is on the stack and there is no need to store it anywhere
|
|
globally, so we call this stateless retry. The distinction between
|
|
stateless and stateful retry is contained in the implementation of the
|
|
<code class="classname">RetryPolicy</code> (the
|
|
<code class="classname">RetryTemplate</code> can handle both). In a stateless
|
|
retry, the callback is always executed in the same thread on retry as
|
|
when it failed.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="statefulRetry" href="#statefulRetry"></a>9.1.4 Stateful Retry</h3></div></div></div><p>Where the failure has caused a transactional resource to become
|
|
invalid, there are some special considerations. This does not apply to a
|
|
simple remote call because there is no transactional resource (usually),
|
|
but it does sometimes apply to a database update, especially when using
|
|
Hibernate. In this case it only makes sense to rethrow the exception
|
|
that called the failure immediately so that the transaction can roll
|
|
back and we can start a new valid one.</p><p>In these cases a stateless retry is not good enough because the
|
|
re-throw and roll back necessarily involve leaving the
|
|
<code class="code">RetryOperations.execute()</code> method and potentially losing the
|
|
context that was on the stack. To avoid losing it we have to introduce a
|
|
storage strategy to lift it off the stack and put it (at a minimum) in
|
|
heap storage. For this purpose Spring Batch provides a storage strategy
|
|
<code class="classname">RetryContextCache</code> which can be injected into the
|
|
<code class="classname">RetryTemplate</code>. The default implementation of the
|
|
<code class="classname">RetryContextCache</code> is in memory, using a simple
|
|
<code class="classname">Map</code>. Advanced usage with multiple processes in a
|
|
clustered environment might also consider implementing the
|
|
<code class="classname">RetryContextCache</code> with a cluster cache of some
|
|
sort (though, even in a clustered environment this might be
|
|
overkill).</p><p>Part of the responsibility of the
|
|
<code class="classname">RetryOperations</code> is to recognize the failed
|
|
operations when they come back in a new execution (and usually wrapped
|
|
in a new transaction). To facilitate this, Spring Batch provides the
|
|
<code class="classname">RetryState</code> abstraction. This works in conjunction
|
|
with a special <code class="classname">execute</code> methods in the
|
|
<code class="classname">RetryOperations</code>.</p><p>The way the failed operations are recognized is by identifying the
|
|
state across multiple invocations of the retry. To identify the state,
|
|
the user can provide an <code class="classname">RetryState</code> object that is
|
|
responsible for returning a unique key identifying the item. The
|
|
identifier is used as a key in the
|
|
<code class="classname">RetryContextCache</code>.</p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Warning"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Warning]" src="images/warning.png"></td><th align="left">Warning</th></tr><tr><td align="left" valign="top"><p>Be very careful with the implementation of
|
|
<code class="code">Object.equals()</code> and <code class="code">Object.hashCode()</code> in the
|
|
key returned by <code class="classname">RetryState</code>. The best advice is
|
|
to use a business key to identify the items. In the case of a JMS
|
|
message the message ID can be used.</p></td></tr></table></div><p>When the retry is exhausted there is also the option to handle the
|
|
failed item in a different way, instead of calling the
|
|
<code class="classname">RetryCallback</code> (which is presumed now to be likely
|
|
to fail). Just like in the stateless case, this option is provided by
|
|
the <code class="classname">RecoveryCallback</code>, which can be provided by
|
|
passing it in to the <code class="classname">execute</code> method of
|
|
<code class="classname">RetryOperations</code>.</p><p>The decision to retry or not is actually delegated to a regular
|
|
<code class="classname">RetryPolicy</code>, so the usual concerns about limits
|
|
and timeouts can be injected there (see below).</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="retryPolicies" href="#retryPolicies"></a>9.2 Retry Policies</h2></div></div></div><p>Inside a <code class="classname">RetryTemplate</code> the decision to retry
|
|
or fail in the <code class="methodname">execute</code> method is determined by a
|
|
<code class="classname">RetryPolicy</code> which is also a factory for the
|
|
<code class="classname">RetryContext</code>. The
|
|
<code class="classname">RetryTemplate</code> has the responsibility to use the
|
|
current policy to create a <code class="classname">RetryContext</code> and pass
|
|
that in to the <code class="classname">RetryCallback</code> at every attempt.
|
|
After a callback fails the <code class="classname">RetryTemplate</code> has to
|
|
make a call to the <code class="classname">RetryPolicy</code> to ask it to update
|
|
its state (which will be stored in the
|
|
<code class="classname">RetryContext</code>), and then it asks the policy if
|
|
another attempt can be made. If another attempt cannot be made (e.g. a
|
|
limit is reached or a timeout is detected) then the policy is also
|
|
responsible for handling the exhausted state. Simple implementations will
|
|
just throw <code class="classname">RetryExhaustedException</code> which will cause
|
|
any enclosing transaction to be rolled back. More sophisticated
|
|
implementations might attempt to take some recovery action, in which case
|
|
the transaction can remain intact.</p><div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"></td><th align="left">Tip</th></tr><tr><td align="left" valign="top"><p>Failures are inherently either retryable or not - if the same
|
|
exception is always going to be thrown from the business logic, it
|
|
doesn't help to retry it. So don't retry on all exception types - try to
|
|
focus on only those exceptions that you expect to be retryable. It's not
|
|
usually harmful to the business logic to retry more aggressively, but
|
|
it's wasteful because if a failure is deterministic there will be time
|
|
spent retrying something that you know in advance is fatal.</p></td></tr></table></div><p>Spring Batch provides some simple general purpose implementations of
|
|
stateless <code class="classname">RetryPolicy</code>, for example a
|
|
<code class="classname">SimpleRetryPolicy</code>, and the
|
|
<code class="classname">TimeoutRetryPolicy</code> used in the example
|
|
above.</p><p>The <code class="classname">SimpleRetryPolicy</code> just allows a retry on
|
|
any of a named list of exception types, up to a fixed number of times. It
|
|
also has a list of "fatal" exceptions that should never be retried, and
|
|
this list overrides the retryable list so that it can be used to give
|
|
finer control over the retry behavior:</p><pre class="programlisting">SimpleRetryPolicy policy = <span class="hl-keyword">new</span> SimpleRetryPolicy();
|
|
<span class="hl-comment">// Set the max retry attempts</span>
|
|
policy.setMaxAttempts(<span class="hl-number">5</span>);
|
|
<span class="hl-comment">// Retry on all exceptions (this is the default)</span>
|
|
policy.setRetryableExceptions(<span class="hl-keyword">new</span> Class[] {Exception.<span class="hl-keyword">class</span>});
|
|
<span class="hl-comment">// ... but never retry IllegalStateException</span>
|
|
policy.setFatalExceptions(<span class="hl-keyword">new</span> Class[] {IllegalStateException.<span class="hl-keyword">class</span>});
|
|
|
|
<span class="hl-comment">// Use the policy...</span>
|
|
RetryTemplate template = <span class="hl-keyword">new</span> RetryTemplate();
|
|
template.setRetryPolicy(policy);
|
|
template.execute(<span class="hl-keyword">new</span> RetryCallback<Foo>() {
|
|
<span class="hl-keyword">public</span> Foo doWithRetry(RetryContext context) {
|
|
<span class="hl-comment">// business logic here</span>
|
|
}
|
|
});</pre><p>There is also a more flexible implementation called
|
|
<code class="classname">ExceptionClassifierRetryPolicy</code>, which allows the
|
|
user to configure different retry behavior for an arbitrary set of
|
|
exception types though the <code class="classname">ExceptionClassifier</code>
|
|
abstraction. The policy works by calling on the classifier to convert an
|
|
exception into a delegate <code class="classname">RetryPolicy</code>, so for
|
|
example, one exception type can be retried more times before failure than
|
|
another by mapping it to a different policy.</p><p>Users might need to implement their own retry policies for more
|
|
customized decisions. For instance, if there is a well-known,
|
|
solution-specific, classification of exceptions into retryable and not
|
|
retryable.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="backoffPolicies" href="#backoffPolicies"></a>9.3 Backoff Policies</h2></div></div></div><p>When retrying after a transient failure it often helps to wait a bit
|
|
before trying again, because usually the failure is caused by some problem
|
|
that will only be resolved by waiting. If a
|
|
<code class="classname">RetryCallback</code> fails, the
|
|
<code class="classname">RetryTemplate</code> can pause execution according to the
|
|
<code class="classname">BackoffPolicy</code> in place.</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> BackoffPolicy {
|
|
|
|
BackOffContext start(RetryContext context);
|
|
|
|
<span class="hl-keyword">void</span> backOff(BackOffContext backOffContext)
|
|
<span class="hl-keyword">throws</span> BackOffInterruptedException;
|
|
|
|
}</pre><p>A <code class="classname">BackoffPolicy</code> is free to implement
|
|
the backOff in any way it chooses. The policies provided by Spring Batch
|
|
out of the box all use <code class="code">Object.wait()</code>. A common use case is to
|
|
backoff with an exponentially increasing wait period, to avoid two retries
|
|
getting into lock step and both failing - this is a lesson learned from
|
|
the ethernet. For this purpose Spring Batch provides the
|
|
<code class="classname">ExponentialBackoffPolicy</code>.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="retryListeners" href="#retryListeners"></a>9.4 Listeners</h2></div></div></div><p>Often it is useful to be able to receive additional callbacks for
|
|
cross cutting concerns across a number of different retries. For this
|
|
purpose Spring Batch provides the <code class="classname">RetryListener</code>
|
|
interface. The <code class="classname">RetryTemplate</code> allows users to
|
|
register <code class="classname">RetryListener</code>s, and they will be given
|
|
callbacks with the <code class="classname">RetryContext</code> and
|
|
<code class="classname">Throwable</code> where available during the
|
|
iteration.</p><p>The interface looks like this:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> RetryListener {
|
|
|
|
<span class="hl-keyword">void</span> open(RetryContext context, RetryCallback<T> callback);
|
|
|
|
<span class="hl-keyword">void</span> onError(RetryContext context, RetryCallback<T> callback, Throwable e);
|
|
|
|
<span class="hl-keyword">void</span> close(RetryContext context, RetryCallback<T> callback, Throwable e);
|
|
}</pre><p>The <code class="methodname">open</code> and
|
|
<code class="methodname">close</code> callbacks come before and after the entire
|
|
retry in the simplest case and <code class="methodname">onError</code> applies to
|
|
the individual <code class="classname">RetryCallback</code> calls. The
|
|
<code class="methodname">close</code> method might also receive a
|
|
<code class="classname">Throwable</code>; if there has been an error it is the
|
|
last one thrown by the <code class="classname">RetryCallback</code>.</p><p>Note that when there is more than one listener, they are in a list,
|
|
so there is an order. In this case <code class="methodname">open</code> will be
|
|
called in the same order while <code class="methodname">onError</code> and
|
|
<code class="methodname">close</code> will be called in reverse order.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="declarativeRetry" href="#declarativeRetry"></a>9.5 Declarative Retry</h2></div></div></div><p>Sometimes there is some business processing that you know you want
|
|
to retry every time it happens. The classic example of this is the remote
|
|
service call. Spring Batch provides an AOP interceptor that wraps a method
|
|
call in a <code class="classname">RetryOperations</code> for just this purpose.
|
|
The <code class="classname">RetryOperationsInterceptor</code> executes the
|
|
intercepted method and retries on failure according to the
|
|
<code class="classname">RetryPolicy</code> in the provided
|
|
<code class="classname">RepeatTemplate</code>.</p><p>Here is an example of declarative iteration using the Spring AOP
|
|
namespace to repeat a service call to a method called
|
|
<code class="methodname">remoteCall</code> (for more detail on how to configure
|
|
AOP interceptors see the Spring User Guide):</p><pre class="programlisting"><span class="hl-tag"><aop:config></span>
|
|
<span class="hl-tag"><aop:pointcut</span> <span class="hl-attribute">id</span>=<span class="hl-value">"transactional"</span>
|
|
<span class="hl-attribute">expression</span>=<span class="hl-value">"execution(* com..*Service.remoteCall(..))"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><aop:advisor</span> <span class="hl-attribute">pointcut-ref</span>=<span class="hl-value">"transactional"</span>
|
|
<span class="hl-attribute">advice-ref</span>=<span class="hl-value">"retryAdvice"</span> <span class="hl-attribute">order</span>=<span class="hl-value">"-1"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></aop:config></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"retryAdvice"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.retry.interceptor.RetryOperationsInterceptor"</span><span class="hl-tag">/></span></pre><p>The example above uses a default
|
|
<code class="classname">RetryTemplate</code> inside the interceptor. To change the
|
|
policies or listeners, you only need to inject an instance of
|
|
<code class="classname">RetryTemplate</code> into the interceptor.</p></div></div>
|
|
|
|
<div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="testing" href="#testing"></a>10. Unit Testing</h1></div></div></div><p>Just as with other application styles, it is extremely important to
|
|
unit test any code written as part of a batch job as well. The Spring core
|
|
documentation covers how to unit and integration test with Spring in great
|
|
detail, so it won't be repeated here. It is important, however, to think
|
|
about how to 'end to end' test a batch job, which is what this chapter will
|
|
focus on. The spring-batch-test project includes classes that will help
|
|
facilitate this end-to-end test approach.</p><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="creatingUnitTestClass" href="#creatingUnitTestClass"></a>10.1 Creating a Unit Test Class</h2></div></div></div><p>In order for the unit test to run a batch job, the framework must
|
|
load the job's ApplicationContext. Two annotations are used to trigger
|
|
this:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="classname">@RunWith(SpringJUnit4ClassRunner.class)</code>:
|
|
Indicates that the class should use Spring's JUnit facilities</p></li><li class="listitem"><p><code class="classname">@ContextConfiguration(locations = {...})</code>:
|
|
Indicates which XML files contain the ApplicationContext.</p></li></ul></div><pre class="programlisting"><em><span class="hl-annotation" style="color: gray">@RunWith(SpringJUnit4ClassRunner.class)</span></em>
|
|
<em><span class="hl-annotation" style="color: gray">@ContextConfiguration(locations = { "/simple-job-launcher-context.xml",
|
|
"/jobs/skipSampleJob.xml" })</span></em>
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> SkipSampleFunctionalTests { ... }</pre></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="endToEndTesting" href="#endToEndTesting"></a>10.2 End-To-End Testing of Batch Jobs</h2></div></div></div><p>'End To End' testing can be defined as testing the complete run of a
|
|
batch job from beginning to end. This allows for a test that sets up a
|
|
test condition, executes the job, and verifies the end result.</p><p>In the example below, the batch job reads from the database and
|
|
writes to a flat file. The test method begins by setting up the database
|
|
with test data. It clears the CUSTOMER table and then inserts 10 new
|
|
records. The test then launches the <code class="classname">Job </code>using the
|
|
<code class="methodname">launchJob()</code> method. The
|
|
<code class="methodname">launchJob</code>() method is provided by the
|
|
<code class="classname">JobLauncherTestUtils</code> class. Also provided by the
|
|
utils class is <code class="classname">launchJob(JobParameters)</code>, which
|
|
allows the test to give particular parameters. The
|
|
<code class="methodname">launchJob()</code> method returns the
|
|
<code class="classname">JobExecution</code> object which is useful for asserting
|
|
particular information about the <code class="classname">Job</code> run. In the
|
|
case below, the test verifies that the <code class="classname">Job</code> ended
|
|
with status "COMPLETED".</p><pre class="programlisting"><em><span class="hl-annotation" style="color: gray">@RunWith(SpringJUnit4ClassRunner.class)</span></em>
|
|
<em><span class="hl-annotation" style="color: gray">@ContextConfiguration(locations = { "/simple-job-launcher-context.xml",
|
|
"/jobs/skipSampleJob.xml" })</span></em>
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> SkipSampleFunctionalTests {
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Autowired</span></em>
|
|
<span class="hl-keyword">private</span> JobLauncherTestUtils jobLauncherTestUtils;
|
|
|
|
<span class="hl-keyword">private</span> SimpleJdbcTemplate simpleJdbcTemplate;
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Autowired</span></em>
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> setDataSource(DataSource dataSource) {
|
|
<span class="hl-keyword">this</span>.simpleJdbcTemplate = <span class="hl-keyword">new</span> SimpleJdbcTemplate(dataSource);
|
|
}
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Test</span></em>
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> testJob() <span class="hl-keyword">throws</span> Exception {
|
|
simpleJdbcTemplate.update(<span class="hl-string">"delete from CUSTOMER"</span>);
|
|
<span class="hl-keyword">for</span> (<span class="hl-keyword">int</span> i = <span class="hl-number">1</span>; i <= <span class="hl-number">10</span>; i++) {
|
|
simpleJdbcTemplate.update(<span class="hl-string">"insert into CUSTOMER values (?, 0, ?, 100000)"</span>,
|
|
i, <span class="hl-string">"customer"</span> + i);
|
|
}
|
|
|
|
JobExecution jobExecution = jobLauncherTestUtils.launchJob().getStatus();
|
|
|
|
|
|
Assert.assertEquals(<span class="hl-string">"COMPLETED"</span>, jobExecution.getExitStatus());
|
|
}
|
|
}</pre></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="testingIndividualSteps" href="#testingIndividualSteps"></a>10.3 Testing Individual Steps</h2></div></div></div><p>For complex batch jobs, test cases in the end-to-end testing
|
|
approach may become unmanageable. It these cases, it may be more useful to
|
|
have test cases to test individual steps on their own. The
|
|
<code class="classname">AbstractJobTests</code> class contains a method
|
|
<code class="methodname">launchStep</code> that takes a step name and runs just
|
|
that particular <code class="classname">Step</code>. This approach allows for more
|
|
targeted tests by allowing the test to set up data for just that step and
|
|
to validate its results directly.</p><pre class="programlisting">JobExecution jobExecution = jobLauncherTestUtils.launchStep(<span class="hl-string">"loadFileStep"</span>);</pre></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="d5e3514" href="#d5e3514"></a>10.4 Testing Step-Scoped Components</h2></div></div></div><p>Often the components that are configured for your steps at runtime
|
|
use step scope and late binding to inject context from the step or job
|
|
execution. These are tricky to test as standalone components unless you
|
|
have a way to set the context as if they were in a step execution. That is
|
|
the goal of two components in Spring Batch: the
|
|
<code class="classname">StepScopeTestExecutionListener</code> and the
|
|
<code class="classname">StepScopeTestUtils</code>.</p><p>The listener is declared at the class level, and its job is to
|
|
create a step execution context for each test method. For example:</p><pre class="programlisting"><em><span class="hl-annotation" style="color: gray">@ContextConfiguration</span></em>
|
|
<em><span class="hl-annotation" style="color: gray">@TestExecutionListeners( { DependencyInjectionTestExecutionListener.class,
|
|
StepScopeTestExecutionListener.class })</span></em>
|
|
<em><span class="hl-annotation" style="color: gray">@RunWith(SpringJUnit4ClassRunner.class)</span></em>
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> StepScopeTestExecutionListenerIntegrationTests {
|
|
|
|
<span class="hl-comment">// This component is defined step-scoped, so it cannot be injected unless</span>
|
|
<span class="hl-comment">// a step is active...</span>
|
|
<em><span class="hl-annotation" style="color: gray">@Autowired</span></em>
|
|
<span class="hl-keyword">private</span> ItemReader<String> reader;
|
|
|
|
<span class="hl-keyword">public</span> StepExecution getStepExection() {
|
|
StepExecution execution = MetaDataInstanceFactory.createStepExecution();
|
|
execution.getExecutionContext().putString(<span class="hl-string">"input.data"</span>, <span class="hl-string">"foo,bar,spam"</span>);
|
|
<span class="hl-keyword">return</span> execution;
|
|
}
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Test</span></em>
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> testReader() {
|
|
<span class="hl-comment">// The reader is initialized and bound to the input data</span>
|
|
assertNotNull(reader.read());
|
|
}
|
|
|
|
}</pre><p>There are two <code class="classname">TestExecutionListeners</code>, one
|
|
from the regular Spring Test framework and handles dependency injection
|
|
from the configured application context, injecting the reader, and the
|
|
other is the Spring Batch
|
|
<code class="classname">StepScopeTestExecutionListener</code>. It works by looking
|
|
for a factory method in the test case for a
|
|
<code class="classname">StepExecution</code>, and using that as the context for
|
|
the test method, as if that execution was active in a Step at runtime. The
|
|
factory method is detected by its signature (it just has to return a
|
|
<code class="classname">StepExecution</code>). If a factory method is not provided
|
|
then a default <code class="classname">StepExecution</code> is created.</p><p>The listener approach is convenient if you want the duration of the
|
|
step scope to be the execution of the test method. For a more flexible,
|
|
but more invasive approach you can use the
|
|
<code class="classname">StepScopeTestUtils</code>. For example, to count the
|
|
number of items available in the reader above:</p><pre class="programlisting"><span class="hl-keyword">int</span> count = StepScopeTestUtils.doInStepScope(stepExecution,
|
|
<span class="hl-keyword">new</span> Callable<Integer>() {
|
|
<span class="hl-keyword">public</span> Integer call() <span class="hl-keyword">throws</span> Exception {
|
|
|
|
<span class="hl-keyword">int</span> count = <span class="hl-number">0</span>;
|
|
|
|
<span class="hl-keyword">while</span> (reader.read() != null) {
|
|
count++;
|
|
}
|
|
<span class="hl-keyword">return</span> count;
|
|
}
|
|
});</pre></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="validatingOutputFiles" href="#validatingOutputFiles"></a>10.5 Validating Output Files</h2></div></div></div><p>When a batch job writes to the database, it is easy to query the
|
|
database to verify that the output is as expected. However, if the batch
|
|
job writes to a file, it is equally important that the output be verified.
|
|
Spring Batch provides a class <code class="classname">AssertFile</code> to
|
|
facilitate the verification of output files. The method
|
|
<code class="methodname">assertFileEquals</code> takes two
|
|
<code class="classname">File</code> objects (or two
|
|
<code class="classname">Resource</code> objects) and asserts, line by line, that
|
|
the two files have the same content. Therefore, it is possible to create a
|
|
file with the expected output and to compare it to the actual
|
|
result:</p><pre class="programlisting"><span class="hl-keyword">private</span> <span class="hl-keyword">static</span> <span class="hl-keyword">final</span> String EXPECTED_FILE = <span class="hl-string">"src/main/resources/data/input.txt"</span>;
|
|
<span class="hl-keyword">private</span> <span class="hl-keyword">static</span> <span class="hl-keyword">final</span> String OUTPUT_FILE = <span class="hl-string">"target/test-outputs/output.txt"</span>;
|
|
|
|
AssertFile.assertFileEquals(<span class="hl-keyword">new</span> FileSystemResource(EXPECTED_FILE),
|
|
<span class="hl-keyword">new</span> FileSystemResource(OUTPUT_FILE));</pre></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="mockingDomainObjects" href="#mockingDomainObjects"></a>10.6 Mocking Domain Objects</h2></div></div></div><p>Another common issue encountered while writing unit and integration
|
|
tests for Spring Batch components is how to mock domain objects. A good
|
|
example is a <code class="classname">StepExecutionListener</code>, as illustrated
|
|
below:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> NoWorkFoundStepExecutionListener <span class="hl-keyword">extends</span> StepExecutionListenerSupport {
|
|
|
|
<span class="hl-keyword">public</span> ExitStatus afterStep(StepExecution stepExecution) {
|
|
<span class="hl-keyword">if</span> (stepExecution.getReadCount() == <span class="hl-number">0</span>) {
|
|
<span class="hl-keyword">throw</span> <span class="hl-keyword">new</span> NoWorkFoundException(<span class="hl-string">"Step has not processed any items"</span>);
|
|
}
|
|
<span class="hl-keyword">return</span> stepExecution.getExitStatus();
|
|
}
|
|
}</pre><p>The above listener is provided by the framework and checks a
|
|
<code class="classname">StepExecution</code> for an empty read count, thus
|
|
signifying that no work was done. While this example is fairly simple, it
|
|
serves to illustrate the types of problems that may be encountered when
|
|
attempting to unit test classes that implement interfaces requiring Spring
|
|
Batch domain objects. Consider the above listener's unit test:</p><pre class="programlisting"><span class="hl-keyword">private</span> NoWorkFoundStepExecutionListener tested = <span class="hl-keyword">new</span> NoWorkFoundStepExecutionListener();
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Test</span></em>
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> testAfterStep() {
|
|
<span class="bold"><strong>StepExecution stepExecution = new StepExecution("NoProcessingStep",
|
|
new JobExecution(new JobInstance(1L, new JobParameters(),
|
|
"NoProcessingJob")));</strong></span>
|
|
|
|
stepExecution.setReadCount(<span class="hl-number">0</span>);
|
|
|
|
<span class="hl-keyword">try</span> {
|
|
tested.afterStep(stepExecution);
|
|
fail();
|
|
} <span class="hl-keyword">catch</span> (NoWorkFoundException e) {
|
|
assertEquals(<span class="hl-string">"Step has not processed any items"</span>, e.getMessage());
|
|
}
|
|
}</pre><p>Because the Spring Batch domain model follows good object orientated
|
|
principles, the StepExecution requires a
|
|
<code class="classname">JobExecution</code>, which requires a
|
|
<code class="classname">JobInstance</code> and
|
|
<code class="classname">JobParameters</code> in order to create a valid
|
|
<code class="classname">StepExecution</code>. While this is good in a solid domain
|
|
model, it does make creating stub objects for unit testing verbose. To
|
|
address this issue, the Spring Batch test module includes a factory for
|
|
creating domain objects: <code class="classname">MetaDataInstanceFactory</code>.
|
|
Given this factory, the unit test can be updated to be more
|
|
concise:</p><pre class="programlisting"><span class="hl-keyword">private</span> NoWorkFoundStepExecutionListener tested = <span class="hl-keyword">new</span> NoWorkFoundStepExecutionListener();
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Test</span></em>
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> testAfterStep() {
|
|
<span class="bold"><strong>StepExecution stepExecution = MetaDataInstanceFactory.createStepExecution();</strong></span>
|
|
|
|
stepExecution.setReadCount(<span class="hl-number">0</span>);
|
|
|
|
<span class="hl-keyword">try</span> {
|
|
tested.afterStep(stepExecution);
|
|
fail();
|
|
} <span class="hl-keyword">catch</span> (NoWorkFoundException e) {
|
|
assertEquals(<span class="hl-string">"Step has not processed any items"</span>, e.getMessage());
|
|
}
|
|
}</pre><p>The above method for creating a simple
|
|
<code class="classname">StepExecution</code> is just one convenience method
|
|
available within the factory. A full method listing can be found in its
|
|
<a class="ulink" href="http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/test/MetaDataInstanceFactory.html" target="_top">Javadoc</a>.</p></div></div>
|
|
|
|
<div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="patterns" href="#patterns"></a>11. Common Batch Patterns</h1></div></div></div>
|
|
|
|
|
|
|
|
<p>Some batch jobs can be assembled purely from off-the-shelf components
|
|
in Spring Batch. For instance the <code class="classname">ItemReader</code> and
|
|
<code class="classname">ItemWriter</code> implementations can be configured to cover
|
|
a wide range of scenarios. However, for the majority of cases, custom code
|
|
will have to be written. The main API entry points for application
|
|
developers are the <code class="classname">Tasklet</code>,
|
|
<code class="classname">ItemReader</code>, <code class="classname">ItemWriter</code> and the
|
|
various listener interfaces. Most simple batch jobs will be able to use
|
|
off-the-shelf input from a Spring Batch <code class="classname">ItemReader</code>,
|
|
but it is often the case that there are custom concerns in the processing
|
|
and writing, which require developers to implement an
|
|
<code class="classname">ItemWriter</code> or
|
|
<code class="classname">ItemProcessor</code>.</p>
|
|
|
|
<p>Here, we provide a few examples of common patterns in custom business
|
|
logic. These examples primarily feature the listener interfaces. It should
|
|
be noted that an <code class="classname">ItemReader</code> or
|
|
<code class="classname">ItemWriter</code> can implement a listener interface as
|
|
well, if appropriate.</p>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="loggingItemProcessingAndFailures" href="#loggingItemProcessingAndFailures"></a>11.1 Logging Item Processing and Failures</h2></div></div></div>
|
|
|
|
|
|
<p>A common use case is the need for special handling of errors in a
|
|
step, item by item, perhaps logging to a special channel, or inserting a
|
|
record into a database. A chunk-oriented <code class="classname">Step</code>
|
|
(created from the step factory beans) allows users to implement this use
|
|
case with a simple <code class="classname">ItemReadListener</code>, for errors on
|
|
read, and an <code class="classname">ItemWriteListener</code>, for errors on
|
|
write. The below code snippets illustrate a listener that logs both read
|
|
and write failures:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> ItemFailureLoggerListener <span class="hl-keyword">extends</span> ItemListenerSupport {
|
|
|
|
<span class="hl-keyword">private</span> <span class="hl-keyword">static</span> Log logger = LogFactory.getLog(<span class="hl-string">"item.error"</span>);
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> onReadError(Exception ex) {
|
|
logger.error(<span class="hl-string">"Encountered error on read"</span>, e);
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> onWriteError(Exception ex, Object item) {
|
|
logger.error(<span class="hl-string">"Encountered error on write"</span>, ex);
|
|
}
|
|
|
|
}</pre>
|
|
|
|
<p>Having implemented this listener it must be registered with the
|
|
step:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"simpleStep"</span><span class="hl-tag">></span>
|
|
...
|
|
<span class="hl-tag"><listeners></span>
|
|
<span class="hl-tag"><listener></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.example...ItemFailureLoggerListener"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></listener></span>
|
|
<span class="hl-tag"></listeners></span>
|
|
<span class="hl-tag"></step></span></pre>
|
|
|
|
<p>Remember that if your listener does anything in an
|
|
<code class="code">onError()</code> method, it will be inside a transaction that is
|
|
going to be rolled back. If you need to use a transactional resource such
|
|
as a database inside an <code class="code">onError()</code> method, consider adding a
|
|
declarative transaction to that method (see Spring Core Reference Guide
|
|
for details), and giving its propagation attribute the value
|
|
REQUIRES_NEW.</p>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="stoppingAJobManuallyForBusinessReasons" href="#stoppingAJobManuallyForBusinessReasons"></a>11.2 Stopping a Job Manually for Business Reasons</h2></div></div></div>
|
|
|
|
|
|
<p>Spring Batch provides a <code class="methodname">stop</code>() method
|
|
through the <code class="classname">JobLauncher</code> interface, but this is
|
|
really for use by the operator rather than the application programmer.
|
|
Sometimes it is more convenient or makes more sense to stop a job
|
|
execution from within the business logic.</p>
|
|
|
|
<p>The simplest thing to do is to throw a
|
|
<code class="classname">RuntimeException</code> (one that isn't retried
|
|
indefinitely or skipped). For example, a custom exception type could be
|
|
used, as in the example below:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> PoisonPillItemWriter <span class="hl-keyword">implements</span> ItemWriter<T> {
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(T item) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-keyword">if</span> (isPoisonPill(item)) {
|
|
<span class="hl-keyword">throw</span> <span class="hl-keyword">new</span> PoisonPillException(<span class="hl-string">"Posion pill detected: "</span> + item);
|
|
}
|
|
}
|
|
|
|
}</pre>
|
|
|
|
<p>Another simple way to stop a step from executing is to simply return
|
|
<code class="code">null</code> from the <code class="classname">ItemReader</code>:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> EarlyCompletionItemReader <span class="hl-keyword">implements</span> ItemReader<T> {
|
|
|
|
<span class="hl-keyword">private</span> ItemReader<T> delegate;
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> setDelegate(ItemReader<T> delegate) { ... }
|
|
|
|
<span class="hl-keyword">public</span> T read() <span class="hl-keyword">throws</span> Exception {
|
|
T item = delegate.read();
|
|
<span class="hl-keyword">if</span> (isEndItem(item)) {
|
|
<span class="hl-keyword">return</span> null; <span class="hl-comment">// end the step here</span>
|
|
}
|
|
<span class="hl-keyword">return</span> item;
|
|
}
|
|
|
|
}</pre>
|
|
|
|
<p>The previous example actually relies on the fact that there is a
|
|
default implementation of the <code class="classname">CompletionPolicy</code>
|
|
strategy which signals a complete batch when the item to be processed is
|
|
null. A more sophisticated completion policy could be implemented and
|
|
injected into the <code class="classname">Step</code> through the
|
|
<code class="classname">SimpleStepFactoryBean</code>:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"simpleStep"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"reader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"writer"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"10"</span>
|
|
<span class="bold"><strong>chunk-completion-policy="completionPolicy"</strong></span>/>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"></step></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"completionPolicy"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.example...SpecialCompletionPolicy"</span><span class="hl-tag">/></span></pre>
|
|
|
|
<p>An alternative is to set a flag in the
|
|
<code class="classname">StepExecution</code>, which is checked by the
|
|
<code class="classname">Step</code> implementations in the framework in between
|
|
item processing. To implement this alternative, we need access to the
|
|
current <code class="classname">StepExecution</code>, and this can be achieved by
|
|
implementing a <code class="classname">StepListener</code> and registering it with
|
|
the <code class="classname">Step</code>. Here is an example of a listener that
|
|
sets the flag:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> CustomItemWriter <span class="hl-keyword">extends</span> ItemListenerSupport <span class="hl-keyword">implements</span> StepListener {
|
|
|
|
<span class="hl-keyword">private</span> StepExecution stepExecution;
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> beforeStep(StepExecution stepExecution) {
|
|
<span class="hl-keyword">this</span>.stepExecution = stepExecution;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> afterRead(Object item) {
|
|
<span class="hl-keyword">if</span> (isPoisonPill(item)) {
|
|
stepExecution.setTerminateOnly(true);
|
|
}
|
|
}
|
|
|
|
}</pre>
|
|
|
|
<p>The default behavior here when the flag is set is for the step to
|
|
throw a <code class="classname">JobInterruptedException</code>. This can be
|
|
controlled through the <code class="classname">StepInterruptionPolicy</code>, but
|
|
the only choice is to throw or not throw an exception, so this is always
|
|
an abnormal ending to a job.</p>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="addingAFooterRecord" href="#addingAFooterRecord"></a>11.3 Adding a Footer Record</h2></div></div></div>
|
|
|
|
|
|
<p>Often when writing to flat files, a "footer" record must be appended
|
|
to the end of the file, after all processing has be completed. This can
|
|
also be achieved using the <code class="classname">FlatFileFooterCallback</code>
|
|
interface provided by Spring Batch. The
|
|
<code class="classname">FlatFileFooterCallback</code> (and its counterpart, the
|
|
<code class="classname">FlatFileHeaderCallback</code>) are optional properties of
|
|
the <code class="classname">FlatFileItemWriter</code>:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...FlatFileItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"outputResource"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"lineAggregator"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"lineAggregator"</span><span class="hl-tag">/></span>
|
|
<span class="bold"><strong><property name="headerCallback" ref="headerCallback" /></strong></span>
|
|
<span class="bold"><strong><property name="footerCallback" ref="footerCallback" /></strong></span>
|
|
<span class="hl-tag"></bean></span></pre>
|
|
|
|
<p>The footer callback interface is very simple. It has just one method
|
|
that is called when the footer must be written:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">interface</span> FlatFileFooterCallback {
|
|
|
|
<span class="hl-keyword">void</span> writeFooter(Writer writer) <span class="hl-keyword">throws</span> IOException;
|
|
|
|
}</pre>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="writingASummaryFooter" href="#writingASummaryFooter"></a>11.3.1 Writing a Summary Footer</h3></div></div></div>
|
|
|
|
|
|
<p>A very common requirement involving footer records is to aggregate
|
|
information during the output process and to append this information to
|
|
the end of the file. This footer serves as a summarization of the file
|
|
or provides a checksum.</p>
|
|
|
|
<p>For example, if a batch job is writing
|
|
<code class="classname">Trade</code> records to a flat file, and there is a
|
|
requirement that the total amount from all the
|
|
<code class="classname">Trade</code>s is placed in a footer, then the following
|
|
<code class="classname">ItemWriter</code> implementation can be used:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> TradeItemWriter <span class="hl-keyword">implements</span> ItemWriter<Trade>,
|
|
FlatFileFooterCallback {
|
|
|
|
<span class="hl-keyword">private</span> ItemWriter<Trade> delegate;
|
|
|
|
<span class="hl-keyword">private</span> BigDecimal totalAmount = BigDecimal.ZERO;
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(List<? <span class="hl-keyword">extends</span> Trade> items) {
|
|
BigDecimal chunkTotal = BigDecimal.ZERO;
|
|
<span class="hl-keyword">for</span> (Trade trade : items) {
|
|
chunkTotal = chunkTotal.add(trade.getAmount());
|
|
}
|
|
|
|
delegate.write(items);
|
|
|
|
<span class="hl-comment">// After successfully writing all items</span>
|
|
totalAmount = totalAmount.add(chunkTotal);
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> writeFooter(Writer writer) <span class="hl-keyword">throws</span> IOException {
|
|
writer.write(<span class="hl-string">"Total Amount Processed: "</span> + totalAmount);
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> setDelegate(ItemWriter delegate) {...}
|
|
}</pre>
|
|
|
|
<p>This <code class="classname">TradeItemWriter</code> stores a
|
|
<code class="code">totalAmount</code> value that is increased with the
|
|
<code class="code">amount</code> from each <code class="classname">Trade</code> item written.
|
|
After the last <code class="classname">Trade</code> is processed, the framework
|
|
will call <code class="methodname">writeFooter</code>, which will put that
|
|
<code class="code">totalAmount</code> into the file. Note that the
|
|
<code class="methodname">write</code> method makes use of a temporary variable,
|
|
<code class="varname">chunkTotalAmount</code>, that stores the total of the trades
|
|
in the chunk. This is done to ensure that if a skip occurs in the
|
|
<code class="methodname">write</code> method, that the
|
|
<span class="property">totalAmount</span> will be left unchanged. It is only at
|
|
the end of the <code class="methodname">write</code> method, once we are
|
|
guaranteed that no exceptions will be thrown, that we update the
|
|
<code class="varname">totalAmount</code>.</p>
|
|
|
|
<p>In order for the <code class="methodname">writeFooter</code> method to be
|
|
called, the <code class="classname">TradeItemWriter</code> (which implements
|
|
<code class="classname">FlatFileFooterCallback</code>) must be wired into the
|
|
<code class="classname">FlatFileItemWriter</code> as the
|
|
<code class="code">footerCallback</code>:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"tradeItemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"..TradeItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"delegate"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"flatFileItemWriter"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"flatFileItemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...FlatFileItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"outputResource"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"lineAggregator"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"lineAggregator"</span><span class="hl-tag">/></span>
|
|
<span class="bold"><strong> <property name="footerCallback" ref="tradeItemWriter" /></strong></span>
|
|
<span class="hl-tag"></bean></span></pre>
|
|
|
|
<p>The way that the <code class="classname">TradeItemWriter</code> has been
|
|
so far will only function correctly if the <code class="classname">Step</code>
|
|
is not restartable. This is because the class is stateful (since it
|
|
stores the <code class="code">totalAmount</code>), but the <code class="code">totalAmount</code>
|
|
is not persisted to the database, and therefore, it cannot be retrieved
|
|
in the event of a restart. In order to make this class restartable, the
|
|
<code class="classname">ItemStream</code> interface should be implemented along
|
|
with the methods <code class="methodname">open</code> and
|
|
<code class="methodname">update</code>:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">void</span> open(ExecutionContext executionContext) {
|
|
<span class="hl-keyword">if</span> (executionContext.containsKey(<span class="hl-string">"total.amount"</span>) {
|
|
totalAmount = (BigDecimal) executionContext.get(<span class="hl-string">"total.amount"</span>);
|
|
}
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> update(ExecutionContext executionContext) {
|
|
executionContext.put(<span class="hl-string">"total.amount"</span>, totalAmount);
|
|
}</pre>
|
|
|
|
<p>The <code class="methodname">update</code> method will store the most
|
|
current version of <code class="code">totalAmount</code> to the
|
|
<code class="classname">ExecutionContext</code> just before that object is
|
|
persisted to the database. The <code class="methodname">open</code> method will
|
|
retrieve any existing <code class="code">totalAmount</code> from the
|
|
<code class="classname">ExecutionContext</code> and use it as the starting point
|
|
for processing, allowing the <code class="classname">TradeItemWriter</code> to
|
|
pick up on restart where it left off the previous time the
|
|
<code class="classname">Step</code> was executed.</p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="drivingQueryBasedItemReaders" href="#drivingQueryBasedItemReaders"></a>11.4 Driving Query Based ItemReaders</h2></div></div></div>
|
|
|
|
|
|
<p>In the chapter on readers and writers, database input using paging
|
|
was discussed. Many database vendors, such as DB2, have extremely
|
|
pessimistic locking strategies that can cause issues if the table being
|
|
read also needs to be used by other portions of the online application.
|
|
Furthermore, opening cursors over extremely large datasets can cause
|
|
issues on certain vendors. Therefore, many projects prefer to use a
|
|
'Driving Query' approach to reading in data. This approach works by
|
|
iterating over keys, rather than the entire object that needs to be
|
|
returned, as the following example illustrates:</p>
|
|
|
|
<div class="mediaobject" align="center"><img src="images/drivingQueryExample.png" align="middle"></div>
|
|
|
|
<p>As you can see, this example uses the same 'FOO' table as was used
|
|
in the cursor based example. However, rather than selecting the entire
|
|
row, only the ID's were selected in the SQL statement. So, rather than a
|
|
FOO object being returned from <code class="classname">read</code>, an Integer
|
|
will be returned. This number can then be used to query for the 'details',
|
|
which is a complete Foo object:</p>
|
|
|
|
<div class="mediaobject" align="center"><img src="images/drivingQueryJob.png" align="middle"></div>
|
|
|
|
<p>An ItemProcessor should be used to transform the key obtained from
|
|
the driving query into a full 'Foo' object. An existing DAO can be used to
|
|
query for the full object based on the key.</p>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="multiLineRecords" href="#multiLineRecords"></a>11.5 Multi-Line Records</h2></div></div></div>
|
|
|
|
|
|
<p>While it is usually the case with flat files that one each record is
|
|
confined to a single line, it is common that a file might have records
|
|
spanning multiple lines with multiple formats. The following excerpt from
|
|
a file illustrates this:</p>
|
|
|
|
<pre class="programlisting">HEA;0013100345;2007-02-15
|
|
NCU;Smith;Peter;;T;20014539;F
|
|
BAD;;Oak Street 31/A;;Small Town;00235;IL;US
|
|
FOT;2;2;267.34</pre>
|
|
|
|
<p>Everything between the line starting with 'HEA' and the line
|
|
starting with 'FOT' is considered one record. There are a few
|
|
considerations that must be made in order to handle this situation
|
|
correctly:</p>
|
|
|
|
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">
|
|
<p>Instead of reading one record at a time, the
|
|
<code class="classname">ItemReader</code> must read every line of the
|
|
multi-line record as a group, so that it can be passed to the
|
|
<code class="classname">ItemWriter</code> intact.</p>
|
|
</li><li class="listitem">
|
|
<p>Each line type may need to be tokenized differently.</p>
|
|
</li></ul></div>
|
|
|
|
<p>Because a single record spans multiple lines, and we may not know
|
|
how many lines there are, the <code class="classname">ItemReader</code> must be
|
|
careful to always read an entire record. In order to do this, a custom
|
|
<code class="classname">ItemReader</code> should be implemented as a wrapper for
|
|
the <code class="classname">FlatFileItemReader</code>.</p>
|
|
|
|
<pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...MultiLineTradeItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"delegate"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.file.FlatFileItemReader"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"data/iosample/input/multiLine.txt"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"lineMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...DefaultLineMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"lineTokenizer"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"orderFileTokenizer"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"fieldSetMapper"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...PassThroughFieldSetMapper"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre>
|
|
|
|
<p>To ensure that each line is tokenized properly, which is especially
|
|
important for fixed length input, the
|
|
<code class="classname">PatternMatchingCompositeLineTokenizer</code> can be used
|
|
on the delegate <code class="classname">FlatFileItemReader</code>. See <a class="xref" href="#prefixMatchingLineMapper" title="Multiple Record Types within a Single File">the section called “Multiple Record Types within a Single File”</a> for more details. The delegate
|
|
reader will then use a <code class="classname">PassThroughFieldSetMapper</code> to
|
|
deliver a <code class="classname">FieldSet</code> for each line back to the
|
|
wrapping <code class="classname">ItemReader</code>.</p>
|
|
|
|
<pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"orderFileTokenizer"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.spr...PatternMatchingCompositeLineTokenizer"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"tokenizers"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><map></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"HEA*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"headerRecordTokenizer"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"FOT*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"footerRecordTokenizer"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"NCU*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"customerLineTokenizer"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"><entry</span> <span class="hl-attribute">key</span>=<span class="hl-value">"BAD*"</span> <span class="hl-attribute">value-ref</span>=<span class="hl-value">"billingAddressLineTokenizer"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></map></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre>
|
|
|
|
<p>This wrapper will have to be able recognize the end of a record so
|
|
that it can continually call <code class="methodname">read()</code> on its
|
|
delegate until the end is reached. For each line that is read, the wrapper
|
|
should build up the item to be returned. Once the footer is reached, the
|
|
item can be returned for delivery to the
|
|
<code class="classname">ItemProcessor</code> and
|
|
<code class="classname">ItemWriter</code>.</p>
|
|
|
|
<pre class="programlisting"><span class="hl-keyword">private</span> FlatFileItemReader<FieldSet> delegate;
|
|
|
|
<span class="hl-keyword">public</span> Trade read() <span class="hl-keyword">throws</span> Exception {
|
|
Trade t = null;
|
|
|
|
<span class="hl-keyword">for</span> (FieldSet line = null; (line = <span class="hl-keyword">this</span>.delegate.read()) != null;) {
|
|
String prefix = line.readString(<span class="hl-number">0</span>);
|
|
<span class="hl-keyword">if</span> (prefix.equals(<span class="hl-string">"HEA"</span>)) {
|
|
t = <span class="hl-keyword">new</span> Trade(); <span class="hl-comment">// Record must start with header</span>
|
|
}
|
|
<span class="hl-keyword">else</span> <span class="hl-keyword">if</span> (prefix.equals(<span class="hl-string">"NCU"</span>)) {
|
|
Assert.notNull(t, <span class="hl-string">"No header was found."</span>);
|
|
t.setLast(line.readString(<span class="hl-number">1</span>));
|
|
t.setFirst(line.readString(<span class="hl-number">2</span>));
|
|
...
|
|
}
|
|
<span class="hl-keyword">else</span> <span class="hl-keyword">if</span> (prefix.equals(<span class="hl-string">"BAD"</span>)) {
|
|
Assert.notNull(t, <span class="hl-string">"No header was found."</span>);
|
|
t.setCity(line.readString(<span class="hl-number">4</span>));
|
|
t.setState(line.readString(<span class="hl-number">6</span>));
|
|
...
|
|
}
|
|
<span class="hl-keyword">else</span> <span class="hl-keyword">if</span> (prefix.equals(<span class="hl-string">"FOT"</span>)) {
|
|
<span class="hl-keyword">return</span> t; <span class="hl-comment">// Record must end with footer</span>
|
|
}
|
|
}
|
|
Assert.isNull(t, <span class="hl-string">"No 'END' was found."</span>);
|
|
<span class="hl-keyword">return</span> null;
|
|
}</pre>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="executingSystemCommands" href="#executingSystemCommands"></a>11.6 Executing System Commands</h2></div></div></div>
|
|
|
|
|
|
<p>Many batch jobs may require that an external command be called from
|
|
within the batch job. Such a process could be kicked off separately by the
|
|
scheduler, but the advantage of common meta-data about the run would be
|
|
lost. Furthermore, a multi-step job would also need to be split up into
|
|
multiple jobs as well.</p>
|
|
|
|
<p>Because the need is so common, Spring Batch provides a
|
|
<code class="classname">Tasklet</code> implementation for calling system
|
|
commands:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.core.step.tasklet.SystemCommandTasklet"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"command"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"echo hello"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-comment"><!-- 5 second timeout for the command to complete --></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"timeout"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"5000"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></bean></span></pre>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="handlingStepCompletionWhenNoInputIsFound" href="#handlingStepCompletionWhenNoInputIsFound"></a>11.7 Handling Step Completion When No Input is Found</h2></div></div></div>
|
|
|
|
|
|
<p>In many batch scenarios, finding no rows in a database or file to
|
|
process is not exceptional. The <code class="classname">Step</code> is simply
|
|
considered to have found no work and completes with 0 items read. All of
|
|
the <code class="classname">ItemReader</code> implementations provided out of the
|
|
box in Spring Batch default to this approach. This can lead to some
|
|
confusion if nothing is written out even when input is present. (which
|
|
usually happens if a file was misnamed, etc) For this reason, the meta
|
|
data itself should be inspected to determine how much work the framework
|
|
found to be processed. However, what if finding no input is considered
|
|
exceptional? In this case, programmatically checking the meta data for no
|
|
items processed and causing failure is the best solution. Because this is
|
|
a common use case, a listener is provided with just this
|
|
functionality:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> NoWorkFoundStepExecutionListener <span class="hl-keyword">extends</span> StepExecutionListenerSupport {
|
|
|
|
<span class="hl-keyword">public</span> ExitStatus afterStep(StepExecution stepExecution) {
|
|
<span class="hl-keyword">if</span> (stepExecution.getReadCount() == <span class="hl-number">0</span>) {
|
|
<span class="hl-keyword">return</span> ExitStatus.FAILED;
|
|
}
|
|
<span class="hl-keyword">return</span> null;
|
|
}
|
|
|
|
}</pre>
|
|
|
|
<p>The above <code class="classname">StepExecutionListener</code> inspects the
|
|
readCount property of the <code class="classname">StepExecution</code> during the
|
|
'afterStep' phase to determine if no items were read. If that is the case,
|
|
an exit code of FAILED is returned, indicating that the
|
|
<code class="classname">Step</code> should fail. Otherwise, null is returned,
|
|
which will not affect the status of the
|
|
<code class="classname">Step</code>.</p>
|
|
</div>
|
|
|
|
<div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="passingDataToFutureSteps" href="#passingDataToFutureSteps"></a>11.8 Passing Data to Future Steps</h2></div></div></div>
|
|
|
|
|
|
<p>It is often useful to pass information from one step to another.
|
|
This can be done using the <code class="classname">ExecutionContext</code>. The
|
|
catch is that there are two <code class="classname">ExecutionContext</code>s: one
|
|
at the <code class="classname">Step</code> level and one at the
|
|
<code class="classname">Job</code> level. The <code class="classname">Step</code>
|
|
<code class="classname">ExecutionContext</code> lives only as long as the step
|
|
while the <code class="classname">Job</code>
|
|
<code class="classname">ExecutionContext</code> lives through the whole
|
|
<code class="classname">Job</code>. On the other hand, the
|
|
<code class="classname">Step</code> <code class="classname">ExecutionContext</code> is
|
|
updated every time the <code class="classname">Step</code> commits a chunk while
|
|
the <code class="classname">Job</code> <code class="classname">ExecutionContext</code> is
|
|
updated only at the end of each <code class="classname">Step</code>.</p>
|
|
|
|
<p>The consequence of this separation is that all data must be placed
|
|
in the <code class="classname">Step</code> <code class="classname">ExecutionContext</code>
|
|
while the <code class="classname">Step</code> is executing. This will ensure that
|
|
the data will be stored properly while the <code class="classname">Step</code> is
|
|
on-going. If data is stored to the <code class="classname">Job</code>
|
|
<code class="classname">ExecutionContext</code>, then it will not be persisted
|
|
during <code class="classname">Step</code> execution and if the
|
|
<code class="classname">Step</code> fails, that data will be lost.</p>
|
|
|
|
<pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> SavingItemWriter <span class="hl-keyword">implements</span> ItemWriter<Object> {
|
|
<span class="hl-keyword">private</span> StepExecution stepExecution;
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(List<? <span class="hl-keyword">extends</span> Object> items) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-comment">// ...</span>
|
|
|
|
ExecutionContext stepContext = <span class="hl-keyword">this</span>.stepExecution.getExecutionContext();
|
|
stepContext.put(<span class="hl-string">"someKey"</span>, someObject);
|
|
}
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@BeforeStep</span></em>
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> saveStepExecution(StepExecution stepExecution) {
|
|
<span class="hl-keyword">this</span>.stepExecution = stepExecution;
|
|
}
|
|
}</pre>
|
|
|
|
<p>To make the data available to future <code class="classname">Step</code>s,
|
|
it will have to be "promoted" to the <code class="classname">Job</code>
|
|
<code class="classname">ExecutionContext</code> after the step has finished.
|
|
Spring Batch provides the
|
|
<code class="classname">ExecutionContextPromotionListener</code> for this purpose.
|
|
The listener must be configured with the keys related to the data in the
|
|
<code class="classname">ExecutionContext</code> that must be promoted. It can
|
|
also, optionally, be configured with a list of exit code patterns for
|
|
which the promotion should occur ("COMPLETED" is the default). As with all
|
|
listeners, it must be registered on the
|
|
<code class="classname">Step</code>.</p>
|
|
|
|
<pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"job1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"reader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"savingWriter"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"10"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
<span class="hl-tag"><listeners></span>
|
|
<span class="bold"><strong><listener ref="promotionListener"/></strong></span>
|
|
<span class="hl-tag"></listeners></span>
|
|
<span class="hl-tag"></step></span>
|
|
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step2"</span><span class="hl-tag">></span>
|
|
...
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span>
|
|
|
|
<span class="bold"><strong><beans:bean id="promotionListener" class="org.spr....ExecutionContextPromotionListener">
|
|
<beans:property name="keys" value="someKey"/>
|
|
</beans:bean></strong></span></pre>
|
|
|
|
<p>Finally, the saved values must be retrieved from the
|
|
<code class="classname">Job</code> <code class="classname">ExeuctionContext</code>:</p>
|
|
|
|
<pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> RetrievingItemWriter <span class="hl-keyword">implements</span> ItemWriter<Object> {
|
|
<span class="hl-keyword">private</span> Object someObject;
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> write(List<? <span class="hl-keyword">extends</span> Object> items) <span class="hl-keyword">throws</span> Exception {
|
|
<span class="hl-comment">// ...</span>
|
|
}
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@BeforeStep</span></em>
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> retrieveInterstepData(StepExecution stepExecution) {
|
|
JobExecution jobExecution = stepExecution.getJobExecution();
|
|
ExecutionContext jobContext = jobExecution.getExecutionContext();
|
|
<span class="hl-keyword">this</span>.someObject = jobContext.get(<span class="hl-string">"someKey"</span>);
|
|
}
|
|
}</pre>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="jsr-352" href="#jsr-352"></a>12. JSR-352 Support</h1></div></div></div><p>As of Spring Batch 3.0 support for JSR-352 has been fully implemented. This section is not a replacement for
|
|
the spec itself and instead, intends to explain how the JSR-352 specific concepts apply to Spring Batch.
|
|
Additional information on JSR-352 can be found via the
|
|
JCP here: <a class="ulink" href="https://jcp.org/en/jsr/detail?id=352" target="_top">https://jcp.org/en/jsr/detail?id=352</a></p><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="jsrGeneralNotes" href="#jsrGeneralNotes"></a>12.1 General Notes Spring Batch and JSR-352</h2></div></div></div><p>Spring Batch and JSR-352 are structurally the same. They both have jobs that are made up of steps. They
|
|
both have readers, processors, writers, and listeners. However, their interactions are subtly different.
|
|
For example, the <code class="code">org.springframework.batch.core.SkipListener#onSkipInWrite(S item, Throwable t)</code>
|
|
within Spring Batch receives two parameters: the item that was skipped and the Exception that caused the
|
|
skip. The JSR-352 version of the same method
|
|
(<code class="classname">javax.batch.api.chunk.listener.SkipWriteListener#onSkipWriteItem(List<Object> items, Exception ex)</code>)
|
|
also receives two parameters. However the first one is a <code class="classname">List</code> of all the items
|
|
within the current chunk with the second being the <code class="classname">Exception</code> that caused the skip.
|
|
Because of these differences, it is important to note that there are two paths to execute a job within
|
|
Spring Batch: either a traditional Spring Batch job or a JSR-352 based job. While the use of Spring Batch
|
|
artifacts (readers, writers, etc) will work within a job configured via JSR-352's JSL and executed via the
|
|
<code class="classname">JsrJobOperator</code>, they will behave according to the rules of JSR-352. It is also
|
|
important to note that batch artifacts that have been developed against the JSR-352 interfaces will not work
|
|
within a traditional Spring Batch job.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="jsrSetup" href="#jsrSetup"></a>12.2 Setup</h2></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="jsrSetupContexts" href="#jsrSetupContexts"></a>12.2.1 Application Contexts</h3></div></div></div><p>All JSR-352 based jobs within Spring Batch consist of two application contexts. A parent context, that
|
|
contains beans related to the infrastructure of Spring Batch such as the <code class="classname">JobRepository</code>,
|
|
<code class="classname">PlatformTransactionManager</code>, etc and a child context that consists of the configuration
|
|
of the job to be run. The parent context is defined via the <code class="classname">baseContext.xml</code> provided
|
|
by the framework. This context may be overridden via the <code class="classname">JSR-352-BASE-CONTEXT</code> system
|
|
property.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>The base context is not processed by the JSR-352 processors for things like property injection so
|
|
no components requiring that additional processing should be configured there.
|
|
</p></td></tr></table></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="jsrSetupLaunching" href="#jsrSetupLaunching"></a>12.2.2 Launching a JSR-352 based job</h3></div></div></div><p>JSR-352 requires a very simple path to executing a batch job. The following code is all that is needed to
|
|
execute your first batch job:
|
|
</p><pre class="programlisting">JobOperator operator = BatchRuntime.getJobOperator();
|
|
jobOperator.start(<span class="hl-string">"myJob"</span>, <span class="hl-keyword">new</span> Properties());</pre><p>While that is convenient for developers, the devil is in the details. Spring Batch bootstraps a bit of
|
|
infrastructure behind the scenes that a developer may want to override. The following is bootstrapped the
|
|
first time <code class="code">BatchRuntime.getJobOperator()</code> is called:
|
|
</p><div class="informaltable"><table style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col align="left"><col align="left"><col align="left"></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
<span class="bold"><strong>Bean Name</strong></span>
|
|
</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
<span class="bold"><strong>Default Configuration</strong></span>
|
|
</td><td style="border-bottom: 0.5pt solid ; " align="left">
|
|
<span class="bold"><strong>Notes</strong></span>
|
|
</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
dataSource
|
|
</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
Apache DBCP BasicDataSource with configured values.
|
|
</td><td style="border-bottom: 0.5pt solid ; " align="left">
|
|
By default, HSQLDB is bootstrapped.
|
|
</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
<code class="code">transactionManager</code>
|
|
</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
<code class="code">org.springframework.jdbc.datasource.DataSourceTransactionManager</code>
|
|
</td><td style="border-bottom: 0.5pt solid ; " align="left">
|
|
References the dataSource bean defined above.
|
|
</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
A Datasource initializer
|
|
</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
</td><td style="border-bottom: 0.5pt solid ; " align="left">
|
|
This is configured to execute the scripts configured via the
|
|
<code class="code">batch.drop.script</code> and <code class="code">batch.schema.script</code> properties. By
|
|
default, the schema scripts for HSQLDB are executed. This behavior can be disabled via
|
|
<code class="code">batch.data.source.init</code> property.
|
|
</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
jobRepository
|
|
</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
A JDBC based <code class="code">SimpleJobRepository</code>.
|
|
</td><td style="border-bottom: 0.5pt solid ; " align="left">
|
|
This <code class="code">JobRepository</code> uses the previously mentioned data source and transaction
|
|
manager. The schema's table prefix is configurable (defaults to BATCH_) via the
|
|
<code class="code">batch.table.prefix</code> property.
|
|
</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
jobLauncher
|
|
</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
<code class="code">org.springframework.batch.core.launch.support.SimpleJobLauncher</code>
|
|
</td><td style="border-bottom: 0.5pt solid ; " align="left">
|
|
Used to launch jobs.
|
|
</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
batchJobOperator
|
|
</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
<code class="code">org.springframework.batch.core.launch.support.SimpleJobOperator</code>
|
|
</td><td style="border-bottom: 0.5pt solid ; " align="left">
|
|
The <code class="code">JsrJobOperator</code> wraps this to provide most of it's functionality.
|
|
</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
jobExplorer
|
|
</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
<code class="code">org.springframework.batch.core.explore.support.JobExplorerFactoryBean</code>
|
|
</td><td style="border-bottom: 0.5pt solid ; " align="left">
|
|
Used to address lookup functionality provided by the <code class="code">JsrJobOperator</code>.
|
|
</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
jobParametersConverter
|
|
</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
<code class="code">org.springframework.batch.core.jsr.JsrJobParametersConverter</code>
|
|
</td><td style="border-bottom: 0.5pt solid ; " align="left">
|
|
JSR-352 specific implementation of the <code class="code">JobParametersConverter</code>.
|
|
</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
jobRegistry
|
|
</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
<code class="code">org.springframework.batch.core.configuration.support.MapJobRegistry</code>
|
|
</td><td style="border-bottom: 0.5pt solid ; " align="left">
|
|
Used by the <code class="code">SimpleJobOperator</code>.
|
|
</td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">
|
|
placeholderProperties
|
|
</td><td style="border-right: 0.5pt solid ; " align="left">
|
|
<code class="code">org.springframework.beans.factory.config.PropertyPlaceholderConfigure</code>
|
|
</td><td style="" align="left">
|
|
Loads the properties file <code class="code">batch-${ENVIRONMENT:hsql}.properties</code> to configure
|
|
the properties mentioned above. ENVIRONMENT is a System property (defaults to hsql)
|
|
that can be used to specify any of the supported databases Spring Batch currently
|
|
supports.
|
|
</td></tr></tbody></table></div><p>
|
|
</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>None of the above beans are optional for executing JSR-352 based jobs. All may be overriden to
|
|
provide customized functionality as needed.
|
|
</p></td></tr></table></div></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="dependencyInjection" href="#dependencyInjection"></a>12.3 Dependency Injection</h2></div></div></div><p>JSR-352 is based heavily on the Spring Batch programming model. As such, while not explicitly requiring a
|
|
formal dependency injection implementation, DI of some kind implied. Spring Batch supports all three
|
|
methods for loading batch artifacts defined by JSR-352:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Implementation Specific Loader - Spring Batch is built upon Spring and so supports Spring
|
|
dependency injection within JSR-352 batch jobs.</p></li><li class="listitem"><p>Archive Loader - JSR-352 defines the existing of a batch.xml file that provides mappings between a
|
|
logical name and a class name. This file must be found within the /META-INF/ directory if it is
|
|
used.</p></li><li class="listitem"><p>Thread Context Class Loader - JSR-352 allows configurations to specify batch artifact
|
|
implementations in their JSL by providing the fully qualified class name inline. Spring Batch
|
|
supports this as well in JSR-352 configured jobs.</p></li></ul></div><p>To use Spring dependency injection within a JSR-352 based batch job consists of configuring batch
|
|
artifacts using a Spring application context as beans. Once the beans have been defined, a job can refer to
|
|
them as it would any bean defined within the batch.xml.</p><pre class="programlisting"><span class="hl-directive" style="color: maroon"><?xml version="1.0" encoding="UTF-8"?></span>
|
|
<span class="hl-tag"><beans</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"http://www.springframework.org/schema/beans"</span>
|
|
<span class="hl-attribute">xmlns:xsi</span>=<span class="hl-value">"http://www.w3.org/2001/XMLSchema-instance"</span>
|
|
<span class="hl-attribute">xsi:schemaLocation</span>=<span class="hl-value">"http://www.springframework.org/schema/beans
|
|
http://www.springframework.org/schema/beans/spring-beans.xsd
|
|
http://xmlns.jcp.org/xml/ns/javaee
|
|
http://xmlns.jcp.org/xml/ns/javaee/jobXML_1_0.xsd"</span><span class="hl-tag">></span>
|
|
|
|
<span class="hl-comment"><!-- javax.batch.api.Batchlet implementation --></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"fooBatchlet"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"io.spring.FooBatchlet"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"prop"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"bar"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-comment"><!-- Job is defined using the JSL schema provided in JSR-352 --></span>
|
|
<span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"fooJob"</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"http://xmlns.jcp.org/xml/ns/javaee"</span> <span class="hl-attribute">version</span>=<span class="hl-value">"1.0"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><batchlet</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"fooBatchlet"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span>
|
|
<span class="hl-tag"></beans></span>
|
|
</pre><p>The assembly of Spring contexts (imports, etc) works with JSR-352 jobs just as it would with any other
|
|
Spring based application. The only difference with a JSR-352 based job is that the entry point for the
|
|
context definition will be the job definition found in /META-INF/batch-jobs/.</p><p>To use the thread context class loader approach, all you need to do is provide the fully qualified class
|
|
name as the ref. It is important to note that when using this approach or the batch.xml approach, the class
|
|
referenced requires a no argument constructor which will be used to create the bean.</p><pre class="programlisting"><span class="hl-directive" style="color: maroon"><?xml version="1.0" encoding="UTF-8"?></span>
|
|
<span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"fooJob"</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"http://xmlns.jcp.org/xml/ns/javaee"</span> <span class="hl-attribute">version</span>=<span class="hl-value">"1.0"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag"> ></span>
|
|
<span class="hl-tag"><batchlet</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"io.spring.FooBatchlet"</span><span class="hl-tag"> /></span>
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span>
|
|
</pre></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="jsrJobProperties" href="#jsrJobProperties"></a>12.4 Batch Properties</h2></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="jsrPropertySupport" href="#jsrPropertySupport"></a>12.4.1 Property Support</h3></div></div></div><p>JSR-352 allows for properties to be defined at the Job, Step and batch artifact level by way of
|
|
configuration in the JSL. Batch properties are configured at each level in the following way:</p><pre class="programlisting"><span class="hl-tag"><properties></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"propertyName1"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"propertyValue1"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"propertyName2"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"propertyValue2"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></properties></span></pre><p>
|
|
Properties may be configured on any batch artifact.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="jsrBatchPropertyAnnotation" href="#jsrBatchPropertyAnnotation"></a>12.4.2 <code class="classname">@BatchProperty</code> annotation</h3></div></div></div><p>Properties are referenced in batch artifacts by annotating class fields with the
|
|
<code class="classname">@BatchProperty</code> and <code class="classname">@Inject</code> annotations (both annotations
|
|
are required by the spec). As defined by JSR-352, fields for properties must be String typed. Any type
|
|
conversion is up to the implementing developer to perform.</p><p>An <code class="classname">javax.batch.api.chunk.ItemReader</code> artifact could be configured with a
|
|
properties block such as the one described above and accessed as such:</p><pre class="programlisting"><span class="hl-keyword">public</span> <span class="hl-keyword">class</span> MyItemReader <span class="hl-keyword">extends</span> AbstractItemReader {
|
|
<em><span class="hl-annotation" style="color: gray">@Inject</span></em>
|
|
<em><span class="hl-annotation" style="color: gray">@BatchProperty</span></em>
|
|
<span class="hl-keyword">private</span> String propertyName1;
|
|
|
|
...
|
|
}</pre><p>
|
|
The value of the field "propertyName1" will be "propertyValue1"</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="jsrPropertySubstitution" href="#jsrPropertySubstitution"></a>12.4.3 Property Substitution</h3></div></div></div><p>Property substitution is provided by way of operators and simple conditional expressions. The general
|
|
usage is #{operator['key']}.</p><p>Supported operators:</p><p>
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>jobParameters - access job parameter values that the job was started/restarted with.
|
|
</p></li><li class="listitem"><p>jobProperties - access properties configured at the job level of the JSL.</p></li><li class="listitem"><p>systemProperties - access named system properties.</p></li><li class="listitem"><p>partitionPlan - access named property from the partition plan of a partitioned step.
|
|
</p></li></ul></div><p>
|
|
</p><pre class="programlisting">#{jobParameters['unresolving.prop']}?:#{systemProperties['file.separator']}</pre><p>
|
|
The left hand side of the assignment is the expected value, the right hand side is the default value. In
|
|
this example, the result will resolve to a value of the system property file.separator as
|
|
#{jobParameters['unresolving.prop']} is assumed to not be resolvable. If neither expressions can be
|
|
resolved, an empty String will be returned. Multiple conditions can be used, which are separated by a
|
|
';'.
|
|
</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="jsrProcessingModels" href="#jsrProcessingModels"></a>12.5 Processing Models</h2></div></div></div><p>JSR-352 provides the same two basic processing models that Spring Batch does:</p><p>
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Item based processing - Using an <code class="classname">javax.batch.api.chunk.ItemReader</code>, an
|
|
optional <code class="classname">javax.batch.api.chunk.ItemProcessor</code>, and an
|
|
<code class="classname">javax.batch.api.chunk.ItemWriter</code>.</p></li><li class="listitem"><p>Task based processing - Using a <code class="classname">javax.batch.api.Batchlet</code>
|
|
implementation. This processing model is the same as the
|
|
<code class="classname">org.springframework.batch.core.step.tasklet.Tasklet</code> based processing
|
|
currently available.</p></li></ul></div><p>
|
|
</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="d5e3942" href="#d5e3942"></a>12.5.1 Item based processing</h3></div></div></div><p>Item based processing in this context is a chunk size being set by the number of items read by an
|
|
<code class="classname">ItemReader</code>. To configure a step this way, specify the
|
|
<code class="classname">item-count</code> (which defaults to 10) and optionally configure the
|
|
<code class="classname">checkpoint-policy</code> as item (this is the default).
|
|
</p><pre class="programlisting">...
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">checkpoint-policy</span>=<span class="hl-value">"item"</span> <span class="hl-attribute">item-count</span>=<span class="hl-value">"3"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><reader</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"fooReader"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><processor</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"fooProcessor"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><writer</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"fooWriter"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></chunk></span>
|
|
<span class="hl-tag"></step></span>
|
|
...</pre><p>
|
|
If item based checkpointing is chosen, an additional attribute <code class="classname">time-limit</code> is
|
|
supported. This sets a time limit for how long the number of items specified has to be processed. If
|
|
the timeout is reached, the chunk will complete with however many items have been read by then
|
|
regardless of what the <code class="classname">item-count</code> is configured to be.
|
|
</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="d5e3952" href="#d5e3952"></a>12.5.2 Custom checkpointing</h3></div></div></div><p>JSR-352 calls the process around the commit interval within a step "checkpointing". Item based
|
|
checkpointing is one approach as mentioned above. However, this will not be robust enough in many
|
|
cases. Because of this, the spec allows for the implementation of a custom checkpointing algorithm by
|
|
implementing the <code class="classname">javax.batch.api.chunk.CheckpointAlgorithm</code> interface. This
|
|
functionality is functionally the same as Spring Batch's custom completion policy. To use an
|
|
implementation of <code class="classname">CheckpointAlgorithm</code>, configure your step with the custom
|
|
<code class="classname">checkpoint-policy</code> as shown below where fooCheckpointer refers to an
|
|
implementation of <code class="classname">CheckpointAlgorithm</code>.
|
|
</p><pre class="programlisting">...
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">checkpoint-policy</span>=<span class="hl-value">"custom"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><checkpoint-algorithm</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"fooCheckpointer"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><reader</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"fooReader"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><processor</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"fooProcessor"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><writer</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"fooWriter"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></chunk></span>
|
|
<span class="hl-tag"></step></span>
|
|
...</pre></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="jsrRunningAJob" href="#jsrRunningAJob"></a>12.6 Running a job</h2></div></div></div><p>The entrance to executing a JSR-352 based job is through the
|
|
<code class="classname">javax.batch.operations.JobOperator</code>. Spring Batch provides our own implementation to
|
|
this interface (<code class="classname">org.springframework.batch.core.jsr.launch.JsrJobOperator</code>). This
|
|
implementation is loaded via the <code class="classname">javax.batch.runtime.BatchRuntime</code>. Launching a
|
|
JSR-352 based batch job is implemented as follows:</p><pre class="programlisting">
|
|
JobOperator jobOperator = BatchRuntime.getJobOperator();
|
|
<span class="hl-keyword">long</span> jobExecutionId = jobOperator.start(<span class="hl-string">"fooJob"</span>, <span class="hl-keyword">new</span> Properties());
|
|
</pre><p>The above code does the following:</p><p>
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Bootstraps a base ApplicationContext - In order to provide batch functionality, the framework
|
|
needs some infrastructure bootstrapped. This occurs once per JVM. The components that are
|
|
bootstrapped are similar to those provided by <code class="classname">@EnableBatchProcessing</code>.
|
|
Specific details can be found in the javadoc for the <code class="classname">JsrJobOperator</code>.
|
|
</p></li><li class="listitem"><p>Loads an <code class="classname">ApplicationContext</code> for the job requested - In the example
|
|
above, the framework will look in /META-INF/batch-jobs for a file named fooJob.xml and load a
|
|
context that is a child of the shared context mentioned previously.</p></li><li class="listitem"><p>Launch the job - The job defined within the context will be executed asynchronously. The
|
|
<code class="classname">JobExecution</code>'s id will be returned.</p></li></ul></div><p>
|
|
</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>All JSR-352 based batch jobs are executed asynchronously.</p></td></tr></table></div><p>When <code class="classname">JobOperator#start</code> is called using <code class="classname">SimpleJobOperator</code>,
|
|
Spring Batch determines if the call is an initial run or a retry of a previously executed run. Using the
|
|
JSR-352 based <code class="classname">JobOpeator#start(String jobXMLName, Properties jobParameters)</code>, the
|
|
framework will always create a new <code class="classname">JobInstance</code> (JSR-352 job parameters are
|
|
non-identifying). In order to restart a job, a call to
|
|
<code class="classname">JobOperator#restart(long executionId, Properties restartParameters)</code> is required.
|
|
</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="jsrContexts" href="#jsrContexts"></a>12.7 Contexts</h2></div></div></div><p>JSR-352 defines two context objects that are used to interact with the meta-data of a job or step from
|
|
within a batch artifact: <code class="classname">javax.batch.runtime.context.JobContext</code> and
|
|
<code class="classname">javax.batch.runtime.context.StepContext</code>. Both of these are available in any step
|
|
level artifact (<code class="classname">Batchlet</code>, <code class="classname">ItemReader</code>, etc) with the
|
|
<code class="classname">JobContext</code> being available to job level artifacts as well
|
|
(<code class="classname">JobListener</code> for example).</p><p>To obtain a reference to the <code class="classname">JobContext</code> or <code class="classname">StepContext</code>
|
|
within the current scope, simply use the <code class="classname">@Inject</code> annotation:</p><pre class="programlisting"><em><span class="hl-annotation" style="color: gray">@Inject</span></em>
|
|
JobContext jobContext;
|
|
</pre><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note: @Autowire for JSR-352 contexts"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">@Autowire for JSR-352 contexts</th></tr><tr><td align="left" valign="top"><p>Using Spring's @Autowire is not supported for the injection of these contexts.</p></td></tr></table></div><p>In Spring Batch, the <code class="classname">JobContext</code> and <code class="classname">StepContext</code> wrap their
|
|
corresponding execution objects (<code class="classname">JobExecution</code> and
|
|
<code class="classname">StepExecution</code> respectively). Data stored via
|
|
<code class="classname">StepContext#persistent#setPersistentUserData(Serializable data)</code> is stored in the
|
|
Spring Batch <code class="classname">StepExecution#executionContext</code>.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="jsrStepFlow" href="#jsrStepFlow"></a>12.8 Step Flow</h2></div></div></div><p>Within a JSR-352 based job, the flow of steps works similarly as it does within Spring Batch.
|
|
However, there are a few subtle differences:</p><p>
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Decision's are steps - In a regular Spring Batch job, a decision is a state that does not
|
|
have an independent <code class="classname">StepExecution</code> or any of the rights and
|
|
responsibilities that go along with being a full step.. However, with JSR-352, a decision
|
|
is a step just like any other and will behave just as any other steps (transactionality,
|
|
it gets a <code class="classname">StepExecution</code>, etc). This means that they are treated the
|
|
same as any other step on restarts as well.</p></li><li class="listitem"><p><code class="classname">next</code> attribute and step transitions - In a regular job, these are
|
|
allowed to appear together in the same step. JSR-352 allows them to both be used in the
|
|
same step with the next attribute taking precedence in evaluation.</p></li><li class="listitem"><p>Transition element ordering - In a standard Spring Batch job, transition elements are
|
|
sorted from most specific to least specific and evaluated in that order. JSR-352 jobs
|
|
evaluate transition elements in the order they are specified in the XML.</p></li></ul></div><p>
|
|
</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="jsrScaling" href="#jsrScaling"></a>12.9 Scaling a JSR-352 batch job</h2></div></div></div><p>Traditional Spring Batch jobs have four ways of scaling (the last two capable of being executed across
|
|
multiple JVMs):
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Split - Running multiple steps in parallel.</p></li><li class="listitem"><p>Multiple threads - Executing a single step via multiple threads.</p></li><li class="listitem"><p>Partitioning - Dividing the data up for parallel processing (master/slave).</p></li><li class="listitem"><p>Remote Chunking - Executing the processor piece of logic remotely.</p></li></ul></div><p>
|
|
</p><p>JSR-352 provides two options for scaling batch jobs. Both options support only a single JVM:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Split - Same as Spring Batch</p></li><li class="listitem"><p>Partitioning - Conceptually the same as Spring Batch however implemented slightly different.
|
|
</p></li></ul></div><p>
|
|
</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="jsrPartitioning" href="#jsrPartitioning"></a>12.9.1 Partitioning</h3></div></div></div><p>Conceptually, partitioning in JSR-352 is the same as it is in Spring Batch. Meta-data is provided
|
|
to each slave to identify the input to be processed with the slaves reporting back to the master the
|
|
results upon completion. However, there are some important differences:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Partitioned <code class="classname">Batchlet</code> - This will run multiple instances of the
|
|
configured <code class="classname">Batchlet</code> on multiple threads. Each instance will have
|
|
it's own set of properties as provided by the JSL or the
|
|
<code class="classname">PartitionPlan</code></p></li><li class="listitem"><p><code class="classname">PartitionPlan</code> - With Spring Batch's partitioning, an
|
|
<code class="classname">ExecutionContext</code> is provided for each partition. With JSR-352, a
|
|
single <code class="classname">javax.batch.api.partition.PartitionPlan</code> is provided with an
|
|
array of <code class="classname">Properties</code> providing the meta-data for each partition.
|
|
</p></li><li class="listitem"><p><code class="classname">PartitionMapper</code> - JSR-352 provides two ways to generate partition
|
|
meta-data. One is via the JSL (partition properties). The second is via an implementation
|
|
of the <code class="classname">javax.batch.api.partition.PartitionMapper</code> interface.
|
|
Functionally, this interface is similar to the
|
|
<code class="classname">org.springframework.batch.core.partition.support.Partitioner</code>
|
|
interface provided by Spring Batch in that it provides a way to programmaticaly generate
|
|
meta-data for partitioning.</p></li><li class="listitem"><p><code class="classname">StepExecution</code>s - In Spring Batch, partitioned steps are run as
|
|
master/slave. Within JSR-352, the same configuration occurs. However, the slave steps do
|
|
not get official <code class="classname">StepExecution</code>s. Because of that, calls to
|
|
<code class="classname">JsrJobOperator#getStepExecutions(long jobExecutionId)</code> will only
|
|
return the <code class="classname">StepExecution</code> for the master. </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>The child
|
|
<code class="classname">StepExecution</code>s still exist in the job repository and are available
|
|
via the <code class="classname">JobExplorer</code> and Spring Batch Admin.</p></td></tr></table></div><p>
|
|
</p></li><li class="listitem"><p>Compensating logic - Since Spring Batch implements the master/slave logic of
|
|
partitioning using steps, <code class="classname">StepExecutionListener</code>s can be used to
|
|
handle compensating logic if something goes wrong. However, since the slaves JSR-352
|
|
provides a collection of other components for the ability to provide compensating logic when
|
|
errors occur and to dynamically set the exit status. These components include the following:
|
|
</p><div class="informaltable"><table style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
|
|
<span class="bold"><strong>Artifact Interface</strong></span>
|
|
</td><td style="border-bottom: 0.5pt solid ; " align="left">
|
|
<span class="bold"><strong>Description</strong></span>
|
|
</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"><code class="classname">javax.batch.api.partition.PartitionCollector</code></td><td style="border-bottom: 0.5pt solid ; " align="left">Provides a way for slave steps to send information back to the
|
|
master. There is one instance per slave thread.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"><code class="classname">javax.batch.api.partition.PartitionAnalyzer</code></td><td style="border-bottom: 0.5pt solid ; " align="left">End point that receives the information collected by the
|
|
<code class="classname">PartitionCollector</code> as well as the resulting
|
|
statuses from a completed partition.</td></tr><tr><td style="border-right: 0.5pt solid ; " align="left"><code class="classname">javax.batch.api.partition.PartitionReducer</code></td><td style="" align="left">Provides the ability to provide compensating logic for a partitioned
|
|
step.</td></tr></tbody></table></div><p>
|
|
</p></li></ul></div><p>
|
|
</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="jsrTesting" href="#jsrTesting"></a>12.10 Testing</h2></div></div></div><p>Since all JSR-352 based jobs are executed asynchronously, it can be difficult to determine when a job has
|
|
completed. To help with testing, Spring Batch provides the
|
|
<code class="classname">org.springframework.batch.core.jsr.JsrTestUtils</code>. This utility class provides the
|
|
ability to start a job and restart a job and wait for it to complete. Once the job completes, the
|
|
associated <code class="classname">JobExecution</code> is returned.</p></div></div>
|
|
|
|
<div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="springBatchIntegration" href="#springBatchIntegration"></a>13. Spring Batch Integration</h1></div></div></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="spring-batch-integration-introduction" href="#spring-batch-integration-introduction"></a>13.1. Spring Batch Integration Introduction</h2></div></div></div><p>
|
|
Many users of Spring Batch may encounter requirements that are
|
|
outside the scope of Spring Batch, yet may be efficiently and
|
|
concisely implemented using Spring Integration. Conversely, Spring
|
|
Batch users may encounter Spring Batch requirements and need a way
|
|
to efficiently integrate both frameworks. In this context several
|
|
patterns and use-cases emerge and Spring Batch Integration will
|
|
address those requirements.
|
|
</p><p>
|
|
The line between Spring Batch and Spring Integration is not always
|
|
clear, but there are guidelines that one can follow. Principally,
|
|
these are: think about granularity, and apply common patterns. Some
|
|
of those common patterns are described in this reference manual
|
|
section.
|
|
</p><p>
|
|
Adding messaging to a batch process enables automation of
|
|
operations, and also separation and strategizing of key concerns.
|
|
For example a message might trigger a job to execute, and then the
|
|
sending of the message can be exposed in a variety of ways. Or when
|
|
a job completes or fails that might trigger a message to be sent,
|
|
and the consumers of those messages might have operational concerns
|
|
that have nothing to do with the application itself. Messaging can
|
|
also be embedded in a job, for example reading or writing items for
|
|
processing via channels. Remote partitioning and remote chunking
|
|
provide methods to distribute workloads over an number of workers.
|
|
</p><p>
|
|
Some key concepts that we will cover are:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
|
|
<a class="link" href="#namespace-support" title="13.1.1. Namespace Support">Namespace Support</a>
|
|
</p></li><li class="listitem"><p>
|
|
<a class="link" href="#launching-batch-jobs-through-messages" title="13.1.2. Launching Batch Jobs through Messages">Launching
|
|
Batch Jobs through Messages</a>
|
|
</p></li><li class="listitem"><p>
|
|
<a class="link" href="#providing-feedback-with-informational-messages" title="13.1.3. Providing Feedback with Informational Messages">Providing
|
|
Feedback with Informational Messages</a>
|
|
</p></li><li class="listitem"><p>
|
|
<a class="link" href="#asynchronous-processors" title="13.1.4. Asynchronous Processors">Asynchronous
|
|
Processors</a>
|
|
</p></li><li class="listitem"><p>
|
|
<a class="link" href="#externalizing-batch-process-execution" title="13.1.5. Externalizing Batch Process Execution">Externalizing
|
|
Batch Process Execution</a>
|
|
</p></li></ul></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="namespace-support" href="#namespace-support"></a>13.1.1. Namespace Support</h3></div></div></div><p>
|
|
Since Spring Batch Integration 1.3, dedicated XML Namespace
|
|
support was added, with the aim to provide an easier configuration
|
|
experience. In order to activate the namespace, add the following
|
|
namespace declarations to your Spring XML Application Context
|
|
file:
|
|
</p><pre class="programlisting"><span class="hl-tag"><beans</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"http://www.springframework.org/schema/beans"</span>
|
|
<span class="hl-attribute">xmlns:xsi</span>=<span class="hl-value">"http://www.w3.org/2001/XMLSchema-instance"</span>
|
|
<span class="hl-attribute">xmlns:batch-int</span>=<span class="hl-value">"http://www.springframework.org/schema/batch-integration"</span>
|
|
<span class="hl-attribute">xsi:schemaLocation</span>=<span class="hl-value">"
|
|
http://www.springframework.org/schema/batch-integration
|
|
http://www.springframework.org/schema/batch-integration/spring-batch-integration.xsd"</span><span class="hl-tag">></span>
|
|
|
|
...
|
|
|
|
<span class="hl-tag"></beans></span></pre><p>
|
|
A fully configured Spring XML Application Context file for Spring
|
|
Batch Integration may look like the following:
|
|
</p><pre class="programlisting"><span class="hl-tag"><beans</span> <span class="hl-attribute">xmlns</span>=<span class="hl-value">"http://www.springframework.org/schema/beans"</span>
|
|
<span class="hl-attribute">xmlns:xsi</span>=<span class="hl-value">"http://www.w3.org/2001/XMLSchema-instance"</span>
|
|
<span class="hl-attribute">xmlns:int</span>=<span class="hl-value">"http://www.springframework.org/schema/integration"</span>
|
|
<span class="hl-attribute">xmlns:batch</span>=<span class="hl-value">"http://www.springframework.org/schema/batch"</span>
|
|
<span class="hl-attribute">xmlns:batch-int</span>=<span class="hl-value">"http://www.springframework.org/schema/batch-integration"</span>
|
|
<span class="hl-attribute">xsi:schemaLocation</span>=<span class="hl-value">"
|
|
http://www.springframework.org/schema/batch-integration
|
|
http://www.springframework.org/schema/batch-integration/spring-batch-integration.xsd
|
|
http://www.springframework.org/schema/batch
|
|
http://www.springframework.org/schema/batch/spring-batch.xsd
|
|
http://www.springframework.org/schema/beans
|
|
http://www.springframework.org/schema/beans/spring-beans.xsd
|
|
http://www.springframework.org/schema/integration
|
|
http://www.springframework.org/schema/integration/spring-integration.xsd"</span><span class="hl-tag">></span>
|
|
|
|
...
|
|
|
|
<span class="hl-tag"></beans></span></pre><p>
|
|
Appending version numbers to the referenced XSD file is also
|
|
allowed but, as a version-less declaration will always use the
|
|
latest schema, we generally don't recommend appending the version
|
|
number to the XSD name. Adding a version number, for instance,
|
|
would create possibly issues when updating the Spring Batch
|
|
Integration dependencies as they may require more recent versions
|
|
of the XML schema.
|
|
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="launching-batch-jobs-through-messages" href="#launching-batch-jobs-through-messages"></a>13.1.2. Launching Batch Jobs through Messages</h3></div></div></div><p>
|
|
When starting batch jobs using the core Spring Batch API you
|
|
basically have 2 options:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
|
|
Command line via the <code class="classname">CommandLineJobRunner</code>
|
|
</p></li><li class="listitem"><p>
|
|
Programatically via either
|
|
<code class="classname">JobOperator.start()</code> or
|
|
<code class="classname">JobLauncher.run()</code>.
|
|
</p></li></ul></div><p>
|
|
For example, you may want to use the
|
|
<code class="classname">CommandLineJobRunner</code> when invoking Batch Jobs
|
|
using a shell script. Alternatively, you may use the
|
|
<code class="classname">JobOperator</code> directly, for example when using
|
|
Spring Batch as part of a web application. However, what about
|
|
more complex use-cases? Maybe you need to poll a remote (S)FTP
|
|
server to retrieve the data for the Batch Job. Or your application
|
|
has to support multiple different data sources simultaneously. For
|
|
example, you may receive data files not only via the web, but also
|
|
FTP etc. Maybe additional transformation of the input files is
|
|
needed before invoking Spring Batch.
|
|
</p><p>
|
|
Therefore, it would be much more powerful to execute the batch job
|
|
using Spring Integration and its numerous adapters. For example,
|
|
you can use a <span class="emphasis"><em>File Inbound Channel Adapter</em></span> to
|
|
monitor a directory in the file-system and start the Batch Job as
|
|
soon as the input file arrives. Additionally you can create Spring
|
|
Integration flows that use multiple different adapters to easily
|
|
ingest data for your Batch Jobs from multiple sources
|
|
simultaneously using configuration only. Implementing all these
|
|
scenarios with Spring Integration is easy as it allow for an
|
|
decoupled event-driven execution of the
|
|
<code class="classname">JobLauncher</code>.
|
|
</p><p>
|
|
Spring Batch Integration provides the
|
|
<code class="classname">JobLaunchingMessageHandler</code> class that you can
|
|
use to launch batch jobs. The input for the
|
|
<code class="classname">JobLaunchingMessageHandler</code> is provided by a
|
|
Spring Integration message, which payload is of type
|
|
<code class="classname">JobLaunchRequest</code>. This class is a wrapper around the Job
|
|
that needs to be launched as well as the <code class="classname">JobParameters</code>
|
|
necessary to launch the Batch job.
|
|
</p><p>
|
|
The following image illustrates the typical Spring Integration
|
|
message flow in order to start a Batch job. The
|
|
<a class="ulink" href="http://www.eaipatterns.com/toc.html" target="_top">EIP (Enterprise IntegrationPatterns) website</a>
|
|
provides a full overview of messaging icons and their descriptions.
|
|
</p><div class="mediaobject" align="center"><img src="images/launch-batch-job.png" align="middle"></div><div class="sect3"><div class="titlepage"><div><div><h4 class="title"><a name="transforming-a-file-into-a-joblaunchrequest" href="#transforming-a-file-into-a-joblaunchrequest"></a>Transforming a file into a JobLaunchRequest</h4></div></div></div><pre class="programlisting"><span class="hl-keyword">package</span> io.spring.sbi;
|
|
|
|
<span class="hl-keyword">import</span> org.springframework.batch.core.Job;
|
|
<span class="hl-keyword">import</span> org.springframework.batch.core.JobParametersBuilder;
|
|
<span class="hl-keyword">import</span> org.springframework.batch.integration.launch.JobLaunchRequest;
|
|
<span class="hl-keyword">import</span> org.springframework.integration.annotation.Transformer;
|
|
<span class="hl-keyword">import</span> org.springframework.messaging.Message;
|
|
|
|
<span class="hl-keyword">import</span> java.io.File;
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">class</span> FileMessageToJobRequest {
|
|
<span class="hl-keyword">private</span> Job job;
|
|
<span class="hl-keyword">private</span> String fileParameterName;
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> setFileParameterName(String fileParameterName) {
|
|
<span class="hl-keyword">this</span>.fileParameterName = fileParameterName;
|
|
}
|
|
|
|
<span class="hl-keyword">public</span> <span class="hl-keyword">void</span> setJob(Job job) {
|
|
<span class="hl-keyword">this</span>.job = job;
|
|
}
|
|
|
|
<em><span class="hl-annotation" style="color: gray">@Transformer</span></em>
|
|
<span class="hl-keyword">public</span> JobLaunchRequest toRequest(Message<File> message) {
|
|
JobParametersBuilder jobParametersBuilder =
|
|
<span class="hl-keyword">new</span> JobParametersBuilder();
|
|
|
|
jobParametersBuilder.addString(fileParameterName,
|
|
message.getPayload().getAbsolutePath());
|
|
|
|
<span class="hl-keyword">return</span> <span class="hl-keyword">new</span> JobLaunchRequest(job, jobParametersBuilder.toJobParameters());
|
|
}
|
|
}</pre></div><div class="sect3"><div class="titlepage"><div><div><h4 class="title"><a name="the-jobexecution-response" href="#the-jobexecution-response"></a>The JobExecution Response</h4></div></div></div><p>
|
|
When a Batch Job is being executed, a
|
|
<code class="classname">JobExecution</code> instance is returned. This
|
|
instance can be used to determine the status of an execution. If
|
|
a <code class="classname">JobExecution</code> was able to be created
|
|
successfully, it will always be returned, regardless of whether
|
|
or not the actual execution was successful.
|
|
</p><p>
|
|
The exact behavior on how the <code class="classname">JobExecution</code>
|
|
instance is returned depends on the provided
|
|
<code class="classname">TaskExecutor</code>. If a
|
|
<code class="classname">synchronous</code> (single-threaded)
|
|
<code class="classname">TaskExecutor</code> implementation is used, the
|
|
<code class="classname">JobExecution</code> response is only returned
|
|
<code class="classname">after</code> the job completes. When using an
|
|
<code class="classname">asynchronous</code>
|
|
<code class="classname">TaskExecutor</code>, the
|
|
<code class="classname">JobExecution</code> instance is returned
|
|
immediately. Users can then take the <code class="classname">id</code> of
|
|
<code class="classname">JobExecution</code> instance
|
|
(<code class="classname">JobExecution.getJobId()</code>) and query the
|
|
<code class="classname">JobRepository</code> for the job's updated status
|
|
using the <code class="classname">JobExplorer</code>. For more
|
|
information, please refer to the <code class="classname">Spring
|
|
Batch</code> reference documentation on
|
|
<a class="ulink" href="http://docs.spring.io/spring-batch/reference/html/configureJob.html#queryingRepository" target="_top">Querying
|
|
the Repository</a>.
|
|
</p><p>
|
|
The following configuration will create a file
|
|
<code class="classname">inbound-channel-adapter</code> to listen for CSV
|
|
files in the provided directory, hand them off to our
|
|
transformer (<code class="classname">FileMessageToJobRequest</code>),
|
|
launch the job via the <span class="emphasis"><em>Job Launching
|
|
Gateway</em></span> then simply log the output of the
|
|
<code class="classname">JobExecution</code> via the
|
|
<code class="classname">logging-channel-adapter</code>.
|
|
</p></div><div class="sect3"><div class="titlepage"><div><div><h4 class="title"><a name="spring-batch-integration-configuration" href="#spring-batch-integration-configuration"></a>Spring Batch Integration Configuration</h4></div></div></div><pre class="programlisting"><span class="hl-tag"><int:channel</span> <span class="hl-attribute">id</span>=<span class="hl-value">"inboundFileChannel"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><int:channel</span> <span class="hl-attribute">id</span>=<span class="hl-value">"outboundJobRequestChannel"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><int:channel</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jobLaunchReplyChannel"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><int-file:inbound-channel-adapter</span> <span class="hl-attribute">id</span>=<span class="hl-value">"filePoller"</span>
|
|
<span class="hl-attribute">channel</span>=<span class="hl-value">"inboundFileChannel"</span>
|
|
<span class="hl-attribute">directory</span>=<span class="hl-value">"file:/tmp/myfiles/"</span>
|
|
<span class="hl-attribute">filename-pattern</span>=<span class="hl-value">"*.csv"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><int:poller</span> <span class="hl-attribute">fixed-rate</span>=<span class="hl-value">"1000"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></int-file:inbound-channel-adapter></span>
|
|
|
|
<span class="hl-tag"><int:transformer</span> <span class="hl-attribute">input-channel</span>=<span class="hl-value">"inboundFileChannel"</span>
|
|
<span class="hl-attribute">output-channel</span>=<span class="hl-value">"outboundJobRequestChannel"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"io.spring.sbi.FileMessageToJobRequest"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"job"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"personJob"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"fileParameterName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"input.file.name"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></int:transformer></span>
|
|
|
|
<span class="hl-tag"><batch-int:job-launching-gateway</span> <span class="hl-attribute">request-channel</span>=<span class="hl-value">"outboundJobRequestChannel"</span>
|
|
<span class="hl-attribute">reply-channel</span>=<span class="hl-value">"jobLaunchReplyChannel"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><int:logging-channel-adapter</span> <span class="hl-attribute">channel</span>=<span class="hl-value">"jobLaunchReplyChannel"</span><span class="hl-tag">/></span></pre><p>
|
|
Now that we are polling for files and launching jobs, we need to
|
|
configure for example our Spring Batch
|
|
<code class="classname">ItemReader</code> to utilize found file
|
|
represented by the job parameter "input.file.name":
|
|
</p></div><div class="sect3"><div class="titlepage"><div><div><h4 class="title"><a name="example-itemreader-configuration" href="#example-itemreader-configuration"></a>Example ItemReader Configuration</h4></div></div></div><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.item.file.FlatFileItemReader"</span>
|
|
<span class="hl-attribute">scope</span>=<span class="hl-value">"step"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"resource"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"file://#{jobParameters['input.file.name']}"</span><span class="hl-tag">/></span>
|
|
...
|
|
<span class="hl-tag"></bean></span></pre><p>
|
|
The main points of interest here are injecting the value of
|
|
<code class="classname">#{jobParameters['input.file.name']}</code>
|
|
as the Resource property value and setting the ItemReader bean
|
|
to be of <span class="emphasis"><em>Step scope</em></span> to take advantage of
|
|
the late binding support which allows access to the
|
|
<code class="classname">jobParameters</code> variable.
|
|
</p><div class="sect4"><div class="titlepage"><div><div><h5 class="title"><a name="available-attributes-of-the-job-launching-gateway" href="#available-attributes-of-the-job-launching-gateway"></a>Available Attributes of the Job-Launching Gateway</h5></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
|
|
<code class="classname">id</code> Identifies the
|
|
underlying Spring bean definition, which is an instance of
|
|
either:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p>
|
|
<code class="classname">EventDrivenConsumer</code>
|
|
</p></li><li class="listitem"><p>
|
|
<code class="classname">PollingConsumer</code>
|
|
</p></li></ul></div><p>
|
|
The exact implementation depends on whether the component's
|
|
input channel is a:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p>
|
|
<code class="classname">SubscribableChannel</code> or
|
|
</p></li><li class="listitem"><p>
|
|
<code class="classname">PollableChannel</code>
|
|
</p></li></ul></div></li></ul></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
|
|
<code class="classname">auto-startup</code>
|
|
Boolean flag to indicate that the endpoint should start automatically on
|
|
startup. The default is <span class="emphasis"><em>true</em></span>.
|
|
</p></li><li class="listitem"><p>
|
|
<code class="classname">request-channel</code>
|
|
The input <code class="classname">MessageChannel</code> of this endpoint.
|
|
</p></li><li class="listitem"><p>
|
|
<code class="classname">reply-channel</code> <code class="classname">Message Channel</code>
|
|
to which the resulting <code class="classname">JobExecution</code> payload will be sent.
|
|
</p></li><li class="listitem"><p>
|
|
<code class="classname">reply-timeout</code>
|
|
Allows you to specify how long this gateway will wait for the reply message
|
|
to be sent successfully to the reply channel before throwing
|
|
an exception. This attribute only applies when the channel
|
|
might block, for example when using a bounded queue channel
|
|
that is currently full. Also, keep in mind that when sending to a
|
|
<code class="classname">DirectChannel</code>, the invocation will occur
|
|
in the sender's thread. Therefore, the failing of the send
|
|
operation may be caused by other components further downstream.
|
|
The <code class="classname">reply-timeout</code> attribute maps to the
|
|
<code class="classname">sendTimeout</code> property of the underlying
|
|
<code class="classname">MessagingTemplate</code> instance. The attribute
|
|
will default, if not specified, to<span class="emphasis"><em>-1</em></span>,
|
|
meaning that by default, the Gateway will wait indefinitely.
|
|
The value is specified in milliseconds.
|
|
</p></li><li class="listitem"><p>
|
|
<code class="classname">job-launcher</code>
|
|
Pass in a
|
|
custom
|
|
<code class="classname">JobLauncher</code>
|
|
bean reference. This
|
|
attribute is optional. If not specified the adapter will
|
|
re-use the instance that is registered under the id
|
|
<code class="classname">jobLauncher</code>. If no default instance
|
|
exists an exception is thrown.
|
|
</p></li><li class="listitem"><p>
|
|
<code class="classname">order</code>
|
|
Specifies the order
|
|
for invocation when this endpoint is connected as a subscriber
|
|
to a <code class="classname">SubscribableChannel</code>.
|
|
</p></li></ul></div></div><div class="sect4"><div class="titlepage"><div><div><h5 class="title"><a name="sub-elements" href="#sub-elements"></a>Sub-Elements</h5></div></div></div><p>
|
|
When this Gateway is receiving messages from a
|
|
<code class="classname">PollableChannel</code>, you must either provide
|
|
a global default Poller or provide a Poller sub-element to the
|
|
<code class="classname">Job Launching Gateway</code>:
|
|
</p><pre class="programlisting"><span class="hl-tag"><batch-int:job-launching-gateway</span> <span class="hl-attribute">request-channel</span>=<span class="hl-value">"queueChannel"</span>
|
|
<span class="hl-attribute">reply-channel</span>=<span class="hl-value">"replyChannel"</span> <span class="hl-attribute">job-launcher</span>=<span class="hl-value">"jobLauncher"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><int:poller</span> <span class="hl-attribute">fixed-rate</span>=<span class="hl-value">"1000"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></batch-int:job-launching-gateway></span></pre></div></div></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="providing-feedback-with-informational-messages" href="#providing-feedback-with-informational-messages"></a>13.1.3. Providing Feedback with Informational Messages</h3></div></div></div><p>
|
|
As Spring Batch jobs can run for long times, providing progress
|
|
information will be critical. For example, stake-holders may want
|
|
to be notified if a some or all parts of a Batch Job has failed.
|
|
Spring Batch provides support for this information being gathered
|
|
through:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
|
|
Active polling or
|
|
</p></li><li class="listitem"><p>
|
|
Event-driven, using listeners.
|
|
</p></li></ul></div><p>
|
|
When starting a Spring Batch job asynchronously, e.g. by using the
|
|
<code class="classname">Job Launching Gateway</code>, a
|
|
<code class="classname">JobExecution</code> instance is returned. Thus,
|
|
<code class="classname">JobExecution.getJobId()</code> can be used to
|
|
continuously poll for status updates by retrieving updated
|
|
instances of the <code class="classname">JobExecution</code> from the
|
|
<code class="classname">JobRepository</code> using the
|
|
<code class="classname">JobExplorer</code>. However, this is considered
|
|
sub-optimal and an event-driven approach should be preferred.
|
|
</p><p>
|
|
Therefore, Spring Batch provides listeners such as:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
|
|
StepListener
|
|
</p></li><li class="listitem"><p>
|
|
ChunkListener
|
|
</p></li><li class="listitem"><p>
|
|
JobExecutionListener
|
|
</p></li></ul></div><p>
|
|
In the following example, a Spring Batch job was configured with a
|
|
<code class="classname">StepExecutionListener</code>. Thus, Spring
|
|
Integration will receive and process any step before/after step
|
|
events. For example, the received
|
|
<code class="classname">StepExecution</code> can be inspected using a
|
|
<code class="classname">Router</code>. Based on the results of that
|
|
inspection, various things can occur for example routing a message
|
|
to a Mail Outbound Channel Adapter, so that an Email notification
|
|
can be sent out based on some condition.
|
|
</p><div class="mediaobject" align="center"><img src="images/handling-informational-messages.png" align="middle"></div><p>
|
|
Below is an example of how a listener is configured to send a
|
|
message to a <code class="classname">Gateway</code> for
|
|
<code class="classname">StepExecution</code> events and log its output to a
|
|
<code class="classname">logging-channel-adapter</code>:
|
|
</p><p>
|
|
First create the notifications integration beans:
|
|
</p><pre class="programlisting"><span class="hl-tag"><int:channel</span> <span class="hl-attribute">id</span>=<span class="hl-value">"stepExecutionsChannel"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><int:gateway</span> <span class="hl-attribute">id</span>=<span class="hl-value">"notificationExecutionsListener"</span>
|
|
<span class="hl-attribute">service-interface</span>=<span class="hl-value">"org.springframework.batch.core.StepExecutionListener"</span>
|
|
<span class="hl-attribute">default-request-channel</span>=<span class="hl-value">"stepExecutionsChannel"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><int:logging-channel-adapter</span> <span class="hl-attribute">channel</span>=<span class="hl-value">"stepExecutionsChannel"</span><span class="hl-tag">/></span></pre><p>
|
|
Then modify your job to add a step level listener:
|
|
</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"importPayments"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet</span> <span class="hl-attribute">../></span>
|
|
<span class="hl-attribute"><chunk</span> <span class="hl-attribute">../></span>
|
|
<span class="hl-attribute"><listeners></span>
|
|
<span class="hl-attribute"><listener</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"notificationExecutionsListener"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></listeners></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
...
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span></pre></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="asynchronous-processors" href="#asynchronous-processors"></a>13.1.4. Asynchronous Processors</h3></div></div></div><p>
|
|
Asynchronous Processors help you to to scale the processing of
|
|
items. In the asynchronous processor use-case, an
|
|
<code class="classname">AsyncItemProcessor</code> serves as a dispatcher,
|
|
executing the <code class="classname">ItemProcessor</code>'s logic for an
|
|
item on a new thread. The <code class="classname">Future</code> is passed to
|
|
the <code class="classname">AsynchItemWriter</code> to be written once the
|
|
processor completes.
|
|
</p><p>
|
|
Therefore, you can increase performance by using asynchronous item
|
|
processing, basically allowing you to implement
|
|
<span class="emphasis"><em>fork-join</em></span> scenarios. The
|
|
<code class="classname">AsyncItemWriter</code> will gather the results and
|
|
write back the chunk as soon as all the results become available.
|
|
</p><p>
|
|
Configuration of both the <code class="classname">AsyncItemProcessor</code>
|
|
and <code class="classname">AsyncItemWriter</code> are simple, first the
|
|
<code class="classname">AsyncItemProcessor</code>:
|
|
</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"processor"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.integration.async.AsyncItemProcessor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"delegate"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"your.ItemProcessor"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"taskExecutor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.core.task.SimpleAsyncTaskExecutor"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>
|
|
The property "<code class="classname">delegate</code>" is actually
|
|
a reference to your <code class="classname">ItemProcessor</code> bean and
|
|
the "<code class="classname">taskExecutor</code>" property is a
|
|
reference to the <code class="classname">TaskExecutor</code> of your choice.
|
|
</p><p>
|
|
Then we configure the <code class="classname">AsyncItemWriter</code>:
|
|
</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.integration.async.AsyncItemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"delegate"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"your.ItemWriter"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>
|
|
Again, the property "<code class="classname">delegate</code>" is
|
|
actually a reference to your <code class="classname">ItemWriter</code> bean.
|
|
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="externalizing-batch-process-execution" href="#externalizing-batch-process-execution"></a>13.1.5. Externalizing Batch Process Execution</h3></div></div></div><p>
|
|
The integration approaches discussed so far suggest use-cases
|
|
where Spring Integration wraps Spring Batch like an outer-shell.
|
|
However, Spring Batch can also use Spring Integration internally.
|
|
Using this approach, Spring Batch users can delegate the
|
|
processing of items or even chunks to outside processes. This
|
|
allows you to offload complex processing. Spring Batch Integration
|
|
provides dedicated support for:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
|
|
Remote Chunking
|
|
</p></li><li class="listitem"><p>
|
|
Remote Partitioning
|
|
</p></li></ul></div><div class="sect3"><div class="titlepage"><div><div><h4 class="title"><a name="remote-chunking" href="#remote-chunking"></a>Remote Chunking</h4></div></div></div><div class="mediaobject" align="center"><img src="images/remote-chunking-sbi.png" align="middle"></div><p>
|
|
Taking things one step further, one can also externalize the
|
|
chunk processing using the
|
|
<code class="classname">ChunkMessageChannelItemWriter</code> which is
|
|
provided by Spring Batch Integration which will send items out
|
|
and collect the result. Once sent, Spring Batch will continue the
|
|
process of reading and grouping items, without waiting for the results.
|
|
Rather it is the responsibility of the <code class="classname">ChunkMessageChannelItemWriter</code>
|
|
to gather the results and integrate them back into the Spring Batch process.
|
|
</p><p>
|
|
Using Spring Integration you have full
|
|
control over the concurrency of your processes, for instance by
|
|
using a <code class="classname">QueueChannel</code> instead of a
|
|
<code class="classname">DirectChannel</code>. Furthermore, by relying on
|
|
Spring Integration's rich collection of Channel Adapters (E.g.
|
|
JMS or AMQP), you can distribute chunks of a Batch job to
|
|
external systems for processing.
|
|
</p><p>
|
|
A simple job with a step to be remotely chunked would have a
|
|
configuration similar to the following:
|
|
</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"personJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><tasklet></span>
|
|
<span class="hl-tag"><chunk</span> <span class="hl-attribute">reader</span>=<span class="hl-value">"itemReader"</span> <span class="hl-attribute">writer</span>=<span class="hl-value">"itemWriter"</span> <span class="hl-attribute">commit-interval</span>=<span class="hl-value">"200"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></tasklet></span>
|
|
...
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span></pre><p>
|
|
The ItemReader reference would point to the bean you would like
|
|
to use for reading data on the master. The ItemWriter reference
|
|
points to a special ItemWriter
|
|
"<code class="classname">ChunkMessageChannelItemWriter</code>"
|
|
as described above. The processor (if any) is left off the
|
|
master configuration as it is configured on the slave. The
|
|
following configuration provides a basic master setup. It's
|
|
advised to check any additional component properties such as
|
|
throttle limits and so on when implementing your use case.
|
|
</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"connectionFactory"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.apache.activemq.ActiveMQConnectionFactory"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"brokerURL"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"tcp://localhost:61616"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><int-jms:outbound-channel-adapter</span> <span class="hl-attribute">id</span>=<span class="hl-value">"requests"</span> <span class="hl-attribute">destination-name</span>=<span class="hl-value">"requests"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"messagingTemplate"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.integration.core.MessagingTemplate"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"defaultChannel"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"requests"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"receiveTimeout"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"2000"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"itemWriter"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.integration.chunk.ChunkMessageChannelItemWriter"</span>
|
|
<span class="hl-attribute">scope</span>=<span class="hl-value">"step"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"messagingOperations"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"messagingTemplate"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"replyChannel"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"replies"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"chunkHandler"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.integration.chunk.RemoteChunkHandlerFactoryBean"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"chunkWriter"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"itemWriter"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"step"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"step1"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><int:channel</span> <span class="hl-attribute">id</span>=<span class="hl-value">"replies"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><int:queue/></span>
|
|
<span class="hl-tag"></int:channel></span>
|
|
|
|
<span class="hl-tag"><int-jms:message-driven-channel-adapter</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jmsReplies"</span>
|
|
<span class="hl-attribute">destination-name</span>=<span class="hl-value">"replies"</span>
|
|
<span class="hl-attribute">channel</span>=<span class="hl-value">"replies"</span><span class="hl-tag">/></span></pre><p>
|
|
This configuration provides us with a number of beans. We
|
|
configure our messaging middleware using ActiveMQ and
|
|
inbound/outbound JMS adapters provided by Spring Integration. As
|
|
shown, our <code class="classname">itemWriter</code> bean which is
|
|
referenced by our job step utilizes the
|
|
<code class="classname">ChunkMessageChannelItemWriter</code> for writing chunks over the
|
|
configured middleware.
|
|
</p><p>
|
|
Now lets move on to the slave configuration:
|
|
</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"connectionFactory"</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.apache.activemq.ActiveMQConnectionFactory"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"brokerURL"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"tcp://localhost:61616"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><int:channel</span> <span class="hl-attribute">id</span>=<span class="hl-value">"requests"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><int:channel</span> <span class="hl-attribute">id</span>=<span class="hl-value">"replies"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><int-jms:message-driven-channel-adapter</span> <span class="hl-attribute">id</span>=<span class="hl-value">"jmsIn"</span>
|
|
<span class="hl-attribute">destination-name</span>=<span class="hl-value">"requests"</span>
|
|
<span class="hl-attribute">channel</span>=<span class="hl-value">"requests"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><int-jms:outbound-channel-adapter</span> <span class="hl-attribute">id</span>=<span class="hl-value">"outgoingReplies"</span>
|
|
<span class="hl-attribute">destination-name</span>=<span class="hl-value">"replies"</span>
|
|
<span class="hl-attribute">channel</span>=<span class="hl-value">"replies"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"></int-jms:outbound-channel-adapter></span>
|
|
|
|
<span class="hl-tag"><int:service-activator</span> <span class="hl-attribute">id</span>=<span class="hl-value">"serviceActivator"</span>
|
|
<span class="hl-attribute">input-channel</span>=<span class="hl-value">"requests"</span>
|
|
<span class="hl-attribute">output-channel</span>=<span class="hl-value">"replies"</span>
|
|
<span class="hl-attribute">ref</span>=<span class="hl-value">"chunkProcessorChunkHandler"</span>
|
|
<span class="hl-attribute">method</span>=<span class="hl-value">"handleChunk"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"chunkProcessorChunkHandler"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.integration.chunk.ChunkProcessorChunkHandler"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"chunkProcessor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.core.step.item.SimpleChunkProcessor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"itemWriter"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"io.spring.sbi.PersonItemWriter"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"itemProcessor"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"io.spring.sbi.PersonItemProcessor"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span></pre><p>
|
|
Most of these configuration items should look familiar from the
|
|
master configuration. Slaves do not need access to things like
|
|
the Spring Batch <code class="classname">JobRepository</code> nor access
|
|
to the actual job configuration file. The main bean of interest
|
|
is the
|
|
"<code class="classname">chunkProcessorChunkHandler</code>". The
|
|
<code class="classname">chunkProcessor</code> property of
|
|
<code class="classname">ChunkProcessorChunkHandler</code> takes a
|
|
configured <code class="classname">SimpleChunkProcessor</code> which is
|
|
where you would provide a reference to your
|
|
<code class="classname">ItemWriter</code> and optionally your
|
|
<code class="classname">ItemProcessor</code> that will run on the slave
|
|
when it receives chunks from the master.
|
|
</p><p>
|
|
For more information, please also consult the Spring Batch
|
|
manual, specifically the chapter on
|
|
<a class="ulink" href="http://docs.spring.io/spring-batch/reference/html/scalability.html#remoteChunking" target="_top">Remote
|
|
Chunking</a>.
|
|
</p></div><div class="sect3"><div class="titlepage"><div><div><h4 class="title"><a name="remote-partitioning" href="#remote-partitioning"></a>Remote Partitioning</h4></div></div></div><div class="mediaobject" align="center"><img src="images/remote-partitioning.png" align="middle"></div><p>
|
|
Remote Partitioning, on the other hand, is useful when the
|
|
problem is not the processing of items, but the associated I/O
|
|
represents the bottleneck. Using Remote Partitioning, work can
|
|
be farmed out to slaves that execute complete Spring Batch
|
|
steps. Thus, each slave has its own
|
|
<code class="classname">ItemReader</code>,
|
|
<code class="classname">ItemProcessor</code> and
|
|
<code class="classname">ItemWriter</code>. For this purpose, Spring Batch
|
|
Integration provides the
|
|
<code class="classname">MessageChannelPartitionHandler</code>.
|
|
</p><p>
|
|
This implementation of the <code class="classname">PartitionHandler</code>
|
|
interface uses <code class="classname">MessageChannel</code> instances to
|
|
send instructions to remote workers and receive their responses.
|
|
This provides a nice abstraction from the transports (E.g. JMS
|
|
or AMQP) being used to communicate with the remote workers.
|
|
</p><p>
|
|
The reference manual section
|
|
<a class="ulink" href="http://docs.spring.io/spring-batch/reference/html/scalability.html#partitioning" target="_top">Remote
|
|
Partitioning</a> provides an overview of the concepts and
|
|
components needed to configure Remote Partitioning and shows an
|
|
example of using the default
|
|
<code class="classname">TaskExecutorPartitionHandler</code> to partition
|
|
in separate local threads of execution. For Remote Partitioning
|
|
to multiple JVM's, two additional components are required:
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
|
|
Remoting fabric or grid environment
|
|
</p></li><li class="listitem"><p>
|
|
A PartitionHandler implementation that supports the desired
|
|
remoting fabric or grid environment
|
|
</p></li></ul></div><p>
|
|
Similar to Remote Chunking JMS can be used as the "remoting
|
|
fabric" and the PartitionHandler implementation to be used
|
|
as described above is the
|
|
<code class="classname">MessageChannelPartitionHandler</code>. The example
|
|
shown below assumes an existing partitioned job and focuses on
|
|
the <code class="classname">MessageChannelPartitionHandler</code> and JMS
|
|
configuration:
|
|
</p><pre class="programlisting"><span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"partitionHandler"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.integration.partition.MessageChannelPartitionHandler"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"stepName"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"step1"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"gridSize"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"3"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"replyChannel"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"outbound-replies"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"messagingOperations"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.integration.core.MessagingTemplate"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"defaultChannel"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"outbound-requests"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"receiveTimeout"</span> <span class="hl-attribute">value</span>=<span class="hl-value">"100000"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span>
|
|
<span class="hl-tag"></property></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><int:channel</span> <span class="hl-attribute">id</span>=<span class="hl-value">"outbound-requests"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><int-jms:outbound-channel-adapter</span> <span class="hl-attribute">destination</span>=<span class="hl-value">"requestsQueue"</span>
|
|
<span class="hl-attribute">channel</span>=<span class="hl-value">"outbound-requests"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><int:channel</span> <span class="hl-attribute">id</span>=<span class="hl-value">"inbound-requests"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><int-jms:message-driven-channel-adapter</span> <span class="hl-attribute">destination</span>=<span class="hl-value">"requestsQueue"</span>
|
|
<span class="hl-attribute">channel</span>=<span class="hl-value">"inbound-requests"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"stepExecutionRequestHandler"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.integration.partition.StepExecutionRequestHandler"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"jobExplorer"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"jobExplorer"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><property</span> <span class="hl-attribute">name</span>=<span class="hl-value">"stepLocator"</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"stepLocator"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"></bean></span>
|
|
|
|
<span class="hl-tag"><int:service-activator</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"stepExecutionRequestHandler"</span> <span class="hl-attribute">input-channel</span>=<span class="hl-value">"inbound-requests"</span>
|
|
<span class="hl-attribute">output-channel</span>=<span class="hl-value">"outbound-staging"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><int:channel</span> <span class="hl-attribute">id</span>=<span class="hl-value">"outbound-staging"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><int-jms:outbound-channel-adapter</span> <span class="hl-attribute">destination</span>=<span class="hl-value">"stagingQueue"</span>
|
|
<span class="hl-attribute">channel</span>=<span class="hl-value">"outbound-staging"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><int:channel</span> <span class="hl-attribute">id</span>=<span class="hl-value">"inbound-staging"</span><span class="hl-tag">/></span>
|
|
<span class="hl-tag"><int-jms:message-driven-channel-adapter</span> <span class="hl-attribute">destination</span>=<span class="hl-value">"stagingQueue"</span>
|
|
<span class="hl-attribute">channel</span>=<span class="hl-value">"inbound-staging"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><int:aggregator</span> <span class="hl-attribute">ref</span>=<span class="hl-value">"partitionHandler"</span> <span class="hl-attribute">input-channel</span>=<span class="hl-value">"inbound-staging"</span>
|
|
<span class="hl-attribute">output-channel</span>=<span class="hl-value">"outbound-replies"</span><span class="hl-tag">/></span>
|
|
|
|
<span class="hl-tag"><int:channel</span> <span class="hl-attribute">id</span>=<span class="hl-value">"outbound-replies"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><int:queue/></span>
|
|
<span class="hl-tag"></int:channel></span>
|
|
|
|
<span class="hl-tag"><bean</span> <span class="hl-attribute">id</span>=<span class="hl-value">"stepLocator"</span>
|
|
<span class="hl-attribute">class</span>=<span class="hl-value">"org.springframework.batch.integration.partition.BeanFactoryStepLocator"</span><span class="hl-tag"> /></span></pre><p>
|
|
Also ensure the partition <code class="classname">handler</code> attribute
|
|
maps to the <code class="classname">partitionHandler</code> bean:
|
|
</p><pre class="programlisting"><span class="hl-tag"><job</span> <span class="hl-attribute">id</span>=<span class="hl-value">"personJob"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><step</span> <span class="hl-attribute">id</span>=<span class="hl-value">"step1.master"</span><span class="hl-tag">></span>
|
|
<span class="hl-tag"><partition</span> <span class="hl-attribute">partitioner</span>=<span class="hl-value">"partitioner"</span> <span class="hl-attribute">handler</span>=<span class="hl-value">"partitionHandler"</span><span class="hl-tag">/></span>
|
|
...
|
|
<span class="hl-tag"></step></span>
|
|
<span class="hl-tag"></job></span></pre></div></div></div></div>
|
|
|
|
<div class="appendix"><div class="titlepage"><div><div><h1 class="title"><a name="listOfReadersAndWriters" href="#listOfReadersAndWriters"></a>Appendix A. List of ItemReaders and ItemWriters</h1></div></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="itemReadersAppendix" href="#itemReadersAppendix"></a>A.1 Item Readers</h2></div></div></div><div class="table"><a name="d5e4408" href="#d5e4408"></a><p class="title"><b>Table A.1. Available Item Readers</b></p><div class="table-contents"><table summary="Available Item Readers" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col align="center"><col></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="center">Item Reader</th><th style="border-bottom: 0.5pt solid ; " align="center">Description</th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">AbstractItemCountingItemStreamItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Abstract base class that provides basic
|
|
restart capabilities by counting the number of items returned from
|
|
an <code class="classname">ItemReader</code>.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">AggregateItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">An ItemReader that delivers a list as its
|
|
item, storing up objects from the injected ItemReader until they
|
|
are ready to be packed out as a collection. This ItemReader should
|
|
mark the beginning and end of records with the constant values in
|
|
FieldSetMapper AggregateItemReader#<span class="bold"><strong>BEGIN_RECORD</strong></span> and
|
|
AggregateItemReader#<span class="bold"><strong>END_RECORD</strong></span></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">AmqpItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Given a Spring AmqpTemplate it provides
|
|
synchronous receive methods. The receiveAndConvert() method
|
|
lets you receive POJO objects. </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">FlatFileItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Reads from a flat file. Includes ItemStream
|
|
and Skippable functionality. See section on Read from a
|
|
File</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">HibernateCursorItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Reads from a cursor based on an HQL query. See
|
|
section on Reading from a Database</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">HibernatePagingItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Reads from a paginated HQL query</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">IbatisPagingItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Reads via iBATIS based on a query. Pages
|
|
through the rows so that large datasets can be read without
|
|
running out of memory. See HOWTO - Read from a Database. This
|
|
ItemReader is now deprecated as of Spring Batch 3.0.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">ItemReaderAdapter</td><td style="border-bottom: 0.5pt solid ; " align="left">Adapts any class to the
|
|
<code class="classname">ItemReader</code> interface.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">JdbcCursorItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Reads from a database cursor via JDBC. See
|
|
HOWTO - Read from a Database</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">JdbcPagingItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Given a SQL statement, pages through the rows,
|
|
such that large datasets can be read without running out of
|
|
memory</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">JmsItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Given a Spring JmsOperations object and a JMS
|
|
Destination or destination name to send errors, provides items
|
|
received through the injected JmsOperations receive()
|
|
method</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">JpaPagingItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Given a JPQL statement, pages through the
|
|
rows, such that large datasets can be read without running out of
|
|
memory</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">ListItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Provides the items from a list, one at a
|
|
time</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">MongoItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Given a MongoOperations object and JSON based MongoDB
|
|
query, provides items received from the MongoOperations find method</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">Neo4jItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Given a Neo4jOperations object and the components of a
|
|
Cyhper query, items are returned as the result of the Neo4jOperations.query
|
|
method</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">RepositoryItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Given a Spring Data PagingAndSortingRepository object,
|
|
a Sort and the name of method to execute, returns items provided by the
|
|
Spring Data repository implementation</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">StoredProcedureItemReader</td><td style="border-bottom: 0.5pt solid ; " align="left">Reads from a database cursor resulting from the
|
|
execution of a database stored procedure. See HOWTO - Read from a
|
|
Database</td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">StaxEventItemReader</td><td style="" align="left">Reads via StAX. See HOWTO - Read from a
|
|
File</td></tr></tbody></table></div></div><br class="table-break"></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="itemWritersAppendix" href="#itemWritersAppendix"></a>A.2 Item Writers</h2></div></div></div><div class="table"><a name="d5e4477" href="#d5e4477"></a><p class="title"><b>Table A.2. Available Item Writers</b></p><div class="table-contents"><table summary="Available Item Writers" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col align="center"><col></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="center">Item Writer</th><th style="border-bottom: 0.5pt solid ; " align="center">Description</th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">AbstractItemStreamItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">Abstract base class that combines the
|
|
<code class="classname">ItemStream</code> and
|
|
<code class="classname">ItemWriter</code> interfaces.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">AmqpItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">Given a Spring AmqpTemplate it provides
|
|
for synchronous send method. The convertAndSend(Object)
|
|
method lets you send POJO objects. </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">CompositeItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">Passes an item to the process method of each
|
|
in an injected <span class="bold"><strong>List</strong></span> of <span class="bold"><strong>ItemWriter</strong></span> objects</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">FlatFileItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">Writes to a flat file. Includes ItemStream and
|
|
Skippable functionality. See section on Writing to a File</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">GemfireItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">Using a GemfireOperations object, items wre either written
|
|
or removed from the Gemfire instance based on the configuration of the delete
|
|
flag</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">HibernateItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">This item writer is hibernate session aware
|
|
and handles some transaction-related work that a non-"hibernate
|
|
aware" item writer would not need to know about and then delegates
|
|
to another item writer to do the actual writing.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">IbatisBatchItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">Writes items in a batch using the iBatis API's
|
|
directly. This ItemWriter is deprecated as of Spring Batch 3.0.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">ItemWriterAdapter</td><td style="border-bottom: 0.5pt solid ; " align="left">Adapts any class to the
|
|
<code class="classname">ItemWriter</code> interface.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">JdbcBatchItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">Uses batching features from a
|
|
<code class="classname">PreparedStatement</code>, if available, and can
|
|
take rudimentary steps to locate a failure during a
|
|
<code class="methodname">flush</code>.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">JmsItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">Using a JmsOperations object, items are written
|
|
to the default queue via the JmsOperations.convertAndSend() method</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">JpaItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">This item writer is JPA EntityManager aware
|
|
and handles some transaction-related work that a non-"jpa aware"
|
|
<code class="classname">ItemWriter</code> would not need to know about and
|
|
then delegates to another writer to do the actual writing.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">MimeMessageItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">Using Spring's JavaMailSender, items of type <code class="classname">MimeMessage</code>
|
|
are sent as mail messages</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">MongoItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">Given a MongoOperations object, items are written
|
|
via the MongoOperations.save(Object) method. The actual write is delayed
|
|
until the last possible moment before the transaction commits.</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">Neo4jItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">Given a Neo4jOperations object, items are persisted via the
|
|
save(Object) method or deleted via the delete(Object) per the
|
|
<code class="classname">ItemWriter</code>'s configuration</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">PropertyExtractingDelegatingItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">Extends AbstractMethodInvokingDelegator
|
|
creating arguments on the fly. Arguments are created by retrieving
|
|
the values from the fields in the item to be processed (via a
|
|
SpringBeanWrapper) based on an injected array of field
|
|
name</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">RepositoryItemWriter</td><td style="border-bottom: 0.5pt solid ; " align="left">Given a Spring Data CrudRepository implementation,
|
|
items are saved via the method specified in the configuration.</td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">StaxEventItemWriter</td><td style="" align="left">Uses an <span class="bold"><strong>ObjectToXmlSerializer</strong></span> implementation to
|
|
convert each item to XML and then writes it to an XML file using
|
|
StAX.</td></tr></tbody></table></div></div><br class="table-break"></div></div>
|
|
|
|
<div class="appendix"><div class="titlepage"><div><div><h1 class="title"><a name="metaDataSchema" href="#metaDataSchema"></a>Appendix B. Meta-Data Schema</h1></div></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="metaDataSchemaOverview" href="#metaDataSchemaOverview"></a>B.1 Overview</h2></div></div></div><p>The Spring Batch Meta-Data tables very closely match the Domain
|
|
objects that represent them in Java. For example,
|
|
<code class="classname">JobInstance</code>, <code class="classname">JobExecution</code>,
|
|
<code class="classname">JobParameters</code>, and
|
|
<code class="classname">StepExecution</code> map to BATCH_JOB_INSTANCE,
|
|
BATCH_JOB_EXECUTION, BATCH_JOB_EXECUTION_PARAMS, and BATCH_STEP_EXECUTION,
|
|
respectively. <code class="classname">ExecutionContext</code> maps to both
|
|
BATCH_JOB_EXECUTION_CONTEXT and BATCH_STEP_EXECUTION_CONTEXT. The
|
|
<code class="classname">JobRepository</code> is responsible for saving and storing
|
|
each Java object into its correct table. The following appendix describes
|
|
the meta-data tables in detail, along with many of the design decisions
|
|
that were made when creating them. When viewing the various table creation
|
|
statements below, it is important to realize that the data types used are
|
|
as generic as possible. Spring Batch provides many schemas as examples,
|
|
which all have varying data types due to variations in individual database
|
|
vendors' handling of data types. Below is an ERD model of all 6 tables and
|
|
their relationships to one another:</p><div class="mediaobject" align="center"><img src="images/meta-data-erd.png" align="middle"></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="exampleDDLScripts" href="#exampleDDLScripts"></a>B.1.1 Example DDL Scripts</h3></div></div></div><p>The Spring Batch Core JAR file contains example
|
|
scripts to create the relational tables for a number of database
|
|
platforms (which are in turn auto-detected by the job repository factory
|
|
bean or namespace equivalent). These scripts can be used as is, or
|
|
modified with additional indexes and constraints as desired. The file
|
|
names are in the form <code class="literal">schema-*.sql</code>, where "*" is the
|
|
short name of the target database platform. The scripts are in
|
|
the package <code class="literal">org.springframework.batch.core</code>.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="metaDataVersion" href="#metaDataVersion"></a>B.1.2 Version</h3></div></div></div><p>Many of the database tables discussed in this appendix contain a
|
|
version column. This column is important because Spring Batch employs an
|
|
optimistic locking strategy when dealing with updates to the database.
|
|
This means that each time a record is 'touched' (updated) the value in
|
|
the version column is incremented by one. When the repository goes back
|
|
to try and save the value, if the version number has change it will
|
|
throw <code class="classname">OptimisticLockingFailureException</code>,
|
|
indicating there has been an error with concurrent access. This check is
|
|
necessary since, even though different batch jobs may be running in
|
|
different machines, they are all using the same database tables.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="metaDataIdentity" href="#metaDataIdentity"></a>B.1.3 Identity</h3></div></div></div><p>BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, and BATCH_STEP_EXECUTION
|
|
each contain columns ending in _ID. These fields act as primary keys for
|
|
their respective tables. However, they are not database generated keys,
|
|
but rather they are generated by separate sequences. This is necessary
|
|
because after inserting one of the domain objects into the database, the
|
|
key it is given needs to be set on the actual object so that they can be
|
|
uniquely identified in Java. Newer database drivers (Jdbc 3.0 and up)
|
|
support this feature with database generated keys, but rather than
|
|
requiring it, sequences were used. Each variation of the schema will
|
|
contain some form of the following:</p><pre class="programlisting"><span class="hl-keyword">CREATE</span> <span class="hl-keyword">SEQUENCE</span> BATCH_STEP_EXECUTION_SEQ;
|
|
<span class="hl-keyword">CREATE</span> <span class="hl-keyword">SEQUENCE</span> BATCH_JOB_EXECUTION_SEQ;
|
|
<span class="hl-keyword">CREATE</span> <span class="hl-keyword">SEQUENCE</span> BATCH_JOB_SEQ;</pre><p>Many database vendors don't support sequences. In these cases,
|
|
work-arounds are used, such as the following for MySQL:</p><pre class="programlisting"><span class="hl-keyword">CREATE</span> <span class="hl-keyword">TABLE</span> BATCH_STEP_EXECUTION_SEQ (ID <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span>) <span class="hl-keyword">type</span>=InnoDB;
|
|
<span class="hl-keyword">INSERT</span> <span class="hl-keyword">INTO</span> BATCH_STEP_EXECUTION_SEQ <span class="hl-keyword">values</span>(<span class="hl-number">0</span>);
|
|
<span class="hl-keyword">CREATE</span> <span class="hl-keyword">TABLE</span> BATCH_JOB_EXECUTION_SEQ (ID <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span>) <span class="hl-keyword">type</span>=InnoDB;
|
|
<span class="hl-keyword">INSERT</span> <span class="hl-keyword">INTO</span> BATCH_JOB_EXECUTION_SEQ <span class="hl-keyword">values</span>(<span class="hl-number">0</span>);
|
|
<span class="hl-keyword">CREATE</span> <span class="hl-keyword">TABLE</span> BATCH_JOB_SEQ (ID <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span>) <span class="hl-keyword">type</span>=InnoDB;
|
|
<span class="hl-keyword">INSERT</span> <span class="hl-keyword">INTO</span> BATCH_JOB_SEQ <span class="hl-keyword">values</span>(<span class="hl-number">0</span>);</pre><p>In the above case, a table is used in place of each sequence. The
|
|
Spring core class <code class="classname">MySQLMaxValueIncrementer</code> will
|
|
then increment the one column in this sequence in order to give similar
|
|
functionality.</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="metaDataBatchJobInstance" href="#metaDataBatchJobInstance"></a>B.2 BATCH_JOB_INSTANCE</h2></div></div></div><p>The BATCH_JOB_INSTANCE table holds all information relevant to a
|
|
<code class="classname">JobInstance</code>, and serves as the top of the overall
|
|
hierarchy. The following generic DDL statement is used to create
|
|
it:</p><pre class="programlisting"><span class="hl-keyword">CREATE</span> <span class="hl-keyword">TABLE</span> BATCH_JOB_INSTANCE (
|
|
JOB_INSTANCE_ID <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">PRIMARY</span> <span class="hl-keyword">KEY</span> ,
|
|
VERSION <span class="hl-keyword">BIGINT</span>,
|
|
JOB_NAME <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">100</span>) <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span> ,
|
|
JOB_KEY <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">2500</span>)
|
|
);</pre><p>Below are descriptions of each column in the table:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>JOB_INSTANCE_ID: The unique id that will identify the instance,
|
|
which is also the primary key. The value of this column should be
|
|
obtainable by calling the <code class="methodname">getId</code> method on
|
|
<code class="classname">JobInstance</code>.</p></li><li class="listitem"><p>VERSION: See above section.</p></li><li class="listitem"><p>JOB_NAME: Name of the job obtained from the
|
|
<code class="classname">Job</code> object. Because it is required to identify
|
|
the instance, it must not be null.</p></li><li class="listitem"><p>JOB_KEY: A serialization of the
|
|
<code class="classname">JobParameters</code> that uniquely identifies separate
|
|
instances of the same job from one another.
|
|
(<code class="classname">JobInstances</code> with the same job name must have
|
|
different <code class="classname">JobParameters</code>, and thus, different
|
|
JOB_KEY values).</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="metaDataBatchJobParams" href="#metaDataBatchJobParams"></a>B.3 BATCH_JOB_EXECUTION_PARAMS</h2></div></div></div><p>The BATCH_JOB_EXECUTION_PARAMS table holds all information relevant to the
|
|
<code class="classname">JobParameters</code> object. It contains 0 or more
|
|
key/value pairs passed to a <code class="classname">Job</code> and serve as a record of the parameters
|
|
a job was run with. For each parameter that contributes to the generation of a job's identity,
|
|
the IDENTIFYING flag is set to true. It should be noted that the table has been
|
|
denormalized. Rather than creating a separate table for each type, there
|
|
is one table with a column indicating the type:</p><pre class="programlisting"><span class="hl-keyword">CREATE</span> <span class="hl-keyword">TABLE</span> BATCH_JOB_EXECUTION_PARAMS (
|
|
JOB_EXECUTION_ID <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span> ,
|
|
TYPE_CD <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">6</span>) <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span> ,
|
|
KEY_NAME <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">100</span>) <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span> ,
|
|
STRING_VAL <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">250</span>) ,
|
|
DATE_VAL DATETIME <span class="hl-keyword">DEFAULT</span> <span class="hl-keyword">NULL</span> ,
|
|
LONG_VAL <span class="hl-keyword">BIGINT</span> ,
|
|
DOUBLE_VAL <span class="hl-keyword">DOUBLE</span> <span class="hl-keyword">PRECISION</span> ,
|
|
IDENTIFYING <span class="hl-keyword">CHAR</span>(<span class="hl-number">1</span>) <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span> ,
|
|
<span class="hl-keyword">constraint</span> JOB_EXEC_PARAMS_FK <span class="hl-keyword">foreign</span> <span class="hl-keyword">key</span> (JOB_EXECUTION_ID)
|
|
<span class="hl-keyword">references</span> BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
|
|
);</pre><p>Below are descriptions for each column:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>JOB_EXECUTION_ID: Foreign Key from the BATCH_JOB_EXECUTION table
|
|
that indicates the job execution the parameter entry belongs to. It
|
|
should be noted that multiple rows (i.e key/value pairs) may exist for
|
|
each execution.</p></li><li class="listitem"><p>TYPE_CD: String representation of the type of value stored,
|
|
which can be either a string, date, long, or double. Because the type
|
|
must be known, it cannot be null.</p></li><li class="listitem"><p>KEY_NAME: The parameter key.</p></li><li class="listitem"><p>STRING_VAL: Parameter value, if the type is string.</p></li><li class="listitem"><p>DATE_VAL: Parameter value, if the type is date.</p></li><li class="listitem"><p>LONG_VAL: Parameter value, if the type is a long.</p></li><li class="listitem"><p>DOUBLE_VAL: Parameter value, if the type is double.</p></li><li class="listitem"><p>IDENTIFYING: Flag indicating if the parameter contributed to the identity of the related <code class="classname">JobInstance</code>.</p></li></ul></div><p>It is worth noting that there is no primary key for this table. This
|
|
is simply because the framework has no use for one, and thus doesn't
|
|
require it. If a user so chooses, one may be added with a database
|
|
generated key, without causing any issues to the framework itself.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="metaDataBatchJobExecution" href="#metaDataBatchJobExecution"></a>B.4 BATCH_JOB_EXECUTION</h2></div></div></div><p>The BATCH_JOB_EXECUTION table holds all information relevant to the
|
|
<code class="classname">JobExecution</code> object. Every time a
|
|
<code class="classname">Job</code> is run there will always be a new
|
|
<code class="classname">JobExecution</code>, and a new row in this table:</p><pre class="programlisting"><span class="hl-keyword">CREATE</span> <span class="hl-keyword">TABLE</span> BATCH_JOB_EXECUTION (
|
|
JOB_EXECUTION_ID <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">PRIMARY</span> <span class="hl-keyword">KEY</span> ,
|
|
VERSION <span class="hl-keyword">BIGINT</span>,
|
|
JOB_INSTANCE_ID <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span>,
|
|
CREATE_TIME <span class="hl-keyword">TIMESTAMP</span> <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span>,
|
|
START_TIME <span class="hl-keyword">TIMESTAMP</span> <span class="hl-keyword">DEFAULT</span> <span class="hl-keyword">NULL</span>,
|
|
END_TIME <span class="hl-keyword">TIMESTAMP</span> <span class="hl-keyword">DEFAULT</span> <span class="hl-keyword">NULL</span>,
|
|
STATUS <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">10</span>),
|
|
EXIT_CODE <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">20</span>),
|
|
EXIT_MESSAGE <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">2500</span>),
|
|
LAST_UPDATED <span class="hl-keyword">TIMESTAMP</span>,
|
|
JOB_CONFIGURATION_LOCATION <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">2500</span>) <span class="hl-keyword">NULL</span>,
|
|
<span class="hl-keyword">constraint</span> JOB_INSTANCE_EXECUTION_FK <span class="hl-keyword">foreign</span> <span class="hl-keyword">key</span> (JOB_INSTANCE_ID)
|
|
<span class="hl-keyword">references</span> BATCH_JOB_INSTANCE(JOB_INSTANCE_ID)
|
|
) ;</pre><p>Below are descriptions for each column:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>JOB_EXECUTION_ID: Primary key that uniquely identifies this
|
|
execution. The value of this column is obtainable by calling the
|
|
<code class="methodname">getId</code> method of the
|
|
<code class="classname">JobExecution</code> object.</p></li><li class="listitem"><p>VERSION: See above section.</p></li><li class="listitem"><p>JOB_INSTANCE_ID: Foreign key from the BATCH_JOB_INSTANCE table
|
|
indicating the instance to which this execution belongs. There may be
|
|
more than one execution per instance.</p></li><li class="listitem"><p>CREATE_TIME: Timestamp representing the time that the execution
|
|
was created.</p></li><li class="listitem"><p>START_TIME: Timestamp representing the time the execution was
|
|
started.</p></li><li class="listitem"><p>END_TIME: Timestamp representing the time the execution was
|
|
finished, regardless of success or failure. An empty value in this
|
|
column even though the job is not currently running indicates that
|
|
there has been some type of error and the framework was unable to
|
|
perform a last save before failing.</p></li><li class="listitem"><p>STATUS: Character string representing the status of the
|
|
execution. This may be COMPLETED, STARTED, etc. The object
|
|
representation of this column is the
|
|
<code class="classname">BatchStatus</code> enumeration.</p></li><li class="listitem"><p>EXIT_CODE: Character string representing the exit code of the
|
|
execution. In the case of a command line job, this may be converted
|
|
into a number.</p></li><li class="listitem"><p>EXIT_MESSAGE: Character string representing a more detailed
|
|
description of how the job exited. In the case of failure, this might
|
|
include as much of the stack trace as is possible.</p></li><li class="listitem"><p>LAST_UPDATED: Timestamp representing the last time this
|
|
execution was persisted.</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="metaDataBatchStepExecution" href="#metaDataBatchStepExecution"></a>B.5 BATCH_STEP_EXECUTION</h2></div></div></div><p>The BATCH_STEP_EXECUTION table holds all information relevant to the
|
|
<code class="classname">StepExecution</code> object. This table is very similar in
|
|
many ways to the BATCH_JOB_EXECUTION table and there will always be at
|
|
least one entry per <code class="classname">Step</code> for each
|
|
<code class="classname">JobExecution</code> created:</p><pre class="programlisting"><span class="hl-keyword">CREATE</span> <span class="hl-keyword">TABLE</span> BATCH_STEP_EXECUTION (
|
|
STEP_EXECUTION_ID <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">PRIMARY</span> <span class="hl-keyword">KEY</span> ,
|
|
VERSION <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span>,
|
|
STEP_NAME <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">100</span>) <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span>,
|
|
JOB_EXECUTION_ID <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span>,
|
|
START_TIME <span class="hl-keyword">TIMESTAMP</span> <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span> ,
|
|
END_TIME <span class="hl-keyword">TIMESTAMP</span> <span class="hl-keyword">DEFAULT</span> <span class="hl-keyword">NULL</span>,
|
|
STATUS <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">10</span>),
|
|
COMMIT_COUNT <span class="hl-keyword">BIGINT</span> ,
|
|
READ_COUNT <span class="hl-keyword">BIGINT</span> ,
|
|
FILTER_COUNT <span class="hl-keyword">BIGINT</span> ,
|
|
WRITE_COUNT <span class="hl-keyword">BIGINT</span> ,
|
|
READ_SKIP_COUNT <span class="hl-keyword">BIGINT</span> ,
|
|
WRITE_SKIP_COUNT <span class="hl-keyword">BIGINT</span> ,
|
|
PROCESS_SKIP_COUNT <span class="hl-keyword">BIGINT</span> ,
|
|
ROLLBACK_COUNT <span class="hl-keyword">BIGINT</span> ,
|
|
EXIT_CODE <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">20</span>) ,
|
|
EXIT_MESSAGE <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">2500</span>) ,
|
|
LAST_UPDATED <span class="hl-keyword">TIMESTAMP</span>,
|
|
<span class="hl-keyword">constraint</span> JOB_EXECUTION_STEP_FK <span class="hl-keyword">foreign</span> <span class="hl-keyword">key</span> (JOB_EXECUTION_ID)
|
|
<span class="hl-keyword">references</span> BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
|
|
) ;</pre><p>Below are descriptions for each column:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>STEP_EXECUTION_ID: Primary key that uniquely identifies this
|
|
execution. The value of this column should be obtainable by calling
|
|
the <code class="methodname">getId</code> method of the
|
|
<code class="classname">StepExecution</code> object.</p></li><li class="listitem"><p>VERSION: See above section.</p></li><li class="listitem"><p>STEP_NAME: The name of the step to which this execution
|
|
belongs.</p></li><li class="listitem"><p>JOB_EXECUTION_ID: Foreign key from the BATCH_JOB_EXECUTION table
|
|
indicating the JobExecution to which this StepExecution belongs. There
|
|
may be only one <code class="classname">StepExecution</code> for a given
|
|
<code class="classname">JobExecution</code> for a given
|
|
<code class="classname">Step</code> name.</p></li><li class="listitem"><p>START_TIME: Timestamp representing the time the execution was
|
|
started.</p></li><li class="listitem"><p>END_TIME: Timestamp representing the time the execution was
|
|
finished, regardless of success or failure. An empty value in this
|
|
column even though the job is not currently running indicates that
|
|
there has been some type of error and the framework was unable to
|
|
perform a last save before failing.</p></li><li class="listitem"><p>STATUS: Character string representing the status of the
|
|
execution. This may be COMPLETED, STARTED, etc. The object
|
|
representation of this column is the
|
|
<code class="classname">BatchStatus</code> enumeration.</p></li><li class="listitem"><p>COMMIT_COUNT: The number of times in which the step has
|
|
committed a transaction during this execution.</p></li><li class="listitem"><p>READ_COUNT: The number of items read during this
|
|
execution.</p></li><li class="listitem"><p>FILTER_COUNT: The number of items filtered out of this
|
|
execution.</p></li><li class="listitem"><p>WRITE_COUNT: The number of items written and committed during
|
|
this execution.</p></li><li class="listitem"><p>READ_SKIP_COUNT: The number of items skipped on read during this
|
|
execution.</p></li><li class="listitem"><p>WRITE_SKIP_COUNT: The number of items skipped on write during
|
|
this execution.</p></li><li class="listitem"><p>PROCESS_SKIP_COUNT: The number of items skipped during
|
|
processing during this execution.</p></li><li class="listitem"><p>ROLLBACK_COUNT: The number of rollbacks during this execution.
|
|
Note that this count includes each time rollback occurs, including
|
|
rollbacks for retry and those in the skip recovery procedure.</p></li><li class="listitem"><p>EXIT_CODE: Character string representing the exit code of the
|
|
execution. In the case of a command line job, this may be converted
|
|
into a number.</p></li><li class="listitem"><p>EXIT_MESSAGE: Character string representing a more detailed
|
|
description of how the job exited. In the case of failure, this might
|
|
include as much of the stack trace as is possible.</p></li><li class="listitem"><p>LAST_UPDATED: Timestamp representing the last time this
|
|
execution was persisted.</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="metaDataBatchJobExecutionContext" href="#metaDataBatchJobExecutionContext"></a>B.6 BATCH_JOB_EXECUTION_CONTEXT</h2></div></div></div><p>The BATCH_JOB_EXECUTION_CONTEXT table holds all information relevant
|
|
to an <code class="classname">Job</code>'s
|
|
<code class="classname">ExecutionContext</code>. There is exactly one
|
|
<code class="classname">Job</code> <code class="classname">ExecutionContext</code> per
|
|
<code class="classname">JobExecution</code>, and it contains all of the job-level
|
|
data that is needed for a particular job execution. This data typically
|
|
represents the state that must be retrieved after a failure so that a
|
|
<code class="classname">JobInstance</code> can 'start from where it left
|
|
off'.</p><pre class="programlisting"><span class="hl-keyword">CREATE</span> <span class="hl-keyword">TABLE</span> BATCH_JOB_EXECUTION_CONTEXT (
|
|
JOB_EXECUTION_ID <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">PRIMARY</span> <span class="hl-keyword">KEY</span>,
|
|
SHORT_CONTEXT <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">2500</span>) <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span>,
|
|
SERIALIZED_CONTEXT <span class="hl-keyword">CLOB</span>,
|
|
<span class="hl-keyword">constraint</span> JOB_EXEC_CTX_FK <span class="hl-keyword">foreign</span> <span class="hl-keyword">key</span> (JOB_EXECUTION_ID)
|
|
<span class="hl-keyword">references</span> BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
|
|
) ;</pre><p>Below are descriptions for each column:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>JOB_EXECUTION_ID: Foreign key representing the
|
|
<code class="classname">JobExecution</code> to which the context belongs.
|
|
There may be more than one row associated to a given execution.</p></li><li class="listitem"><p>SHORT_CONTEXT: A string version of the
|
|
SERIALIZED_CONTEXT.</p></li><li class="listitem"><p>SERIALIZED_CONTEXT: The entire context, serialized.</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="metaDataBatchStepExecutionContext" href="#metaDataBatchStepExecutionContext"></a>B.7 BATCH_STEP_EXECUTION_CONTEXT</h2></div></div></div><p>The BATCH_STEP_EXECUTION_CONTEXT table holds all information
|
|
relevant to an <code class="classname">Step</code>'s
|
|
<code class="classname">ExecutionContext</code>. There is exactly one
|
|
<code class="classname">ExecutionContext</code> per
|
|
<code class="classname">StepExecution</code>, and it contains all of the data that
|
|
needs to persisted for a particular step execution. This data typically
|
|
represents the state that must be retrieved after a failure so that a
|
|
<code class="classname">JobInstance</code> can 'start from where it left
|
|
off'.</p><pre class="programlisting"><span class="hl-keyword">CREATE</span> <span class="hl-keyword">TABLE</span> BATCH_STEP_EXECUTION_CONTEXT (
|
|
STEP_EXECUTION_ID <span class="hl-keyword">BIGINT</span> <span class="hl-keyword">PRIMARY</span> <span class="hl-keyword">KEY</span>,
|
|
SHORT_CONTEXT <span class="hl-keyword">VARCHAR</span>(<span class="hl-number">2500</span>) <span class="hl-keyword">NOT</span> <span class="hl-keyword">NULL</span>,
|
|
SERIALIZED_CONTEXT <span class="hl-keyword">CLOB</span>,
|
|
<span class="hl-keyword">constraint</span> STEP_EXEC_CTX_FK <span class="hl-keyword">foreign</span> <span class="hl-keyword">key</span> (STEP_EXECUTION_ID)
|
|
<span class="hl-keyword">references</span> BATCH_STEP_EXECUTION(STEP_EXECUTION_ID)
|
|
) ;</pre><p>Below are descriptions for each column:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>STEP_EXECUTION_ID: Foreign key representing the
|
|
<code class="classname">StepExecution</code> to which the context belongs.
|
|
There may be more than one row associated to a given execution.</p></li><li class="listitem"><p>SHORT_CONTEXT: A string version of the
|
|
SERIALIZED_CONTEXT.</p></li><li class="listitem"><p>SERIALIZED_CONTEXT: The entire context, serialized.</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="metaDataArchiving" href="#metaDataArchiving"></a>B.8 Archiving</h2></div></div></div><p>Because there are entries in multiple tables every time a batch job
|
|
is run, it is common to create an archive strategy for the meta-data
|
|
tables. The tables themselves are designed to show a record of what
|
|
happened in the past, and generally won't affect the run of any job, with
|
|
a couple of notable exceptions pertaining to restart:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>The framework will use the meta-data tables to determine if a
|
|
particular JobInstance has been run before. If it has been run, and
|
|
the job is not restartable, then an exception will be thrown.</p></li><li class="listitem"><p>If an entry for a JobInstance is removed without having
|
|
completed successfully, the framework will think that the job is new,
|
|
rather than a restart.</p></li><li class="listitem"><p>If a job is restarted, the framework will use any data that has
|
|
been persisted to the ExecutionContext to restore the Job's state.
|
|
Therefore, removing any entries from this table for jobs that haven't
|
|
completed successfully will prevent them from starting at the correct
|
|
point if run again.</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="multiByteCharacters" href="#multiByteCharacters"></a>B.9 International and Multi-byte Characters</h2></div></div></div><p>If you are using multi-byte character sets (e.g. Chines or Cyrillic)
|
|
in your business processing, then those characters might need to be
|
|
persisted in the Spring Batch schema. Many users find that
|
|
simply changing the schema to double the length of the <code class="literal">VARCHAR</code>
|
|
columns is enough. Others prefer to configure the <a class="link" href="#configuringJobRepository" title="4.3 Configuring a JobRepository"><code class="classname">JobRepository</code></a> with <code class="literal">max-varchar-length</code> half the value of the <code class="literal">VARCHAR</code> column length is enough. Some users have also reported that
|
|
they use <code class="literal">NVARCHAR</code> in place of <code class="literal">VARCHAR</code>
|
|
in their schema definitions. The best result will depend on the database
|
|
platform and the way the database server has been configured locally.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="recommendationsForIndexingMetaDataTables" href="#recommendationsForIndexingMetaDataTables"></a>B.10 Recommendations for Indexing Meta Data Tables</h2></div></div></div><p>Spring Batch provides DDL samples for the meta-data tables in the
|
|
Core jar file for several common database platforms. Index declarations
|
|
are not included in that DDL because there are too many variations in how
|
|
users may want to index depending on their precise platform, local
|
|
conventions and also the business requirements of how the jobs will be
|
|
operated. The table below provides some indication as to which columns are
|
|
going to be used in a WHERE clause by the Dao implementations provided by
|
|
Spring Batch, and how frequently they might be used, so that individual
|
|
projects can make up their own minds about indexing.</p><div class="table"><a name="d5e4771" href="#d5e4771"></a><p class="title"><b>Table B.1. Where clauses in SQL statements (excluding primary keys) and
|
|
their approximate frequency of use.</b></p><div class="table-contents"><table summary="Where clauses in SQL statements (excluding primary keys) and
 their approximate frequency of use." style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col><col><col></colgroup><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">Default Table Name</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">Where Clause</td><td style="border-bottom: 0.5pt solid ; ">Frequency</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">BATCH_JOB_INSTANCE</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_NAME = ? and JOB_KEY = ?</td><td style="border-bottom: 0.5pt solid ; ">Every time a job is launched</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">BATCH_JOB_EXECUTION</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">JOB_INSTANCE_ID = ?</td><td style="border-bottom: 0.5pt solid ; ">Every time a job is restarted</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">BATCH_EXECUTION_CONTEXT</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">EXECUTION_ID = ? and KEY_NAME = ?</td><td style="border-bottom: 0.5pt solid ; ">On commit interval, a.k.a. chunk</td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">BATCH_STEP_EXECUTION</td><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; ">VERSION = ?</td><td style="border-bottom: 0.5pt solid ; ">On commit interval, a.k.a. chunk (and at start and end of
|
|
step)</td></tr><tr><td style="border-right: 0.5pt solid ; ">BATCH_STEP_EXECUTION</td><td style="border-right: 0.5pt solid ; ">STEP_NAME = ? and JOB_EXECUTION_ID = ?</td><td style="">Before each step execution</td></tr></tbody></table></div></div><br class="table-break"></div></div>
|
|
|
|
<div class="appendix"><div class="titlepage"><div><div><h1 class="title"><a name="transactions" href="#transactions"></a>Appendix C. Batch Processing and Transactions</h1></div></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="transactionsNoRetry" href="#transactionsNoRetry"></a>C.1 Simple Batching with No Retry</h2></div></div></div><p>Consider the following simple example of a nested batch with no
|
|
retries. This is a very common scenario for batch processing, where
|
|
an input source is processed until exhausted, but we commit
|
|
periodically at the end of a "chunk" of processing.</p><pre class="programlisting">
|
|
1 | REPEAT(until=exhausted) {
|
|
|
|
|
2 | TX {
|
|
3 | REPEAT(size=5) {
|
|
3.1 | input;
|
|
3.2 | output;
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
</pre><p>The input operation (3.1) could be a message-based receive
|
|
(e.g. JMS), or a file-based read, but to recover and continue
|
|
processing with a chance of completing the whole job, it must be
|
|
transactional. The same applies to the operation at (3.2) - it must
|
|
be either transactional or idempotent.</p><p>If the chunk at REPEAT(3) fails because of a database exception at
|
|
(3.2), then TX(2) will roll back the whole chunk.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="transactionStatelessRetry" href="#transactionStatelessRetry"></a>C.2 Simple Stateless Retry</h2></div></div></div><p>It is also useful to use a retry for an operation which is not
|
|
transactional, like a call to a web-service or other remote
|
|
resource. For example:</p><pre class="programlisting">
|
|
0 | TX {
|
|
1 | input;
|
|
1.1 | output;
|
|
2 | RETRY {
|
|
2.1 | remote access;
|
|
| }
|
|
| }
|
|
</pre><p>This is actually one of the most useful applications of a retry,
|
|
since a remote call is much more likely to fail and be retryable
|
|
than a database update. As long as the remote access (2.1)
|
|
eventually succeeds, the transaction TX(0) will commit. If the
|
|
remote access (2.1) eventually fails, then the transaction TX(0) is
|
|
guaranteed to roll back.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="repeatRetry" href="#repeatRetry"></a>C.3 Typical Repeat-Retry Pattern</h2></div></div></div><p>The most typical batch processing pattern is to add a retry to the
|
|
inner block of the chunk in the Simple Batching example.
|
|
Consider this:</p><pre class="programlisting">
|
|
1 | REPEAT(until=exhausted, exception=not critical) {
|
|
|
|
|
2 | TX {
|
|
3 | REPEAT(size=5) {
|
|
|
|
|
4 | RETRY(stateful, exception=deadlock loser) {
|
|
4.1 | input;
|
|
5 | } PROCESS {
|
|
5.1 | output;
|
|
6 | } SKIP and RECOVER {
|
|
| notify;
|
|
| }
|
|
|
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
</pre><p>The inner RETRY(4) block is marked as "stateful" - see the
|
|
typical use case for a description of a stateful
|
|
retry. This means that if the the retry PROCESS(5) block fails, the
|
|
behaviour of the RETRY(4) is as follows.</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Throw an exception, rolling back the transaction TX(2) at the
|
|
chunk level, and allowing the item to be re-presented to the input
|
|
queue.</p></li><li class="listitem"><p>When the item re-appears, it might be retried depending on the
|
|
retry policy in place, executing PROCESS(5) again. The second and
|
|
subsequent attempts might fail again and rethrow the exception.</p></li><li class="listitem"><p>Eventually the item re-appears for the final time: the retry
|
|
policy disallows another attempt, so PROCESS(5) is never
|
|
executed. In this case we follow a RECOVER(6) path, effectively
|
|
"skipping" the item that was received and is being processed.</p></li></ul></div><p>Notice that the notation used for the RETRY(4) in the plan above
|
|
shows explictly that the the input step (4.1) is part of the retry.
|
|
It also makes clear that there are two alternate paths for
|
|
processing: the normal case is denoted by PROCESS(5), and the
|
|
recovery path is a separate block, RECOVER(6). The two alternate
|
|
paths are completely distinct: only one is ever taken in normal
|
|
circumstances.</p><p>In special cases (e.g. a special <code class="classname">TranscationValidException</code>
|
|
type), the retry policy might be able to determine that the
|
|
RECOVER(6) path can be taken on the last attempt after PROCESS(5)
|
|
has just failed, instead of waiting for the item to be re-presented.
|
|
This is not the default behavior because it requires detailed
|
|
knowledge of what has happened inside the PROCESS(5) block, which is
|
|
not usually available - e.g. if the output included write
|
|
access before the failure, then the exception should be rethrown to
|
|
ensure transactional integrity.</p><p>The completion policy in the outer, REPEAT(1) is crucial to the
|
|
success of the above plan. If the output(5.1) fails it may throw an
|
|
exception (it usually does, as described), in which case the
|
|
transaction TX(2) fails and the exception could propagate up through
|
|
the outer batch REPEAT(1). We do not want the whole batch to stop
|
|
because the RETRY(4) might still be successful if we try again, so
|
|
we add the exception=not critical to the outer REPEAT(1).</p><p>Note, however, that if the TX(2) fails and we <span class="emphasis"><em>do</em></span> try again, by
|
|
virtue of the outer completion policy, the item that is next
|
|
processed in the inner REPEAT(3) is not guaranteed to be the one
|
|
that just failed. It might well be, but it depends on the
|
|
implementation of the input(4.1). Thus the output(5.1) might fail
|
|
again, on a new item, or on the old one. The client of the batch
|
|
should not assume that each RETRY(4) attempt is going to process the
|
|
same items as the last one that failed. E.g. if the termination
|
|
policy for REPEAT(1) is to fail after 10 attempts, it will fail
|
|
after 10 consecutive attempts, but not necessarily at the same item.
|
|
This is consistent with the overall retry strategy: it is the inner
|
|
RETRY(4) that is aware of the history of each item, and can decide
|
|
whether or not to have another attempt at it.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="asyncChunkProcessing" href="#asyncChunkProcessing"></a>C.4 Asynchronous Chunk Processing</h2></div></div></div><p>The inner batches or chunks in the typical example
|
|
above can be executed concurrently by configuring the outer batch to
|
|
use an <code class="classname">AsyncTaskExecutor</code>. The outer batch waits for all the
|
|
chunks to complete before completing.</p><pre class="programlisting">
|
|
1 | REPEAT(until=exhausted, concurrent, exception=not critical) {
|
|
|
|
|
2 | TX {
|
|
3 | REPEAT(size=5) {
|
|
|
|
|
4 | RETRY(stateful, exception=deadlock loser) {
|
|
4.1 | input;
|
|
5 | } PROCESS {
|
|
| output;
|
|
6 | } RECOVER {
|
|
| recover;
|
|
| }
|
|
|
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
</pre></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="asyncItemProcessing" href="#asyncItemProcessing"></a>C.5 Asynchronous Item Processing</h2></div></div></div><p>The individual items in chunks in the typical
|
|
can also in principle be processed concurrently. In this case the
|
|
transaction boundary has to move to the level of the individual
|
|
item, so that each transaction is on a single thread:
|
|
</p><pre class="programlisting">
|
|
1 | REPEAT(until=exhausted, exception=not critical) {
|
|
|
|
|
2 | REPEAT(size=5, concurrent) {
|
|
|
|
|
3 | TX {
|
|
4 | RETRY(stateful, exception=deadlock loser) {
|
|
4.1 | input;
|
|
5 | } PROCESS {
|
|
| output;
|
|
6 | } RECOVER {
|
|
| recover;
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
|
|
|
| }
|
|
</pre><p>This plan sacrifices the optimisation benefit, that the simple plan
|
|
had, of having all the transactional resources chunked together. It
|
|
is only useful if the cost of the processing (5) is much higher than
|
|
the cost of transaction management (3).</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="transactionPropagation" href="#transactionPropagation"></a>C.6 Interactions Between Batching and Transaction Propagation</h2></div></div></div><p>There is a tighter coupling between batch-retry and TX management
|
|
than we would ideally like. In particular a stateless retry cannot
|
|
be used to retry database operations with a transaction manager that
|
|
doesn't support NESTED propagation.
|
|
</p><p>For a simple example using retry without repeat, consider this:</p><pre class="programlisting">
|
|
1 | TX {
|
|
|
|
|
1.1 | input;
|
|
2.2 | database access;
|
|
2 | RETRY {
|
|
3 | TX {
|
|
3.1 | database access;
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
</pre><p>Again, and for the same reason, the inner transaction TX(3) can
|
|
cause the outer transaction TX(1) to fail, even if the RETRY(2) is
|
|
eventually successful.</p><p>Unfortunately the same effect percolates from the retry block up to
|
|
the surrounding repeat batch if there is one:</p><pre class="programlisting">
|
|
1 | TX {
|
|
|
|
|
2 | REPEAT(size=5) {
|
|
2.1 | input;
|
|
2.2 | database access;
|
|
3 | RETRY {
|
|
4 | TX {
|
|
4.1 | database access;
|
|
| }
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
</pre><p>Now if TX(3) rolls back it can pollute the whole batch at TX(1) and
|
|
force it to roll back at the end.</p><p>What about non-default propagation?</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>In the last example PROPAGATION_REQUIRES_NEW at TX(3) will
|
|
prevent the outer TX(1) from being polluted if both transactions
|
|
are eventually successful. But if TX(3) commits and TX(1) rolls
|
|
back, then TX(3) stays committed, so we violate the transaction
|
|
contract for TX(1). If TX(3) rolls back, TX(1) does not necessarily (but it probably
|
|
will in practice because the retry will throw a roll back
|
|
exception).</p></li><li class="listitem"><p>PROPAGATION_NESTED at TX(3) works as we require in the retry
|
|
case (and for a batch with skips): TX(3) can commit, but
|
|
subsequently be rolled back by the outer transaction TX(1). If
|
|
TX(3) rolls back, again TX(1) will roll back in practice. This
|
|
option is only available on some platforms, e.g. not Hibernate or
|
|
JTA, but it is the only one that works consistently.</p></li></ul></div><p>So NESTED is best if the retry block contains any database access.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="specialTransactionOrthonogonal" href="#specialTransactionOrthonogonal"></a>C.7 Special Case: Transactions with Orthogonal Resources</h2></div></div></div><p>Default propagation is always OK for simple cases where there are no
|
|
nested database transactions. Consider this (where the SESSION and
|
|
TX are not global XA resources, so their resources are orthogonal):
|
|
</p><pre class="programlisting">
|
|
0 | SESSION {
|
|
1 | input;
|
|
2 | RETRY {
|
|
3 | TX {
|
|
3.1 | database access;
|
|
| }
|
|
| }
|
|
| }
|
|
</pre><p>Here there is a transactional message SESSION(0), but it doesn't
|
|
participate in other transactions with
|
|
<code class="classname">PlatformTransactionManager</code>, so doesn't propagate when TX(3)
|
|
starts. There is no database access outside the RETRY(2) block. If
|
|
TX(3) fails and then eventually succeeds on a retry, SESSION(0) can
|
|
commit (it can do this independent of a TX block). This is similar
|
|
to the vanilla "best-efforts-one-phase-commit" scenario - the worst
|
|
that can happen is a duplicate message when the RETRY(2) succeeds
|
|
and the SESSION(0) cannot commit, e.g. because the message system is
|
|
unavailable.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="statelessRetryCannotRecover" href="#statelessRetryCannotRecover"></a>C.8 Stateless Retry Cannot Recover</h2></div></div></div><p>The distinction between a stateless and a stateful retry in the
|
|
typical example above is important. It is actually
|
|
ultimately a transactional constraint that forces the distinction,
|
|
and this constraint also makes it obvious why the distinction
|
|
exists.
|
|
</p><p>We start with the observation that there is no way to skip an item
|
|
that failed and successfully commit the rest of the chunk unless we
|
|
wrap the item processing in a transaction. So we simplify the
|
|
typical batch execution plan to look like this:</p><pre class="programlisting">
|
|
0 | REPEAT(until=exhausted) {
|
|
|
|
|
1 | TX {
|
|
2 | REPEAT(size=5) {
|
|
|
|
|
3 | RETRY(stateless) {
|
|
4 | TX {
|
|
4.1 | input;
|
|
4.2 | database access;
|
|
| }
|
|
5 | } RECOVER {
|
|
5.1 | skip;
|
|
| }
|
|
|
|
|
| }
|
|
| }
|
|
|
|
|
| }
|
|
</pre><p>Here we have a stateless RETRY(3) with a RECOVER(5) path that kicks
|
|
in after the final attempt fails. The "stateless" label just means
|
|
that the block will be repeated without rethrowing any exception up
|
|
to some limit. This will only work if the transaction TX(4) has
|
|
propagation NESTED.</p><p>If the TX(3) has default propagation properties and it rolls back,
|
|
it will pollute the outer TX(1). The inner transaction is assumed by
|
|
the transaction manager to have corrupted the transactional
|
|
resource, and so it cannot be used again.</p><p>Support for NESTED propagation is sufficiently rare that we choose
|
|
not to support recovery with stateless retries in current versions of
|
|
Spring Batch. The same effect can always be achieved (at the
|
|
expense of repeating more processing) using the
|
|
typical pattern above.</p></div></div>
|
|
|
|
<div class="glossary"><div class="titlepage"><div><div><h1 class="title"><a name="glossary" href="#glossary"></a>Glossary</h1></div></div></div><div class="glossdiv"><h3 class="title">Spring Batch Glossary</h3><dl><dt><span class="glossterm">Batch</span></dt><dd class="glossdef"><p>An accumulation of business transactions over time.</p></dd><dt><span class="glossterm">Batch Application Style</span></dt><dd class="glossdef"><p>Term used to designate batch as an application style in its own
|
|
right similar to online, Web or SOA. It has standard elements of
|
|
input, validation, transformation of information to business model,
|
|
business processing and output. In addition, it requires monitoring at
|
|
a macro level.</p></dd><dt><span class="glossterm">Batch Processing</span></dt><dd class="glossdef"><p>The handling of a batch of many business transactions that have
|
|
accumulated over a period of time (e.g. an hour, day, week, month, or
|
|
year). It is the application of a process, or set of processes, to
|
|
many data entities or objects in a repetitive and predictable fashion
|
|
with either no manual element, or a separate manual element for error
|
|
processing.</p></dd><dt><span class="glossterm">Batch Window</span></dt><dd class="glossdef"><p>The time frame within which a batch job must complete. This can
|
|
be constrained by other systems coming online, other dependent jobs
|
|
needing to execute or other factors specific to the batch
|
|
environment.</p></dd><dt><span class="glossterm">Step</span></dt><dd class="glossdef"><p>It is the main batch task or unit of work controller. It
|
|
initializes the business logic, and controls the transaction
|
|
environment based on commit interval setting, etc.</p></dd><dt><span class="glossterm">Tasklet</span></dt><dd class="glossdef"><p>A component created by application developer to process the
|
|
business logic for a Step.</p></dd><dt><span class="glossterm">Batch Job Type</span></dt><dd class="glossdef"><p>Job Types describe application of jobs for particular type of
|
|
processing. Common areas are interface processing (typically flat
|
|
files), forms processing (either for online pdf generation or print
|
|
formats), report processing.</p></dd><dt><span class="glossterm">Driving Query</span></dt><dd class="glossdef"><p>A driving query identifies the set of work for a job to do; the
|
|
job then breaks that work into individual units of work. For instance,
|
|
identify all financial transactions that have a status of "pending
|
|
transmission" and send them to our partner system. The driving query
|
|
returns a set of record IDs to process; each record ID then becomes a
|
|
unit of work. A driving query may involve a join (if the criteria for
|
|
selection falls across two or more tables) or it may work with a
|
|
single table.</p></dd><dt><span class="glossterm">Item</span></dt><dd class="glossdef"><p>An item represents the smallest ammount of complete data for
|
|
processing. In the simplest terms, this might mean a line in a file, a
|
|
row in a database table, or a particular element in an XML
|
|
file.</p></dd><dt><span class="glossterm">Logicial Unit of Work (LUW)</span></dt><dd class="glossdef"><p>A batch job iterates through a driving query (or another input
|
|
source such as a file) to perform the set of work that the job must
|
|
accomplish. Each iteration of work performed is a unit of work.</p></dd><dt><span class="glossterm">Commit Interval</span></dt><dd class="glossdef"><p>A set of LUWs processed within a single transaction.</p></dd><dt><span class="glossterm">Partitioning</span></dt><dd class="glossdef"><p>Splitting a job into multiple threads where each thread is
|
|
responsible for a subset of the overall data to be processed. The
|
|
threads of execution may be within the same JVM or they may span JVMs
|
|
in a clustered environment that supports workload balancing.</p></dd><dt><span class="glossterm">Staging Table</span></dt><dd class="glossdef"><p>A table that holds temporary data while it is being
|
|
processed.</p></dd><dt><span class="glossterm">Restartable</span></dt><dd class="glossdef"><p>A job that can be executed again and will assume the same
|
|
identity as when run initially. In othewords, it is has the same job
|
|
instance id.</p></dd><dt><span class="glossterm">Rerunnable</span></dt><dd class="glossdef"><p>A job that is restartable and manages its own state in terms of
|
|
previous run's record processing. An example of a rerunnable step is
|
|
one based on a driving query. If the driving query can be formed so
|
|
that it will limit the processed rows when the job is restarted than
|
|
it is re-runnable. This is managed by the application logic. Often
|
|
times a condition is added to the where statement to limit the rows
|
|
returned by the driving query with something like "and processedFlag
|
|
!= true".</p></dd><dt><span class="glossterm">Repeat</span></dt><dd class="glossdef"><p>One of the most basic units of batch processing, that defines
|
|
repeatability calling a portion of code until it is finished, and
|
|
while there is no error. Typically a batch process would be repeatable
|
|
as long as there is input.</p></dd><dt><span class="glossterm">Retry</span></dt><dd class="glossdef"><p>Simplifies the execution of operations with retry semantics most
|
|
frequently associated with handling transactional output exceptions.
|
|
Retry is slightly different from repeat, rather than continually
|
|
calling a block of code, retry is stateful, and continually calls the
|
|
same block of code with the same input, until it either succeeds, or
|
|
some type of retry limit has been exceeded. It is only generally
|
|
useful if a subsequent invocation of the operation might succeed
|
|
because something in the environment has improved.</p></dd><dt><span class="glossterm">Recover</span></dt><dd class="glossdef"><p>Recover operations handle an exception in such a way that a
|
|
repeat process is able to continue.</p></dd><dt><span class="glossterm">Skip</span></dt><dd class="glossdef"><p>Skip is a recovery strategy often used on file input sources as
|
|
the strategy for ignoring bad input records that failed
|
|
validation.</p></dd></dl></div></div>
|
|
|
|
</div></body></html> |