diff --git a/docs/src/site/docbook/reference/images/meta-data-erd.png b/docs/src/site/docbook/reference/images/meta-data-erd.png new file mode 100755 index 000000000..2a7179068 Binary files /dev/null and b/docs/src/site/docbook/reference/images/meta-data-erd.png differ diff --git a/docs/src/site/docbook/reference/index.xml b/docs/src/site/docbook/reference/index.xml index e133beb63..ed1d2fb57 100644 --- a/docs/src/site/docbook/reference/index.xml +++ b/docs/src/site/docbook/reference/index.xml @@ -48,6 +48,8 @@ + + \ No newline at end of file diff --git a/docs/src/site/docbook/reference/schema-appendix.xml b/docs/src/site/docbook/reference/schema-appendix.xml new file mode 100644 index 000000000..871a731d0 --- /dev/null +++ b/docs/src/site/docbook/reference/schema-appendix.xml @@ -0,0 +1,448 @@ + + + + Meta-Data Schema + +
+ Overview + + The Spring Batch Meta-Data tables very closely match the Domain + objects that represent them in Java. For example, JobInstance, + JobExecution, JobParameters, StepExecution, and ExecutionContext map to + BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, BATCH_JOB_PARAMS, + BATCH_STEP_EXECUTION, BATCH_STEP_EXECUTION_CONTEXT, respectively. The + JobRepository is responsible for saving and storing + each of java object into it's correct table. The following appendix + describes the meta-data tables in detail, along with many of the design + decisions that were made when creating them. When viewing the various + table creation statements below, it is important to realize that the + datatypes used are as generic as possible. Spring Batch provides many + schemas as examples, which all have varying datatypes due to quirks in + individual database vendors' handling of data types. Below is an ERD model + of all 5 tables and their relationships to one another: + + + + + + + +
+ Version + + Many of the databse tables discussed in this appendix contain a + version column. This column is important because Spring Batch employs an + optimistic locking strategy when dealing with updates to the database. + This means that each time a record is 'touched' (updated) the value in + the version column is incremented by one. When the repository goes back + to try and save the value, if the version number has change it will + throw OptimisticLockingFailureException, + indicating there has been an error with concurrent access. This check is + very necessary, since even though different batch jobs may be running in + different machines, they are all using the same database tables. +
+ +
+ Identity + + BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, and BATCH_STEP_EXECUTION + each contain columns ending in _ID, which act as primary keys for their + respective tables. However, they are not database generated keys, but + rather are generated by separate sequences. This is necessary because + after inserting one of the domain objects into the database, the key it + is given need to be set on the actual object, so that they can be + uniquely identified in Java. Newer database drivers (Jdbc 3.0 and up) + support this feature with database generated keys, but rather than + requiring it, sequences were used. Each variation of the schema will + contain some form of the following: + + CREATE SEQUENCE BATCH_STEP_EXECUTION_SEQ; +CREATE SEQUENCE BATCH_JOB_EXECUTION_SEQ; +CREATE SEQUENCE BATCH_JOB_SEQ; + + Many database vendors don't official support sequences. In these + cases, work arounds are used, such as the following for mySQL: + + CREATE TABLE BATCH_STEP_EXECUTION_SEQ (ID BIGINT NOT NULL) type=MYISAM; +INSERT INTO BATCH_STEP_EXECUTION_SEQ values(0); +CREATE TABLE BATCH_JOB_EXECUTION_SEQ (ID BIGINT NOT NULL) type=MYISAM; +INSERT INTO BATCH_JOB_EXECUTION_SEQ values(0); +CREATE TABLE BATCH_JOB_SEQ (ID BIGINT NOT NULL) type=MYISAM; +INSERT INTO BATCH_JOB_SEQ values(0); + + In the above case, a table is used in place of each sequence. The + Spring core class MySQLMaxValueIncrementer will + then increment hte one column in this sequence in order to give similar + functionality. +
+
+ +
+ BATCH_JOB_INSTANCE + + The BATCH_JOB_INSTANCE table holds all information relevant to a + JobInstance, and serves as the top of the overall + heirarchy. The following generic DDL statement is used to create + it: + + CREATE TABLE BATCH_JOB_INSTANCE ( + JOB_INSTANCE_ID BIGINT PRIMARY KEY , + VERSION BIGINT, + JOB_NAME VARCHAR(100) NOT NULL , + JOB_KEY VARCHAR(2500) +); + + Below are descriptions of each column in the table: + + + + JOB_INSTANCE_ID: The unique id that will identify the instance, + which is also the primary key. The value of this column should be + obtainable by calling the getId method on + JobInstance. + + + + VERSION: See above section. + + + + JOB_NAME: Name of the job obtained from the + Job object. Because it is required to identify + the instance, it must not be null. + + + + JOB_KEY: A serialization of the + JobParameters that uniquely identifies separate + instances of the same job from one another. + (JobInstances with the same job name + + +
+ +
+ BATCH_JOB_PARAMS + + The BATCH_JOB_PARAMS table holds all information relevant to the + JobParameters object. It contains 0 or more key/value pairs that together + uniquely identify a JobInstance and serve as a + record of the parameters a job was run with. It should be noted that the + table has been denormalized. Rather than creating a separate table for + each type, there is one table with a column indicating the type: + + CREATE TABLE BATCH_JOB_PARAMS ( + JOB_INSTANCE_ID BIGINT NOT NULL , + TYPE_CD VARCHAR(6) NOT NULL , + KEY_NAME VARCHAR(100) NOT NULL , + STRING_VAL VARCHAR(250) , + DATE_VAL TIMESTAMP DEFAULT NULL, + LONG_VAL BIGINT , + DOUBLE_VAL DOUBLE PRECISION, + constraint JOB_INSTANCE_PARAMS_FK foreign key (JOB_INSTANCE_ID) + references BATCH_JOB_INSTANCE(JOB_INSTANCE_ID) +); + + Below are descriptions for each column: + + + + JOB_INSTANCE_ID: Foreign Key from the BATCH_JOB_INSTANCE table + that indicates the job instance the parameter entry belongs to. It + should be noted that multiple rows (i.e key/value pairs) may exist for + each instance. + + + + TYPE_CD: String representation of the type of value stored, + which can be either a character string, date, long, or double. Because + the type must be known, it cannot be null. + + + + KEY_NAME: The Parameter key. + + + + STRING_VAL: Parameter value, if the type is string. + + + + DATE_VAL: Parameter value, if the type is date. + + + + LONG_VAL: Parameter value, if the type is a long. + + + + DOUBLE_VAL: Paramter value, if the type is double. + + + + It is worth noting that there is no primary key for this table. This + is simply because the framework has no use for one, and thus doesn't + require it. If a user so chooses, one may be added with a database + generated key, without causing any issues to the framework itself. +
+ +
+ BATCH_JOB_EXECUTION + + The BATCH_JOB_EXECUTION table holds all information relevant to the + JobExecution object. Every time a + Job is run there will always be a new + JobExecution, and a new row in this table: + + CREATE TABLE BATCH_JOB_EXECUTION ( + JOB_EXECUTION_ID BIGINT PRIMARY KEY , + VERSION BIGINT, + JOB_INSTANCE_ID BIGINT NOT NULL, + START_TIME TIMESTAMP DEFAULT NULL, + END_TIME TIMESTAMP DEFAULT NULL, + STATUS VARCHAR(10), + CONTINUABLE CHAR(1), + EXIT_CODE VARCHAR(20), + EXIT_MESSAGE VARCHAR(2500), + constraint JOB_INSTANCE_EXECUTION_FK foreign key (JOB_INSTANCE_ID) + references BATCH_JOB_INSTANCE(JOB_INSTANCE_ID) +) ; + + Below are descriptions for each column: + + + + JOB_EXECUTION_ID: Primary key that uniquely identifies this + execution. The value of this column should be obtainable by calling + the getId method of the + JobExecution object. + + + + VERSION: See above section. + + + + JOB_INSTANCE_ID: Foreign key from the BATCH_JOB_INSTANCE table + indicating the instance to which this execution belongs. There may be + more than one execution per instance. + + + + START_TIME: Timestamp representing the time the execution was + started. + + + + END_TIME: Timestamp representing the time the execution was + finished, regardless of success or failure. An empty value in this + column even though the job is not currently running indicates that + there has been some type of error and the framework was unable to + perform a last save before failing. + + + + STATUS: Character string representing the status of the + execution. This may be COMPLETED, STARTED, etc. The object + representation of this column is the + BatchStatus enumeration. + + + + CONTINUABLE: Character indicating whether or not the execution + is currently able to continue. 'Y' for yes and 'N' for no. + + + + EXIT_CODE: Character string representing the exit code of the + execution. In the case of a command line job, this may be converted + into a number. + + + + EXIT_MESSAGE: Character string representing a more detailed + description of how the job exited. In the case of failure, this might + include as much of the stack trace as is possible. + + +
+ +
+ BATCH_STEP_EXECUTION + + The BATCH_STEP_EXECUTION table holds all information relevant to the + StepExecution object. This table is very similar in + many ways to the BATCH_JOB_EXECUTION table and there will always be at + least one entry per Step for each + JobExecution created: + + CREATE TABLE BATCH_STEP_EXECUTION ( + STEP_EXECUTION_ID BIGINT PRIMARY KEY , + VERSION BIGINT NOT NULL, + STEP_NAME VARCHAR(100) NOT NULL, + JOB_EXECUTION_ID BIGINT NOT NULL, + START_TIME TIMESTAMP NOT NULL , + END_TIME TIMESTAMP DEFAULT NULL, + STATUS VARCHAR(10), + COMMIT_COUNT BIGINT , + ITEM_COUNT BIGINT , + CONTINUABLE CHAR(1), + EXIT_CODE VARCHAR(20), + EXIT_MESSAGE VARCHAR(2500), + constraint JOB_EXECUTION_STEP_FK foreign key (JOB_EXECUTION_ID) + references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID) +) ; + + Below are descriptions for each column: + + + + STEP_EXECUTION_ID: Primary key that uniquely identifies this + execution. The value of this column should be obtainable by calling + the getId method of the + StepExecution object. + + + + VERSION: See above section. + + + + STEP_NAME: The name of the step to which this execution + belongs. + + + + JOB_EXECUTION_ID: Foreign key from the BATCH_JOB_EXECUTION table + indicating the JobExecution to which this StepExecution belongs. There + may be only one StepExecution for a given + JobExecution for a given + Step name. + + + + START_TIME: Timestamp representing the time the execution was + started. + + + + END_TIME: Timestamp representing the time the execution was + finished, regardless of success or failure. An empty value in this + column even though the job is not currently running indicates that + there has been some type of error and the framework was unable to + perform a last save before failing. + + + + STATUS: Character string representing the status of the + execution. This may be COMPLETED, STARTED, etc. The object + representation of this column is the + BatchStatus enumeration. + + + + COMMIT_COUNT: The number of times in which the step has + committed a transaction during this execution. + + + + ITEM_COUNT: The number of items that have been writtne out + during this execution. + + + + CONTINUABLE: Character indicating whether or not the execution + is currently able to continue. 'Y' for yes and 'N' for no. + + + + EXIT_CODE: Character string representing the exit code of the + execution. In the case of a command line job, this may be converted + into a number. + + + + EXIT_MESSAGE: Character string representing a more detailed + description of how the job exited. In the case of failure, this might + include as much of the stack trace as is possible. + + +
+ +
+ BATCH_STEP_EXECUTION_CONTEXT + + The BATCH_STEP_EXECUTION_CONTEXT table holds all information + relevant to an ExecutionContext. There is exactly + one ExecutionContext per + StepExecution, and it contains all user defined + key/value pairs that need to persisted for a particular job run. This data + is usually state information that must be retrieved back after a failure + so that a JobInstance can 'start from where it left off'. As with the + BATCH_JOB_PARAMS table, this table has been denormalized and uses a column + to determine the type: + + CREATE TABLE BATCH_STEP_EXECUTION_CONTEXT ( + STEP_EXECUTION_ID BIGINT NOT NULL , + TYPE_CD VARCHAR(6) NOT NULL , + KEY_NAME VARCHAR(1000) NOT NULL , + STRING_VAL VARCHAR(1000) , + DATE_VAL TIMESTAMP DEFAULT NULL , + LONG_VAL VARCHAR(10) , + DOUBLE_VAL DOUBLE PRECISION , + OBJECT_VAL BLOB, + constraint STEP_EXECUTION_CONTEXT_FK foreign key (STEP_EXECUTION_ID) + references BATCH_STEP_EXECUTION(STEP_EXECUTION_ID) +) ; + + Below are descriptions for each column: + + + + STEP_EXECUTION_ID: Foreign key representing the + StepExecution to which the context belongs. + There may be more than one row associated to a given + StepExecution. + + + + TYPE_CD: String representation of the type of value stored, + which can be either a character string, date, long, or double. Because + the type must be known, it cannot be null. + + + + KEY_NAME: The Parameter key. + + + + STRING_VAL: Parameter value, if the type is string. + + + + DATE_VAL: Parameter value, if the type is date. + + + + LONG_VAL: Parameter value, if the type is a long. + + + + DOUBLE_VAL: Paramter value, if the type is double. + + + + OBJECT_VAL: Parameter value, if the type is a blob. + + + + When an ExecutionContext is stored, values that are one of the well + known types above will be stored as their respective type. Any unknown + type will be serialized to a blob and stored in the OBJECT_VAL column. As + with BATCH_JOB_PARAMS, there is no primary key for this table. This is + simply because the framework has no use for one, and thus doesn't require + it. If a user so chooses, one may be added with a database generated key, + without causing any issues to the framework itself. +
+
\ No newline at end of file