BATCH-605:Added an new appendix that describes the meta-data table schema in detail.
This commit is contained in:
BIN
docs/src/site/docbook/reference/images/meta-data-erd.png
Executable file
BIN
docs/src/site/docbook/reference/images/meta-data-erd.png
Executable file
Binary file not shown.
|
After Width: | Height: | Size: 37 KiB |
@@ -48,6 +48,8 @@
|
||||
<xi:include href="testing.xml" />
|
||||
|
||||
<xi:include href="appendix.xml" />
|
||||
|
||||
<xi:include href="schema-appendix.xml" />
|
||||
|
||||
<xi:include href="glossary.xml" />
|
||||
</book>
|
||||
448
docs/src/site/docbook/reference/schema-appendix.xml
Normal file
448
docs/src/site/docbook/reference/schema-appendix.xml
Normal file
@@ -0,0 +1,448 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE appendix PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
|
||||
<appendix>
|
||||
<title>Meta-Data Schema</title>
|
||||
|
||||
<section>
|
||||
<title>Overview</title>
|
||||
|
||||
<para>The Spring Batch Meta-Data tables very closely match the Domain
|
||||
objects that represent them in Java. For example, JobInstance,
|
||||
JobExecution, JobParameters, StepExecution, and ExecutionContext map to
|
||||
BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, BATCH_JOB_PARAMS,
|
||||
BATCH_STEP_EXECUTION, BATCH_STEP_EXECUTION_CONTEXT, respectively. The
|
||||
<classname>JobRepository</classname> is responsible for saving and storing
|
||||
each of java object into it's correct table. The following appendix
|
||||
describes the meta-data tables in detail, along with many of the design
|
||||
decisions that were made when creating them. When viewing the various
|
||||
table creation statements below, it is important to realize that the
|
||||
datatypes used are as generic as possible. Spring Batch provides many
|
||||
schemas as examples, which all have varying datatypes due to quirks in
|
||||
individual database vendors' handling of data types. Below is an ERD model
|
||||
of all 5 tables and their relationships to one another:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata fileref="images/meta-data-erd.png" />
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
|
||||
<section>
|
||||
<title>Version</title>
|
||||
|
||||
<para>Many of the databse tables discussed in this appendix contain a
|
||||
version column. This column is important because Spring Batch employs an
|
||||
optimistic locking strategy when dealing with updates to the database.
|
||||
This means that each time a record is 'touched' (updated) the value in
|
||||
the version column is incremented by one. When the repository goes back
|
||||
to try and save the value, if the version number has change it will
|
||||
throw <classname>OptimisticLockingFailureException</classname>,
|
||||
indicating there has been an error with concurrent access. This check is
|
||||
very necessary, since even though different batch jobs may be running in
|
||||
different machines, they are all using the same database tables.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Identity</title>
|
||||
|
||||
<para>BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, and BATCH_STEP_EXECUTION
|
||||
each contain columns ending in _ID, which act as primary keys for their
|
||||
respective tables. However, they are not database generated keys, but
|
||||
rather are generated by separate sequences. This is necessary because
|
||||
after inserting one of the domain objects into the database, the key it
|
||||
is given need to be set on the actual object, so that they can be
|
||||
uniquely identified in Java. Newer database drivers (Jdbc 3.0 and up)
|
||||
support this feature with database generated keys, but rather than
|
||||
requiring it, sequences were used. Each variation of the schema will
|
||||
contain some form of the following:</para>
|
||||
|
||||
<programlisting>CREATE SEQUENCE BATCH_STEP_EXECUTION_SEQ;
|
||||
CREATE SEQUENCE BATCH_JOB_EXECUTION_SEQ;
|
||||
CREATE SEQUENCE BATCH_JOB_SEQ;</programlisting>
|
||||
|
||||
<para>Many database vendors don't official support sequences. In these
|
||||
cases, work arounds are used, such as the following for mySQL:</para>
|
||||
|
||||
<programlisting>CREATE TABLE BATCH_STEP_EXECUTION_SEQ (ID BIGINT NOT NULL) type=MYISAM;
|
||||
INSERT INTO BATCH_STEP_EXECUTION_SEQ values(0);
|
||||
CREATE TABLE BATCH_JOB_EXECUTION_SEQ (ID BIGINT NOT NULL) type=MYISAM;
|
||||
INSERT INTO BATCH_JOB_EXECUTION_SEQ values(0);
|
||||
CREATE TABLE BATCH_JOB_SEQ (ID BIGINT NOT NULL) type=MYISAM;
|
||||
INSERT INTO BATCH_JOB_SEQ values(0);</programlisting>
|
||||
|
||||
<para>In the above case, a table is used in place of each sequence. The
|
||||
Spring core class <classname>MySQLMaxValueIncrementer</classname> will
|
||||
then increment hte one column in this sequence in order to give similar
|
||||
functionality.</para>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>BATCH_JOB_INSTANCE</title>
|
||||
|
||||
<para>The BATCH_JOB_INSTANCE table holds all information relevant to a
|
||||
<classname>JobInstance</classname>, and serves as the top of the overall
|
||||
heirarchy. The following generic DDL statement is used to create
|
||||
it:</para>
|
||||
|
||||
<programlisting>CREATE TABLE BATCH_JOB_INSTANCE (
|
||||
JOB_INSTANCE_ID BIGINT PRIMARY KEY ,
|
||||
VERSION BIGINT,
|
||||
JOB_NAME VARCHAR(100) NOT NULL ,
|
||||
JOB_KEY VARCHAR(2500)
|
||||
);</programlisting>
|
||||
|
||||
<para>Below are descriptions of each column in the table:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>JOB_INSTANCE_ID: The unique id that will identify the instance,
|
||||
which is also the primary key. The value of this column should be
|
||||
obtainable by calling the <methodname>getId</methodname> method on
|
||||
<classname>JobInstance</classname>.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>VERSION: See above section.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>JOB_NAME: Name of the job obtained from the
|
||||
<classname>Job</classname> object. Because it is required to identify
|
||||
the instance, it must not be null.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>JOB_KEY: A serialization of the
|
||||
<classname>JobParameters</classname> that uniquely identifies separate
|
||||
instances of the same job from one another.
|
||||
(<classname>JobInstances</classname> with the same job name</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>BATCH_JOB_PARAMS</title>
|
||||
|
||||
<para>The BATCH_JOB_PARAMS table holds all information relevant to the
|
||||
JobParameters object. It contains 0 or more key/value pairs that together
|
||||
uniquely identify a <classname>JobInstance</classname> and serve as a
|
||||
record of the parameters a job was run with. It should be noted that the
|
||||
table has been denormalized. Rather than creating a separate table for
|
||||
each type, there is one table with a column indicating the type:</para>
|
||||
|
||||
<programlisting>CREATE TABLE BATCH_JOB_PARAMS (
|
||||
JOB_INSTANCE_ID BIGINT NOT NULL ,
|
||||
TYPE_CD VARCHAR(6) NOT NULL ,
|
||||
KEY_NAME VARCHAR(100) NOT NULL ,
|
||||
STRING_VAL VARCHAR(250) ,
|
||||
DATE_VAL TIMESTAMP DEFAULT NULL,
|
||||
LONG_VAL BIGINT ,
|
||||
DOUBLE_VAL DOUBLE PRECISION,
|
||||
constraint JOB_INSTANCE_PARAMS_FK foreign key (JOB_INSTANCE_ID)
|
||||
references BATCH_JOB_INSTANCE(JOB_INSTANCE_ID)
|
||||
);</programlisting>
|
||||
|
||||
<para>Below are descriptions for each column:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>JOB_INSTANCE_ID: Foreign Key from the BATCH_JOB_INSTANCE table
|
||||
that indicates the job instance the parameter entry belongs to. It
|
||||
should be noted that multiple rows (i.e key/value pairs) may exist for
|
||||
each instance. </para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>TYPE_CD: String representation of the type of value stored,
|
||||
which can be either a character string, date, long, or double. Because
|
||||
the type must be known, it cannot be null.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>KEY_NAME: The Parameter key.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>STRING_VAL: Parameter value, if the type is string.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>DATE_VAL: Parameter value, if the type is date.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>LONG_VAL: Parameter value, if the type is a long.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>DOUBLE_VAL: Paramter value, if the type is double.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>It is worth noting that there is no primary key for this table. This
|
||||
is simply because the framework has no use for one, and thus doesn't
|
||||
require it. If a user so chooses, one may be added with a database
|
||||
generated key, without causing any issues to the framework itself.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>BATCH_JOB_EXECUTION</title>
|
||||
|
||||
<para>The BATCH_JOB_EXECUTION table holds all information relevant to the
|
||||
<classname>JobExecution</classname> object. Every time a
|
||||
<classname>Job</classname> is run there will always be a new
|
||||
<classname>JobExecution</classname>, and a new row in this table:</para>
|
||||
|
||||
<programlisting>CREATE TABLE BATCH_JOB_EXECUTION (
|
||||
JOB_EXECUTION_ID BIGINT PRIMARY KEY ,
|
||||
VERSION BIGINT,
|
||||
JOB_INSTANCE_ID BIGINT NOT NULL,
|
||||
START_TIME TIMESTAMP DEFAULT NULL,
|
||||
END_TIME TIMESTAMP DEFAULT NULL,
|
||||
STATUS VARCHAR(10),
|
||||
CONTINUABLE CHAR(1),
|
||||
EXIT_CODE VARCHAR(20),
|
||||
EXIT_MESSAGE VARCHAR(2500),
|
||||
constraint JOB_INSTANCE_EXECUTION_FK foreign key (JOB_INSTANCE_ID)
|
||||
references BATCH_JOB_INSTANCE(JOB_INSTANCE_ID)
|
||||
) ;</programlisting>
|
||||
|
||||
<para>Below are descriptions for each column:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>JOB_EXECUTION_ID: Primary key that uniquely identifies this
|
||||
execution. The value of this column should be obtainable by calling
|
||||
the <methodname>getId</methodname> method of the
|
||||
<classname>JobExecution</classname> object.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>VERSION: See above section.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>JOB_INSTANCE_ID: Foreign key from the BATCH_JOB_INSTANCE table
|
||||
indicating the instance to which this execution belongs. There may be
|
||||
more than one execution per instance.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>START_TIME: Timestamp representing the time the execution was
|
||||
started.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>END_TIME: Timestamp representing the time the execution was
|
||||
finished, regardless of success or failure. An empty value in this
|
||||
column even though the job is not currently running indicates that
|
||||
there has been some type of error and the framework was unable to
|
||||
perform a last save before failing.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>STATUS: Character string representing the status of the
|
||||
execution. This may be COMPLETED, STARTED, etc. The object
|
||||
representation of this column is the
|
||||
<classname>BatchStatus</classname> enumeration.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>CONTINUABLE: Character indicating whether or not the execution
|
||||
is currently able to continue. 'Y' for yes and 'N' for no.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>EXIT_CODE: Character string representing the exit code of the
|
||||
execution. In the case of a command line job, this may be converted
|
||||
into a number.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>EXIT_MESSAGE: Character string representing a more detailed
|
||||
description of how the job exited. In the case of failure, this might
|
||||
include as much of the stack trace as is possible.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>BATCH_STEP_EXECUTION</title>
|
||||
|
||||
<para>The BATCH_STEP_EXECUTION table holds all information relevant to the
|
||||
<classname>StepExecution</classname> object. This table is very similar in
|
||||
many ways to the BATCH_JOB_EXECUTION table and there will always be at
|
||||
least one entry per <classname>Step</classname> for each
|
||||
<classname>JobExecution</classname> created:</para>
|
||||
|
||||
<programlisting>CREATE TABLE BATCH_STEP_EXECUTION (
|
||||
STEP_EXECUTION_ID BIGINT PRIMARY KEY ,
|
||||
VERSION BIGINT NOT NULL,
|
||||
STEP_NAME VARCHAR(100) NOT NULL,
|
||||
JOB_EXECUTION_ID BIGINT NOT NULL,
|
||||
START_TIME TIMESTAMP NOT NULL ,
|
||||
END_TIME TIMESTAMP DEFAULT NULL,
|
||||
STATUS VARCHAR(10),
|
||||
COMMIT_COUNT BIGINT ,
|
||||
ITEM_COUNT BIGINT ,
|
||||
CONTINUABLE CHAR(1),
|
||||
EXIT_CODE VARCHAR(20),
|
||||
EXIT_MESSAGE VARCHAR(2500),
|
||||
constraint JOB_EXECUTION_STEP_FK foreign key (JOB_EXECUTION_ID)
|
||||
references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
|
||||
) ;</programlisting>
|
||||
|
||||
<para>Below are descriptions for each column:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>STEP_EXECUTION_ID: Primary key that uniquely identifies this
|
||||
execution. The value of this column should be obtainable by calling
|
||||
the <methodname>getId</methodname> method of the
|
||||
<classname>StepExecution</classname> object.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>VERSION: See above section.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>STEP_NAME: The name of the step to which this execution
|
||||
belongs.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>JOB_EXECUTION_ID: Foreign key from the BATCH_JOB_EXECUTION table
|
||||
indicating the JobExecution to which this StepExecution belongs. There
|
||||
may be only one <classname>StepExecution</classname> for a given
|
||||
<classname>JobExecution</classname> for a given
|
||||
<classname>Step</classname> name.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>START_TIME: Timestamp representing the time the execution was
|
||||
started. </para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>END_TIME: Timestamp representing the time the execution was
|
||||
finished, regardless of success or failure. An empty value in this
|
||||
column even though the job is not currently running indicates that
|
||||
there has been some type of error and the framework was unable to
|
||||
perform a last save before failing.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>STATUS: Character string representing the status of the
|
||||
execution. This may be COMPLETED, STARTED, etc. The object
|
||||
representation of this column is the
|
||||
<classname>BatchStatus</classname> enumeration.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>COMMIT_COUNT: The number of times in which the step has
|
||||
committed a transaction during this execution.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>ITEM_COUNT: The number of items that have been writtne out
|
||||
during this execution.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>CONTINUABLE: Character indicating whether or not the execution
|
||||
is currently able to continue. 'Y' for yes and 'N' for no.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>EXIT_CODE: Character string representing the exit code of the
|
||||
execution. In the case of a command line job, this may be converted
|
||||
into a number.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>EXIT_MESSAGE: Character string representing a more detailed
|
||||
description of how the job exited. In the case of failure, this might
|
||||
include as much of the stack trace as is possible.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>BATCH_STEP_EXECUTION_CONTEXT</title>
|
||||
|
||||
<para>The BATCH_STEP_EXECUTION_CONTEXT table holds all information
|
||||
relevant to an <classname>ExecutionContext</classname>. There is exactly
|
||||
one <classname>ExecutionContext</classname> per
|
||||
<classname>StepExecution</classname>, and it contains all user defined
|
||||
key/value pairs that need to persisted for a particular job run. This data
|
||||
is usually state information that must be retrieved back after a failure
|
||||
so that a JobInstance can 'start from where it left off'. As with the
|
||||
BATCH_JOB_PARAMS table, this table has been denormalized and uses a column
|
||||
to determine the type:</para>
|
||||
|
||||
<programlisting>CREATE TABLE BATCH_STEP_EXECUTION_CONTEXT (
|
||||
STEP_EXECUTION_ID BIGINT NOT NULL ,
|
||||
TYPE_CD VARCHAR(6) NOT NULL ,
|
||||
KEY_NAME VARCHAR(1000) NOT NULL ,
|
||||
STRING_VAL VARCHAR(1000) ,
|
||||
DATE_VAL TIMESTAMP DEFAULT NULL ,
|
||||
LONG_VAL VARCHAR(10) ,
|
||||
DOUBLE_VAL DOUBLE PRECISION ,
|
||||
OBJECT_VAL BLOB,
|
||||
constraint STEP_EXECUTION_CONTEXT_FK foreign key (STEP_EXECUTION_ID)
|
||||
references BATCH_STEP_EXECUTION(STEP_EXECUTION_ID)
|
||||
) ;</programlisting>
|
||||
|
||||
<para>Below are descriptions for each column:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>STEP_EXECUTION_ID: Foreign key representing the
|
||||
<classname>StepExecution</classname> to which the context belongs.
|
||||
There may be more than one row associated to a given
|
||||
<classname>StepExecution</classname>.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>TYPE_CD: String representation of the type of value stored,
|
||||
which can be either a character string, date, long, or double. Because
|
||||
the type must be known, it cannot be null.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>KEY_NAME: The Parameter key.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>STRING_VAL: Parameter value, if the type is string.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>DATE_VAL: Parameter value, if the type is date.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>LONG_VAL: Parameter value, if the type is a long.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>DOUBLE_VAL: Paramter value, if the type is double.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>OBJECT_VAL: Parameter value, if the type is a blob.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>When an ExecutionContext is stored, values that are one of the well
|
||||
known types above will be stored as their respective type. Any unknown
|
||||
type will be serialized to a blob and stored in the OBJECT_VAL column. As
|
||||
with BATCH_JOB_PARAMS, there is no primary key for this table. This is
|
||||
simply because the framework has no use for one, and thus doesn't require
|
||||
it. If a user so chooses, one may be added with a database generated key,
|
||||
without causing any issues to the framework itself.</para>
|
||||
</section>
|
||||
</appendix>
|
||||
Reference in New Issue
Block a user