BATCH-605:Added an new appendix that describes the meta-data table schema in detail.

This commit is contained in:
lucasward
2008-05-01 05:07:12 +00:00
parent 7be9bc6823
commit 4e59bf9b84
3 changed files with 450 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

View File

@@ -48,6 +48,8 @@
<xi:include href="testing.xml" />
<xi:include href="appendix.xml" />
<xi:include href="schema-appendix.xml" />
<xi:include href="glossary.xml" />
</book>

View File

@@ -0,0 +1,448 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE appendix PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<appendix>
<title>Meta-Data Schema</title>
<section>
<title>Overview</title>
<para>The Spring Batch Meta-Data tables very closely match the Domain
objects that represent them in Java. For example, JobInstance,
JobExecution, JobParameters, StepExecution, and ExecutionContext map to
BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, BATCH_JOB_PARAMS,
BATCH_STEP_EXECUTION, BATCH_STEP_EXECUTION_CONTEXT, respectively. The
<classname>JobRepository</classname> is responsible for saving and storing
each of java object into it's correct table. The following appendix
describes the meta-data tables in detail, along with many of the design
decisions that were made when creating them. When viewing the various
table creation statements below, it is important to realize that the
datatypes used are as generic as possible. Spring Batch provides many
schemas as examples, which all have varying datatypes due to quirks in
individual database vendors' handling of data types. Below is an ERD model
of all 5 tables and their relationships to one another:</para>
<mediaobject>
<imageobject>
<imagedata fileref="images/meta-data-erd.png" />
</imageobject>
</mediaobject>
<section>
<title>Version</title>
<para>Many of the databse tables discussed in this appendix contain a
version column. This column is important because Spring Batch employs an
optimistic locking strategy when dealing with updates to the database.
This means that each time a record is 'touched' (updated) the value in
the version column is incremented by one. When the repository goes back
to try and save the value, if the version number has change it will
throw <classname>OptimisticLockingFailureException</classname>,
indicating there has been an error with concurrent access. This check is
very necessary, since even though different batch jobs may be running in
different machines, they are all using the same database tables.</para>
</section>
<section>
<title>Identity</title>
<para>BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, and BATCH_STEP_EXECUTION
each contain columns ending in _ID, which act as primary keys for their
respective tables. However, they are not database generated keys, but
rather are generated by separate sequences. This is necessary because
after inserting one of the domain objects into the database, the key it
is given need to be set on the actual object, so that they can be
uniquely identified in Java. Newer database drivers (Jdbc 3.0 and up)
support this feature with database generated keys, but rather than
requiring it, sequences were used. Each variation of the schema will
contain some form of the following:</para>
<programlisting>CREATE SEQUENCE BATCH_STEP_EXECUTION_SEQ;
CREATE SEQUENCE BATCH_JOB_EXECUTION_SEQ;
CREATE SEQUENCE BATCH_JOB_SEQ;</programlisting>
<para>Many database vendors don't official support sequences. In these
cases, work arounds are used, such as the following for mySQL:</para>
<programlisting>CREATE TABLE BATCH_STEP_EXECUTION_SEQ (ID BIGINT NOT NULL) type=MYISAM;
INSERT INTO BATCH_STEP_EXECUTION_SEQ values(0);
CREATE TABLE BATCH_JOB_EXECUTION_SEQ (ID BIGINT NOT NULL) type=MYISAM;
INSERT INTO BATCH_JOB_EXECUTION_SEQ values(0);
CREATE TABLE BATCH_JOB_SEQ (ID BIGINT NOT NULL) type=MYISAM;
INSERT INTO BATCH_JOB_SEQ values(0);</programlisting>
<para>In the above case, a table is used in place of each sequence. The
Spring core class <classname>MySQLMaxValueIncrementer</classname> will
then increment hte one column in this sequence in order to give similar
functionality.</para>
</section>
</section>
<section>
<title>BATCH_JOB_INSTANCE</title>
<para>The BATCH_JOB_INSTANCE table holds all information relevant to a
<classname>JobInstance</classname>, and serves as the top of the overall
heirarchy. The following generic DDL statement is used to create
it:</para>
<programlisting>CREATE TABLE BATCH_JOB_INSTANCE (
JOB_INSTANCE_ID BIGINT PRIMARY KEY ,
VERSION BIGINT,
JOB_NAME VARCHAR(100) NOT NULL ,
JOB_KEY VARCHAR(2500)
);</programlisting>
<para>Below are descriptions of each column in the table:</para>
<itemizedlist>
<listitem>
<para>JOB_INSTANCE_ID: The unique id that will identify the instance,
which is also the primary key. The value of this column should be
obtainable by calling the <methodname>getId</methodname> method on
<classname>JobInstance</classname>.</para>
</listitem>
<listitem>
<para>VERSION: See above section.</para>
</listitem>
<listitem>
<para>JOB_NAME: Name of the job obtained from the
<classname>Job</classname> object. Because it is required to identify
the instance, it must not be null.</para>
</listitem>
<listitem>
<para>JOB_KEY: A serialization of the
<classname>JobParameters</classname> that uniquely identifies separate
instances of the same job from one another.
(<classname>JobInstances</classname> with the same job name</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>BATCH_JOB_PARAMS</title>
<para>The BATCH_JOB_PARAMS table holds all information relevant to the
JobParameters object. It contains 0 or more key/value pairs that together
uniquely identify a <classname>JobInstance</classname> and serve as a
record of the parameters a job was run with. It should be noted that the
table has been denormalized. Rather than creating a separate table for
each type, there is one table with a column indicating the type:</para>
<programlisting>CREATE TABLE BATCH_JOB_PARAMS (
JOB_INSTANCE_ID BIGINT NOT NULL ,
TYPE_CD VARCHAR(6) NOT NULL ,
KEY_NAME VARCHAR(100) NOT NULL ,
STRING_VAL VARCHAR(250) ,
DATE_VAL TIMESTAMP DEFAULT NULL,
LONG_VAL BIGINT ,
DOUBLE_VAL DOUBLE PRECISION,
constraint JOB_INSTANCE_PARAMS_FK foreign key (JOB_INSTANCE_ID)
references BATCH_JOB_INSTANCE(JOB_INSTANCE_ID)
);</programlisting>
<para>Below are descriptions for each column:</para>
<itemizedlist>
<listitem>
<para>JOB_INSTANCE_ID: Foreign Key from the BATCH_JOB_INSTANCE table
that indicates the job instance the parameter entry belongs to. It
should be noted that multiple rows (i.e key/value pairs) may exist for
each instance. </para>
</listitem>
<listitem>
<para>TYPE_CD: String representation of the type of value stored,
which can be either a character string, date, long, or double. Because
the type must be known, it cannot be null.</para>
</listitem>
<listitem>
<para>KEY_NAME: The Parameter key.</para>
</listitem>
<listitem>
<para>STRING_VAL: Parameter value, if the type is string.</para>
</listitem>
<listitem>
<para>DATE_VAL: Parameter value, if the type is date.</para>
</listitem>
<listitem>
<para>LONG_VAL: Parameter value, if the type is a long.</para>
</listitem>
<listitem>
<para>DOUBLE_VAL: Paramter value, if the type is double.</para>
</listitem>
</itemizedlist>
<para>It is worth noting that there is no primary key for this table. This
is simply because the framework has no use for one, and thus doesn't
require it. If a user so chooses, one may be added with a database
generated key, without causing any issues to the framework itself.</para>
</section>
<section>
<title>BATCH_JOB_EXECUTION</title>
<para>The BATCH_JOB_EXECUTION table holds all information relevant to the
<classname>JobExecution</classname> object. Every time a
<classname>Job</classname> is run there will always be a new
<classname>JobExecution</classname>, and a new row in this table:</para>
<programlisting>CREATE TABLE BATCH_JOB_EXECUTION (
JOB_EXECUTION_ID BIGINT PRIMARY KEY ,
VERSION BIGINT,
JOB_INSTANCE_ID BIGINT NOT NULL,
START_TIME TIMESTAMP DEFAULT NULL,
END_TIME TIMESTAMP DEFAULT NULL,
STATUS VARCHAR(10),
CONTINUABLE CHAR(1),
EXIT_CODE VARCHAR(20),
EXIT_MESSAGE VARCHAR(2500),
constraint JOB_INSTANCE_EXECUTION_FK foreign key (JOB_INSTANCE_ID)
references BATCH_JOB_INSTANCE(JOB_INSTANCE_ID)
) ;</programlisting>
<para>Below are descriptions for each column:</para>
<itemizedlist>
<listitem>
<para>JOB_EXECUTION_ID: Primary key that uniquely identifies this
execution. The value of this column should be obtainable by calling
the <methodname>getId</methodname> method of the
<classname>JobExecution</classname> object.</para>
</listitem>
<listitem>
<para>VERSION: See above section.</para>
</listitem>
<listitem>
<para>JOB_INSTANCE_ID: Foreign key from the BATCH_JOB_INSTANCE table
indicating the instance to which this execution belongs. There may be
more than one execution per instance.</para>
</listitem>
<listitem>
<para>START_TIME: Timestamp representing the time the execution was
started.</para>
</listitem>
<listitem>
<para>END_TIME: Timestamp representing the time the execution was
finished, regardless of success or failure. An empty value in this
column even though the job is not currently running indicates that
there has been some type of error and the framework was unable to
perform a last save before failing.</para>
</listitem>
<listitem>
<para>STATUS: Character string representing the status of the
execution. This may be COMPLETED, STARTED, etc. The object
representation of this column is the
<classname>BatchStatus</classname> enumeration.</para>
</listitem>
<listitem>
<para>CONTINUABLE: Character indicating whether or not the execution
is currently able to continue. 'Y' for yes and 'N' for no.</para>
</listitem>
<listitem>
<para>EXIT_CODE: Character string representing the exit code of the
execution. In the case of a command line job, this may be converted
into a number.</para>
</listitem>
<listitem>
<para>EXIT_MESSAGE: Character string representing a more detailed
description of how the job exited. In the case of failure, this might
include as much of the stack trace as is possible.</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>BATCH_STEP_EXECUTION</title>
<para>The BATCH_STEP_EXECUTION table holds all information relevant to the
<classname>StepExecution</classname> object. This table is very similar in
many ways to the BATCH_JOB_EXECUTION table and there will always be at
least one entry per <classname>Step</classname> for each
<classname>JobExecution</classname> created:</para>
<programlisting>CREATE TABLE BATCH_STEP_EXECUTION (
STEP_EXECUTION_ID BIGINT PRIMARY KEY ,
VERSION BIGINT NOT NULL,
STEP_NAME VARCHAR(100) NOT NULL,
JOB_EXECUTION_ID BIGINT NOT NULL,
START_TIME TIMESTAMP NOT NULL ,
END_TIME TIMESTAMP DEFAULT NULL,
STATUS VARCHAR(10),
COMMIT_COUNT BIGINT ,
ITEM_COUNT BIGINT ,
CONTINUABLE CHAR(1),
EXIT_CODE VARCHAR(20),
EXIT_MESSAGE VARCHAR(2500),
constraint JOB_EXECUTION_STEP_FK foreign key (JOB_EXECUTION_ID)
references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ;</programlisting>
<para>Below are descriptions for each column:</para>
<itemizedlist>
<listitem>
<para>STEP_EXECUTION_ID: Primary key that uniquely identifies this
execution. The value of this column should be obtainable by calling
the <methodname>getId</methodname> method of the
<classname>StepExecution</classname> object.</para>
</listitem>
<listitem>
<para>VERSION: See above section.</para>
</listitem>
<listitem>
<para>STEP_NAME: The name of the step to which this execution
belongs.</para>
</listitem>
<listitem>
<para>JOB_EXECUTION_ID: Foreign key from the BATCH_JOB_EXECUTION table
indicating the JobExecution to which this StepExecution belongs. There
may be only one <classname>StepExecution</classname> for a given
<classname>JobExecution</classname> for a given
<classname>Step</classname> name.</para>
</listitem>
<listitem>
<para>START_TIME: Timestamp representing the time the execution was
started. </para>
</listitem>
<listitem>
<para>END_TIME: Timestamp representing the time the execution was
finished, regardless of success or failure. An empty value in this
column even though the job is not currently running indicates that
there has been some type of error and the framework was unable to
perform a last save before failing.</para>
</listitem>
<listitem>
<para>STATUS: Character string representing the status of the
execution. This may be COMPLETED, STARTED, etc. The object
representation of this column is the
<classname>BatchStatus</classname> enumeration.</para>
</listitem>
<listitem>
<para>COMMIT_COUNT: The number of times in which the step has
committed a transaction during this execution.</para>
</listitem>
<listitem>
<para>ITEM_COUNT: The number of items that have been writtne out
during this execution.</para>
</listitem>
<listitem>
<para>CONTINUABLE: Character indicating whether or not the execution
is currently able to continue. 'Y' for yes and 'N' for no.</para>
</listitem>
<listitem>
<para>EXIT_CODE: Character string representing the exit code of the
execution. In the case of a command line job, this may be converted
into a number.</para>
</listitem>
<listitem>
<para>EXIT_MESSAGE: Character string representing a more detailed
description of how the job exited. In the case of failure, this might
include as much of the stack trace as is possible.</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>BATCH_STEP_EXECUTION_CONTEXT</title>
<para>The BATCH_STEP_EXECUTION_CONTEXT table holds all information
relevant to an <classname>ExecutionContext</classname>. There is exactly
one <classname>ExecutionContext</classname> per
<classname>StepExecution</classname>, and it contains all user defined
key/value pairs that need to persisted for a particular job run. This data
is usually state information that must be retrieved back after a failure
so that a JobInstance can 'start from where it left off'. As with the
BATCH_JOB_PARAMS table, this table has been denormalized and uses a column
to determine the type:</para>
<programlisting>CREATE TABLE BATCH_STEP_EXECUTION_CONTEXT (
STEP_EXECUTION_ID BIGINT NOT NULL ,
TYPE_CD VARCHAR(6) NOT NULL ,
KEY_NAME VARCHAR(1000) NOT NULL ,
STRING_VAL VARCHAR(1000) ,
DATE_VAL TIMESTAMP DEFAULT NULL ,
LONG_VAL VARCHAR(10) ,
DOUBLE_VAL DOUBLE PRECISION ,
OBJECT_VAL BLOB,
constraint STEP_EXECUTION_CONTEXT_FK foreign key (STEP_EXECUTION_ID)
references BATCH_STEP_EXECUTION(STEP_EXECUTION_ID)
) ;</programlisting>
<para>Below are descriptions for each column:</para>
<itemizedlist>
<listitem>
<para>STEP_EXECUTION_ID: Foreign key representing the
<classname>StepExecution</classname> to which the context belongs.
There may be more than one row associated to a given
<classname>StepExecution</classname>.</para>
</listitem>
<listitem>
<para>TYPE_CD: String representation of the type of value stored,
which can be either a character string, date, long, or double. Because
the type must be known, it cannot be null.</para>
</listitem>
<listitem>
<para>KEY_NAME: The Parameter key.</para>
</listitem>
<listitem>
<para>STRING_VAL: Parameter value, if the type is string.</para>
</listitem>
<listitem>
<para>DATE_VAL: Parameter value, if the type is date.</para>
</listitem>
<listitem>
<para>LONG_VAL: Parameter value, if the type is a long.</para>
</listitem>
<listitem>
<para>DOUBLE_VAL: Paramter value, if the type is double.</para>
</listitem>
<listitem>
<para>OBJECT_VAL: Parameter value, if the type is a blob.</para>
</listitem>
</itemizedlist>
<para>When an ExecutionContext is stored, values that are one of the well
known types above will be stored as their respective type. Any unknown
type will be serialized to a blob and stored in the OBJECT_VAL column. As
with BATCH_JOB_PARAMS, there is no primary key for this table. This is
simply because the framework has no use for one, and thus doesn't require
it. If a user so chooses, one may be added with a database generated key,
without causing any issues to the framework itself.</para>
</section>
</appendix>