Allow a Locale to be set for formatting

The internal DataFormatter used from Apache POI allows for setting a fixed
Locale to use when reading numbers, dates etc. Initially we didn't set this
and thus it always used the default Locale as set from the Java runtime. This
could lead to unexpected results. We now allow to set the userLocale property
and use this to configure the DataFormatter used internally.

We also took the opportunity to make this work with the streaming ItemReader
as well as the regular ItemReader. The streaming item reader now also uses
a pre-configured DataFormatter.

Closes: #98
This commit is contained in:
Marten Deinum
2022-12-06 15:25:01 +01:00
parent e8c67a115d
commit 77f441b5c1
7 changed files with 143 additions and 82 deletions

View File

@@ -1,4 +1,4 @@
# spring-batch-excel
= spring-batch-excel
Spring Batch extension containing an `ItemReader` implementation for Excel based on https://poi.apache.org[Apache POI]. It supports reading both XLS and XLSX files. For the latter, there is also (experimental) streaming support.
@@ -8,26 +8,28 @@ To reduce the memory footprint the `StreamingXlsxItemReader` can be used, this w
NOTE: The `ItemReader` classess are **not threadsafe**. The API from https://poi.apache.org/help/faq.html#20[Apache POI] itself isn't threadsafe as well as the https://docs.spring.io/spring-batch/docs/current/api/org/springframework/batch/item/support/AbstractItemCountingItemStreamItemReader.html[`AbstractItemCountingItemStreamItemReader`] used as a base class for the `ItemReader` classes. Reading from multiple threads is therefore not supported. Using a multi-threaded processor/writer should work as long as you use a single thread for reading.
## Configuration of `PoiItemReader`
== Configuration of `PoiItemReader`
Next to the https://docs.spring.io/spring-batch/reference/html/configureJob.html[configuration of Spring Batch] one needs to configure the `PoiItemReader`.
Configuration of can be done in XML or Java Config.
### XML
=== XML
```xml
[source,xml]
----
<bean id="excelReader" class="org.springframework.batch.extensions.excel.poi.PoiItemReader" scope="step">
<property name="resource" value="file:/path/to/your/excel/file" />
<property name="rowMapper">
<bean class="org.springframework.batch.extensions.excel.mapping.PassThroughRowMapper" />
</property>
</bean>
```
----
### Java Config
=== Java Config
```java
[source,java]
----
@Bean
@StepScope
public PoiItemReader excelReader() {
@@ -41,32 +43,34 @@ public PoiItemReader excelReader() {
public RowMapper rowMapper() {
return new PassThroughRowMapper();
}
```
----
## Configuration of `StreamingXlsxItemReader`
== Configuration of `StreamingXlsxItemReader`
Configuration can be done in XML or Java Config.
### XML
=== XML
```xml
[source,xml]
----
<bean id="excelReader" class="org.springframework.batch.extensions.excel.streaming.StreamingXlsxItemReader" scope="step">
<property name="resource" value="file:/path/to/your/excel/file" />
<property name="rowMapper">
<bean class="org.springframework.batch.extensions.excel.mapping.PassThroughRowMapper" />
</property>
</bean>
```
----
### Java Config
=== Java Config
```java
[source,java]
----
@Bean
@StepScope
public StreamingXlsxItemReader excelReader() {
public StreamingXlsxItemReader excelReader(RowMapper rowMapper) {
StreamingXlsxItemReader reader = new StreamingXlsxItemReader();
reader.setResource(new FileSystemResource("/path/to/your/excel/file"));
reader.setRowMapper(rowMapper());
reader.setRowMapper(rowMapper);
return reader;
}
@@ -74,10 +78,10 @@ public StreamingXlsxItemReader excelReader() {
public RowMapper rowMapper() {
return new PassThroughRowMapper();
}
```
----
## Configuration properties
== Configuration properties
[cols="1,1,1,4"]
.Properties for item readers
|===
@@ -91,28 +95,30 @@ public RowMapper rowMapper() {
| `rowSetFactory` | no | `DefaultRowSetFactory` | For reading rows a `RowSet` abstraction is used. To construct a `RowSet` for the current `Sheet` a `RowSetFactory` is needed. The `DefaultRowSetFactory` constructs a `DefaultRowSet` and `DefaultRowSetMetaData`. For construction of the latter a `ColumnNameExtractor` is needed. At the moment there are 2 implementations
| `skippedRowsCallback` | no | `null` | When rows are skipped an optional `RowCallbackHandler` is called with the skipped row. This comes in handy when one needs to write the skipped rows to another file or create some logging.
| `strict` | no | `true` | This controls wether or not an exception is thrown if the file doesn't exists or isn't readable, by default an exception will be thrown.
| `datesAsIso` | no | `false` | Controls if dates need to be parsed as ISO or to use the format as specified in the excel sheet. *NOTE:* Only for the `PoiItemReader` **not** the `StreamingXlsxReader`!
| `datesAsIso` | no | `false` | Controls if dates need to be parsed as ISO or to use the format as specified in the excel sheet.
| `userLocale` | no | `null` | Set the `java.util.Locale` to use when formatting dates when there is no explicit format set in the Excel document.
|===
- `StaticColumnNameExtractor` uses a preset list of column names.
- `StaticColumnNameExtractor` uses a preset list of column names.
- `RowNumberColumnNameExtractor` (**the default**) reads a given row (default 0) to determine the column names of the current sheet
## RowMappers
== RowMappers
To map a read row a `RowMapper` is needed. Out-of-the-box there are 2 implementations. The `PassThroughRowMapper` and `BeanWrapperRowMapper`.
### PassThroughRowMapper
=== PassThroughRowMapper
Transforms the read row from excel into a `String[]`.
### BeanWrapperRowMapper
=== BeanWrapperRowMapper
Uses a `BeanWrapper` to convert a given row into an object. Uses the column names of the given `RowSet` to map column to properties of the `targetType` or prototype bean.
```java
[source,xml]
----
<bean id="excelReader" class="org.springframework.batch.extensions.excel.poi.PoiItemReader" scope="step">
<property name="resource" value="file:/path/to/your/excel/file" />
<property name="rowMapper">
<bean class="org.springframework.batch.extensions.excel.mapping.BeanWrapperRowMapper">
<property name="targetType" value="com.your.package.Player" />
<bean>
</bean>
</property>
</bean>
```
----

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2006-2021 the original author or authors.
* Copyright 2006-2022 the original author or authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -16,8 +16,11 @@
package org.springframework.batch.extensions.excel;
import java.util.Locale;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.springframework.batch.extensions.excel.support.rowset.DefaultRowSetFactory;
import org.springframework.batch.extensions.excel.support.rowset.RowSet;
@@ -65,6 +68,12 @@ public abstract class AbstractExcelItemReader<T> extends AbstractItemCountingIte
private String password;
private boolean datesAsIso = false;
private Locale userLocale;
private DataFormatter dataFormatter;
public AbstractExcelItemReader() {
super();
this.setName(ClassUtils.getShortName(this.getClass()));
@@ -213,6 +222,16 @@ public abstract class AbstractExcelItemReader<T> extends AbstractItemCountingIte
public void afterPropertiesSet() throws Exception {
Assert.notNull(this.rowMapper, "RowMapper must be set");
if (this.datesAsIso) {
this.dataFormatter = (this.userLocale != null) ? new IsoFormattingDateDataFormatter(this.userLocale) : new IsoFormattingDateDataFormatter();
}
else {
this.dataFormatter = (this.userLocale != null) ? new DataFormatter(this.userLocale) : new DataFormatter();
}
}
protected DataFormatter getDataFormatter() {
return this.dataFormatter;
}
/**
@@ -295,4 +314,20 @@ public abstract class AbstractExcelItemReader<T> extends AbstractItemCountingIte
this.password = password;
}
/**
* Instead of using the format defined in the Excel sheet, read the date/time fields as an ISO formatted
* string instead. This is by default {@code false} to leave the original behavior.
* @param datesAsIso default {@code false}
*/
public void setDatesAsIso(boolean datesAsIso) {
this.datesAsIso = datesAsIso;
}
/**
* The {@code Locale} to use when reading sheets. Defaults to the platform default as set by Java.
* @param userLocale the {@code Locale} to use, default {@code null}
*/
public void setUserLocale(Locale userLocale) {
this.userLocale = userLocale;
}
}

View File

@@ -0,0 +1,67 @@
/*
* Copyright 2011-2022 the original author or authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.springframework.batch.extensions.excel;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Locale;
import org.apache.poi.ss.formula.ConditionalFormattingEvaluator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.FormulaEvaluator;
/**
* Specialized subclass for additionally formatting the date into an ISO date/time.
*
* @author Marten Deinum
*
* @see DateTimeFormatter#ISO_OFFSET_DATE_TIME
*/
public class IsoFormattingDateDataFormatter extends DataFormatter {
public IsoFormattingDateDataFormatter() {
super();
}
public IsoFormattingDateDataFormatter(Locale locale) {
super(locale);
}
@Override
public String formatCellValue(Cell cell, FormulaEvaluator evaluator, ConditionalFormattingEvaluator cfEvaluator) {
if (cell == null) {
return "";
}
CellType cellType = cell.getCellType();
if (cellType == CellType.FORMULA) {
if (evaluator == null) {
return cell.getCellFormula();
}
cellType = evaluator.evaluateFormulaCell(cell);
}
if (cellType == CellType.NUMERIC && DateUtil.isCellDateFormatted(cell, cfEvaluator)) {
LocalDateTime value = cell.getLocalDateTimeCellValue();
return (value != null) ? value.format(DateTimeFormatter.ISO_OFFSET_DATE_TIME) : "";
}
return super.formatCellValue(cell, evaluator, cfEvaluator);
}
}

View File

@@ -44,11 +44,9 @@ public class PoiItemReader<T> extends AbstractExcelItemReader<T> {
private InputStream inputStream;
private boolean datesAsIso = false;
@Override
protected Sheet getSheet(final int sheet) {
return new PoiSheet(this.workbook.getSheetAt(sheet), this.datesAsIso);
return new PoiSheet(this.workbook.getSheetAt(sheet), getDataFormatter());
}
@Override
@@ -92,12 +90,4 @@ public class PoiItemReader<T> extends AbstractExcelItemReader<T> {
this.workbook.setMissingCellPolicy(Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
}
/**
* Instead of using the format defined in the Excel sheet, read the date/time fields as an ISO formatted
* string instead. This is by default {@code false} to leave the original behavior.
* @param datesAsIso default {@code false}
*/
public void setDatesAsIso(boolean datesAsIso) {
this.datesAsIso = datesAsIso;
}
}

View File

@@ -16,17 +16,13 @@
package org.springframework.batch.extensions.excel.poi;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import org.apache.poi.ss.formula.ConditionalFormattingEvaluator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.FormulaEvaluator;
import org.apache.poi.ss.usermodel.Row;
@@ -42,12 +38,8 @@ import org.springframework.lang.Nullable;
class PoiSheet implements Sheet {
private final DataFormatter dataFormatter;
private final org.apache.poi.ss.usermodel.Sheet delegate;
private final boolean datesAsIso;
private final int numberOfRows;
private final String name;
private FormulaEvaluator evaluator;
@@ -55,15 +47,14 @@ class PoiSheet implements Sheet {
/**
* Constructor which takes the delegate sheet.
* @param delegate the apache POI sheet
* @param datesAsIso should we format the dates as ISO or use the Excel formatting instead
* @param dataFormatter the {@code DataFormatter} to use.
*/
PoiSheet(final org.apache.poi.ss.usermodel.Sheet delegate, boolean datesAsIso) {
PoiSheet(final org.apache.poi.ss.usermodel.Sheet delegate, DataFormatter dataFormatter) {
super();
this.delegate = delegate;
this.datesAsIso = datesAsIso;
this.numberOfRows = this.delegate.getLastRowNum() + 1;
this.name = this.delegate.getSheetName();
this.dataFormatter = this.datesAsIso ? new IsoFormattingDateDataFormatter() : new DataFormatter();
this.dataFormatter = dataFormatter;
}
/**
@@ -142,33 +133,4 @@ class PoiSheet implements Sheet {
};
}
/**
* Specialized subclass for additionally formatting the date into an ISO date/time.
*
* @author Marten Deinum
* @see DateTimeFormatter#ISO_OFFSET_DATE_TIME
*/
private static class IsoFormattingDateDataFormatter extends DataFormatter {
@Override
public String formatCellValue(Cell cell, FormulaEvaluator evaluator, ConditionalFormattingEvaluator cfEvaluator) {
if (cell == null) {
return "";
}
CellType cellType = cell.getCellType();
if (cellType == CellType.FORMULA) {
if (evaluator == null) {
return cell.getCellFormula();
}
cellType = evaluator.evaluateFormulaCell(cell);
}
if (cellType == CellType.NUMERIC && DateUtil.isCellDateFormatted(cell, cfEvaluator)) {
LocalDateTime value = cell.getLocalDateTimeCellValue();
return (value != null) ? value.format(DateTimeFormatter.ISO_OFFSET_DATE_TIME) : "";
}
return super.formatCellValue(cell, evaluator, cfEvaluator);
}
}
}

View File

@@ -28,6 +28,7 @@ import javax.xml.stream.XMLStreamReader;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.util.CellRangeAddress;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler;
@@ -64,11 +65,11 @@ class StreamingSheet implements Sheet {
private int colCount;
StreamingSheet(String name, InputStream is, SharedStrings sharedStrings, Styles styles) {
StreamingSheet(String name, InputStream is, SharedStrings sharedStrings, Styles styles, DataFormatter dataFormatter) {
this.name = name;
this.is = is;
this.contentHandler = new ValueRetrievingContentsHandler();
this.sheetHandler = new XSSFSheetXMLHandler(styles, sharedStrings, this.contentHandler, false);
this.sheetHandler = new XSSFSheetXMLHandler(styles, sharedStrings, this.contentHandler, dataFormatter, false);
try {
this.reader = StaxUtils.createDefensiveInputFactory().createXMLStreamReader(is);

View File

@@ -87,7 +87,7 @@ public class StreamingXlsxItemReader<T> extends AbstractExcelItemReader<T> {
while (iter.hasNext()) {
InputStream is = iter.next();
String name = iter.getSheetName();
this.sheets.add(new StreamingSheet(name, is, sharedStrings, styles));
this.sheets.add(new StreamingSheet(name, is, sharedStrings, styles, getDataFormatter()));
}
if (this.logger.isTraceEnabled()) {