Write millions of rows in an excel file using Postgres Function, Java, POI Stream API
This article will explain how to design and develop a program/utility which can write millions of record in a excel file efficiently.
In the legacy system there is no JPA entity and we need to call the custom Database Function/Query as a Data Source. I will show how to write large data set to an Excel file using Postgres Database Function, Apache POI Stream API (SXSSF), Java Stream API and Java 8.
The theoretical concept and knowledge have been borrowed from above attached hyperlink URL. Kindly follow these link if interested. I will moving directly to the implementation.
Lets follow the below steps to achieve the objective:
Create a Postgres Function which will return the records in table format. follow this link to get the sample SQL code.
Here I am using SQL Function as Table Sources so that I can Stream the Records instead of loading all records in memory. If we use others source like ref cursor then we have to load all record in heap memory and we will run out with OutOfMemoryError.
To call the Database Function, I had used JPA EntityManager as below:
Recommended by LinkedIn
@Transactional(readOnly = true
public void processReportCreation() {
StoredProcedureQuery query = entityManager
.createStoredProcedureQuery("fun_get_customer_table")
.setHint(QueryHints.HINT_FETCH_SIZE, "1")
.setHint(QueryHints.HINT_CACHEABLE, "false")
.setHint(QueryHints.HINT_READONLY, "true");
// Registered Function Input Parameter Type and their Position
query.registerStoredProcedureParameter(1, String.class, ParameterMode.IN);
query.registerStoredProcedureParameter(2, String.class, ParameterMode.IN);
query.registerStoredProcedureParameter(3, String.class, ParameterMode.IN);
query.registerStoredProcedureParameter(4, String.class, ParameterMode.IN);
query.registerStoredProcedureParameter(5, String.class, ParameterMode.IN);
// Set Input Parameters
query.setParameter(1, "");
query.setParameter(2, "");
query.setParameter(3, "");
query.setParameter(4, "");
query.setParameter(5, "");
try (Stream<Object[]> stream = query.getResultStream()) {
stream.forEach(o -> {
// do excel report processing
});
}
})
If you pay attention, you will see that we disable second level caching and hint Hibernate that the record will be read only.
The @Transactional annotation is necessary to make streaming work. Otherwise, Spring throws runtime exception. For this example, we pass readOnly to the annotation since we are not aiming to modify the entity. The try with resource block will auto close the stream.
Now I will use the POI stream API SXSSFWorkbook to create the excel file, The advantage of SXSSFWorkbook is it will be keeping only define number(default is 100) of row in memory. Using the size of row you will avoid the OutOfMemoryError issue.
In the below class, the method writes to a sheet with a window of 100 rows. When the row count reaches 101, the row with rownum=0 is flushed to disk and removed from memory, when rownum reaches 102 then the row with rownum=1 is flushed, etc.
public void createExcelFile(String fileName)
try (SXSSFWorkbook workbook = new SXSSFWorkbook(SXSSFWorkbook.DEFAULT_WINDOW_SIZE/*100*/)) {
Sheet sheet = workbook.createSheet("sheet1");
// add row number
// addHeaders(sheet, rowNum);
// add row number
// add value row
try (FileOutputStream out = new FileOutputStream(fileName)) {
workbook.write(out);
}
// dispose of temporary files backing this workbook on disk
workbook.dispose();
} catch (Exception e) {
e.printStackTrace();
}
}{
If we follow above steps and run for 2 millions records then you will get an Exception. Because the excel has some limitation with row number kindly write in comment what are the allowed max number of row in a excel sheet.
I can not new sxssfworkbook() inside k8s, maybe increase memory size.