. Generate report with big CSV in BIRT

Scenario:

I use BIRT RCP 4.2.1 for creating and test reports. To automatically generate a report with a big CSV, I try to modify the Java options to increase the memory parameters “-Xms512m -Xmx768m -XX:MaxPermSize=512m” - but is not solved my issue. The CSV file is very big. Sometimes the file has less 400000 strings (~ 20 Mb), and reports generate successfully. But now I’d like to create a report for a more big-time period - one week. The CSV file has ~ 2000000 strings and has a size of 140 Mb. Even when working with only 1 million records, my 8Gb RAM computer can’t handle it.

Solution:

Query over a Large CSV File with esProc SPL script.

You can change the way of retrieving data with esProc. In a stream style in esProc SPL, the computer only needs a small memory to handle a large file.

For example: find female employees who were born on an after January 1, 1981 from employee.csv. The code in SPL is like this:


A
1 =file("D:/employee.csv").cursor@tc()
2 =A1.select(BIRTHDAY>=date("1981-01-01")&&GENDER=="M")
3 =A1.fetch()

Explanation:
A1: Import big data in batches with the cursor, which is the method for big data computing. File cursor basically work the same way as usual database cursor.

A2: If the query condition is dynamic, A2’s code will be A1.select(${where}). The condition can be passed through the where parameter to achieve a dynamic query.

imagepng

${where} is a macro that is used to dynamically parse the expression, in which “where” is the input parameter. esProc will compute the expression in ${…} to get the macro string value and, replace ${…} with it, and then interpret and execute the generated expression. The final code for execution is =A1.select(BIRTHDAY>=date(1981,1,1) && GENDER==“M”).

A3: Fetch all of the data from the cursor and return the desired result set to Birt.

Beside, you can write a SQL query in esProc SPL to query a CSV file, like this:


A

1

$select * from ./employee.csv where BIRTHDAY>=date(1981-01-01) and GENDER=‘M’


The report can be designed in the same way as you would if you were retrieving the data from a database. For detail SPL integration with BIRT, see How to Call an SPL Script in BIRT.

For many difficult calculations of Text Computing, you can refer to Structured Text Computing.

If you have any questions or comments, please leave them below.