Dynamic Query over a Large CSV File

Question
I am trying to extract certain records from a 4.5GB csv file and am running into some problems. (i) OpenOffice will only import 1 million records, and this file has more (ii) Even when working with only 1 million records, my 4GB RAM computer can’t handle it.

 

Answer
You can retrieve data from a large text file in a stream style in Java, but the code is complicated. Try handling it in SPL, the code will be rather simple. An example: find female employees who were born on an after January 1, 1981 from employee.csv. The code in SPL is like this:

 

A

1

=file("D:/employee.csv").cursor@tc()

2

=A1.select(BIRTHDAY>=date("1981-01-01")   && GENDER=="M")

3

=A1.fetch()

If the query condition is dynamic, A2’s code will be A1.select(${where}). The condition can be passed through the where parameter to achieve a dynamic query. If the query result is too large to be stored in the memory, we can change A3’s code into file(“D:/result.txt”).export(A2) to export the result directly into a file.

In SPL, you can also add more conditions to a SQL statement to query a csv file. For example, =connect().query("select * from D:/emplyee.csv where"+where).

You can use parallel processing to increase performance. See Parallel Computing for more details.