Dynamic Query over a Large CSV File
【Question】
I am trying to extract certain records from a 4.5GB csv file and am running into some problems. (i) OpenOffice will only import 1 million records, and this file has more (ii) Even when working with only 1 million records, my 4GB RAM computer can’t handle it.
【Answer】
You can retrieve data from a large text file in a stream style in Java, but the code is complicated. Try handling it in SPL, the code will be rather simple. An example: find female employees who were born on an after January 1, 1981 from employee.csv. The code in SPL is like this:
A |
|
1 |
=file("D:/employee.csv").cursor@tc() |
2 |
=A1.select(BIRTHDAY>=date("1981-01-01") && GENDER=="M") |
3 |
=A1.fetch() |
If the query condition is dynamic, A2’s code will be A1.select(${where}). The condition can be passed through the where parameter to achieve a dynamic query. If the query result is too large to be stored in the memory, we can change A3’s code into file(“D:/result.txt”).export(A2) to export the result directly into a file.
In SPL, you can also add more conditions to a SQL statement to query a csv file. For example, =connect().query("select * from D:/emplyee.csv where"+where).
You can use parallel processing to increase performance. See Parallel Computing for more details.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL