Compare two csv files
Here are two large, same-structure csv files A and B. Their primary keys are Name & Dept fields. The two have some different records.
A.csv |
B.csv |
Name,Dept,Salary Jonathan,Administration,7 Alexis,Administration,16000 Timothy,Administration,0 Michael,Administration,0 Alexis_,Administration,0 Ashley,Finance,11000 |
Name,Dept,Salary Jonathan,Administration,7 Alexis,Administration,16 Timothy,Finance,5000 Ashley,Finance,11000 Daniel,HR,1600 Joseph_,Finance,1600 |
Use Java to compare primary keys of the two files to find records that exist in A but that does not exist in B according to the key values.
Name |
Dept |
Salary |
Alexis_ |
Administration |
0 |
Michael |
Administration |
0 |
Timothy |
Administration |
0 |
Write the following SPL code:
A |
||
1 |
=T@c("A.csv") |
=T@c("B.csv") |
2 |
=A1.sortx(Name,Dept) |
=B1.sortx(Name,Dept) |
3 |
=[A2,B2].merge@d(Name,Dept).fetch() |
T()function parses a csv file; @c option enables retrieving data from a file that does not fit into the memory. sortx() function sorts data in a cursor. merge() function merges two cursors; @d enables calculating the difference.
The logic of the above code can be also expressed in a single SPL statement:
=[T@c(""A.csv"").sortx(Name,Dept),T@c(""B.csv"").sortx(Name,Dept)].merge@d(Name,Dept).fetch()
Read How to Call a SPL Script in Java to find how to integrate SPL into a Java application.
Source:https://stackoverflow.com/questions/75987204/efficiently-comparing-two-large-java-lists-to-find-unique-items
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL