CSV File Comparison
A CSV file stores data in a similar structure as a database does but it is more convenient to read and write. It is thus preferred in many cases, such as temporary result output or real-time recording of log-style data. However, we still need to input data to the database to further process it, which offsets the file’s advantages.
esProc can handle a CSV file directly, basically or in a more complex way such as comparing two CSV files.
Suppose there is a simple sales system, in which data is entered, modified and deleted at the frontend. Every day the files are backed up and archived. Now we want to find the newly-added, canceled and modified orders in a certain period. The following illustrates how esProc does it.
The files belong to March, 2015 – an earlier one named old.csv and a later one named new.csv. Both use userName and date as their logical primary keys. Below are the source files:
old.csv |
new.csv |
|
1 2 3 4 5 6 7 8 9 |
userName,date,saleValue,saleCount Rachel,2015-03-01,4500,9 Rachel,2015-03-03,8700,4 Tom,2015-03-02,3000,8 Tom,2015-03-03,5000,7 Tom,2015-03-04,6000,12 John,2015-03-02,4000,3 John,2015-03-02,4300,9 John,2015-03-04,4800,4 |
userName,date,saleValue,saleCount Rachel,2015-03-01,4500,9 Rachel,2015-03-02,5000,5 Ashley,2015-03-01,6000,5 Rachel,2015-03-03,11700,4 Tom,2015-03-03,5000,7 Tom,2015-03-04,6000,12 John,2015-03-02,4000,3 John,2015-03-02,4300,9 John,2015-03-04,4800,4 |
We can see that in new.csv line 2 and line 3 are newly-entered and line 4 is modified and that in old.csv line 3 is deleted.
To get the newly-added, modified and deleted rows, we use this esProc SPL script:
A |
B |
|
1 |
=file("d:\\old.csv").import@t(;,",") |
=file("d:\\new.csv").import@t(;,",") |
2 |
=A1.sort(userName,date) |
=B1.sort(userName,date) |
3 |
=new=[B2,A2].merge@d(userName,date) |
|
4 |
=delete=[A2,B2].merge@d(userName,date) |
|
5 |
=diff=[B2,A2].merge@d(userName,date,saleValue,saleCount) |
|
6 |
=update=[A5,new].merge@d(userName,date) |
return update |
A1,B1: Read in files with comma as the separator.
A2,B2: Sort each file by the primary key because merge() function requires that data be ordered.
A3: merge() function merges data sets together; @d option enables finding their difference. Similarly, @u option gets union and @i gets intersection. To find new records is to find the difference between the new records and the old records:
A4: To find the deleted record is to get the difference between the old records and the new records according to the primary key values:
A5: To get the modified records is to treat the primary keys as ordinary fields to get the difference between the two files:
A6: To get the updated record is to find the difference between A5’s records and A3’s new records:
B6: Return A6 to your Java application or reporting tool.
You can integrate the SPL script into your Java application via esProc JDBC using the following Java code:
//Establish esProc jdbc connection
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
//Call esProc SPL script test which accepts a parameter value
st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test()"); com.esproc.jdbc.InternalCStatement st = (com.esproc.jdbc.InternalCStatement)con.prepareCall("call test()");
st.execute();//execute esProc stored procedure
ResultSet set = st.getResultSet();//Get the result set
To return multiple result set to a Java application, B6 can be return new,delete,update.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL