" A CSV file stores data in a similar structure as a database does but it is more convenient to .."

pjf RaqForum 22 No.
1 Reply • 1,611 View • 5 Years ago

CSV File Comparison

A CSV file stores data in a similar structure as a database does but it is more convenient to read and write. It is thus preferred in many cases, such as temporary result output or real-time recording of log-style data. However, we still need to input data to the database to further process it, which offsets the file’s advantages.

esProc can handle a CSV file directly, basically or in a more complex way such as comparing two CSV files.

Suppose there is a simple sales system, in which data is entered, modified and deleted at the frontend. Every day the files are backed up and archived. Now we want to find the newly-added, canceled and modified orders in a certain period. The following illustrates how esProc does it.

The files belong to March, 2015 – an earlier one named old.csv and a later one named new.csv. Both use userName and date as their logical primary keys. Below are the source files:

old.csv

new.csv

userName,date,saleValue,saleCount

Rachel,2015-03-01,4500,9

Rachel,2015-03-03,8700,4

Tom,2015-03-02,3000,8

Tom,2015-03-03,5000,7

Tom,2015-03-04,6000,12

John,2015-03-02,4000,3

John,2015-03-02,4300,9

John,2015-03-04,4800,4

userName,date,saleValue,saleCount

Rachel,2015-03-01,4500,9

Rachel,2015-03-02,5000,5

Ashley,2015-03-01,6000,5

Rachel,2015-03-03,11700,4

Tom,2015-03-03,5000,7

Tom,2015-03-04,6000,12

John,2015-03-02,4000,3

John,2015-03-02,4300,9

John,2015-03-04,4800,4

We can see that in new.csv line 2 and line 3 are newly-entered and line 4 is modified and that in old.csv line 3 is deleted.

To get the newly-added, modified and deleted rows, we use this esProc SPL script:

	A	B
1	=file("d:\\old.csv").import@t(;,",")	=file("d:\\new.csv").import@t(;,",")
2	=A1.sort(userName,date)	=B1.sort(userName,date)
3	=new=[B2,A2].merge@d(userName,date)
4	=delete=[A2,B2].merge@d(userName,date)
5	=diff=[B2,A2].merge@d(userName,date,saleValue,saleCount)
6	=update=[A5,new].merge@d(userName,date)	return update

A1,B1: Read in files with comma as the separator.

A2,B2: Sort each file by the primary key because merge() function requires that data be ordered.

A3: merge() function merges data sets together; @d option enables finding their difference. Similarly, @u option gets union and @i gets intersection. To find new records is to find the difference between the new records and the old records:

undefined

A4: To find the deleted record is to get the difference between the old records and the new records according to the primary key values:

A5: To get the modified records is to treat the primary keys as ordinary fields to get the difference between the two files:

undefined

A6: To get the updated record is to find the difference between A5’s records and A3’s new records:

B6: Return A6 to your Java application or reporting tool.

You can integrate the SPL script into your Java application via esProc JDBC using the following Java code:

    //Establish esProc jdbc connection
    Class.forName("com.esproc.jdbc.InternalDriver");
    con= DriverManager.getConnection("jdbc:esproc:local://");
    //Call esProc SPL script test which accepts a parameter value
    st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test()"); com.esproc.jdbc.InternalCStatement st = (com.esproc.jdbc.InternalCStatement)con.prepareCall("call test()");
    st.execute();//execute esProc stored procedure
    ResultSet set = st.getResultSet();//Get the result set

To return multiple result set to a Java application, B6 can be return new,delete,update.

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/2bkGwqTj

Youtube 👉 https://www.youtube.com/@esProc_SPL

esProc

pjf • 1,611 View • 5 Years ago