CSV File Comparison

 

  A CSV file stores data in a similar structure as a database does but it is more convenient to read and write. It is thus preferred in many cases, such as temporary result output or real-time recording of log-style data. However, we still need to input data to the database to further process it, which offsets the file’s advantages.

  esProc can handle a CSV file directly, basically or in a more complex way such as comparing two CSV files.

  Suppose there is a simple sales system, in which data is entered, modified and deleted at the frontend. Every day the files are backed up and archived. Now we want to find the newly-added, canceled and modified orders in a certain period. The following illustrates how esProc does it.

  The files belong to March, 2015 – an earlier one named old.csv and a later one named new.csv. Both use userName and date as their logical primary keys. Below are the source files:


old.csv

new.csv

 

1

2

3

4

5

6

7

8

9

userName,date,saleValue,saleCount

Rachel,2015-03-01,4500,9

Rachel,2015-03-03,8700,4

Tom,2015-03-02,3000,8

Tom,2015-03-03,5000,7

Tom,2015-03-04,6000,12

John,2015-03-02,4000,3

John,2015-03-02,4300,9

John,2015-03-04,4800,4

userName,date,saleValue,saleCount

Rachel,2015-03-01,4500,9

Rachel,2015-03-02,5000,5

Ashley,2015-03-01,6000,5

Rachel,2015-03-03,11700,4

Tom,2015-03-03,5000,7

Tom,2015-03-04,6000,12

John,2015-03-02,4000,3

John,2015-03-02,4300,9

John,2015-03-04,4800,4

  We can see that in new.csv line 2 and line 3 are newly-entered and line 4 is modified and that in old.csv line 3 is deleted.

  To get the newly-added, modified and deleted rows, we use this esProc SPL script:


A

B

1

=file("d:\\old.csv").import@t(;,",")

=file("d:\\new.csv").import@t(;,",")

2

=A1.sort(userName,date)

=B1.sort(userName,date)

3

=new=[B2,A2].merge@d(userName,date)


4

=delete=[A2,B2].merge@d(userName,date)


5

=diff=[B2,A2].merge@d(userName,date,saleValue,saleCount)

6

=update=[A5,new].merge@d(userName,date)

return update

  A1,B1: Read in files with comma as the separator.

  A2,B2: Sort each file by the primary key because merge() function requires that data be ordered.

  A3: merge() function merges data sets together; @d option enables finding their difference. Similarly, @u option gets union and @i gets intersection. To find new records is to find the difference between the new records and the old records:

  undefined

  A4: To find the deleted record is to get the difference between the old records and the new records according to the primary key values:

  undefined

  A5: To get the modified records is to treat the primary keys as ordinary fields to get the difference between the two files:

  undefined

  A6: To get the updated record is to find the difference between A5’s records and A3’s new records:

  undefined

 

  B6: Return A6 to your Java application or reporting tool.

 

  You can integrate the SPL script into your Java application via esProc JDBC using the following Java code:

    //Establish esProc jdbc connection
    Class.forName("com.esproc.jdbc.InternalDriver");
    con= DriverManager.getConnection("jdbc:esproc:local://");
    //Call esProc SPL script test which accepts a parameter value
    st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test()"); com.esproc.jdbc.InternalCStatement st = (com.esproc.jdbc.InternalCStatement)con.prepareCall("call test()");
    st.execute();//execute esProc stored procedure
    ResultSet set = st.getResultSet();//Get the result set

  To return multiple result set to a Java application, B6 can be return new,delete,update.