Align Two CSV Files by a Specific Column to Compare and Output the Differences

 

Problem description & analysis

data1.csv AND data2.csv have same structure (same column names) and both use id column to uniquely identify records. The two tables have same id values but could have different values in the other columns.

data1.csv

id,v1,v2

1001,1001_v1,1001_v2

1002,1002_v1,1002_v2

1003,1003_v1,1003_v2

1004,1004_v1,1004_v2

1005,1005_v1,1005_v2

data2.csv

id,v1,v2

1005,1005_v2,1005_v1

1001,1001_v1,1001_v2

1002,1002_v1c,1002_v2

1003,1003_v1c,1003_v2c

1004,1004_v1,1004_v2

The computing task is to find differences of the two files based on the other columns and output the result to result.csv. Here is the specific rule: For records of same id values in the two files, if the one in data2.csv has any value in the other column is different from the counterpart in data1.csv, then output the record of data2.csv. Below is part of the result file:

result.csv

id,v1,v2

1002,1002_v1c,1002_v2

1003,1003_v1c,1003_v2c

1005,1005_v2,1005_v1

The algorithm involves order-based merge and difference operation.

Solution

We write the following script in esProc:

A

1

=file("041/data1.csv").import@ct()

2

=file("041/data2.csv").import@ct()

3

=[A2,A1].merge@od()

4

=file("result.csv").export@ct(A3)

Merge file1 and file2 using @o option, which merges unordered memory data, and @d option, which performs difference intersection. With no to-be-compared columns specified, the function compares the whole records.

The script can also be written in the command line:

d:\raqsoft64\esProc\bin>   .\esprocx  -r   file(\"result.csv\").export@ct([file(\"data2.csv\").import@ct(),file(\"data1.csv\").import@ct()].merge@d())

Q & A Collection

https://stackoverflow.com/questions/63887147/comparing-2-csv-files-in-powershell-and-output-the-differences