Compare Two CSV Files

Question

Scenario - I want to compare 2 CSV files in Mule. Find details like rows added, deleted or updated. I have searched but there is no such feature/component. Wondering if we can use Bash script in mule? The option known to me is through a Java component. But I want better suggestions or ideas. Please suggest pointers to give a start.

 

Answer

To find the difference between the two CSV files is basically performing set operations over structured data. Java doesn’t offer ready-to-use functions to do this, so the process will be complicated. Try using SPL (Structured Process Language) to do this and return result to the Java application. For example, the following SPL script finds newly-added rows by the composite primary key - userName\date:

A

B

1

=file("D:\\old.csv").import@t(;",").sort(userName,date)

=file("D:\\new.csv").import@t(;",").sort(userName,date)

2

=[B1,A1].merge@d(userName,date)

You can also find the deleted or updated rows in SPL. The SPL script can be used as a Java class library for processing structured data, and it can be easily embedded into a Java application.