Find Difference Between Two Text Files – Case 3

Question
I have two text files:  file1.txt and file2.txt:

 

file1.txt 

--------- 

Syed 

Sheethal 

Mirko 

Rathod 

.

 

file2.txt 

--------- 

Syed 

Vijay 

Akash 

 

Both files have millions of records. I need to do “file1.txt - file2.txt”.
Can anybody give me logically best approach?

Thanks & Regards,
Syed

 

Answer
Your question is to find difference, which is a set operation. Java lacks set operation class library and thus needs a lot of code to implement it. Try using esProc SPL (Structured Process Language) to handle the difference operation: Below is SPL script. It is simple.

A

1

=file("e:\\f1.txt").cursor().sortx(_1)

2

=file("e:\\f2.txt").cursor().sortx(_1)

3

result [A1,A2].mergex@xd(1)

A3: JAVA Find difference between A1 and A2 and return result to Java.

 

Here we assume that both files (billions of lines) are large and can’t be loaded into the memory at one time. So they need to be first sorted to make the operation faster. If the files ae relatively small, you don’t need to perform sort and can perform the operation using isect() function.

esProc offers a series of set operation functions to handle related computations. An SPL script can integrate with Java via esProc JDBC. See How to Call an SPL Script in Java to learn more.