Remove Duplicate Data from a CSV File

 

Problem description & analysis

Below is CSV file csv.csv:

a,123,value1,a@email.com

a,123,value1,a@email.com

a,123,value1,a@email.com

a,123,Value7,a@email.com

b,567,Value5,b@email.com

b,567,Value6,b@email.com

b,567,Value6,b@email.com

We are trying to delete the duplicate records from the CSV file. Below is the desired result:

a,123,Value7,a@email.com

a,123,value1,a@email.com

b,567,Value5,b@email.com

b,567,Value6,b@email.com

Solutions:

Method: Through table sequence

Write the following script p1.dfx in esProc:

A

1

=file("csv.csv").import@c()

2

=A1.group@1(#1,#2,#3,#4)

3

=file("result.csv").export@c(A2)

Explanation:

A1   Import the CSV file as a table sequence.

A2  Group records by all columns, get the first record of each group, and return all the eligible records as a record sequence. This is equivalent to a distinct operation by all columns.

A3  Export result to result.csv.

Method 2: Through strings

Write the following script p1.dfx in esProc:

A

1

=file("csv.csv").read@n()

2

=A1.id()

3

=file("result.csv").export(A2)

Explanation:

A1   Read each row of the CSV file as a string and return a sequence of strings.

A2  Perform distinct operation on the sequence of strings using id() function.

A3  Export result to result.csv.

Method 3: Through sequence of sequences

Write the following script p1.dfx in esProc:

A

1

=file("csv.csv").import@w()

2

=A1.id()

3

=file("result.csv").export@c(A2)

Explanation:

A1   Import the CSV file as a sequence of sequences.

A2  Perform distinct operation on the sequence of sequences using id() function.

A3  Export result to result.csv.

Q & A Collection

https://stackoverflow.com/questions/62313801/compare-two-columns-in-csv-and-write-only-unique-values-to-another-csv