Problem description & analysis

Below is CSV file csv.csv:

a,123,value1,a@email.com

a,123,value1,a@email.com

a,123,value1,a@email.com

a,123,Value7,a@email.com

b,567,Value5,b@email.com

b,567,Value6,b@email.com

b,567,Value6,b@email.com

We are trying to delete the duplicate records from the CSV file. Below is the desired result:

a,123,Value7,a@email.com

a,123,value1,a@email.com

b,567,Value5,b@email.com

b,567,Value6,b@email.com

Solutions:

Method: Through table sequence

Write the following script p1.dfx in esProc:

 A 1 =file("csv.csv").import@c() 2 =A1.group@1(#1,#2,#3,#4) 3 =file("result.csv").export@c(A2)

Explanation:

A1   Import the CSV file as a table sequence.

A2  Group records by all columns, get the first record of each group, and return all the eligible records as a record sequence. This is equivalent to a distinct operation by all columns.

A3  Export result to result.csv.

Method 2: Through strings

Write the following script p1.dfx in esProc:

 A 1 =file("csv.csv").read@n() 2 =A1.id() 3 =file("result.csv").export(A2)

Explanation:

A1   Read each row of the CSV file as a string and return a sequence of strings.

A2  Perform distinct operation on the sequence of strings using id() function.

A3  Export result to result.csv.

Method 3: Through sequence of sequences

Write the following script p1.dfx in esProc:

 A 1 =file("csv.csv").import@w() 2 =A1.id() 3 =file("result.csv").export@c(A2)

Explanation:

A1   Import the CSV file as a sequence of sequences.

A2  Perform distinct operation on the sequence of sequences using id() function.

A3  Export result to result.csv.