"Problem description & analysis Below is CSV file csv.csv: a,123,value1,a@email.com a,123,val .."

blackduckie RaqForum 28 No.
314 View • 2 Years ago

Remove Duplicate Data from a CSV File

csv(56) distinct(5)

Problem description & analysis

Below is CSV file csv.csv:

a,123,value1,a@email.com

a,123,Value7,a@email.com

b,567,Value5,b@email.com

b,567,Value6,b@email.com

We are trying to delete the duplicate records from the CSV file. Below is the desired result:

a,123,Value7,a@email.com

a,123,value1,a@email.com

b,567,Value5,b@email.com

b,567,Value6,b@email.com

Solutions:

Method: Through table sequence

Write the following script p1.dfx in esProc:

	A
1	=file("csv.csv").import@c()
2	=A1.group@1(#1,#2,#3,#4)
3	=file("result.csv").export@c(A2)

Explanation:

A1 Import the CSV file as a table sequence.

A2 Group records by all columns, get the first record of each group, and return all the eligible records as a record sequence. This is equivalent to a distinct operation by all columns.

A3 Export result to result.csv.

Method 2: Through strings

Write the following script p1.dfx in esProc:

	A
1	=file("csv.csv").read@n()
2	=A1.id()
3	=file("result.csv").export(A2)

Explanation:

A1 Read each row of the CSV file as a string and return a sequence of strings.

A2 Perform distinct operation on the sequence of strings using id() function.

A3 Export result to result.csv.

Method 3: Through sequence of sequences

Write the following script p1.dfx in esProc:

	A
1	=file("csv.csv").import@w()
2	=A1.id()
3	=file("result.csv").export@c(A2)

Explanation:

A1 Import the CSV file as a sequence of sequences.

A2 Perform distinct operation on the sequence of sequences using id() function.

A3 Export result to result.csv.

Read How to Call an SPL Script in Java to learn about the integration of an SPL script with a Java program.

Q & A Collection

https://stackoverflow.com/questions/62313801/compare-two-columns-in-csv-and-write-only-unique-values-to-another-csv

SPL Official Website 👉 http://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc

SPL Learning Material 👉 http://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/ydhVnFH9

Youtube 👉 https://www.youtube.com/@esProc_SPL

csv(56) distinct(5)

Developer

blackduckie • 314 View • 2 Years ago

Remove Duplicate Data from a CSV File

Problem description & analysis

Solutions:

Method: Through table sequence

Method 2: Through strings

Method 3: Through sequence of sequences

ToC