Clear duplicate lines and lines having missing values from a csv file
In the csv file below, some lines have null values, some have NaN values, and there are duplicate lines.
Sno,Country,noofDeaths 1,,32432 2,Pakistan,NaN 3,USA,3332 4,RUSSIA, 5,JAPAN,567 3,USA,3332 |
Use Java to do this: Delete lines containing null values or NaN values, and remove the duplicate lines. Below is the expected result:
Sno,Country,noofDeaths 3,USA,3332 5,JAPAN,567 |
Write the SPL script:
A |
|
1 |
=T("data.csv") |
2 |
=A1.select(~.array() ^ [null,NaN]==[]) |
3 |
=A2.group@1u(~.array()) |
A1: Parse the csv file as a two-dimensional table.
A2: Convert records of the table to a sequence and perform intersection with [null,NaN] to get records that are not their common members.
A3: Group A2’s records, and get the first record from each group while keeping the original order.
Read How to Call a SPL Script in Java to find how to integrate SPL into a Java application.
Source:https://stackoverflow.com/questions/70806307/how-to-remove-row-which-contains-blank-cell-from-csv-file-in-java
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL