Find Duplicates in SPL
【Question】
I am working on an uploading feature for which I need to read in data from .txt. The file’s format is something like this:
13500000000|1
13500000001|1
13500000002|1
13500000003|1
13500000003|1
I need to check whether there is duplicate data, and if there is, prompt the user with a message. Is there any suggestion for doing this? Thanks.
【Answer】
It’s simple to handle this in SPL (Structured Process Language). Group records by the first column and return groups that contain more than one record. Those records are duplicates.
A |
|
1 |
=file("E:\\s.txt").import@i() |
2 |
=A1.group().select(~.len()>1) |
A1: Import content from s.txt and return a sequence.
A2: Group records and find the group that hold more than one member. Below is the duplicate:
An SPL script can be embedded into a Java program for further computation. More details are explained in How to Call an SPL Script in Java.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL