Remove Duplicate Lines
【Question】
Does anyone know of a Python/Bash script for removing duplicate lines?
I dumped the contents of an EPROM yesterday and now need to find a faster way to remove duplicate data.
The following columns are sample #, clock (hi or low), 8 bits of data, reset bit, and sample period. I have it in a text file but brought it into a spreadsheet to make it easier to look at and delete lines.
【Answer】
To remove duplicate lines from grouped data, you can hardcode it in Python or Bash. But the process is complicated. Here I use SPL (Structured Process Language) to do it. SPL provides @1 option to work with group() function to get the first record from each group. Only a one-liner is sufficient:
A |
|
1 |
=file("eprom.log").import().group@1o(_2) |
Below is part of A1’s result:
540 1 5 1 1000000000ms
541 0 5 1 1000000000ms
543 1 1 1 1000000000ms
545 0 2 1 1000000000ms
546 1 3 1 1000000000ms
548 0 4 1 1000000000ms
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL