Remove Duplicate Lines

 

Question

Does anyone know of a Python/Bash script for removing duplicate lines?
I dumped the contents of an EPROM yesterday and now need to find a faster way to remove duplicate data.
The following columns are sample #, clock (hi or low), 8 bits of data, reset bit, and sample period. I have it in a text file but brought it into a spreadsheet to make it easier to look at and delete lines.

undefined

 

Answer

To remove duplicate lines from grouped data, you can hardcode it in Python or Bash. But the process is complicated. Here I use SPL (Structured Process Language) to do it. SPL provides @1 option to work with group() function to get the first record from each group. Only a one-liner is sufficient:

A

1

=file("eprom.log").import().group@1o(_2)

 

Below is part of A1’s result:

540    1    5    1    1000000000ms

541    0    5    1    1000000000ms

543    1    1    1    1000000000ms

545    0    2    1    1000000000ms

546    1    3    1    1000000000ms

548    0    4    1    1000000000ms