Java Stream - Retrieving Repeated Records from CSV

Question

Source: https://stackoverflow.com/questions/68651921/java-stream-retrieving-repeated-records-from-csv

I searched the site and didn't find something similar. I'm newbie to using the Java stream, but I understand that it's a replacement for a loop command. However, I would like to know if there is a way to filter a CSV file using stream, as shown below, where only repeated records are included in the result and grouped by the Center field.

Initial CSV file

Id,Name,Mother,Birth,Center

1,A,A,2000-01-01,1

2,C,A,2000-01-02,1

3,P,M,2000-01-03,2

4,D,S,2000-01-04,3

5,R,H,2000-01-05,4

6,P,M,2000-01-03,2

7,A,A,2000-01-01,1

8,P,C,2000-01-08,2

9,R,I,2000-01-07,3

10,P,M,2000-01-03,2

Final result

Id,Name,Mother,Birth,Center

1,A,A,2000-01-01,1

7,A,A,2000-01-01,1

3,P,M,2000-01-03,2

6,P,M,2000-01-03,2

10,P,M,2000-01-03,2

In addition, the duplicate pair cannot appear in the final result inversely, as shown in the table below:

This shouldn't happen

Id,Name,Mother,Birth,Center

1,A,A,2000-01-01,1

7,A,A,2000-01-01,1

7,A,A,2000-01-01,1

1,A,A,2000-01-01,1

Is there a way to do it using stream and grouping at the same time, since theoretically, two loops would be needed to perform the task?

Thanks in advance.

Answer

The task requires to perform distinct on the CSV file by a non-id field and group the result set by Center field. The code will be very long if you try to do it in Java.

It is very simple to do it in SPL, the open-source Java package. You only need one line of code:

A

1

=file("repeated.csv").import@ct().group(Name,Mother,Birth,Center).select(~.len()>1).conj()

 

SPL offers JDBC driver to be invoked by Java. Just store the above SPL script as repeated.splx and invoke it in Java as you call a stored procedure:

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

st=con.prepareCall("call repeated()");

st.execute();

Or execute the SPL string within a Java program using the way of executing a SQL statement:

st = con.prepareStatement("==file(\"repeated.csv\").import@ct().group(Name,Mother,Birth,Center).select(~.len()>1).conj()");
st.execute();

View SPL source code.