Java Streams Collect Excel CSV to a List Filtering Based on the Sum of a Column

Question

Source:https://stackoverflow.com/questions/66229242/java-streams-collect-excel-csv-to-a-list-filtering-based-on-the-sum-of-a-column

Suppose we have an Excel spreadsheet that looks like:

StatusCount FirstName LastName ID

1 Tod Mahones 122145

0 Tod Mahones 122145

1 Tod Mahones 122145

-1 Tod Mahones 122145

1 Ronny Jackson 149333

1 Eliza Cho 351995

-1 Eliza Cho 351995

1 James Potter 884214

1 James Potter 884214

-1 Peter Walker 900248

1 Zaid Grits 993213

How can I be able to gather to a list of only the IDs of the people whose status count is a sum greater than 0, and if it is 0 or less then discard it. So, in the Excel spreadsheet above, the list in Java should be look like:

List<Integer> = [122145, 149333, 884214, 993213]

Update (adding in what I tried so far):

List<Integer> = csvFile.stream()

.map(Arrays::asList)

.filter(column -> column.get(0).equalsIgnoreCase("1")

.map(column -> column.get(3))

.map(Integer::parseInt)

.sorted()

.collect(Collectors.toList());

I collected them just by status counts of 1 but that isn't the right process, it should be to sum up the status count for each person or ID (I guess it is good to find any dupes) and if its > 0 then collect to the list, if not then discard.

Update 2: I forgot to mention that the csv file is brought into java as a List<String[]> where the List contains the rows of the csv and the String[] is the contents of the rows, so it would be like:

[[1, Tod, Mahones, 122145],[0, Tod, Mahones, 122145], [1, Tod, Mahones, 122145], ...]

Answer

As the CSV file is ordered by ID column, you can group rows by ID, sum StatusCount values in each group, and find IDs whose sums are greater than 0. You need to be familiar with Stream Collectors, otherwise it is hard to implement the algorithm.

Yet, it is easy to express the process using SPL, the open-source Java package. Only one line of code is enough:

A

1

=file("data.csv").import@ct().groups@o(ID;sum(StatusCount)).select(#2>0).(#1)

 

SPL offers JDBC driver to be invoked by Java. Just store the above SPL script as filter.splx and invoke it in Java as you call a stored procedure:

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

st = con.prepareCall("call filter()");
st.execute();

View SPL source code.