"Question Source: [url] According to the reply provided by devReddit [url], I did grouping of CSV .."

blackduckie RaqForum 28 No.
1 Reply • 243 View • 2 Years ago

Assign Unique Value to Vield in Duplicate Records Group during GroupingBy

csv(56) intra-group continuous ranking(1) dense_rank(1)

Question

Source: https://stackoverflow.com/questions/68703671/assign-unique-value-to-field-in-duplicate-records-group-during-groupingby

According to the reply provided by devReddit here, I did grouping of CSV records (same client names) of following test file (fake data):

CSV test file

id,name,mother,birth,center

1,AntonioCarlosdaSilva,AnadaSilva,2008/03/31,1

2,CarlosRobertodeSouza,AmáliaMariadeSouza,2004/12/10,1

3,PedrodeAlbuquerque,MariadeAlbuquerque,2006/04/03,2

4,DanilodaSilvaCardoso,SôniadePaulaCardoso,2002/08/10,3

5,RalfodosSantosFilho,HelenadosSantos,2012/02/21,4

6,PedrodeAlbuquerque,MariadeAlbuquerque,2006/04/03,2

7,AntonioCarlosdaSilva,AnadaSilva,2008/03/31,1

8,RalfodosSantosFilho,HelenadosSantos,2012/02/21,4

9,RosanaPereiradeCampos,IvanaMariadeCampos,2002/07/16,3

10,PaulaCristinadeAbreu,CristinaPereiradeAbreu,2014/10/25,2

11,PedrodeAlbuquerque,MariadeAlbuquerque,2006/04/03,2

12,RalfodosSantosFilho,HelenadosSantos,2012/02/21,4

Client Entity

packageentities;

publicclassClient{

privateStringid;

privateStringname;

privateStringmother;

privateStringbirth;

privateStringcenter;

publicClient(){

}

publicClient(Stringid,Stringname,Stringmother,Stringbirth,Stringcenter){

this.id=id;

this.name=name;

this.mother=mother;

this.birth=birth;

this.center=center;

}

publicStringgetId(){

returnid;

}

publicvoidsetId(Stringid){

this.id=id;

}

publicStringgetName(){

returnname;

}

publicvoidsetName(Stringname){

this.name=name;

}

publicStringgetMother(){

returnmother;

}

publicvoidsetMother(Stringmother){

this.mother=mother;

}

publicStringgetBirth(){

returnbirth;

}

publicvoidsetBirth(Stringbirth){

this.birth=birth;

}

publicStringgetCenter(){

returncenter;

}

publicvoidsetCenter(Stringcenter){

this.center=center;

}

@Override

publicStringtoString(){

return"Client[id="+id+",name="+name+",mother="+mother+",birth="+birth+",center="+center

+"]";

}

Program

packageapplication;

importjava.io.IOException;

importjava.nio.file.Files;

importjava.nio.file.Paths;

importjava.util.LinkedHashMap;

importjava.util.List;

importjava.util.Map;

importjava.util.function.Function;

importjava.util.regex.Pattern;

importjava.util.stream.Collectors;

importentities.Client;

publicclassProgram{

publicstaticvoidmain(String[]args)throwsIOException{

Patternpattern=Pattern.compile(",");

List<Client>file=Files.lines(Paths.get("src/Client.csv"))

.skip(1)

.map(line->{

String[]fields=pattern.split(line);

returnnewClient(fields[0],fields[1],fields[2],fields[3],fields[4]);

})

.collect(Collectors.toList());

Map<String,List<Client>>grouped=file

.stream()

.filter(x->file.stream().anyMatch(y->isDuplicate(x,y)))

.collect(Collectors.toList())

.stream()

.collect(Collectors.groupingBy(p->p.getCenter(),LinkedHashMap::new,Collectors.mapping(Function.identity(),Collectors.toList())));

grouped.entrySet().forEach(System.out::println);

}

privatestaticBooleanisDuplicate(Clientx,Clienty){

return!x.getId().equals(y.getId())

&&x.getName().equals(y.getName())

&&x.getMother().equals(y.getMother())

&&x.getBirth().equals(y.getBirth());

}

Final Result (Grouped by Center)

1=[Client[id=1,name=AntonioCarlosdaSilva,mother=AnadaSilva,birth=2008/03/31,center=1],

Client[id=7,name=AntonioCarlosdaSilva,mother=AnadaSilva,birth=2008/03/31,center=1]]

2=[Client[id=3,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],

Client[id=5,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2],

Client[id=6,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],

Client[id=8,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2],

Client[id=11,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],

Client[id=12,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2]]

What I Need

I need to assign a unique value to each group of repeated records, starting over each time center value changes, even keeping the records together, since map does not guarantee this, according to the example below:

Numbers at left show the grouping by center (1 and 2). Repeated names have the same inner group number and start from "1". When the center number changes, the inner group numbers should be restarted from "1" again and so on.

1=[Client[group=1,id=1,name=AntonioCarlosdaSilva,mother=AnadaSilva,birth=2008/03/31,center=1],

Client[group=1,id=7,name=AntonioCarlosdaSilva,mother=AnadaSilva,birth=2008/03/31,center=1]]

//CENTERCHANGED(2)-Restartinnergroupnumberto"1"again.

2=[Client[group=1,id=3,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],

Client[group=1,id=6,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],

Client[group=1,id=11,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],

//NAMECHANGED,BUTSAMECENTERYET-soincreasesby"1"(group=2)

Client[group=2,id=5,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2],

Client[group=2,id=8,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2],

Client[group=2,id=12,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2]]

Answer

The task requires to group the CSV file by center and sort name in each group in ascending order. The code will be very long if you try to do it in Java.

It is simple to get it done using SPL, the open-source Java package. Only one line of code is enough:

	A
1	=file("client.csv":"UTF-8").import@ct().sort(center,name).derive(ranki(name;center):group)

SPL offers JDBC driver to be invoked by Java. Just store the above SPL script as dense_rank.splx and invoke it in Java as you call a stored procedure:

…

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

st=con.prepareCall("call dense_rank ()");

st.execute();

…

Or execute the SPL string within a Java program as we execute a SQL statement:

…

st = con.prepareStatement("==file(\"client.csv\":\"UTF-8\").import@ct().sort(center,name).derive(ranki(name;center):group)");
st.execute();

…

View SPL source code.

SPL Official Website 👉 http://www.scudata.com