Deduplicate Multiple Related Columns

Examples

Example1

There is an Excel file Book1.xlsx, and part of the data is as follows:

id

name

1001

jack

1002

bob

1002

bob

1003

jerry

1003

jerry

1004

alice

1004


1004

aaa

1005

tom

1006


The task is to remove data with duplicate id and name. If the data with the same id have a non-null name value, the data with the null name will also be deleted. The results are as follows:

id

name

1001

jack

1002

bob

1003

jerry

1004

aaa

1004

alice

1005

tom

1006


Write SPL script:


A

1

=file("Book1.xlsx").xlsimport@t()

2

=A1.group(id).(~.group@1(name)).(if(~.len()>1,~.select(name),~)).conj()

3

=file("result.xlsx").xlsexport@t(A2)

A1 Read the excel file

A2 Group the data by id, deduplicate by name in the group. After deduplicating, if there are more than two pieces of data in the group, then filter out the data with non-null name, otherwise do not filter and merge the result of each group

A3 Export results to result.xlsx

 

Example2

There is an Excel file Book1.xlsx, and part of the data is as follows:

Alex

Eddie

Ford

Alex

Bruce

Evan

Alex

Chris

Bruce

Alex

Evan

Chris

Bruce

Alex

Evan

Bruce

Alice

Chris

Bruce

Ford

Alex

Chris

Evan

Ford

Eric

Alex

Bruce

The task is to remove duplicates (the column order is irrelevant), and the results are as follows:

Alex

Chris

Bruce

Eric

Alex

Bruce

Alex

Bruce

Evan

Bruce

Ford

Alex

Alex

Evan

Chris

Alex

Eddie

Ford

Bruce

Alice

Chris

Chris

Evan

Ford

Write SPL script:


A

1

=file("Book1.xlsx").xlsimport()

2

=A1.group@1(~.sort())

3

=file("result.xlsx").xlsexport(A2)

A1 Read the excel file

A2 Sort each row of data, group them to remove duplicates, and take the first data in each group

A3 Export results to result.xlsx