Remove Records with Duplicates from a CSV File

Question

I have a csv file. Its columns are - "SNo. StateName CityName AreaName PinCode NonServ.Area MessangerService Remark". Column CityName has repeated values. In many records, CityName has a same value (Delhi). Is there any approach in Java to read the file and get distinct values from column CityName?

 

Answer

Removing duplicates to get records with distinct values is a simple structured computation. But since Java lacks related class library, the code is roundabout and unreadable. You can get it done in SPL (Structured Process Language). The code is intuitive and easy to understand:

A

1

=file("E:\\yourfile.csv").import@tc()

2

=A1.group@1(CityName)

A1: @c enables reading the file as comma-separated; @t enables reading the first line as column names.

A2: Group the file by CityName and get the first record from each group. Since you don’t provide enough information, we assume that CityName has duplicates in multiple records and, thus, that we retain the first record for each group.

About SPL script’s integration with a Java application, see How to Call an SPL Script in Java.