Delete Duplicate Records

Question

I am looking for some help. I have an application at work that generates a csv with user information on it. I want to use Java and take the data, delete duplicate information, rearrange it, and create a spreadsheet, to make life easier. The csv is generated in the following format, but much larger:

21458952, a1234, Doe, John, technology, support staff, work phone, 555-555-5555

21458952, a1234, Doe, John, technology, support staff, work email, johndoe@whatever.net

21458952, a1234, Doe, John, technology, support staff, work pager, 555-555-5555

99946133, b9854, Paul, Jane, technology, administration, work phone, 444-444-4444

99946133, b9854, Paul, Jane, technology, administration, work email, janepaul@whatever.net

99946133, b9854, Paul, Jane, technology, administration, work pager, 444-444-4444

99946133, b9854, Paul, Jane, technology, administration, cell phone, 444-444-4444

 

I want to delete the duplicates and arrange the data in appropriate columns.

 

ID | PIN | Lname | Fname | Dept | team | work px | work email

 

I have been trying to build arrays with a BufferedReader to store the data, but I am running into difficulties dealing with duplicates and manipulating the data into a table.

This is the code I have so far:

public class Sort {

 

   public static void main(String[] args) {

 

     BufferedReader br = null;

 

     try{

            String line="";

            String csvSplitBy=(",");

            String outPut;

 

            br = new BufferedReader(new FileReader("C:/Users/Jason/Desktop/test.txt"));    //location where the file is retreived

 

            while ((line = br.readLine()) !=null){  //checks to see if the data is there

                   String[] id = line.split(csvSplitBy);

 

                   outPut = id[0] + "," + id[1] + "," + id[2] + "," + id[3] + "," + id[4] + "," + id[5] + "," + id[6] + "," + id[7]

                           + "," + id[8] + "," + id[9];//incomplete...using for test...

 

 

                 System.out.println(outPut);   //displays the contents of the .txt file

                   } //ends while statement

             } //ends try

 

     catch (IOException e){

            System.out.println ("File not found!");

         } //ends catch

         finally{

                try{

                       if (br !=null)br.close();}

                catch(IOException ex){

                       ex.printStackTrace();

                } //ends try

         } //ends finally

   } //ends main method

} //ends class Sort

 

Answer

ava lacks the class library for grouping text data and getting the unique values. So the hardcoding is rather complicated. You can use SPL (Structured Process Language) to do it effortlessly:

A

1

=file("D:\\dup.csv").import@c()

2

=A1.group(_1,_2,_3,_4,_5,_6;~.select@1(_7=="work    phone")._8,~.select@1(_7=="work   email")._8)

3

=file("D:\\result.csv").export@c(A2)

A1: Read in content in dup.csv.

undefined

A2: Remove duplicates and get desired records.

undefined

A3: Output data in A2’s table to result.csv.

An SPL script can be embedded into a Java program for further computation. See How to Call an SPL Script in Java to learn details.