Comparing CSV Data with Oracle Database Table Using Java

Question

Source: https://stackoverflow.com/questions/68382754/comparing-csv-data-with-oracle-database-table-using-java

I need to compare my csv file data with the oracle database table. The data contains nearly 9000 rows. Any links and sources how can I do this. I am using this thread, but it uses the equals method in list string, but this does not compare the data row by row in both csv and database table.

Compare csv file with MySQL database

Answer

The code will be very long if you try to use Java to do this. But it is convenient to compare an CSV file and a table in the Oracle database using SPL, the open-source Java package.

Suppose we have an employee table in Oracle database:

CREATE TABLE EMPLOYEE

(EID NUMBER(8),

NAME VARCHAR2(255),

SURNAME VARCHAR2(255),

GENDER VARCHAR2(255),

STATE VARCHAR2(255),

BIRTHDAY DATE,

HIREDATE DATE,

DEPT VARCHAR2(255),

SALARY NUMBER(8)

);

INSERT INTO EMPLOYEE VALUES (1,'Rebecca','Moore','F','California',TIMESTAMP'1974-11-20 00:00:00.0',TIMESTAMP'2005-03-11 00:00:00.0','R&D',7000);

INSERT INTO EMPLOYEE VALUES (2,'Ashley','Wilson','F','New York',TIMESTAMP'1980-07-19 00:00:00.0',TIMESTAMP'2008-03-16 00:00:00.0','Finance',11000);

INSERT INTO EMPLOYEE VALUES (3,'Rachel','Johnson','F','New Mexico',TIMESTAMP'1970-12-17 00:00:00.0',TIMESTAMP'2010-12-01 00:00:00.0','Sales',9000);

INSERT INTO EMPLOYEE VALUES (4,'Emily','Smith','F','Texas',TIMESTAMP'1985-03-07 00:00:00.0',TIMESTAMP'2006-08-15 00:00:00.0','HR',7000);

INSERT INTO EMPLOYEE VALUES (5,'Ashley','Smith','F','Texas',TIMESTAMP'1975-05-13 00:00:00.0',TIMESTAMP'2004-07-30 00:00:00.0','R&D',16000);

INSERT INTO EMPLOYEE VALUES (6,'Matthew','Johnson','M','California',TIMESTAMP'1984-07-07 00:00:00.0',TIMESTAMP'2005-07-07 00:00:00.0','Sales',11000);

INSERT INTO EMPLOYEE VALUES (7,'Alexis','Smith','F','Illinois',TIMESTAMP'1972-08-16 00:00:00.0',TIMESTAMP'2002-08-16 00:00:00.0','Sales',9000);

INSERT INTO EMPLOYEE VALUES (8,'Megan','Wilson','F','California',TIMESTAMP'1979-04-19 00:00:00.0',TIMESTAMP'1984-04-19 00:00:00.0','Marketing',11000);

INSERT INTO EMPLOYEE VALUES (9,'Victoria','Davis','F','Texas',TIMESTAMP'1983-12-07 00:00:00.0',TIMESTAMP'2009-12-07 00:00:00.0','HR',3000);

INSERT INTO EMPLOYEE VALUES (10,'Ryan','Johnson','M','Pennsylvania',TIMESTAMP'1976-03-12 00:00:00.0',TIMESTAMP'2006-03-12 00:00:00.0','R&D',13000);

And a CSV file employee.csv:

EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY

1,Rebecca,Moore,F,California,1974-11-20 00:00:00,2005-03-11 00:00:00,R&D,7000

3,Rachel,Johnson,F,New Mexico,1970-12-17 00:00:00,2010-12-01 00:00:00,Sales,9000

5,Ashley,Smith,F,Texas,1975-05-13 00:00:00,2004-07-30 00:00:00,R&D,16000

7,Alexis,Smith,F,Illinois,1972-08-16 00:00:00,2002-08-16 00:00:00,Sales,9000

9,Victoria,Davis,F,Texas,1983-12-07 00:00:00,2009-12-07 00:00:00,HR,3000

In order to get difference between the Oracle employee table and the CSV file (below is the expected result):

EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY

2,Ashley,Wilson,F,New York,1980-07-19 00:00:00,2008-03-16 00:00:00,Finance,11000

4,Emily,Smith,F,Texas,1985-03-07 00:00:00,2006-08-15 00:00:00,HR,7000

6,Matthew,Johnson,M,California,1984-07-07 00:00:00,2005-07-07 00:00:00,Sales,11000

8,Megan,Wilson,F,California,1979-04-19 00:00:00,1984-04-19 00:00:00,Marketing,11000

10,Ryan,Johnson,M,Pennsylvania,1976-03-12 00:00:00,2006-03-12 00:00:00,R&D,13000

And to calculate the intersection of Oracle employee table an the CSV file:

EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY

1,Rebecca,Moore,F,California,1974-11-20 00:00:00,2005-03-11 00:00:00,R&D,7000

3,Rachel,Johnson,F,New Mexico,1970-12-17 00:00:00,2010-12-01 00:00:00,Sales,9000

5,Ashley,Smith,F,Texas,1975-05-13 00:00:00,2004-07-30 00:00:00,R&D,16000

7,Alexis,Smith,F,Illinois,1972-08-16 00:00:00,2002-08-16 00:00:00,Sales,9000

9,Victoria,Davis,F,Texas,1983-12-07 00:00:00,2009-12-07 00:00:00,HR,3000

We just need a number of lines of SPL code:

A

1

=ORACLE.query@x("SELECT * FROM EMPLOYEE")

2

=file("employee.csv").import@ct(EID:decimal,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY:decimal)

3

=INTERSECT=[A1,A2].merge@oi(EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY)

4

=MINUS=[A1,A2].merge@od(EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY)

 

SPL offers JDBC driver to be invoked by Java. Just store the above SPL script as cmp.splx and invoke it in Java as you call a stored procedure:

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

st=con.prepareCall("call cmp()");

st.execute();

View SPL source code.