Comparing row-based Excel

Two Excel files have multiple columns of data. The first row is the column name, and the second row is the data record. Compare the key columns in the two files. To deal with this problem, you can read the file into a data set, take out the record values of the key columns to form a set, and then calculate the result through the two sets.

 

There are sales order tables order_2018.xlsx and order_2019.xlsx for 2018 and 2019. The two files have the same columns. Part of the data is as follows:

undefined

1. Find the same

Find the data with the same key column value in the two files.

 

Example: Please find out the CustomerId and ProductId of users who have purchased the same product in the past two years, and save the data in the c_p.txt file.

 

The esProc SPL script is as follows:


A

1

=T("e:/orders/order_2018.xlsx";CustomerId,ProductId)

2

=T("e:/orders/order_2019.xlsx";CustomerId,ProductId)

3

=[A1,A2].merge@io()

4

=T("e:/orders/c_p.xlsx",A3)

A1 Read order_2018.xlsx key column CustomerId, ProductId

 

A2 Read order_2019.xlsx key column CustomerId, ProductId

 

A3 Merging the two-year data, @i means to return the records that are included in common

 

A4 Write the result in A3 to c_p.xlsx

 

2. Find the difference

Example 1: Find out the order status of new customers in 2019 and save them in the file new_c.xlsx. The esProc SPL script is as follows:


A

1

=T("e:/orders/order_2018.xlsx")

2

=T("e:/orders/order_2019.xlsx")

3

=A2.id(CustomerId)\A1.id(CustomerId)

4

=A2.select(A3.contain(CustomerId))

5

=T("e:/orders/new_c.xlsx",A4)

A1 Read order_2018.xlsx

 

A2 Read order_2019.xlsx

 

A3 Subtract the 2018 customer IDs from all customer IDs in 2019 to get the new customer IDs

 

A4 Filter the orders of new customers from the 2019 order table

 

A5 Write the result in A4 to new_c.xlsx

 

Example 2: Find out the IDs of all lost customers in 2019 and save them in the file lost_c.xlsx. The esProc SPL script is as follows:


A

1

=T("e:/orders/order_2018.xlsx")

2

=T("e:/orders/order_2019.xlsx")

3

=A1.id(CustomerId)\A2.id(CustomerId)

4

=file("e:/orders/lost_c.xlsx").xlsexport(A3)

A1 Read order_2018.xlsx

 

A2 Read order_2019.xlsx

 

A3 Subtract all customer IDs in 2018 from all customer IDs in 2018 to get the customer IDs that were lost

 

A4 Write the result in A3 to lost_c.xlsx, because the result in A3 is a sequence, not a table sequence, so the T function cannot be used to save data here