3.2 Get intersection of two sets

 

The intersection of set A and set B is made up of members that exist in both set A and set B. For instance, the intersection of [1,2,3] and [1,3,5] is [1,3].

Here is an example. A business has two sales channels – online and offline stores. Their sales data is stored in SALES_ONLINE table and SALES_STORE table respectively. The task is to find out customers who purchase in both online and offline stores in the year 2014. Below is part of data in the two tables:

SALES_ONLINE:

ID CUSTOMERID ORDERDATE PRODUCTID AMOUNT
1 HANAR 2014/01/01 17 3868.6
2 ERNSH 2014/01/03 42 1005.9
3 LINOD 2014/01/06 64 400.0
4 OTTIK 2014/01/07 10 1194.0
5 OCEAN 2014/01/09 41 319.2

SALES_STORE:

ID CUSTOMERID ORDERDATE SELLERID PRODUCTID AMOUNT
1 EASTC 2014/01/01 1 27 3063.0
2 ERNSH 2014/01/02 8 70 2713.5
3 MAGAA 2014/01/03 2 74 1675.0
4 SAVEA 2014/01/07 7 45 2018.2
5 FOLIG 2014/01/08 8 30 1622.4

SPL can use character “^” to calculate intersection of sets. A^B, for instance, calculates intersection of set A and set B.

SPL script:

A
1 =T(“SalesOnline.xlsx”).select(year(ORDERDATE)==2014)
2 =T(“SalesStore.xlsx”).select(year(ORDERDATE)==2014)
3 =A1.id(CUSTOMERID)^A2.id(CUSTOMERID)

A1: Import SALES_ONLINE table from the original file and select sales records of the year 2014.
A2: Import SALES_STORE table from the original file and select sales records of the year 2014.
A3: Use A.id() function to get the set of distinct customer IDs from A1 and A2 respectively, and use operator “^” to calculate their intersection, which contains customers who purchase both online and offline.