3.1 Get concatenation of two sets

 

The concatenation of set A and set B is made up of all members of set A and all members of set B, including all duplicates. For instance, the concatenation of [1,2] and [1,3] is [1,2,1,3], where member 1 appears twice in the result set.

Here is an example. A business has two sales channels – online and offline stores. Their sales data is stored in SALES_ONLINE table and SALES_STORE table respectively. The task is to query the total sales amount of each product in the year 2014. Below is part of data in the two tables:

SALES_ONLINE:

ID CUSTOMERID ORDERDATE PRODUCTID AMOUNT
1 HANAR 2014/01/01 17 3868.6
2 ERNSH 2014/01/03 42 1005.9
3 LINOD 2014/01/06 64 400.0
4 OTTIK 2014/01/07 10 1194.0
5 OCEAN 2014/01/09 41 319.2

SALES_STORE:

ID CUSTOMERID ORDERDATE SELLERID PRODUCTID AMOUNT
1 EASTC 2014/01/01 1 27 3063.0
2 ERNSH 2014/01/02 8 70 2713.5
3 MAGAA 2014/01/03 2 74 1675.0
4 SAVEA 2014/01/07 7 45 2018.2
5 FOLIG 2014/01/08 8 30 1622.4

Structures of the two tables are not completely same. SALES_STORE has SELLERID field that stores data of salespeople, but both have ORDERDATE, PRODUCTID and AMOUNT fields. The grouping and aggregation after they are concatenated can be handled as usual.

SPL can use character “|” to calculate concatenation of sets. A|B, for instance, calculates concatenation of set A and set B.

SPL script:

A
1 =T(“SalesOnline.xlsx”).select(year(ORDERDATE)==2014)
2 =T(“SalesStore.xlsx”).select(year(ORDERDATE)==2014)
3 =A1|A2
4 =A3.groups(PRODUCTID; sum(AMOUNT):AMOUNT)

A1: Import SALES_ONLINE table from the original file and select sales records of the year 2014.
A2: Import SALES_STORE table from the original file and select sales records of the year 2014.
A3: Use the operator “|” to calculate concatenation of the two sets.
A4: Group and calculate total sales amount of each product.

Note that SPL concatenation does not require that sets have same structures. This is because SPL sequences support members of different structures. Yet we can directly access common fields of all members of a sequence, like PRODUCTID and AMOUNT here, as we access fields of a regular data table. This is really convenient.