7.22 Join query on large data table and large dimension table

 

Perform a join query on two large data tables and an ordered bin file.
Calculate the total sales quantity of each product in January of the year 2014 based on the associated Order table, Detail table and Product table. The Order table and Detail table are very large, and Product table is a bin file ordered by ID.

imagepng

SPL has cs.joinx() function to perform such a join, where the bin file must be ordered by the joining field.

SPL script:

A
1 =file(“Detail.btx”).cursor()
2 =file(“Order.btx”).cursor().select(year(Date)==2014 && month(Date)==1)
3 =file(“Product.btx”)
4 =A1.joinx@i(ID,A2:ID)
5 =A4.joinx(ProductID,A3:ID,Name:ProductName)
6 =A5.groups(ProductName; count(~):Count)

A1 Create cursor on Detail table.
A2 Create cursor for Order table records in January of the year 2014.
A3 Create a bin file object on Product table.
A4 cs.joinx() function uses @i option to perform a filtering join.
A5 Use cs.joinx() function to join Detail table and Product table on ID.
A6 Group A5’s records by product and count number of sold pieces for each product.

Execution result:

ProductName Count
Milk 32
Coffee 60