7.22 Join query on large data table and large dimension table
Perform a join query on two large data tables and an ordered bin file.
Calculate the total sales quantity of each product in January of the year 2014 based on the associated Order table, Detail table and Product table. The Order table and Detail table are very large, and Product table is a bin file ordered by ID.
SPL has cs.joinx() function to perform such a join, where the bin file must be ordered by the joining field.
SPL script:
A | |
---|---|
1 | =file(“Detail.btx”).cursor() |
2 | =file(“Order.btx”).cursor().select(year(Date)==2014 && month(Date)==1) |
3 | =file(“Product.btx”) |
4 | =A1.joinx@i(ID,A2:ID) |
5 | =A4.joinx(ProductID,A3:ID,Name:ProductName) |
6 | =A5.groups(ProductName; count(~):Count) |
A1 Create cursor on Detail table.
A2 Create cursor for Order table records in January of the year 2014.
A3 Create a bin file object on Product table.
A4 cs.joinx() function uses @i option to perform a filtering join.
A5 Use cs.joinx() function to join Detail table and Product table on ID.
A6 Group A5’s records by product and count number of sold pieces for each product.
Execution result:
ProductName | Count |
---|---|
Milk | 32 |
Coffee | 60 |
… | … |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL