6.1 Basic concept
SQL
SELECT ShipVia,sum(Quantity) Quantity
FROM Orders2021
GROUP BY ShipVia
ORDER BY ShipVia
SELECT ProductID, sum(Amount) Amount
FROM Orders2021
GROUP BY ProductID
ORDER BY ProductID
SELECT EmployeeID,EmployeeName,sum(Amount) Amount
FROM Orders2021
GROUP BY EmployeeID,EmployeeName
ORDER BY EmployeeID,EmployeeName
SPL
In this example, we need to perform several groupings & aggregations on same data. If we accomplish the multiple computations during one traversal, we can reduce hard disk accesses and speed up computations.
Usually, SPL uses cursor to achieve big data grouping & aggregation. During traversal with cursor, data is retrieved, grouped and summarized by segment and result is accumulated to the final result set. For requirement of multiple groupings, we just need to perform groupings & aggregations respectively and accumulate results to their own final result sets. This way, only one traversal is enough for accomplishing multiple groupings & aggregations.
Solution 1:
A | B | |
---|---|---|
1 | =file(“Orders2021_Account.ctx”).open().cursor@x(ShipVia,ProductID,EmployeeID,EmployeeName, Quantity, Amount) | |
2 | cursor A1 | =A2.groups(ShipVia;sum(Quantity):Quantity) |
3 | cursor | =A3.groups(EmployeeID;EmployeeName,sum(Amount):Amount) |
4 | cursor | =A4.groups(ProductID; sum(Amount):Amount) |
A2 uses key word cursor to define a channel the same as A1’s cursor. B2 specifies the first grouping & aggregation in A2’s channel.
A3-A4 Define two channels.
B3 Specify the second grouping & aggregation in A3’s channel.
B4 Specify the third grouping & aggregation in A4’s channel.
Solution 2:
A | |
---|---|
1 | =file(“Orders2021_Account.ctx”).open().cursor@x(ShipVia,ProductID,EmployeeID,EmployeeName, Quantity, Amount) |
2 | >ch1=channel(A1).groups(ShipVia;sum(Quantity):Quantity) |
3 | >ch2=channel(A1).groups(EmployeeID;EmployeeName, sum(Amount):Amount) |
4 | =A1.groups(ProductID; sum(Amount):Amount) |
5 | =ch1.result() |
6 | =ch2.result() |
A2-A3 Define channels and groupings & aggregations on them.
A4 Perform groupings & aggregations on A1’s cursor.
A5-A6 Get results of channels.
Use either solution and no essential difference between them. With a large grouping result set, a cursor must be returned. As the result set cannot fit into memory, we can only use channel to handle it, putting the large result set on the cursor and other smaller result set on those channels.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL