6.1 Basic concept

 

SQL

SELECT ShipVia,sum(Quantity) Quantity
FROM Orders2021
GROUP BY ShipVia
ORDER BY ShipVia

SELECT ProductID, sum(Amount) Amount
FROM Orders2021
GROUP BY ProductID
ORDER BY ProductID

SELECT EmployeeID,EmployeeName,sum(Amount) Amount
FROM Orders2021
GROUP BY EmployeeID,EmployeeName
ORDER BY EmployeeID,EmployeeName

SPL

In this example, we need to perform several groupings & aggregations on same data. If we accomplish the multiple computations during one traversal, we can reduce hard disk accesses and speed up computations.

Usually, SPL uses cursor to achieve big data grouping & aggregation. During traversal with cursor, data is retrieved, grouped and summarized by segment and result is accumulated to the final result set. For requirement of multiple groupings, we just need to perform groupings & aggregations respectively and accumulate results to their own final result sets. This way, only one traversal is enough for accomplishing multiple groupings & aggregations.

Solution 1:

A B
1 =file(“Orders2021_Account.ctx”).open().cursor(ShipVia,ProductID,EmployeeID,EmployeeName, Quantity, Amount)
2 cursor A1 =A2.groups(ShipVia;sum(Quantity):Quantity)
3 cursor =A3.groups(EmployeeID;EmployeeName,sum(Amount):Amount)
4 cursor =A4.groups(ProductID; sum(Amount):Amount)

A2 uses key word cursor to define a channel the same as A1’s cursor. B2 specifies the first grouping & aggregation in A2’s channel.
A3-A4 Define two channels.
B3 Specify the second grouping & aggregation in A3’s channel.
B4 Specify the third grouping & aggregation in A4’s channel.

Solution 2:

A
1 =file(“Orders2021_Account.ctx”).open().cursor(ShipVia,ProductID,EmployeeID,EmployeeName, Quantity, Amount)
2 >ch1=channel(A1).groups(ShipVia;sum(Quantity):Quantity)
3 >ch2=channel(A1).groups(EmployeeID;EmployeeName, sum(Amount):Amount)
4 =A1.groups(ProductID; sum(Amount):Amount)
5 =ch1.result()
6 =ch2.result()

A2-A3 Define channels and groupings & aggregations on them.
A4 Perform groupings & aggregations on A1’s cursor.
A5-A6 Get results of channels.

Use either solution and no essential difference between them. With a large grouping result set, a cursor must be returned. As the result set cannot fit into memory, we can only use channel to handle it, putting the large result set on the cursor and other smaller result set on those channels.