3.5 Order-based COUNT DISTINCT

 

SQL

SELECT count (DISTINCT CustomerID) iNum,count(1) Num 
FROM Orders
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31'

SPL

A
1 >st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et)
2 =file(“Orders_Account.ctx”).open().cursor@m(CustomerID;OrderDate>=start && OrderDate <=end;2)
3 =A2.groups(;count(1):Num, icount@o(CustomerID):iNum)

A2 With @p option used during the data preparation phase, we can just use @m option to accomplish the parallel computation.
A3 icount() works with @o option to perform order-based count distinct, during which there is no need to retain intermediate buffer in the memory and only the comparison with the previous record is enough. By default, the function takes the distinct field as unordered and should keep intermediate buffer in the memory. In that case, memory overflow occurs when buffer exceeds the memory capacity.


SQL

SELECT count(DISTINCT CustomerID),month(OrderDate) months
FROM Orders
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31'
GROUP BY month(OrderDate)

SPL

A
1 >st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et)
2 =file(“Orders_Account.ctx”).open().cursor@m(CustomerID, OrderDate;OrderDate>=start && OrderDate <=end;2)
3 =A2.groups(month(OrderDate):months; icount@o(CustomerID):iNum)

A3 groups function plus icount function is used to compute count distinct on grouped subsets, without first grouping data.