"[url]3.2.1 Bin file SQL SELECT count (DISTINCT CustomerID) iNum,count(1) Num,EmployeeID,Employee .."

mars RaqForum 25 No.
172 View • 1 Years ago

3.2 Order-based filtering

3.2.1 Bin file

SQL

SELECT count (DISTINCT CustomerID) iNum,count(1) Num,EmployeeID,EmployeeName
FROM Orders
WHERE OrderDate>='2022-03-15' and OrderDate<='2022-06-16' and Amount>1000
GROUP BY EmployeeID,EmployeeName
ORDER BY EmployeeID

SPL

	A
1	>st=date(“2022-03-15”), et=date(“2022-06-16”), start=days@o(st), end=days@o(et)
2	=file(“Orders_Time.btx”).iselect@br(start:end,OrderDate; EmployeeID, EmployeeName, CustomerID,Amount).select(Amount>1000).groups(EmployeeID; EmployeeName, count(1):Num, icount(CustomerID):iNum)

A2 iselect function retrieves records whose OrderDate values fall within the interval of [start,end] from the file ordered by OrderDate and generates a cursor with them. Here [start,end] is a closed interval. @r option means that there are duplicate OrderDate values.

The iselect function cannot work with a parallel processing option. With a scenario involving just one computing task and when the select result set is relatively large, it is not necessarily that the order-based filtering is faster than the parallelly processed full traversal. With a multi-task scenario, usually parallel processing is not feasible (because the multi-CPU must deal with concurrency). And in that case, iselect has big advantages.

As data is arranged in time order, we cannot write Amount>1000 within iselect but can only use select to perform a filtering outside it. The iselect result set is small enough so the approach has little impact on the performance.

3.2.2 Composite table

SQL

SELECT count (DISTINCT CustomerID) iNum,count(1) Num,EmployeeID,EmployeeName
FROM Orders
WHERE OrderDate>='2022-03-15' and OrderDate<='2022-06-16' and Amount>1000
GROUP BY EmployeeID,EmployeeName
ORDER BY EmployeeID

SPL

	A
1	>st=date(“2022-03-15”), et=date(“2022-06-16”), start=days@o(st), end=days@o(et)
2	=file(“Orders_Time.ctx”).open().cursor@m(OrderDate,CustomerID,EmployeeID,EmployeeName;OrderDate>=start && OrderDate<=end && Amount>1000)
3	=A2.groups(EmployeeID; EmployeeName, count(1):Num, icount(CustomerID):iNum)

A2 Import data falling within the specified range of time. @m option enables parallel processing.

Writing the condition in cursor()during filtering enables automatic block skipping through comparison with the dimension field. The ineligible blocks are directly skipped. The syntax cursor().select(…) does not perform the block skipping action but first scans all blocks.

Unlike a bin file, the composite table supports writing Amount>1000 in cursor() together with the time filtering condition. After block skipping action, eligible data is automatically identified in each block during traversal and filtering is executed.

SPL Official Website 👉 http://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc

SPL Learning Material 👉 http://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/ydhVnFH9

Youtube 👉 https://www.youtube.com/@esProc_SPL

performance optimization practice(38)

eBook