"[url]4.5.1 Index reuse after filtering SQL SELECT Suppliers.Region,sum(Amount) Amount FROM Order .."

mars RaqForum 25 No.
178 View • 1 Years ago

4.5 Filtering on dimension table

4.5.1 Index reuse after filtering

SQL

SELECT Suppliers.Region,sum(Amount) Amount
FROM Orders
LEFT JOIN Products ON Orders.ProductID= Products.ProductID
LEFT JOIN Suppliers ON Products.SupplierID=Suppliers.SupplierID
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31' and Products.CategoryID not in (1,2,3)
GROUP BY Suppliers.Region

SPL

	A
1	>st=date(“2021-01-01”), et=date(“2021-12-31”), start= days@o(st), end=days@o(et)
2	=Products.select@i([1,2,3].pos@b(CategoryID)==null)
3	=file(“Orders_Time.ctx”).open().cursor@m(ProductID, Amount; OrderDate>=start && OrderDate<=end, ProductID:A2)
4	=A3.groups(ProductID.SupplierID.Region;sum(Amount):Amount)

A2 Use @i option to reuse index during filtering on dimension table.
A3 Put association with dimension table in the cursor generation statement, and records that do not have matches won’t be generated, reducing time of generating records.

Yet index reuse isn’t always faster. As the ineligible records need to be deleted from the index table, the deletion is not fast when a lot of records are deleted (the number of eligible records is very small). When there are a few eligible records, it is probably that re-creating index is faster. Select the right way according to different scenarios.

4.5.2 Re-creating index after filtering

SQL

SELECT Suppliers.Region,sum(Amount) Amount
FROM Orders
LEFT JOIN Products ON Orders.ProductID= Products.ProductID
LEFT JOIN Suppliers ON Products.SupplierID=Suppliers.SupplierID
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31' and Products.CategoryID in (1,2,3)
GROUP BY Suppliers.Region

SPL

	A
1	>st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et)
2	=Products.select([1,2,3].pos@b(CategoryID)!=null).derive@o().keys@i(ProductID)
3	=file(“Orders_Time.ctx”).open().cursor@m(ProductID,Amount;OrderDate>=start && OrderDate<=end,ProductID:A2)
4	=A3.groups(ProductID.SupplierID.Region;sum(Amount):Amount)

A2 The dimension filtering action returns a sequence. We use derive@o() to restore it to a table sequence and re-create index.

In this example, the number of filtered records is large (the number of records left is small) and re-creating index is fast.

4.5.3 Filtering on multilevel dimension tables

SQL

SELECT Suppliers.Region,sum(Amount) Amount
FROM Orders
INNER JOIN Products ON Orders.ProductID= Products.ProductID
INNER JOIN Suppliers ON Products.SupplierID=Suppliers.SupplierID
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31' 
    and Suppliers.SupplierID in (1,2,3) and Products.CategoryID in (1,2,3,4,5)
GROUP BY Suppliers.Region

SPL
A filtering on multilevel dimension tables requires filtering each dimension table separately before creating association between them. The association only keeps matching records.

	A
1	=Suppliers.select@i([1,2,3].pos@b(SupplierID)!=null)
2	=Products.select@i([1,2,3,4,5].pos@b(CategoryID)!=null)
3	=A2.switch@i(SupplierID, A1:SupplierID)

A1 Filter dimension table Suppliers.
A2 Filter dimension table Products.
A3 Associate Products and Suppliers, where @i option enables deleting non-matching records.

Use the association result directly for summarization:

	A
…	/Here is the above code
4	>st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et)
5	=file(“Orders_Time.ctx”).open().cursor@m(ProductID, Amount; OrderDate>=start && OrderDate<=end, ProductID:A3)
6	=A5.groups(ProductID.SupplierID.Region; sum(Amount):Amount)

A5 Associate with the filtered dimension table and do not generate non-matching records.

4.5.4 Inverse filtering

SQL

SELECT sum(Amount)
FROM Orders
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31' and EmployeeID not in (
    SELECT EmployeeID 
    FROM Employees 
    WHERE Title='Sales Representative')

SPL

	A
1	>st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et)
2	>Employees=file(“Employees.btx”).import@b().select(Title==“Sales Representative”).derive().keys@i(EmployeeID)
3	=file(“Orders_Time.ctx”).open().cursor@m(Amount;OrderDate>=start && OrderDate<=end,EmployeeID:Employees:null)
4	=A3.groups(;sum(Amount))

A3 Associate with dimension table in the composite table cursor. The syntax :null means selecting non-matching records only.

When the fact table is not a composite table, we can use @d option in switch to select non-matching records.

SPL Official Website 👉 http://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc

SPL Learning Material 👉 http://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/ydhVnFH9

Youtube 👉 https://www.youtube.com/@esProc_SPL

performance optimization practice(38)

eBook

mars • 178 View • 1 Years ago

4.5 Filtering on dimension table

4.5.1 Index reuse after filtering

4.5.2 Re-creating index after filtering

4.5.3 Filtering on multilevel dimension tables

4.5.4 Inverse filtering

ToC