Performance Optimization - 6.6 [Foreign key association] Index reuse

 

Performance Optimization - 6.5 [Foreign key association] Inner join syntax

Sometimes the dimension table will be filtered before association. For example, we only want to find the products from a certain origin:

A
1 =file(“product.btx”).import@b().keys@i(id)
2 =file(“orders.btx”).cursor@b(p_id,quantity)
3 =A2.switch@i(p_id,A1)
4 =A3.select(p_id.state==“CA”)
5 =A4.groups(p_id.vendor;sum(p_id.price*quantity))

The computation amount of this code will be relatively large, because A4 needs to repeatedly take out p_id.state to make a judgment. If we first filter the dimension table of A1, and retain only the required records, then the switch@i() in A3 will filter out the inappropriate records from the fact table, and the performance will be better.

A
1 =file(“product.btx”).import@b().select(state==“CA”)
2 =file(“orders.btx”).cursor@b(p_id,quantity)
3 =A2.switch@i(p_id,A1)
4 =A3.groups(p_id.vendor;sum(p_id.price*quantity))

The computation amount of this code will be much smaller than the previous one.

However, the problem is that after the dimension table is filtered and becomes a record sequence, its original index cannot be used. An index will be recreated for switch() operation, yet creating index also takes time, including calculating the hash value of primary key, etc., and it will take a long time to create an index when the number of records is large. Moreover, because the filter condition is unpredictable, the index cannot be prepared in advance.

In fact, the index on the original table is still available. At least it can help us avoid recalculating the hash value, and what we need to do is to delete the filtered records from the index table. SPL provides this mechanism:

A
1 =file(“product.btx”).import@b().keys@i(id)
2 =file(“orders.btx”).cursor@b(p_id,quantity)
3 =A1.select@i(state==“CA”)
4 =A2.switch@i(p_id,A3)
5 =A4.groups(p_id.vendor;sum(p_id.price*quantity))

select@i() will reuse the index of original table on the filtered record sequence.

Index reuse is not always faster, because it requires to delete the filtered records from the index table. If the number of filtered records is large (the number of remaining records is small), the action will not be fast, while recreating index may be faster. Therefore, the specific method to be adopted should be determined based on the actual situation.


Performance Optimization - 6.7 [Foreign key association] Aligned sequence
Performance Optimization - Preface