7.3 Batch search

 

SQL

SELECT *
FROM Customers
WHERE CustomerID in (
    SELECT DISTINCT CustomerID 
    FROM Orders
    WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31' and Amount>2000 )

SPL

For equivalent value search, batch search is often needed and sometimes multiple values fall in the same index segment. In order to avoid reading the same segment repeatedly, we use contain function to automatically sort the target search values, and before searching for the next value, compare it with the previous retrieved index segment. If the value is contained in the segment, just reuse the retrieved segment; and if the value isn’t contained in it, discard the segment and read the next segment. This helps to increase efficiency.

A
1 >st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et)
2 =file(“Customers.ctx”).open()
3 =A2.index(ids;CustomerID)
4 =file(“Orders_Time.ctx”).open().cursor@m(CustomerID;OrderDate>=start && OrderDate<=end && Amount>2000)
5 =A4.groups(CustomerID).(#1)
6 =A2.icursor(;A5.contain(CustomerID)).fetch()

When data volume is large, the index is large, too. In this case we need to create a multilevel index, and retrieve it level by level for each search before finally locating the target value.
Some indexes will be repeatedly used for hundreds of thousands of times. If they are loaded each time they are needed, there will be noticeable delay.

SPL offers the automatic index caching mechanism to cache the retrieved index segments so that there is no need to re-retrieve them when they are needed in a short time. This can effectively reduce search delay. Besides, SPL provides method of actively preloading certain index segments at system startup. It speeds up the subsequent searches because you don’t need to wait until those segments are accessed.

Below is SPL code of actively preloading a multilevel index:

A
1 =file(“Orders.ctx”).open()
2 =A1.index@3(idh)

A2 Preload the first three levels of composite table A1’s index idh.

How many levels of index will be loaded is determined by the memory capacity. Generally speaking, loading the first three levels of index could take up about several GBs space.

When the composite table file stores data row-wise, its index is a row-wise storage index. When the file uses column-wise storage, its index is a column-wise storage index. The code of creating an index is the same as above and we’ll just skip it.
The search performance of a column-wise storage index is not as good as that of a row-wise storage index. Column-wise storage is more suitable for traverse computations. When there are a lot of columns in a table and only a very small number of columns are going to be retrieved, the column-wise storage has big advantages.

Therefore, we should select row-wise storage or column-wise storage as needed when dumping data to a composite table file.

When a composite table file uses column-wise storage and is expected to be frequently searched, we can write the mostly used fields to index with their values attached. Now the index table stores not only index fields and ordinal numbers of records, but the most frequently retrieved fields. When a search only involves fields in them, we don’t need to access the original composite table. This amounts to searching row-wise data, and gets higher performance.

Below is code for creating an index with values attached:

A
1 =file(“Customers.ctx”).open()
2 =A1.index(ids;CustomerID;CustomerName,City,Region)

A2 Write CustomerName, City and Region fields to the index with their values attached when creating the index. If the search involves only the three, there is no need to read the original composite table.