Order-based Traversal Syntax

This is the sequel of the article Common Traversal Syntax.

5. Referencing sequence numbers

SQL is based on the mathematical concept of unordered sets. The order of members isn’t the primary consideration in its handling of traversal operation. Yet the computer can’t execute an operation at once (here we ignore the situation of parallel processing), it must traverses a set in a certain order. Making appropriate use of the order assists in expressing algorithms of various computing goals.

An example is to get a half of the members of a set and form a new one. It looks like a filtering operation, but actually the filtering criterion has nothing to do with the members of the set themselves, but is closely related to the sequence numbers of the members to be traversed.

The sign “~” alone is insufficient to express such a filtering. And another sign (identifier) is needed to represent the sequence numbers of the members being traversed.

In reality, most high-level languages support defining a loop variable to represent the sequence numbers in writing a loop statement. The loop variable plays the role of the “another sign”. Many set-oriented languages including SQL, however, don’t offer the mechanism. They must write a lengthy loop statement to deal with such a filtering, which is tedious. SQL’s way is to create sequence numbers with the subquery and then perform filtering over them.

But it’s easy to phrase the computation if we use a special sign, like“#”, to represent a sequence number.

A.select(#<=A.len()/2) Get the first half of the members

A.select(#%2==0) Get members whose sequence numbers are even number

A filtering operation returns members that meet the criterion. At times it’s the sequence numbers, instead of the members, that we really care about. For those cases, it’s necessary to define a filter function that returns sequence numbers.

A.pselect(~>5) Return sequence numbers of the members that are greater than 5

Here is another instance:

A.pmax() Return the sequence number of the max value

6. Referencing neighboring members or a set consisting of neighboring members

Having the syntax of expressing order, a language’s capability of phrasing algorithms will be increased.

Suppose there is sales data of 12 months that is arranged in a certain order. The task is to find the months whose growth rates are greater than 5%.

It’s complicated to express the inter-row computation in SQL. You need to align the data of the previous month to that of the current one using a JOIN or a window function before calculating the growth rates. In the process the subquery is inevitable.

The computing process will be easy if there is a set of syntax rules for referencing neighboring members.

For instance, we can use [i] to represent a member that is i members away from the current member. And with the introduction of the sign “#”, the above computation can be expressed this way:

A.(if(~/~[-1]>1.05,#,0)).select(~>0)

~[-1] represents the previous member, i.e., the sales amount of the last month. Then the statement marks months when growth rates are greater than 5% (which is represented by #) while setting the other months as zero, and then gets the non-zero months.

By using the filter function that returns sequence numbers, the code can be much simpler:

A.pselect(~/~[-1]>1.05)

In some other occasions, we need to reference a set made up of neighboring members. For example, over the set mentioned in the above computing scenario, we want to calculate the moving average of the sales amount spanning the previous month, the current month and the next month.

By scaling out the expression [i] into the expression [a,b], we can express a set containing a series of neighboring members. Now it’s quite easy to phrase the computation:

A.(~[-1,1].avg())

There are some more complicated occasions involving the reference of such a set. Like calculating the cumulative sales amount until the specified current month.

The absence of a in the expression [a,b] means beginning from the first member (Similarly, the absence of b means doing a traversal continuously to the last member). The computation can be written as:

A.(~[,0].sum())

Likewise, we can reference a field by its name when processing structure data. Suppose the set in the example is a table containing two fields – Month and Amount. The above computations can be expressed this way:

A.select(Amount/Amount[-1]>1.05) # isn’t needed since there is already the Month field in the result set

A.derive(Amount[-1,1].avg:MovingAverage) Add a MovingAverage field

A.derive(Amount[,0].sum():CumulativeAmount)

Compared with the common traversal syntax, the order-based traversal syntax is much more complicated. But as the computing scenarios requiring the order-based traversal operations are common in real-world practice, it’s necessary to design a set of easy-to-use syntax rules. Otherwise it will be inconvenient to phrase those computations and multiline loop statement is needed, which is tedious and difficult to read.

SQL doesn’t provide the syntax for performing order-based traversals. It uses subqueries and window functions to generate sequence numbers. And it can’t handle certain complicated order-based computations, needing the stored procedure to generate a multiline loop statement. In view of this, SQL isn’t a thoroughly set-oriented language.