Designing Lambda syntax from set operations

A programming language with set-oriented features allows us to write complex operations on sets with very few statements, and the core issue is whether the Lambda syntax is designed conveniently, which directly determines the description efficiency of the programming language.

Let’s examine the possible scenarios of set operations from simple to complex, and see which requirements a good Lambda syntax needs to adapt to.


Firstly, it is the operation that directly uses set members, such as calculating the sum of set members.

This is the simplest case, and using a regular function syntax is ok, to pass in the set to be calculated as a parameter. For example, sum(A) is used to calculate the sum of the members of set A, and of course, it can also be written in an object-oriented syntax as A.sum().

At this point, Lambda syntax is not yet involved, only the set itself is taken as a parameter of the function. This counts as the 0th rule.


Then it is necessary to be able to easily reference set members in the calculation formula, and it counts as the 1st rule.

If we are not calculating the sum of set members, but rather calculating the sum of squares, how should we describe this square?

This is where we start using Lambda syntax, and the square here is essentially a function that takes the current member of a set as a parameter and returns the square of that parameter. The Lambda syntax allows a function to be written as a parameter into another function, so that one statement is enough without the need to define the function specifically.

But here’s a problem, what identifier or symbol is used to represent this current member in Lambda syntax?

Java and other languages will define parameter names like regular functions, which will make Lambda expressions cumbersome and lose conciseness, which is not a good solution.

For this extremely common situation, using a fixed identifier or symbol appears simpler. For example, in SPL, using ~ to represent the current member, the sum of squares can be written as A.sum(~*~), which is simple and easy to understand. It can also be done in two steps. First, calculate the square of the set members to form a new set, and then calculate the sum of the new set, written as A.(~*~).sum().


However, SQL, which is considered to have set-oriented features, does not use a symbol or identifier to represent the current member. So how does SQL solve this problem?

SQL does not have a set that can be composed of any data in the ordinary sense. The set of SQL is a table, and the members of the table are records. SQL has the concept of records, but records cannot be referenced as a data type. If we want to perform a set operation on a single valued member in SQL, we can only understand the single value as a record with only one field, and then perform operations on a table with these records. All calculations are for certain fields, not for the entire record.

But what does this have to do with SQL does not have a symbol to represent the current member?

When discussing the set-oriented features, we also mentioned that Lambda syntax for structured data needs to have a concise way to reference fields. SQL provides the convenient mechanism for directly referencing fields, while SQL can only calculate fields, so there is no need to provide a means to reference the current member (record) anymore. Calculating the sum of squares in SQL is always the sum of squares of a certain field, and when the entire record is a member of a set, calculating the sum of squares is meaningless.

SQL sacrifices the expressive power of sets and simplifies syntax. However, for SPL that can support the composition of a set by generic members, the ~ notation is necessary.

At the same time, when used for structured data calculation, the writing method of SQL that can directly reference fields also needs to be supported for convenience. When calculating sales amounts, written as “~.price*~.quantity” is obviously not as simple and intuitive as writing “price*quantity”. SPL also provides such support, which is the 2nd rule of Lambda syntax, i.e., when dealing with structured data, fields can be directly referenced.


Thirdly, we also need to consider the rules for nested references.

Set operations are essentially a loop, and loop statements may have multiple nested layers, and set operations may also be nested. For example, calculating the intersection of two sets A and B, a simple algorithm is to loop through the members of A and see if they have appeared in set B, which will involve two layers of loops.

At this point, the ~ writing method will have ambiguity, whether it refers to the current member of set A or set B, which requires a clear convention in the syntax rules.

The principle of proximity can generally be adopted, which means that if it is not specified which set it belongs to, it is assumed to be the inner set by default, while the current member of the outer set needs to explicitly indicate which set it belongs to. Following this rule in SPL, the expression for calculating the intersection can be written as A.select(B.count(~==A.~)>0), where ~ defaults to the current member of B, and the other should be explicitly written as A.~ to distinguish.

When dealing with structured data calculations, field names can be directly referenced, which can also lead to inner and outer ambiguity, and the principle of proximity is also applicable. SQL is like this. When the inner and outer tables have the same field names, they are considered as fields of the inner table by default. When referring to fields with the same name in the outer table, the table name must be explicitly written; If there are no identical field names in the inner and outer tables, they can be correctly identified without the need to write the table name. SPL also supports this rule.


The overall performance of SQL in the Lambda syntax design of set operations is good. Apart from lacking generic members in sets, it is relatively convenient and concise to describe regular set operations. SPL has expanded the data organization of SQL by introducing discreteness, supporting sets of arbitrary members, and Lambda syntax has also added some content.