Grouping Data by a Non-field Condition in SPL
Writing complex SQL-style queries during database application development is hard - when you want to group data by a non-field condition, for example. It’s hard because the grouping condition isn’t an attribute in the to-be-grouped data; instead it may come from another table or is an external parameter, a list of conditions, etc. Though it’s not so difficult to handle a limited number of cases in SQL, such as when a grouping condition comes from a table and doesn’t specify a certain order, it’s really thorny to deal with more demanding conditions with the so-called universal language.
Yet there is an expressway leading to the goal. SPL (Structured Process Language) can manipulate scenarios with various non-field conditions flexibly and easily. We’ll illustrate how it works through examples.
sales table stores orders records, where Client column contains clients names and Amount column contains orders amounts, as shown below:
OrderID |
Client |
SellerId |
OrderDate |
Amount |
10248 |
VINET |
5 |
2013/7/4 |
2440 |
10249 |
TOMSP |
6 |
2013/7/5 |
1863.4 |
10250 |
HANAR |
4 |
2013/7/8 |
1813 |
10251 |
VICTE |
3 |
2013/7/8 |
670.8 |
10252 |
SUPRD |
4 |
2013/7/9 |
3730 |
10253 |
HANAR |
3 |
2013/7/10 |
1444.8 |
10254 |
CHOPS |
5 |
2013/7/11 |
625.2 |
10255 |
RICSU |
9 |
2013/7/12 |
2490.5 |
10256 |
WELLI |
3 |
2013/7/15 |
517.8 |
Task: Group the table by “potential clients list” and sum Amounts for each group.
Example 1: The “potential clients list” is potential table’s Std field, which has 4 records – ANATR, BERGS, LACOR and ZTOZ. Client ZTOZ isn’t included in sales table. The output result should be arranged according to the order of the records.
If order isn’t compulsory, SQL can handle it easily:
select potential.std as client, sum(sales.amount) as amount from potential left join client on potential.std=sales.client group by potential.std
But since this case requires order, SQL needs to invent a field to align to the specified order and remove it with a subquery when the necessary computations are completed. It’s much easier to do it in SPL:
A |
|
1 |
=sales=db.query ("select * from sales") |
2 |
=potential=db.query("select * from potential") |
3 |
=sales.align@a(potential:STD,CLIENT) |
4 |
=A3.new(potential(#).STD:CLIENT,~.sum(AMOUNT):AMOUNT) |
A1,B1: Retrieve sales data and potential data from the database.
A3: =sales.align@a(potential:STD,CLIENT)
align() function aligns sales’ Client field with potential’s Std field to divide the former table into four groups:
Group 4 is contains nothing because the 4th value in Std field doesn’t exist in sales. align() function works with @a option to get all records from each group; without the option the function gets only the first record from each group.
A4: =A3.new(potential(#).STD:CLIENT,~.sum(AMOUNT):AMOUNT)
new() function generates a new table sequence with two fields - potential.STD, which is potential table’s Std field and ~.sum(AMOUNT), which sums Amount field for A3’s each group. Below is the final result:
Example 2: The “potential clients list” contains a great number of fixed members.
If there are only several members in the potential clients list, SQL can UNION all client records into a pseudo table. When there are a lot of members, as in this case, SQL needs to create a new table to store records. SPL can handle this without creating a new table:
A |
|
1 |
=sales=db.query ("select * from sales") |
2 |
=potential=["ALFKI","BSBEV","FAMIA","GALED","HUNGC","KOENE","LACOR","NORTS","QUICK","SANTG","THEBI","VINET","WOLZA"] |
3 |
=sales.align@a(potential,CLIENT) |
4 |
=A3.new(potential(#):CLIENT,~.sum(AMOUNT):AMOUNT) |
A2 is a sequence of strings named potential. A3 and A4 accesses the sequence to reference its members for computation.
Example 3: The “potential clients list” is an external parameter in the form of "BSBEV","FAMIA","GALED".
The external parameter value is dynamic, so it’s inconvenient to create a pseudo table by UNION in SQL. We have to create a temporary table, parse the parameter and insert corresponding records into the table. There’s no need to create such a temporary table if we handle it with SPL:
First, we need to define a parameter named clients:
Here’s the SPL script:
A |
|
1 |
=sales=db.query ("select * from sales") |
2 |
=potential=clients.array() |
3 |
=sales.align@a(potential,CLIENT) |
4 |
=A3.new(potential(#):CLIENT,~.sum(AMOUNT):AMOUNT) |
Run the script to input the parameter value:
The result is different according to different input parameter (the grouping condition). Here’s the result for this case:
Example 4: The grouping condition can be intervals. For example, group orders amounts by <=1000, >1000&& <=2000, >2000 && <=4000, >4000 and sum amounts for each group.
If the intervals are static, we can embed them in the SQL statement. If they are a dynamic external parameter, we need to compose a SQL statement using a high-level language like Java, which is complicated. SPL supports dynamic expressions to handle this effortlessly:
A |
|
1 |
=sales=db.query ("select * from sales") |
2 |
=byFac=["?<=1000" ,"?>1000 && ?<=2000","?>2000 && ?<=4000","?>4000"] |
3 |
=sales.enum(byFac,AMOUNT) |
4 |
=A18.new(byFac(#):byFac,~.sum(AMOUNT):AMOUNT) |
Variable byFac is the grouping condition covering four intervals. It also can be an external parameter, or a database view or a database table. Below is the final result:
Example 5:
In real-world businesses, the intervals can be overlapping. For example:
>=1000 && <=4000: Ordinary orders (r14)
<=2000: small orders (r2)
>=3000: VIP orders (r3)
Both r2 and r3 overlap r14. But we want non-overlapping records only. To do this we get records meeting r14, and then those meeting r2 from the selected ones, and then those meeting r3 from the re-selected records.
SPL’s enum() function handles data grouping according to overlapping intervals:
A |
|
1 |
=sales=db.query ("select * from sales") |
2 |
=byFac=["?>=1000 && ?<=4000" ,"?<=2000","?>=3000"] |
3 |
=sales.enum(byFac,AMOUNT) |
4 |
=A3.new(byFac(#):byFac,~.sum(AMOUNT):AMOUNT) |
Below is A3’s grouping result:
A4’s result:
Sometimes we want records falling in all intervals. To do this we get records meeting r14 from sales, then get records meeting r2 from the table, and then get those meeting r3 from all records. SPL gets it done by using @r option with enum() function. Now A3’s statement becomes =sales.enum@r(byFac,AMOUNT). Below is the grouping result:
Here we get records falling within [1000,4000] (Surrounded by red box). Here’s the final result:
SPL supports outputting the result directly and feeding data to a Java application or a reporting tool through invocation. SPL offers JDBC interface to return the result to a Java main application in the form ResultSet. The call of an SPL script is the same as calling a database. See How to Call an SPL Script in Java to learn details.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL