Grouping Data by a Non-field Condition in SPL

 

  Writing complex SQL-style queries during database application development is hard - when you want to group data by a non-field condition, for example. It’s hard because the grouping condition isn’t an attribute in the to-be-grouped data; instead it may come from another table or is an external parameter, a list of conditions, etc. Though it’s not so difficult to handle a limited number of cases in SQL, such as when a grouping condition comes from a table and doesn’t specify a certain order, it’s really thorny to deal with more demanding conditions with the so-called universal language.

  Yet there is an expressway leading to the goal. SPL (Structured Process Language) can manipulate scenarios with various non-field conditions flexibly and easily. We’ll illustrate how it works through examples.

  sales table stores orders records, where Client column contains clients names and Amount column contains orders amounts, as shown below:

OrderID

Client

SellerId

OrderDate

Amount

10248

VINET

5

2013/7/4

2440

10249

TOMSP

6

2013/7/5

1863.4

10250

HANAR

4

2013/7/8

1813

10251

VICTE

3

2013/7/8

670.8

10252

SUPRD

4

2013/7/9

3730

10253

HANAR

3

2013/7/10

1444.8

10254

CHOPS

5

2013/7/11

625.2

10255

RICSU

9

2013/7/12

2490.5

10256

WELLI

3

2013/7/15

517.8

  Task: Group the table by “potential clients list” and sum Amounts for each group.

 

  Example 1: The “potential clients list” is potential table’s Std field, which has 4 records – ANATR, BERGS, LACOR and ZTOZ. Client ZTOZ isn’t included in sales table. The output result should be arranged according to the order of the records.

  If order isn’t compulsory, SQL can handle it easily:

  select potential.std as client, sum(sales.amount) as amount from potential left join client on potential.std=sales.client group by potential.std

  But since this case requires order, SQL needs to invent a field to align to the specified order and remove it with a subquery when the necessary computations are completed. It’s much easier to do it in SPL:


A

1

=sales=db.query  ("select * from sales")

2

=potential=db.query("select    * from potential")

3

=sales.align@a(potential:STD,CLIENT)

4

=A3.new(potential(#).STD:CLIENT,~.sum(AMOUNT):AMOUNT)

  A1,B1: Retrieve sales data and potential data from the database.

  undefined

  A3: =sales.align@a(potential:STD,CLIENT)

  align() function aligns sales’ Client field with potential’s Std field to divide the former table into four groups:

  undefined

  Group 4 is contains nothing because the 4th value in Std field doesn’t exist in sales. align() function works with @a option to get all records from each group; without the option the function gets only the first record from each group.

  A4: =A3.new(potential(#).STD:CLIENT,~.sum(AMOUNT):AMOUNT)

  new() function generates a new table sequence with two fields - potential.STD, which is potential table’s Std field and ~.sum(AMOUNT), which sums Amount field for A3’s each group. Below is the final result:

  undefined

 

  Example 2: The “potential clients list” contains a great number of fixed members.

  If there are only several members in the potential clients list, SQL can UNION all client records into a pseudo table. When there are a lot of members, as in this case, SQL needs to create a new table to store records. SPL can handle this without creating a new table:


A

1

=sales=db.query  ("select * from sales")

2

=potential=["ALFKI","BSBEV","FAMIA","GALED","HUNGC","KOENE","LACOR","NORTS","QUICK","SANTG","THEBI","VINET","WOLZA"]

3

=sales.align@a(potential,CLIENT)

4

=A3.new(potential(#):CLIENT,~.sum(AMOUNT):AMOUNT)

  A2 is a sequence of strings named potential. A3 and A4 accesses the sequence to reference its members for computation.

 

  Example 3: The “potential clients list” is an external parameter in the form of "BSBEV","FAMIA","GALED".

  The external parameter value is dynamic, so it’s inconvenient to create a pseudo table by UNION in SQL. We have to create a temporary table, parse the parameter and insert corresponding records into the table. There’s no need to create such a temporary table if we handle it with SPL:

  First, we need to define a parameter named clients:

  undefined

  Here’s the SPL script:


A

1

=sales=db.query  ("select * from sales")

2

=potential=clients.array()

3

=sales.align@a(potential,CLIENT)

4

=A3.new(potential(#):CLIENT,~.sum(AMOUNT):AMOUNT)

  Run the script to input the parameter value:

  undefined

  The result is different according to different input parameter (the grouping condition). Here’s the result for this case:

  undefined

 

  Example 4: The grouping condition can be intervals. For example, group orders amounts by <=1000, >1000&& <=2000, >2000 && <=4000, >4000 and sum amounts for each group.

  If the intervals are static, we can embed them in the SQL statement. If they are a dynamic external parameter, we need to compose a SQL statement using a high-level language like Java, which is complicated. SPL supports dynamic expressions to handle this effortlessly:


A

1

=sales=db.query  ("select * from sales")

2

=byFac=["?<=1000" ,"?>1000 && ?<=2000","?>2000 && ?<=4000","?>4000"]

3

=sales.enum(byFac,AMOUNT)

4

=A18.new(byFac(#):byFac,~.sum(AMOUNT):AMOUNT)

  Variable byFac is the grouping condition covering four intervals. It also can be an external parameter, or a database view or a database table. Below is the final result:

  undefined

 

  Example 5:

  In real-world businesses, the intervals can be overlapping. For example:

  >=1000 && <=4000: Ordinary orders (r14)
  <=2000: small orders (r2)
  >=3000: VIP orders (r3)

  Both r2 and r3 overlap r14. But we want non-overlapping records only. To do this we get records meeting r14, and then those meeting r2 from the selected ones, and then those meeting r3 from the re-selected records.

  SPL’s enum() function handles data grouping according to overlapping intervals:


A

1

=sales=db.query  ("select * from sales")

2

=byFac=["?>=1000 && ?<=4000" ,"?<=2000","?>=3000"]

3

=sales.enum(byFac,AMOUNT)

4

=A3.new(byFac(#):byFac,~.sum(AMOUNT):AMOUNT)

  Below is A3’s grouping result:

  undefined

  A4’s result:

  undefined

  Sometimes we want records falling in all intervals. To do this we get records meeting r14 from sales, then get records meeting r2 from the table, and then get those meeting r3 from all records. SPL gets it done by using @r option with enum() function. Now A3’s statement becomes =sales.enum@r(byFac,AMOUNT). Below is the grouping result:

  undefined

  Here we get records falling within [1000,4000] (Surrounded by red box). Here’s the final result:

  undefined

  SPL supports outputting the result directly and feeding data to a Java application or a reporting tool through invocation. SPL offers JDBC interface to return the result to a Java main application in the form ResultSet. The call of an SPL script is the same as calling a database. See How to Call an SPL Script in Java to learn details.