Implement grouping by continuous equivalence from MySQL in one line

 

This grouping is based on comparing the field values of the previous row to produce a new group when changes occur. SQL only supports equivalent grouping. In order to realize this kind of orderly grouping, we have to transform the data into equivalent grouping after several times. Take the new MySQL that supports window functions as an example. It goes through three steps:

1.  Get the change flag field. Get the field value of the previous row through the window function lag, flag = 1 if change occurs, otherwise it is 0;

2.  Add up the flag field to get the ACC field. Add from the first row to the current row, and it also needs to be achieved by using window function: sum(flag)…BETWEEN ROWS UNBOUNDED AND CURRENT ROW;

3.  According to the field of ACC, the general equivalent grouping can be done.

For MySQL without window functions in the early days, it could be realized theoretically, which is more complex, so I won't go into details.

 

This kind of orderly grouping would be very simple if written in SPL language. With the @o option of group operation, you can finish it in one line:

=connect(”mysqlDB”).query(“select * from t”).group@o(f)

After the grouping operation, the sequence table B of two-tier structure is obtained. For the second layer of grouped subsets, it is easy to do any calculation:

=B.max(~.len())      Get the number of rows in the maximum group;

=B.(~.new(f,avg(f2))).conj()    Get the average value of f2 of each group;

=B.(~.sort@z(f2).m(to(3))).conj()  Take out the three largest rows of f2 of each group;

 

In addition to continuous equivalence grouping, there are more general ordered conditional grouping. For example, when the difference between the current row and the previous row is greater than 10, a new group will be generated:

=connect(”mysqlDB”).query(“select * from t”).group@i(f>~[-1].f+10)

There are also grouping by sequence number, nested grouping and many other grouping methods that are difficult to implement by SQL. For details, please refer to Grouped subsets.

 

When the data is not in the database, it is still convenient for SPL to perform complex calculations:

=file(“d:/t.csv”).import(;,",").group...

It's also easy to embed esProc into Java applicationsplease refer to How to Call an SPL Script in Java

For specific usage, please refer to  Getting started with esProc