How to group according to field value changes in time sequence in SQL

 

Key wordsSQL ordered grouping; Adjacent row comparison

The sorted (usually sorted by time) data is grouped and counted according to the change of a field, that is, the value of the grouping field is compared with the value of the previous row. If it is the same, it is divided into the same group as the previous row, and if it is different, a new group is created.

It's hard to do this with SQL!

The set of SQL is order-less. In early SQL, there was no method of adjacent row reference. In SQL 2003 standard, window function is added, which can reference adjacent row, but grouping is still very difficult, and you need to use subquery to create grouping sequence number artificially.

For exampleQuery the city where a person stays for a certain period of time, and list the start and end time. The existing database table footmark is as follows:

NAME

FOOTDATE

CITY

Tom

2020-01-02 08:30:00

New York

Tom

2020-01-03 08:30:00

New York

Tom

2020-01-04 13:30:05

New York

Tom

2020-01-04 16:36:00

Washington

Tom

2020-01-05 08:30:00

Washington

Tom

2020-01-06 12:30:00

Washington

Tom

2020-01-06 17:30:25

New York

Tom

2020-01-07 09:30:05

New York

Tom

2020-01-09 16:30:00

New York

 

The expected result is as follows

NAME

CITY

STARTDATE

ENDDATE

Tom

New York

2020-01-02 08:30:00

2020-01-04 13:30:05

Tom

Washington

2020-01-04 16:36:00

2020-01-06 12:30:00

Tom

New York

2020-01-06 17:30:25

2020-01-09 16:30:00

 

Take Oracle as an example. It is written in SQL as follows:

WITH A AS

       ( SELECT NAME, FOOTDATE, CITY,

              CASE WHEN CITY=LAG(CITY) OVER (PARTITION BY NAME ORDER BY FOOTDATE) THEN 0 ELSE ROWNUM END FLAG

       FROM FOOTMARK ORDER BY NAME, FOOTDATE),

B AS

       ( SELECT NAME, FOOTDATE, CITY,

              MAX(FLAG) OVER (PARTITION BY NAME ORDER BY FOOTDATE) FLAG

       FROM A),

C AS

       ( SELECT NAME, CITY, FLAG,

MIN(FOOTDATE) STARTDATE,

MAX(FOOTDATE) ENDDATE

       FROM B

GROUP BY NAME, CITY, FLAG

ORDER BY NAME, FLAG )

SELECT NAME, CITY, STARTDATE, ENDDATE FROM C;

Here, the flag is an artificial grouping sequence number. The SQL is hard to write and understand.

For grouping by order, it would have been much simpler if you had used esProc SPL. Only one line is needed:

connect("mydb").query("SELECT * FROM FOOTMARK ORDER BY NAME,FOOTDATE").groups@o(NAME,CITY;min(FOOTDATE):STARTDATE,max(FOOTDATE):ENDDATE)

SPL is based on the ordered sets and provides the option @o of grouping by order. It is very easy to solve this problem.

 SPL provides multiple grouping methods, such as equivalent grouping, ordered grouping, ordered conditional grouping, sequence number grouping, nested grouping, big data ordered grouping, and ordered conditional grouping of big data. For details, please refer to grouped subsets

esProc SPL is a professional script language to solve difficult SQL problems. It has simple syntax, conforms to natural thinking, and is a natural step-by-step, procedure oriented computing language. It uses a unified syntax independent of database, and the algorithm can be seamlessly migrated between databases. A desktop computing tool, esProc is ready to use, with simple configuration and complete debugging functions: breakpoints and single step execution can be set, and the result of each step can be viewed. Please refer to: http://www.raqsoft.com/html/sql-enhancer.html

It's also easy to embed esProc into Java applicationsplease refer to How to Call an SPL Script in Java

For specific usage, please refer to  Getting started with esProc