"In a certain view of the PostgreSQL database, the row_index field is an underscore separated str .."

jinxing RaqForum 41 No.
69 View • 1 Weeks ago

From SQL to SPL: Add records that meet the criteria before each group after grouping

In a certain view of the PostgreSQL database, the row_index field is an underscore separated string and also a grouping field. Some groups' row_index can be divided into 3 parts, while others can be divided into 2 parts.

row_index	id	seq	value_text
1007_0_0	1007	1	800
1007_0_0	1007	2	1110
1007_0_0	1007	4	road, roadwork
1007_0_0	1007	5	100
1007_0_1	1007	1	800
1007_0_1	1007	2	1115
1007_0_1	1007	4	road, roadwork
1007_0_1	1007	5	100
1007_0_2	1007	1	800
1007_0_2	1007	2	1105
1007_0_2	1007	4	road, roadwork
1007_0_2	1007	5	100
1007_0_3	1007	1	800
1007_0_3	1007	2	1120
1007_0_3	1007	4	road, roadwork
1007_0_3	1007	5	100
1007_0	1007	6	Rosedale
1007_0	1007	8	139

Now, before each group of records where row_index can be divided into 3 parts, add the group of records where row_index can be divided into 2 parts, and modify row_index to the row_index of each group.

row_index	id	seq	value_text
1007_0_0	1007	6	Rosedale
1007_0_0	1007	8	139
1007_0_0	1007	1	800
1007_0_0	1007	2	1110
1007_0_0	1007	4	road, roadwork
1007_0_0	1007	5	100
1007_0_1	1007	6	Rosedale
1007_0_1	1007	8	139
1007_0_1	1007	1	800
1007_0_1	1007	2	1115
1007_0_1	1007	4	road, roadwork
1007_0_1	1007	5	100
1007_0_2	1007	6	Rosedale
1007_0_2	1007	8	139
1007_0_2	1007	1	800
1007_0_2	1007	2	1105
1007_0_2	1007	4	road, roadwork
1007_0_2	1007	5	100
1007_0_3	1007	6	Rosedale
1007_0_3	1007	8	139
1007_0_3	1007	1	800
1007_0_3	1007	2	1120
1007_0_3	1007	4	road, roadwork
1007_0_3	1007	5	100

The records where row_index can be divided into three parts can be grouped by row_index, and then each group of records can be processed by merging the records where row_index can be divided into two parts before each group of records. But after SQL grouping, it must aggregate immediately, and subsets cannot be kept to continue processing each group of records. This requires a detour to solve, using multi-layer nested window functions to bypass this problem, which is difficult to code.

SPL supports retaining subsets after grouping, allowing for continued processing of each group of records.


1	=postgresql.query("select * from view1")
2	=A1.select(row_index.split("_").len()==2)
3	=(A1\A2).group@u(row_index)
4	=A3.conj(A2.new(A3.row_index,id,seq,value_text)\|~)

A1: Query the database through JDBC.

A2: Retrieve the records where row_index can be divided into two parts.

A3: Remove A2 from the complete data, which means getting records that row_index can be divided into three parts. Group these records while keeping the order unchanged. The symbol \ represents the difference set, and group@u indicates keeping the original order after grouping.

A4: Loop each group of data: Create a new two-dimensional table according to A2, change row_index to the row_index of this group, keep other fields unchanged, and merge the new two-dimensional table with the data of this group. Finally, merge the data of each group. The symbol | represents merging set members.

Question source：https://stackoverflow.com/questions/78346354/adding-tuple-rows-to-each-subtuple-group-in-sql

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/cFTcUNs7

Youtube 👉 https://www.youtube.com/@esProc_SPL

Application