From SQL to SPL: Add records that meet the criteria before each group after grouping

In a certain view of the PostgreSQL database, the row_index field is an underscore separated string and also a grouping field. Some groups' row_index can be divided into 3 parts, while others can be divided into 2 parts.

row_index

id

seq

value_text

1007_0_0

1007

1

800

1007_0_0

1007

2

1110

1007_0_0

1007

4

road, roadwork

1007_0_0

1007

5

100

1007_0_1

1007

1

800

1007_0_1

1007

2

1115

1007_0_1

1007

4

road, roadwork

1007_0_1

1007

5

100

1007_0_2

1007

1

800

1007_0_2

1007

2

1105

1007_0_2

1007

4

road, roadwork

1007_0_2

1007

5

100

1007_0_3

1007

1

800

1007_0_3

1007

2

1120

1007_0_3

1007

4

road, roadwork

1007_0_3

1007

5

100

1007_0

1007

6

Rosedale

1007_0

1007

8

139

Now, before each group of records where row_index can be divided into 3 parts, add the group of records where row_index can be divided into 2 parts, and modify row_index to the row_index of each group.

row_index

id

seq

value_text

1007_0_0

1007

6

Rosedale

1007_0_0

1007

8

139

1007_0_0

1007

1

800

1007_0_0

1007

2

1110

1007_0_0

1007

4

road, roadwork

1007_0_0

1007

5

100

1007_0_1

1007

6

Rosedale

1007_0_1

1007

8

139

1007_0_1

1007

1

800

1007_0_1

1007

2

1115

1007_0_1

1007

4

road, roadwork

1007_0_1

1007

5

100

1007_0_2

1007

6

Rosedale

1007_0_2

1007

8

139

1007_0_2

1007

1

800

1007_0_2

1007

2

1105

1007_0_2

1007

4

road, roadwork

1007_0_2

1007

5

100

1007_0_3

1007

6

Rosedale

1007_0_3

1007

8

139

1007_0_3

1007

1

800

1007_0_3

1007

2

1120

1007_0_3

1007

4

road, roadwork

1007_0_3

1007

5

100

The records where row_index can be divided into three parts can be grouped by row_index, and then each group of records can be processed by merging the records where row_index can be divided into two parts before each group of records. But after SQL grouping, it must aggregate immediately, and subsets cannot be kept to continue processing each group of records. This requires a detour to solve, using multi-layer nested window functions to bypass this problem, which is difficult to code.

SPL supports retaining subsets after grouping, allowing for continued processing of each group of records.



1

=postgresql.query("select * from view1")

2

=A1.select(row_index.split("_").len()==2)

3

=(A1\A2).group@u(row_index)

4

=A3.conj(A2.new(A3.row_index,id,seq,value_text)|~)

A1: Query the database through JDBC.

A2: Retrieve the records where row_index can be divided into two parts.

A3: Remove A2 from the complete data, which means getting records that row_index can be divided into three parts. Group these records while keeping the order unchanged. The symbol \ represents the difference set, and group@u indicates keeping the original order after grouping.

A4: Loop each group of data: Create a new two-dimensional table according to A2, change row_index to the row_index of this group, keep other fields unchanged, and merge the new two-dimensional table with the data of this group. Finally, merge the data of each group. The symbol | represents merging set members.

Question sourcehttps://stackoverflow.com/questions/78346354/adding-tuple-rows-to-each-subtuple-group-in-sql