Create a New column That Contains a List of Values from Another Column Subsequent Rows



I have a table like below


2021-12-03 04:03:45,ID1,O

2021-12-03 04:03:46,ID1,P

2021-12-03 04:03:47,ID1,Q

2021-12-03 04:03:48,ID1,R

2021-12-03 04:03:49,ID1,

2021-12-03 04:03:50,ID1,S

2021-12-03 04:03:51,ID1,T

2021-12-04 11:09:03,ID2,A

2021-12-04 11:09:04,ID2,B

2021-12-04 11:09:05,ID2,C

2021-12-04 11:09:06,ID2,D

And want to create a new column that contains a list of values from another column subsequent rows like below:


2021-12-03 04:03:45,ID1,O,P,Q,R

2021-12-03 04:03:46,ID1,P,Q,R

2021-12-03 04:03:47,ID1,Q,R

2021-12-03 04:03:48,ID1,R,

2021-12-03 04:03:49,ID1,,

2021-12-03 04:03:50,ID1,S,T

2021-12-03 04:03:51,ID1,T,

2021-12-04 11:09:03,ID2,A,B,C,D

2021-12-04 11:09:04,ID2,B,C,D

2021-12-04 11:09:05,ID2,C,D

2021-12-04 11:09:06,ID2,D,

for copy paste: timestamp ID Value

2021-12-03 04:03:45 ID1 O

2021-12-03 04:03:46 ID1 P

2021-12-03 04:03:47 ID1 Q

2021-12-03 04:03:48 ID1 R

2021-12-03 04:03:49 ID1 NULL

2021-12-03 04:03:50 ID1 S

2021-12-03 04:03:51 ID1 T

2021-12-04 11:09:03 ID2 A

2021-12-04 11:09:04 ID2 B

2021-12-04 11:09:05 ID2 C


Data that is ordered by TMP is also ordered by ID logically. So, you can group rows first by ID; in each group, create a new group when the previous VALUE is null; in each subgroup, use the comma to join up VALUEs from the second to the last non-null VALUE to form a sequence and make it the value of the new column LIST. A SQL set is unordered, which makes computing process very complicated. You need to first create a marker column using the window function, perform a self-join by the marker column, and group rows and join up VALUE values to get the desired result. A common alternative is to fetch the original data out of the database and process it in Python or SPL. SPL, the open-source Java package, is easy to be integrated into a Java program and generates much simpler code. It can get it done with only two lines of code:






View SPL source code.