From SQL to SPL: Pair and transpose adjacent records within the group
A certain table stores records of personnel from external sources entering and leaving a building by swiping their cards in recent days. When a person enters and exits a building, there are usually N pairs of records, with one in and one out for each pair. Sometimes the data is not standardized or paired, with only one in or one out, or multiple consecutive ins or outs.
username |
building |
action |
timestamp |
user-1 |
building-1 |
IN |
2024-04-10 01:00:00.000 |
user-1 |
building-1 |
OUT |
2024-04-10 02:00:00.000 |
user-1 |
building-1 |
IN |
2024-04-10 02:30:00.000 |
user-1 |
building-1 |
OUT |
2024-04-10 04:00:00.000 |
user-1 |
building-1 |
IN |
2024-04-11 10:00:00.000 |
user-1 |
building-1 |
OUT |
2024-04-11 11:00:00.000 |
user-2 |
building-1 |
IN |
2024-04-12 10:00:00.000 |
user-2 |
building-1 |
OUT |
2024-04-12 11:00:00.000 |
user-2 |
building-2 |
IN |
2024-04-10 08:00:00.000 |
user-2 |
building-2 |
OUT |
2024-04-10 09:00:00.000 |
user-2 |
building-3 |
OUT |
2024-04-11 02:30:00.000 |
user-2 |
building-4 |
IN |
2024-04-11 04:00:00.000 |
user-3 |
building-1 |
OUT |
2024-04-10 01:00:00.000 |
user-3 |
building-1 |
IN |
2024-04-10 10:00:00.000 |
user-3 |
building-1 |
IN |
2024-04-10 11:00:00.000 |
user-3 |
building-1 |
IN |
2024-04-10 12:00:00.000 |
user-3 |
building-1 |
OUT |
2024-04-10 13:00:00.000 |
user-3 |
building-1 |
OUT |
2024-04-10 14:00:00.000 |
user-3 |
building-1 |
OUT |
2024-04-10 15:00:00.000 |
Now we need to convert each pair of records from rows to columns and turn them into a single record. Unmatched data should be converted into a separate record, and the blank part should be filled with null.
username |
building |
IN |
OUT |
user-1 |
building-1 |
2024-04-10 01:00:00.000 |
2024-04-10 02:00:00.000 |
user-1 |
building-1 |
2024-04-10 02:30:00.000 |
2024-04-10 04:00:00.000 |
user-1 |
building-1 |
2024-04-11 10:00:00.000 |
2024-04-11 11:00:00.000 |
user-2 |
building-1 |
2024-04-12 10:00:00.000 |
2024-04-12 11:00:00.000 |
user-2 |
building-2 |
2024-04-10 08:00:00.000 |
2024-04-10 09:00:00.000 |
user-2 |
building-3 |
2024-04-11 02:30:00.000 |
|
user-2 |
building-4 |
2024-04-11 04:00:00.000 |
|
user-3 |
building-1 |
2024-04-10 01:00:00.000 |
|
user-3 |
building-1 |
2024-04-10 10:00:00.000 |
|
user-3 |
building-1 |
2024-04-10 11:00:00.000 |
|
user-3 |
building-1 |
2024-04-10 12:00:00.000 |
2024-04-10 13:00:00.000 |
user-3 |
building-1 |
2024-04-10 14:00:00.000 |
We can first group by person and building, and then divide every 2 adjacent rows within the group into a small group and transpose each small group. But after SQL grouping, it must aggregate immediately, and subsets cannot be kept to group again. This requires a shift in thinking and using multi-layer nested window functions to bypass this problem, which is difficult to code. Grouping by adjacent IN/OUT states is another hassle, requiring the use of multiple layers of PARTITION to generate accumulated values for grouping. What's even more troublesome is that it must immediately aggregate after grouping, and the grouped data cannot be further transposed, which is still cumbersome.
After SPL grouping, the grouped subsets can be retained, and conditional ordered grouping is also supported. It is convenient to divide a pair of IN/OUT records into a group, and adjacent references are also used in this process. SPL can continue to keep grouped subsets, making it easier to transpose each group.
A |
|
1 |
=dbConn.query("select * from tb order by username,building,timestamp") |
2 |
=A1.group(username,building) |
3 |
=A2.conj(~.group@i(action=="IN" || action[-1]=="OUT")) |
4 |
=A3.conj(~.pivot(username,building;action,timestamp;"IN","OUT")) |
A1: Load data from the database and sort it by username, building, timestamp.
A2: Use the group function to group by person and building, but do not aggregate.
A3: For each group of data in A2, divide each group of IN and OUT (including individual IN and OUT) into a small group, and merge the large group to leave only the small group. The option @i of the group function indicates grouping by condition. ~ is the current group, and [-1] represents the previous record.
A4: Convert each group of data in A3 from rows to columns, union each group and leave only records. The pivot function is used for transposition, and the column names after transposition can be specified as IN and OUT to ensure that null is automatically filled in case of missing data.
Question source:https://stackoverflow.com/questions/78319976/postgresql-how-to-calculate-swipe-in-and-swipe-out-time
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL