Calculate Adjacent Rows with Continuous the-Same-Category Data

 

Example

The file book1.xlsx records the time of the user's event, where the data has been sorted by the USER ID and DATE columns. Part of the data is as follows:

..

The rule for calculating the isTrue column which records the truth or false of the event is: within the same USER ID, the first event must be true; starting from the second event, if the interval with the last true event is greater than or equal to 10 days, the event is true, and other events are all false. The result is expected to be as follows:

..

Write SPL script:


A

1

=T("e:/work/book1.xlsx")

2

=A1.derive(if('USER ID'!='USER   ID'[-1],(t=DATE,1),if(DATE-t>=10,(t=DATE,1),0)):isTrue)

3

=T("e:/work/book2.xlsx",A2)

A1 Read the data of book1.xlsx

A2 Add a new column isTrue and calculate its value: If the USER ID of this row is different from the previous row, it means that the current row is the first event in the group, and the return value is 1, and the temporary variable t = the DATE of this row. If the USER ID of this row is the same as the previous row, within the group, calculate the interval between the temporary variable and the date of this row, return the value 1 when the interval is greater than or equal to 10, and modify the temporary variable t = the DATE of the row. 'USER ID'[-1] represents the USER ID value of the previous row.

A3 Store the result of A2 to book2.xlsx