From SQL to SPL: Count distinct within intervals in order

The Teradata database table has three fields: date, account, and the name of the product purchased by the account.

Date

ACCOUNT

NAME

2024-01-01

A

XOXO

2024-01-02

A

XOXO

2024-01-02

A

OXXO

2024-01-04

A

XOOX

2024-01-05

A

OOXO

2024-01-06

A

XOXO

2024-01-01

B

B11

2024-01-02

B

B21

2024-01-02

B

B21

2024-01-02

B

B22

2024-01-02

B

B11

2024-01-03

B

B31

2024-01-01

C

C1

2024-01-02

C

C1

2024-01-03

C

C1

Now we need to calculate how many types of products each account has purchased before each day, that is, the number of different products purchased from the beginning to the previous day.

Date

ACCOUNT

COUNT_DISTINCT_NAME

2024-01-01

A

Null

2024-01-02

A

1

2024-01-04

A

2

2024-01-05

A

3

2024-01-06

A

4

2024-01-01

B

Null

2024-01-02

B

1

2024-01-03

B

3

2024-01-01

C

null

2024-01-02

C

1

2024-01-03

C

1

Group by ACCOUNT, then by date, process each group in turn, get the interval from first group from to the current group, and perform deduplication and counting. But SQL grouping must be accompanied by aggregation, and cannot retain grouped subsets for further processing; At this point, it is necessary to change the mindset, set various markers to transition, and write multi-layer nested statements to repeatedly traverse the data table in order to calculate.

SPL allows grouping to retain grouped subsets for further processing. SPL's cross row references are also much simpler than window functions:


 A

1

=teraJDBC.query("select * from tb”)

2

=A1.group(ACCOUNT)

3

=A2.conj(~.group(Date;ACCOUNT,~[:-1].conj().icount(NAME):COUNT_DISTINCT_NAME))

4

=A3.run(if(#3==0,#3=null))

A1: Load data through JDBC.

A2: Group by account.

A3: Process each group of A2: Continue grouping by date, process daily data, get data of groups from the beginning to the previous day, union these groups, and calculate the quantity of different product names. Finally, union the processing results of each upper-level group. [: -1] represents the set from the first member to the previous member, and the function icount is used for unique count, i.e. count (distinct)

A4: Change the 0 in column 3 to null.

Question sourcehttps://stackoverflow.com/questions/78344447/get-the-distinct-names-an-account-has-prior-the-last-day-in-teradata-or-oracle-w