"The Teradata database table has three fields: date, account, and the name of the product purchas .."

jinxing RaqForum 41 No.
79 View • 1 Months ago

From SQL to SPL: Count distinct within intervals in order

The Teradata database table has three fields: date, account, and the name of the product purchased by the account.

Date	ACCOUNT	NAME
2024-01-01	A	XOXO
2024-01-02	A	XOXO
2024-01-02	A	OXXO
2024-01-04	A	XOOX
2024-01-05	A	OOXO
2024-01-06	A	XOXO
2024-01-01	B	B11
2024-01-02	B	B21
2024-01-02	B	B21
2024-01-02	B	B22
2024-01-02	B	B11
2024-01-03	B	B31
2024-01-01	C	C1
2024-01-02	C	C1
2024-01-03	C	C1

Now we need to calculate how many types of products each account has purchased before each day, that is, the number of different products purchased from the beginning to the previous day.

Date	ACCOUNT	COUNT_DISTINCT_NAME
2024-01-01	A	Null
2024-01-02	A	1
2024-01-04	A	2
2024-01-05	A	3
2024-01-06	A	4
2024-01-01	B	Null
2024-01-02	B	1
2024-01-03	B	3
2024-01-01	C	null
2024-01-02	C	1
2024-01-03	C	1

Group by ACCOUNT, then by date, process each group in turn, get the interval from first group from to the current group, and perform deduplication and counting. But SQL grouping must be accompanied by aggregation, and cannot retain grouped subsets for further processing; At this point, it is necessary to change the mindset, set various markers to transition, and write multi-layer nested statements to repeatedly traverse the data table in order to calculate.

SPL allows grouping to retain grouped subsets for further processing. SPL's cross row references are also much simpler than window functions:

	A
1	=teraJDBC.query("select * from tb”)
2	=A1.group(ACCOUNT)
3	=A2.conj(~.group(Date;ACCOUNT,~[:-1].conj().icount(NAME):COUNT_DISTINCT_NAME))
4	=A3.run(if(#3==0,#3=null))

A1: Load data through JDBC.

A2: Group by account.

A3: Process each group of A2: Continue grouping by date, process daily data, get data of groups from the beginning to the previous day, union these groups, and calculate the quantity of different product names. Finally, union the processing results of each upper-level group. [: -1] represents the set from the first member to the previous member, and the function icount is used for unique count, i.e. count (distinct)

A4: Change the 0 in column 3 to null.

Question source：https://stackoverflow.com/questions/78344447/get-the-distinct-names-an-account-has-prior-the-last-day-in-teradata-or-oracle-w

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/2bkGwqTj

Youtube 👉 https://www.youtube.com/@esProc_SPL

Application