"Hive uses window functions and subqueries to implement this, which is not easy, as follows: SELE .."

xingjl RaqForum 16 No.
481 View • 4 Years ago

Take the top three of each group from Hive in one line

SQL Headache(24) TopN(6) Hive(2)

Hive uses window functions and subqueries to implement this, which is not easy, as follows:

SELECT *
FROM (
        SELECT g, ROW_NUMBER() OVER(PARTITION BY g
        ORDER BY x) group_id RN FROM A
)
WHERE group_id <= 3
ORDER BY g ASC, group_id ASC

In this scenario, it would be easier to use esProc SPL, with only one line:

=connect(”HiveDB”).query(“select * from A”).group(g;~.top(-3;x)).conj(#2)

*How to retrieve rows larger than the median in a group?

*What is the largest increase in the group?

*Which groups have experienced continuous rise?

SPL is good at solving such grouped subsets and ordered calculation within groups, such as 《SPL Simplified SQL Case Details: Calculate the first N rows of each group》 ; please refer to《SPL Simplified SQL Case Details: Intra-group computing》 and 《SPL Simplified SQL Case Details: Interline Computing》

When the data is not in the database, it is still convenient for SPL to perform complex calculations:

=file(“d:/t.csv”).import(;,",").group...

It's also easy to embed esProc into Java applications，please refer to How to Call an SPL Script in Java

For specific usage, please refer to Getting started with esProc

SPL Official Website 👉 http://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc

SPL Learning Material 👉 http://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/ydhVnFH9

Youtube 👉 https://www.youtube.com/@esProc_SPL

SQL Headache(24) TopN(6) Hive(2)

Developer

xingjl • 481 View • 4 Years ago