Take the top three of each group from Hive in one line

 

Hive uses window functions and subqueries to implement this, which is not easy, as follows:

SELECT *  
FROM  (
        SELECT g, ROW_NUMBER() OVER(PARTITION BY g
        ORDER BY x) group_id RN FROM A
)
WHERE group_id <= 3
ORDER BY g ASC, group_id ASC

 

In this scenario, it would be easier to use esProc SPL, with only one line:

=connect(”HiveDB”).query(“select * from A”).group(g;~.top(-3;x)).conj(#2)

*How to retrieve rows larger than the median in a group?

*What is the largest increase in the group?           

*Which groups have experienced continuous rise?

SPL is good at solving such grouped subsets and ordered calculation within groups, such as  《SPL Simplified SQL Case Details: Calculate the first N rows of each group》  ; please refer to《SPL Simplified SQL Case Details: Intra-group computing》 and 《SPL Simplified SQL Case Details: Interline Computing

 

When the data is not in the database, it is still convenient for SPL to perform complex calculations:

=file(“d:/t.csv”).import(;,",").group...

It's also easy to embed esProc into Java applicationsplease refer to How to Call an SPL Script in Java

For specific usage, please refer to  Getting started with esProc