Take the top three of each group from Hive in one line


Hive uses window functions and subqueries to implement this, which is not easy, as follows:

        ORDER BY x) group_id RN FROM A
WHERE group_id <= 3
ORDER BY g ASC, group_id ASC


In this scenario, it would be easier to use esProc SPL, with only one line:

=connect(”HiveDB”).query(“select * from A”).group(g;~.top(-3;x)).conj(#2)

*How to retrieve rows larger than the median in a group?

*What is the largest increase in the group?           

*Which groups have experienced continuous rise?

SPL is good at solving such grouped subsets and ordered calculation within groups, such as  《SPL Simplified SQL Case Details: Calculate the first N rows of each group》  ; please refer to《SPL Simplified SQL Case Details: Intra-group computing》 and 《SPL Simplified SQL Case Details: Interline Computing


When the data is not in the database, it is still convenient for SPL to perform complex calculations:


It's also easy to embed esProc into Java applicationsplease refer to How to Call an SPL Script in Java

For specific usage, please refer to  Getting started with esProc