Take the top three of each group from Hive in one line
Hive uses window functions and subqueries to implement this, which is not easy, as follows:
SELECT g, ROW_NUMBER() OVER(PARTITION BY g
ORDER BY x) group_id RN FROM A
WHERE group_id <= 3
ORDER BY g ASC, group_id ASC
In this scenario, it would be easier to use esProc SPL, with only one line:
=connect(”HiveDB”).query(“select * from A”).group(g;~.top(-3;x)).conj(#2)
*How to retrieve rows larger than the median in a group?
*What is the largest increase in the group?
*Which groups have experienced continuous rise?
SPL is good at solving such grouped subsets and ordered calculation within groups, such as 《SPL Simplified SQL Case Details: Calculate the first N rows of each group》 ; please refer to《SPL Simplified SQL Case Details: Intra-group computing》 and 《SPL Simplified SQL Case Details: Interline Computing》
When the data is not in the database, it is still convenient for SPL to perform complex calculations:
It's also easy to embed esProc into Java applications，please refer to How to Call an SPL Script in Java
For specific usage, please refer to Getting started with esProc