Get Top N Rows from Each Group
【Question】
Suppose I have a table, say ‘abc’, and it has two columns: product_id and category_id. There can be more than one product_id for one category_id. Now what I need is to select the first four rows from all the available category_id in that table…I was able to do this in MySQL server database using "row_number over partition by (….." but unfortunately this isn’t supported in MySQL…Could anyone help me with writing this query?
【Answer】
As you say we need window functions to get the first N rows from each group. But unfortunately MySQL doesn’t offer window functions, which makes it rather inconvenient to handle top N problems. If volume of the source data is relatively small, you can write the query in SPL (Structured Process Language). It’s simple. Below is the SPL script:
A |
|
1 |
=db.query(“select * from abc”) |
2 |
=A1.group(category_id) |
3 |
=A2.(~.top(4)) |
4 |
=A3.union() |
A1: Retrieve data from table abc using a SQL statement.
A2: Group rows by category_id.
A3: Get the first 4 rows from each group.
A4: Union rows in A3’s each group.
An SPL script can be integrated with a Java application via esProc JDBC, which is similar to calling a database result set. For more details and examples, see How to Call an SPL Script in Java, SPL Simplified SQL Case Details: Calculate the first N rows of each group.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL