Get the Record Containing Top 1 from Each Group

Question

Hello All,

I hope you can help me on this. In my dataset I have siblings and their addresses. I am trying to group all siblings by address, so I know all the siblings who reside at an address. Now in this group I only want to pick the youngest sibling and get that information. I am trying to do this in EG, I have the dataset sorted by mailing address, but I would like to know is there a way where I can get the youngest sibling in the group. (Something like select top 1)

Here is my dataset:

Address   name   age

 

111 street1  child1  5

 

111 street1  child2  10

 

In this group I only want to output info for child1.

A solution:

proc sql noprint;

 

 create table want as

 

select *

 

from have

 

group by Address

 

having age eq min(age);

 

Answer

The above SQL solution is incorrect, actually. min function gets the smallest value, not the record containing that smallest value. Besides, your desired query also involves group operation and associated operation, which makes the coding even more complicated. An alternative is using keep/ top/row_number, or the window function. But they are not convenient, too.

SPL handles this much more easily. The Structured Process Language’s top function gets the record having the largest value. For example, salary.top(3;amount) gets the three records holding the smallest amounts, and salary.top(-1;amount) gets the record with the largest amount value. The following SPL script meets your requirements with a one-liner:

A

1

=T1.group(Address).(~.top(1;age)).union()

Brief code explanation:

T1: T1 table.

.group(Address): Group T1 by Address.

.(~.top(1;age)) Get the record holding the smallest age value from each group.

.union() Union all groups.

esProc is intended to processing structured data and integration-friendly with a Java program or a reporting tool. You can refer to How to Call an SPL Script in Java to learn more.