Get Max Value in a Specified Field from Each Group

Question

CREATE

 

TABLE DBO.TEST

 

(

 

ID INT, RECTYPE INT,SEQ INT, MAX0 INT,MAX1 INT,MAX2 INT)

 

INSERT

 

INTO dbo.TEST

 

SELECT

 

1,1,1,3,2,3

 

UNION

 

ALL

 

 

SELECT

 

1,2,2,3,2,3

 

UNION

 

ALL

 

 

SELECT

 

1,2,3,3,2,3

 

UNION

 

ALL

 

SELECT

 

1,1,2,3,2,3

 

--SELECT * FROM dbo.TEST

 

how

 

to find MAX seq for each ID and Rectype

 

My result should be

 

MAX0

 

is maximum of seq groupby ID

 

MAX1

 

is maximum of seq groupby ID where rectype=1

 

MAX2 id maximum

 

of seq groupby ID where rectype=2

 

ID Rectype SEQ MAX0 MAX1 MAX2

 

1 1  1  3  2  3

 

1 2  2  3  2  3

 

1 2  3  3  2  3

 

1 1  2  3  2  3

 

A solution:

 

CREATETABLE DBO.TEST

 

(

 

ID  INT,

 

RECTYPE INT,

 

SEQ INT,

 

MAX0 INT,

 

MAX1 INT,

 

MAX2 INT

 

)

 

INSERTINTO dbo.TEST

 

SELECT1,1,1,NULL,NULL,NULL

 

UNIONALL

 

SELECT1,2,2,NULL,NULL,NULL

 

UNIONALL

 

SELECT1,2,3,NULL,NULL,NULL

 

UNIONALL

 

SELECT1,1,2,NULL,NULL,NULL

 

--select * from test

 

;WITH mycte

 

AS(SELECT ID,

 

RECTYPE,

 

Max(seq)

 

OVER(partition BY ID)m0,

 

CASE

 

WHENrectype =1THEN Max(SEQ)OVER(PARTITION BY id,rectype)

 

ELSENULL

 

END         m1,

 

CASE

 

WHENrectype =2THEN Max(SEQ)OVER(PARTITION BY id, rectype)

 

ELSENULL

 

END         m2

 

FROM dbo.TEST)

 

SELECT ID,

 

RECTYPE,

 

M0,

 

Max(m1)OVER(partition BY ID)m1,

 

Max(m2)

 

OVER(partition BY ID)m2

 

FROM mycte

 

droptable test

 

Answer

This is a typical intra-group calculation. The algorithm is clear:

1. Group data by ID.

2. For each group get the largest SEQ value and assign it to MAX0.

3. For each group get the record where Rectype=1 and find the largest SEQ value to assign to MAX1.

4. For each group get the record where Rectype=2 and find the largest SEQ value to assign to MAX2.

Yet it’s difficult to express the algorithm in SQL. You have to write a set of nested multilevel associations using the window function. The code is unreadable and uncopiable. Here I phrase the algorithm in SPL. The Structured Process Language can express an intra-group calculation conveniently. Below is the SPL script:

A

1

=tbData.group(ID)

2

=A1.run(~.run(MAX0=A1.~.max(SEQ)))

3

=A1.run(~.run(MAX1=   A1.~.select(Rectype==1).max(SEQ)))

4

=A1. run(~.run(MAX2= A1.~.select(Rectype==2).max(SEQ)))

The symbol “~” represents a group and is similar to a loop variable. In fact we can combine A2, A3 and A4 into one step:

=A1.run(~.run(MAX0=A1.~.max(SEQ)),

 ~.run(MAX1=A1.~.select(Rectype==1).max(SEQ)),

 ~.run(MAX2=A1.~.select(Rectype==2).max(SEQ)) )

 

The script return a result in the form of ResultSet, which is integration-friendly with a Java program or a reporting tool. You can refer to SPL Simplified SQL Case Details: Intra-group Computing to learn more.