How to find the same values that appear in each group

 

Key wordsGroup by week; Intersection of groups; Intergroup calculation

After grouping data, you need to find a field value that appears in each group.

Using SQL to solve this problem is not easy!

SQL is lack of pure grouping operation, and it can only return the statistical value of each group, but can't perform more detailed intersection, difference, union and other set operations on the members of each group. To solve this problem, we need multi-layer nested subquery to realize it in an indirect way.

For exampleCheck the employees who work overtime every week for recognition. The existing overtime data table overtime_work is as follows:

WORKDATE

NAME

2019-07-01

Tom

2019-07-02

John

2019-07-03

Tom

2019-07-04

Cart

2019-07-08

Tom

2019-07-09

Jordan

2019-07-10

Kate

2019-07-10

John

2019-07-15

Leon

2019-07-16

John

2019-07-17

Tom

2019-07-18

John

2019-07-22

Jim

2019-07-23

Tom

2019-07-24

John

2019-07-25

Cart

 

Take Oracle as an example. It is written in SQL as follows:

WITH A AS

       (SELECT COUNT(*) NUM FROM

        (SELECT DISTINCT TO_CHAR(WORKDATE,'IW') FROM OVERTIME_WORK )

        ),

B AS

        (SELECT TO_CHAR(WORKDATE,'IW') W, NAME

        FROM OVERTIME_WORK

        GROUP BY TO_CHAR(WORKDATE,'IW'), NAME

        ),

C AS

       (SELECT NAME, COUNT(*) NUM FROM B GROUP BY NAME )

SELECT NAME FROM C WHERE NUM=(SELECT NUM FROM A);

Here, A finds out how many weeks in total. B finds out different employees who work overtime each week. C finds out the number of times that each employee in B appears. Finally, select the employees whose number of appearing times is equal to the total number of weeks from C. The solution is rather convoluted. This kind of SQL is hard to write and understand.

 esProc SPL supports set operations, and it would be much simpler to solve this problem. You only need to calculate the intersection of employees who worked overtime each week, and only one line is needed:

connect("mydb").query("SELECT * FROM OVERTIME_WORK").group((WORKDATE-date("2019-07-01"))\7).(~.(NAME)).isect()

[question extension]           

*Query employees who work overtime at least twice each week.
*Query the customers whose consumption amount is within top 20 for every month.

SPL is good at solving such grouped subsets and ordered calculation within groups, please refer to Calculate the first N rows of each group, Intra-group Computing, Interline Computing and SPL Set

esProc SPL is a professional script language to solve difficult SQL problems. It has simple syntax, conforms to natural thinking, and is a natural step-by-step, procedure oriented computing language. It uses a unified syntax independent of database, and the algorithm can be seamlessly migrated between databases. A desktop computing tool, esProc is ready to use, with simple configuration and complete debugging functions: breakpoints and single step execution can be set, and the result of each step can be viewed. Please refer to: http://www.raqsoft.com/html/sql-enhancer.html

It's also easy to embed esProc into Java applicationsplease refer to How to Call an SPL Script in Java

For specific usage, please refer to  Getting started with esProc