SQL GROUP BY Dilemma

Key words: SQL Group By; SQL HAVING; SQL nested query

 

Question

The other day I looked through Stack Overflow and encountered a SQL question involving grouping operation. The requirement is simple: get devices whose all statuses are Ready. For a SQL beginner, it’s significant to write correct and appropriate queries. Let’s look at the case. Below is the source data:

undefined

 

Answer

The intuitive solution: group records by device and get groups where the Ready status count is the same as the all status count.

Step-by-step analysis:

1 Group records by device;

2 Count Ready status in each group (for each device) and record the result as c1;

3 Count all statuses in each group (for each device) and record the result as c2;

4 Get groups (devices) where c1 is the same as c2.

I “draw” the algorithm in the following way:

undefined

 

I tried achieving it in SQL:

SQL query 1:

select t1.device

from

(select device, count(status) as c1

from t

where status='Ready'

group by device) t1

join

(select device, count(status) as c2

from t

group by device) t2

on t1.device=t2.device

where t1.c1=t2.c2

The actual route it takes:

1 Group records by device and count Ready status for each device (c1);

2 Group records by device and count all statuses for each device (c2);

3 Join the two grouping result sets;

4 Find device in the joining result whose c1 is the same as c2.

 

SQL query 2:

select device

from t

group by device

having sum(case when status = 'Ready' then 1 else 0 end) = count(device);

 

This is closer to the natural way of thinking. But it takes a roundabout route to calculate the number of Ready status (c1) for each device:
1. There is a hidden computed field whose value is 1 when the status is Ready and 0 when it isn’t;

2. Perform sum over the computed filed to get the Ready status count.

 

Both SQL ways can lead to the correct result. But neither conforms to our natural way of thinking. It’s good to work out a brilliant SQL trick, but shouldn’t we think over why SQL express an intuitive and simple algorithm always in a roundabout and difficult way? Probably it is the absence of some basic concept that makes the popular yet ancient language really awkward.

 

SPL solution:

A

1

=connect("mysqlDB")

2

=A1.query(“select * from t”)

3

=A2.group(device)

4

=A3.select(~.select(status=="Ready").len()==~.len())

5

=A1.close()

A2: Retrieve data from the database table.

A3: Group records by device. The following is the grouping result. The SPL group() function divides records into 5 groups and doesn’t take any other action. We can see that there are 3 eligible records in the first group.

undefined 

In A4, The outermost select() function uses the sign ~ to represent the current device group. ~.select(status=="Ready").len()gets the Ready status count (c1) for the current device. ~.len()gets all statuses count (c2) for the current device. So A4’s code becomes A3.select(c1==c2), which is the step 4 in the intuitive solution that get the device where c1=c2.

The coding is intuitive, smooth, direct, clear and concise, without any “advanced, smart” workaround skill.

 

The SQL workarounds indicate that the language lacks support of pure grouping action. It can’t explicitly express a group and thus intuitive expression of operations over such a group becomes impossible.

In a nutshell, SQL has fundamental defects. They are non-stepwise orientation, incomplete set-orientation and unnecessarily complicated clause definition (HAVING). These result in heavy workload and low efficiency in writing, maintaining, debugging SQL queries.

The Structured Process Language (SPL) esProc is based thus abandons the relational algebra theory on which SQL is created. It innovatively uses the discrete data set theory to enable an easy, intuitive and smooth algorithm description. More analyses and examples about SQL headaches can be found in SPL: Solution to SQL Dilemmas.

 

Want to have a try?

Download esProc