Group Records by a Date Sequence & Find Difference

Question

I have two tables: 1)users(id,registerdate) 2)user_answer(userid,answer,updated_date)

I want the count of zero usage per day. How many users are registering but not answering per day. Results will be like this:

Date   registedCount   notAnsweredCount

 

15-09-02   20   10

 

15-09-01   20   10

 

15-08-31   12   4

 

Data is like this for user table((1,‘15-09-01’),(2,‘15-09-01’),(3,‘15-09-01’)) and this for user answer table ((1,0,15-09-01)).. Here you can see three users are registered on the day of Sep. 01, 2015 but only one user answered one question. So, result will be (Date=>15-09-01, registedCount => 3, notAnsweredCount => 2)

 

Here’s an almost-correct solution:

SELECT date_range.aDay,

 

COUNT(DISTINCT users.id) AS registedCount,

 

SUM(IF(users.id IS NOT NULL AND user_answer.userid IS NULL, 1, 0)) AS notAnsweredCount

 

FROM

 

(

 

SELECT DATE_ADD('2015-09-01', INTERVAL units.aCnt + tens.aCnt * 10 DAY) AS aDay

 

FROM

 

(

 

SELECT 0 AS aCnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9

 

) units

 

CROSS JOIN

 

(

 

SELECT 0 AS aCnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9

 

) tens

 

) date_range

 

LEFT OUTER JOIN users

 

ON date_range.aDay = users.registerdate

 

LEFT OUTER JOIN user_answer

 

ON users.id = user_answer.userid

 

GROUP BY date_range.aDay

 

Answer

There are two key computations to get done: 1)group the source database tables by a specified date sequence, instead of one of its own fields; 2) find the difference between registered ids and userids that answer questions per day. The code is difficult to understand if we express them in SQL. Here I choose to phrase them in SPL. The Structured Process Language supports step-by-step coding, which is intuitive to understand:

Parameter settings:

undefined

SPL script:

 

A

1

$select id,registerdate from   users where registerdate>=? And registerdate<=?; argBegin,argEnd

2

$select userid,updated_date from   user_answer where updated_date>=? And updated_date<=?;   argBegin,argEnd

3

=periods(argBegin,argEnd)

4

=A1.align@a(A3,registerdate).(~.(id))

5

=A2.align@a(A3,updated_date).(~.(userid))

6

=A3.new(~:Date,A4(#).len():registedCount,(A4(#)\A5(#)).len():notAnsweredCount)

A1, A2: Retrieve data from the two source tables respectively with a SQL-like statement.

A3: periods() function generates a sequence of dates according to two parameters.

A4, A5: align() function groups A1 and A2 respectively by A3’s date sequence.

A6: Create a new table sequence consisting of the desired columns. The sign “\” means getting difference.