Get Top N Dates from Each Group

Question

Table1 records stock codes, transaction dates and prices on block trades. Table2 stores historical records of the stocks.

Task: Get records of stock prices in 3 days on and after the date when a block trade happens.

Table1:

Code  Date1  Price1 

1     2014/3/12  5 

2     2014/3/13  2 

3     2014/3/13  3 

 

Table2:

Code  Date2     Price2 

1     2014/3/10  4 

1     2014/3/11  4.09 

1     2014/3/12  4.01 

1     2014/3/13  4.05 

1     2014/3/14  4.10 

1     2014/3/15  4.50 

1     2014/3/16  4.45 

2     2014/3/10  4 

2     2014/3/11  4.09 

2     2014/3/12  4.01 

2     2014/3/13  4.05 

2     2014/3/14  4.10 

2     2014/3/15  4.50 

2     2014/3/16  4.45 

3     2014/3/10  4 

3     2014/3/11  4.09 

3     2014/3/12  4.01 

3     2014/3/13  4.05 

3     2014/3/14  4.10 

3     2014/3/15  4.50 

3     2014/3/16  4.45 

3     2014/3/17  4.50

 

Desired result:

Code  Date1  Price1  Date2  Price2 

1     2014/3/12  5   2014/3/12  4.01 

1     2014/3/12  5   2014/3/13  4.05 

1     2014/3/12  5   2014/3/14  4.10 

2     2014/3/13  2   2014/3/13  4.05 

2     2014/3/13  2   2014/3/14  4.10 

2     2014/3/13  2   2014/3/15  4.50 

3     2014/3/13  3   2014/3/13  4.05 

3     2014/3/13  3   2014/3/14  4.10 

3     2014/3/13  3   2014/3/15  4.50

I can write the basic join. But I don’t know how to locate the required dates according to the date when a block trade happens. Since not every day is a transaction date, I need to get the records through record numbers rather than by dates. Is there anyone who can help me with this? Thanks a lot!

 

Answer

The intuitive logic is like this: Join the two tables and sort records by Code, Date1 and Date2 in the meantime; group records by Code and Date1; get 3 records where Date2 is equal to and greater than Date1from every group. SQL can’t retain the grouping result, and it’s hard to get consecutive records because the language doesn’t support ordered sets. If the database you use provides window functions, the process will be easier:

select* 

from(select ROW_NUMBER() OVER(PARTITION by t1.code,t1.date1  ORDER BY t2.date2) rn,  

t1.code, 

t1.date1, 

t1.price1, 

t2.date2, 

t2.price2 

 from  t1 

 join t2  on  t1.code = t2.code 

 where  t2.date2 >= t1.date1) t 

 where rn <= 3

Actually you can achieve it with SPL (Structured Process Language). The language has set data type and offers locating records by their positions, making it convenient to handle order-based operations with intuitive and uniform code. It also applies to databases without support of window functions.

A

1

$(db1)select t1.code Code,   t1.date1 Date1, t1.price1 Price1, t2.date2 Date2, t2.price2 Price2 from   t0070_1 t1 join t0070_2 t2 on t1.code = t2.code where t2.date2 >= t1.date1   order by t1.code,t1.date1,t2.date2

2

=A1.group(Code,Date1).(~.m([1,2,3]).select(~)).conj()