8.2 Primary key association
There are seven tables: store information table, order table, order detail table, payment collection table, store online rating table, store offline rating table, and store assessment score table. Perform the following calculations:
Store information table
storeid | state | … |
---|---|---|
s101 | California | … |
s102 | New York | … |
s103 | Florida | … |
… | … | … |
Order table
storeid | orderid | client | location | … |
---|---|---|---|---|
s101 | o10001 | c104231 | New York | … |
s101 | o10002 | c107032 | New York | … |
s101 | o10003 | c108875 | New York | … |
… | … | … | … | … |
Order detail table
storeid | orderid | productid | price | … |
---|---|---|---|---|
s101 | o10001 | p1078 | 1273 | … |
s101 | o10002 | p1091 | 131 | … |
s101 | o10002 | p1008 | 780 | … |
… | … | … | … | … |
Payment collection table
storeid | orderid | term | payment | … |
---|---|---|---|---|
s101 | o10001 | 1 | 318.25 | … |
s101 | o10001 | 2 | 318.25 | … |
s101 | o10001 | 3 | 318.25 | … |
… | … | … | … | … |
Store online rating table
storeid | online_score | … |
---|---|---|
s101 | 4 | … |
s102 | 1 | … |
s103 | 5 | … |
… | … | … |
Store offline rating table
storeid | offline_score | … |
---|---|---|
s101 | 10 | … |
s102 | 2 | … |
s103 | 2 | … |
… | … | … |
Store assessment score table
storeid | test_score | … |
---|---|---|
s101 | 4 | … |
s102 | 10 | … |
s103 | 8 | … |
… | … | … |
1. Homo-dimension association - calculate the score given by client for a store (online score*0.4 + offline score*0.6)
2. Multiple homo-dimension tables - calculate the total score for a store (store assessment score*0.6+online score*0.1+offline score*0.3)
3. Primary-sub tables - query the total consumption of clients of each state in the store s150
4. Multi-layer primary-sub tables – aggregate the total sales of stores in each state
5. One primary table and multiple sub tables - find the order whose payments are not fully collected
SPL
A | B | |
---|---|---|
1 | =file(“online_score.csv”).import@tc() | |
2 | =file(“offline_score.csv”).import@tc() | |
3 | =join@f(A1:online,storeid;A2:offline,storeid) | /Homo-dimension association |
4 | =A3.new(if(online,online.storeid,offline.storeid):storeid,online.online_score:online_score,offline.offline_score:offline_score,0.4*online_score+0.6*offline_score:total) | |
5 | =file(“test_score.csv”).import@tc().keys(storeid) | |
6 | =join@f(A1:online,storeid;A2:offline,storeid;A5:test) | /Multiple homo-dimension tables |
7 | =A6.new([online.storeid,offline.storeid,test.storeid].max():storeid,online.online_score:online_score,offline.offline_score:offline_score,test.test_score:test_score,0.6*test_score+0.1*online_score+0.3*offline_score:total) | |
8 | =file(“Detail.csv”).import@tc() | |
9 | =file(“Orders.csv”).import@tc() | |
10 | =A8.select(storeid==“s150”) | |
11 | =A9.select(storeid==“s150”) | |
12 | =A10.groups(orderid;sum(price):sprice) | |
13 | =join(A11:order,orderid;A12:consume,orderid) | /Primary-sub association |
14 | =A13.groups(order.location;sum(consume.sprice):consume) | |
15 | =file(“Store.csv”).import@tc() | |
16 | =A8.groups(storeid,orderid;sum(price):sprice) | |
17 | =join(A9:order,[storeid,orderid];A16:detail_sum) | /Multi-layer primary-sub tables |
18 | =A17.groups(order.storeid;sum(detail_sum.sprice):sales) | |
19 | =join(A15:store,storeid;A18:store_amount) | |
20 | =A19.groups(store.state;sum(store_amount.sales):sales) | |
21 | =file(“Payment.csv”).import@tc() | |
22 | =A21.groups(storeid,orderid;sum(payment):payment) | |
23 | =join(A9:order,[storeid,orderid];A16:sprice;A22:payment) | /One primary table & multiple sub tables |
24 | 1e-6 | |
25 | =A23.select(sprice.sprice-payment.payment>A24).(order) |
SQL
1. Homo-dimension association
SELECT COALESCE(o.storeid, f.storeid) AS storeid,
(COALESCE(o.online_score, 0)*0.4 + COALESCE(f.offline_score, 0)*0.6) AS customer_score
FROM online_score o
FULL JOIN offline_score f ON o.storeid = f.storeid;
2. Multiple homo-dimension tables
SELECT COALESCE(t.storeid, o.storeid, f.storeid) AS storeid,
(COALESCE(t.test_score, 0)*0.6 + COALESCE(o.online_score, 0)*0.1 +
COALESCE(f.offline_score, 0)*0.3) AS total_score
FROM test_score t
FULL JOIN online_score o ON t.storeid = o.storeid
FULL JOIN offline_score f ON t.storeid = f.storeid;
3. Primary-sub tables
SELECT o.location, SUM(d.price) AS total_consumption
FROM Orders o
JOIN Detail d ON o.orderid = d.orderid AND o.storeid = d.storeid
WHERE o.storeid = 's150'
GROUP BY o.location;
4. Multi-layer primary-sub tables
SELECT s.state, SUM(d.price) AS total_sales
FROM Store s
JOIN Orders o ON s.storeid = o.storeid
JOIN Detail d ON o.orderid = d.orderid AND o.storeid = d.storeid
GROUP BY s.state;
5. One primary table and multiple sub tables
SELECT o.storeid,o.orderid, p. total_payment,d. total_price
FROM Orders o
JOIN (
SELECT p.storeid, p.orderid, SUM(p.payment) AS total_payment
FROM Payment p
GROUP BY p.storeid, p.orderid) p
ON o.storeid = p.storeid AND o.orderid = p.orderid
JOIN (
SELECT d.storeid, d.orderid, SUM(d.price) AS total_price
FROM Detail d
GROUP BY d.storeid, d.orderid) d
ON o.storeid = d.storeid AND o.orderid = d.orderid
WHERE d.total_price-p.total_payment>0.000001
ORDER BY o.storeid,o.orderid;
Python
#Homo-dimensionassociation
onscore=pd.read_csv("../online_score.csv",index_col='storeid')
ofscore=pd.read_csv("../offline_score.csv",index_col='storeid')
scores=pd.merge(onscore,ofscore,on="storeid",how="outer").fillna(0)
scores['total']=0.6*scores['offline_score']+0.4*scores['online_score']
#Multiplehomo-dimensiontables
tscore=pd.read_csv('../test_score.csv',index_col='storeid')
store_score=pd.concat([onscore,ofscore,tscore],axis=1,join='outer',sort=True).fillna(0).reset_index()
store_score['total']=0.6*store_score.test_score+0.3*store_score.offline_score+0.1*store_score.online_score
#Primary-subtables
detail=pd.read_csv("../Detail.csv")
order=pd.read_csv("../Orders.csv")
detail150=detail.query("storeid=='s150'")
order150=order.query("storeid=='s150'")
order_detail=pd.merge(order150,detail150,on="orderid",how="inner")
loc_consume=order_detail.groupby("location").price.sum()
#Multi-layerprimary-subtables
store=pd.read_csv("../Store.csv")
detail_sum=detail.groupby(['storeid','orderid']).price.sum()
order_detai=pd.merge(order,detail_sum,on=['storeid','orderid'])
store_amount=order_detai.groupby('storeid',as_index=False).price.sum()
store_inf=pd.merge(store,store_amount,on=["storeid"])
state_sale=store_inf.groupby('state',as_index=False).price.sum()
#Oneprimarytableandmultiplesubtables
pay=pd.read_csv("../Payment.csv")
detail_order=detail.groupby(['storeid','orderid']).price.sum()
pay_order=pay.groupby(['storeid','orderid']).payment.sum()
order=order.set_index(['storeid','orderid'])
order_d_p=order.join([pay_order,detail_order])
nopay_order=order_d_p.query('price-payment>1e-6')
8.3 Mixed association
Example codes for comparing SPL, SQL, and Python
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL