Join Two CSV Files through the Key Field
We have two CSV files:
data1.csv:
id |
date |
amount |
1 |
2019/9/8 |
3000 |
1 |
2019/10/8 |
1500 |
2 |
2019/11/20 |
1560 |
2 |
2008/5/20 |
2060 |
data2.csv:
id |
name |
age |
location |
1 |
smith |
32 |
NY |
2 |
jack |
24 |
LA |
We want to join them through id field:
id |
date |
amount |
name |
age |
location |
1 |
2020/9/1 |
3000 |
smith |
32 |
NY |
1 |
2020/2/3 |
1500 |
smith |
32 |
NY |
2 |
2020/8/8 |
1560 |
jack |
25 |
LA |
2 |
2020/12/17 |
2060 |
jack |
25 |
LA |
It’s easy to do the join using esProc. Download esProc installation package and free license file HERE.
1. Write script joincsv.dfx in esProc:
A |
B |
|
1 |
=file("data1.csv").import@ct() |
/ Import data1.csv |
2 |
=file("data2.csv").import@ct().keys(id) |
/ Import data2.csv and set id field as the primary key |
3 |
=A1.join(id,A2,name,age,location) |
/ Join A1 with A2 through A1’s foreign key, the id field |
4 |
=file("data3.csv").export@ct(A3) |
/ Export the joining result to data3.csv |
2. Execute the esProc script to get the result CSV file data3.csv.
Usually data1.csv can be too large to fit into the memory, but data2.csv can be small enough to be wholly loaded into the memory. If that’s the case, we just need to modify A1 as =file("data1.csv").cursor@ct(), which uses cursor to do the loading work.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_Desktop/
SPL Learning Material 👉 https://c.scudata.com
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProcDesktop
Linkedin Group 👉 https://www.linkedin.com/groups/14419406/