Join Two CSV Files through the Key Field

We have two CSV files:

data1.csv:

id

date

amount

1

2019/9/8

3000

1

2019/10/8

1500

2

2019/11/20

1560

2

2008/5/20

2060

data2.csv:

id

name

age

location

1

smith

32

NY

2

jack

24

LA

We want to join them through id field:

id

date

amount

name

age

location

1

2020/9/1

3000

smith

32

NY

1

2020/2/3

1500

smith

32

NY

2

2020/8/8

1560

jack

25

LA

2

2020/12/17

2060

jack

25

LA

It’s easy to do the join using esProc. Download esProc installation package and free license file HERE.

1. Write script joincsv.dfx in esProc:

A

B

1

=file("data1.csv").import@ct()

/ Import data1.csv

2

=file("data2.csv").import@ct().keys(id)

/ Import data2.csv and set id field as the primary key

3

=A1.join(id,A2,name,age,location)

/ Join A1 with A2 through A1’s   foreign key, the id field

4

=file("data3.csv").export@ct(A3)

/ Export the joining result   to data3.csv

2. Execute the esProc script to get the result CSV file data3.csv.

Usually data1.csv can be too large to fit into the memory, but data2.csv can be small enough to be wholly loaded into the memory. If that’s the case, we just need to modify A1 as =file("data1.csv").cursor@ct(), which uses cursor to do the loading work.