Multi-CSV Association
【Question】
I have two CSV files as follows:
CSV1:
**ID Name Address Ph**
1 Mr.C dsf 142
2 Ms.N asd 251
4 Mr.V fgg 014
12 Ms.S trw 547
CSV2:
**ID Name Service Day**
1 Mr.C AAA Mon
2 Ms.N AAA Mon
2 Ms.N BBB Tue
2 Ms.N AAA Sat
As you can see very quickly CSV1 file is unique in having only 1 instance of every ID whilst CSV2 has repeats.
I am trying to match two CSV files based on ID and then wherever they match adding to CSV2 file the Address and Ph fields from CSV1. This is then saved as a new output file leaving the two original CSV files intact.
I have written a code but here’s what’s happening:
1. Either all the entries from CSV1 get added against the last row of CSV2
2. Or all the entries from CSV2 get the same address details appended against them
Here’s what I have done so far:
import csv
csv1=open('C:\csv1file.csv')
csv2=open('C:\csv2file.csv')
csv1reader=csv.reader(csv1)
csv2reader=csv.reader(csv2)
outputfile=open('C:\mapped.csv', 'wb')
csvwriter=csv.writer(outputfile)
counter=0
header1=csv1reader.next()
header2=csv2reader.next()
csvwriter.writerow(header2+header1[2:4])
for row1 in csv1reader:
for row2 in csv2reader:
if row1[0]==row2[0]:
counter=counter+1
csvwriter.writerow(row2+row1[2:4])
I am running this code in Python 2.7. As you might have guessed the two different results that I am getting are based on the indentation of the CSV writer statement in the above code. I feel I am quite close to the answer and understand the logic but somehow the loop doesn’t work very well.
Is there any idea about this? Thanks.
Here’s a solution offered by someone else, but there’s a bug in it:
csvwriter.writerow(header2+header1[2:4])
csv2copy=[]
for row2 in csv2reader: csv2copy.append(row2)
for row1 in csv1reader:
for row2 in csv2copy:
print row1,row2,counter
if row1[0]==row2[0]:
counter=counter+1
csvwriter.writerow(row2+row1[2:4])
【Answer】
This is a typical left join. It’s roundabout to achieve it by loop in Python. But it’s simple and easy to understand to do it in SPL (Structured Process Language):
A |
|
1 |
=file("D:\\csv1.csv").import@t() |
2 |
=file("D:\\csv2.csv").import@t() |
3 |
>A2.switch(ID,A1:ID) |
4 |
=A2.new(ID.ID,Name,Service,Day,ID.Address,ID.Ph) |
5 |
=file("D:\\result.txt").export@t(A4) |
A3: Replace A2’s ID field by the corresponding records in A1 by matching it with A1’s ID field.
A4: Get the desired field from the new ID values to get the final result.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL