Key Associations between Multiple Files

Question

I want to match column 1 of file 1 with column 1 of file 2 and then column 2 of file 1 with column 1 of file 3 and then print the matches. The columns in the files are separated by tabs. For example:

file 1:

 

fji01dde AIDJFMGKG

 

dlp02sle VMCFIJGM

 

cmr03lsp CKEIFJ

 

file 2:

 

fji01dde 25  30

 

dlp02sle 40  50

 

cmr03lsp 60  70

 

file 3:

 

AIDJFMGKG

 

CKEIFJ

 

Output:

 

fji01dde AIDJFMGKG 25  30

 

cmr03lsp CKEIFJ 60  70

 

I only want lines that are common in all three files.

The below code works well for the first two files, but I need to incorporate the third file. Any ideas?

#!/usr/bin/env perl

 

use strict;

 

my (%file1,%file2);

 

## Open the 1st file

 

open(A,"file1");

 

while(<A>){

 

 chomp;

 

 ## Split the current line on tabs into the @F array.

 

 my  @F=split(/\t/);

 

 push @{$file1{$F[0]}},@F[1..$#F];

 

}

 

## Open the 2nd file

 

open(B,"file2");

 

while(<B>){

 

 chomp;

 

 ## Split the current line on tabs into the @F array.

 

 my  @F=split(/\t/);

 

 if (defined($file1{$F[0]})) {

 

 foreach  my $col (@{$file1{$F[0]}}) {

 

 print  "$F[0]\t$col\t@F[1..$#F]\n";

 

 }

 

 }

 

}

 

What would the output be if file3 did not have CKEIFJ

It would be just fji01dde AIDJFMGKG 25 30

 

A solution:

use strict;

 

use warnings;

 

use  Text::CSV_XS qw(csv);

 

my  @csv_files = @ARGV;

 

# Parse all the CSV files into arrays of arrays.

 

my $data1 = csv(in => $csv_files[0], sep_char => "\t" );

 

# Parse the other CSV files into hashes of rows keyed on the columns we're going to search on.

 

my $data2 = csv(in => $csv_files[1],

 

sep_char => "\t",

 

 headers => ["code", "num1", "num2"\],

 

key => "code"

 

 );

 

my $data3 = csv(in => $csv_files[2],

 

sep_char => "\t",

 

 headers => \["CODE"],

 

key => "CODE"

 

 );

 

for  my $row1 (@$data1) {

 

 my $row2 = $data2->{$row1->[0]};

 

 my $row3 = $data3->{$row1->[1]};

 

 if($row2 && $row3) {

 

 print join "\t", $row1->[0], $row1->[1], $row2->{num1}, $row2->{num2};

 

 print  "\n";

 

 }

 

}

 

Answer
This is a typical structured computation. The relationship between file1’s column1 and file2’s column1is a left join; and that between file1’s column2 and file3’s column1 is also a left join. A language that supports structured algorithm can describe the relationship clearly and perform the joins easily. Yet Perl lacks related functions. It is able to achieve this but the code is hard to understand.

Here we handle your question with SPL (Structured Process Language). The code is simple and easy, as shown below:

 

A

1

=file("D:/file1.txt").import()

2

=file("D:/file2.txt").import()

3

=file("D:/file3.txt").import()

4

=A1.switch(#1,A2:#1)

5

=A3.switch(#1,A1:#2)

6

=A5.new(#1.#1.#1, #1.#2,   #1.#1.#2, #1.#1.#3)

A4-A5: Create table associations.

A6: Retrieve data by the associated fields. #1.#1.#1 represents file3’s column1. It points to file1’s column1 and then file2’s column1.

undefined