Combine Matching Records from Two Files

 

Question
I have a file that looks like this:

>Unc14086

 

AGAGUUUGAU

 

>Unc35443

 

GCACGAGAAA

 

So, every n (n may vary) lines the next line starts with “>”, that is the beginning of a new block of information.

I have another tab-delimited file:

Unc14086  InformationalTextExample

 

Unc35443  InformationalTextExampleII

 

My goal is to parse the second file with information found in lines starting with “>” in the first file. Whenever a matching pair occurs, I want to write “InformationalTextExample” in that line, possibly separated by “_”:

>Unc14086_InformationalTextExample

 

AGAGUUUGAU

 

>Unc35443_InformationalTextExampleII

 

GCACGAGAAA

How would that be possible?

Thank you!

 

Answer

A Perl solution is clear but long. esProc SPL’s loop functions will give you a concise solution. Here’s the SPL script:

 

A

1

=file("one.txt").read@n()

2

=file("another.txt").import()

3

=A1.(if(left(~,1)!=">",~,A2.select@1(mid(A1.~,2)==_1).(">"+_1+"_"+_2)))

See SQL Headaches Therapies - For Loop Operations to learn more use cases of esProc loop functions.