Text Replacement after Multi-file Association

Answer

I have a file with annotations in the following format:

XS-5236245.2_hypothetical_protein

And a tab delimited blast report with only the accession id in the second column:

transcript1 XS-5236245.2 94.3 35 0 245 356 789 896 1e-230 6.3

I want to replace the accession_id from the blast report with the whole line from the annotations file when there is a match. This is my attempt and as you can see I use very basic Python. If you give me a more complex solution I would appreciate some explanations. Thank you for your help.

Linu

#!/usr/bin/python``#import sys

 

#input1 = sys.argv[1] --> file with annoations``#input2 = sys.argv[2] --> file with blast report``#output = sys.argv[3] --> modified blast report with annotations

 

f1 = open(sys.argv[1],"r")``f2 = open(sys.argv[2],"r")``f3 = open(sys.argv[3],"w")

 

#open and read line by line:

for line in f1:

 # break line by '_'

splitline = line.split("_")

# define search_id as the first element of the line

searchid = splitline[0]

# open blast report and read line by line

 for row in f2:

# split columns by tab separator

col = row.split("\t")

# define target_id as the content of the second column

targetid = col[1]

# when target_id matches search_id replace content with the whole line

 if searchid == targetid:

f3.write(targetid.replace(searchid, splitline))

 else:

 pass

f1.close()

f2.close()

f3.close()

 

Answer

Perl gives weak support for structured data processing. So it’s hard to code your question in the high-level language. It’s much easier to get it done in SPL (Structured Process Language):

 

A

1

=file("annotations.txt").import().derive(left(_1,pos(_1,"_")-1):key)

2

=file("blastreport.txt").import()

3

>A2.switch(_2,A1:key)

4

=A2.new(_1,_2._1,_3,_4,_5,_6,_7,_8,_9,_10,_11)