A Join between a Text File & JSON Text

Question

I have a tab-delimited text file A (representing a BLAST output):

Name1BBBBBBBBBBBB 99.40 166  1 0 1 166  334  499  3e-82 302

 

Name2DDDDDDDDDDDD 98.80 167  2 0 1 167  346  512  4e-81 298

And a text file B (representing a phylogenetic dendrogram) looking like:

{

 

 "member": {

 

 "Cluster A": "BBBBBBBBBBBB This is Animal A",

 

 },

 

 "name": "Cluster A"

 

 },

 

  {

 

 "member": {

 

 "Cluster B": "DDDDDDDDDDDD This is Animal B"

 

 },

 

 "name": "cluster B"

 

 }

 

I want to take the string found in the 2nd tab of text file A (DDDDDDDDDDDD for example) and look it up in text file B. The script should then add the info found in text file B into a new tab of text file A :

Name1BBBBBBBBBBBB 99.40 166  1 0 1 166  334  499  3e-82 302  Cluster A This is Animal A

 

Name2DDDDDDDDDDDD 98.80 167  2 0 1 167  346  512  4e-81 298  Cluster B This is A

 

Answer

If we can read in the two files as two tables, then a SQL join can handle your question. Unfortunately both Perl and Shell don’t offer functions for conversion of text files into tables. It’s rather complicated to hardcode the process. In this case I choose to solve your problem with esProc SPL (Structured Process Language):

 

A

1

=json(file("json.txt").read())

2

=A1.new(#1.name:name,#1.(#1):cluster,(firstblank=pos(cluster," "),left(cluster,firstblank-1)):key,right(cluster,len(cluster)-firstblank):value)

3

=file("file.txt").import()

4

=join(A3,_2;A2,key).new(_1._1,_1._2,_1._3,_1._4,_1._5,_1._6,_1._7,_1._8,_2.name,_2.value)

A1: Read in the JSON text as a table.

undefined

A2: Split each record in A1 according to the tab and generate a new table sequence:

undefined

A3: Read in file.txt as a table:

undefined

A4: Cross join between A2 and A3.

undefined