Combine Matching Records from Two Files
【Question】
I have a file that looks like this:
>Unc14086
AGAGUUUGAU
>Unc35443
GCACGAGAAA
So, every n (n may vary) lines the next line starts with “>”, that is the beginning of a new block of information.
I have another tab-delimited file:
Unc14086 InformationalTextExample
Unc35443 InformationalTextExampleII
My goal is to parse the second file with information found in lines starting with “>” in the first file. Whenever a matching pair occurs, I want to write “InformationalTextExample” in that line, possibly separated by “_”:
>Unc14086_InformationalTextExample
AGAGUUUGAU
>Unc35443_InformationalTextExampleII
GCACGAGAAA
How would that be possible?
Thank you!
【Answer】
A Perl solution is clear but long. esProc SPL’s loop functions will give you a concise solution. Here’s the SPL script:
A |
|
1 |
=file("one.txt").read@n() |
2 |
=file("another.txt").import() |
3 |
=A1.(if(left(~,1)!=">",~,A2.select@1(mid(A1.~,2)==_1).(">"+_1+"_"+_2))) |
See SQL Headaches Therapies - For Loop Operations to learn more use cases of esProc loop functions.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL