"【 Question 】 I need to extract all hits from one list (list.txt) which can be found in one of th .."

blackduckie RaqForum 28 No.
343 View • 4 Years ago

Get Rows Having Values That Included in a Column in Another Data Set

text(125)

【Question】

I need to extract all hits from one list (list.txt) which can be found in one of the columns of another (here in Data.txt) into a third (output.txt).

Data.txt (tab-delimited):

some\_data more\_data other\_data here yet\_more_data etc

A B 2 Gee;Whiz;Hello 13 12

A B 2 Gee;Whizz;Hi 56 32

E 4 Btm;Lol 16 2

T 3 Whizz 13 3

List.txt:

Gee

Whiz

Lol

Ideally output.txt looks like:

some\_data more\_data other\_data here yet\_more_data etc

A B 2 Gee;Whiz;Hello 13 12

A B 2 Gee;Whizz;Hi 56 32

E 4 Btm;Lol 16 2

So I tried a shell script:

for ids in List.txt

grep $ids Data.txt >> output.txt

done

except I typed out everything (cut and paste actually) in List.txt in said script.

Unfortunately it gave me an output.txt including the last line, I assume as ‘Whizz’ contains ‘Whiz’.

I also tried cat Data.txt | egrep -F “List.txt” and that resulted in grep: conflicting matchers specified -- I suppose that was too naive of me. The actual List.txt contains a sorted list of 985 words, and Data.txt has 115576 rows with 17 columns.

【Answer】

To get your task done, you can convert the here field in Data.txt into a set and then perform set operation over it and List.txt. But it’s a little complicated to handle set operations in Shell. Here I do this in SPL (Structured Process Language):

	A
1	=file("/Data.txt").import@t()
2	=file("/List.txt").read@n()
3	=A1.select(here.array(";")^A2!=[])

The “^” sign represents getting intersection; and “[]” represents an empty set.

SPL Official Website 👉 http://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc

SPL Learning Material 👉 http://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/ydhVnFH9

Youtube 👉 https://www.youtube.com/@esProc_SPL

text(125)

Developer

blackduckie • 343 View • 4 Years ago

Get Rows Having Values That Included in a Column in Another Data Set

【Question】

ToC