"【 Question 】 I have a file in the following format: 45 a 36 a 58 a 21 a .."

blackduckie RaqForum 28 No.
463 View • 4 Years ago

Text File Grouping & Aggregation

text(125)

【Question】

I have a file in the following format:

45 a

36 a

58 a

21 a

31 b

12 b

23 b

21 c

32 c

45 c

…

I need to first count lines containing value a, output the result to a target file and write values of the first field in these lines after that. Then I go on to handle lines containing value b and then, lines containing value c, in the same way.

Expected result:

a 4

b 3

c 3

…

What I did is to traverse the file to count lines respectively containing a, b, and c, hold the results in a temporary file, and then re-traverse the file and read the temporary file to write the desired values to target file in the specified format. I am wondering if there’s any simple method to do this, like traversing the source file once only.

【Answer】

It’s simple grouping & aggregation. High-level languages lack class libraries that handle SQL-like calculations. So it’s very inconvenient to write the code directly based on a text file. It’s easy to handle such a calculation in SPL (Structured Process Language), however.

If your file can fit into the memory, we just need to traverse it once. Below is the SPL script:

	A	B
1	=file("E:\\source.txt")	=file("E:\\result.txt")
2	=A1.import().group(_2)
3	for A2	>B1.write@a(A3._2+"\t"+string(A3.len()))
4		>B1.export@a(A3,_1)

A1,B1: Open the to-be-handled file and the target export file.

A2: Import source.txt and group and sort records by the second column.

undefined

A3-B4: Loop through each member of A2’s sequence (which is A3’s current value) to count its records, generate a new record to write it to result.txt, and export values of A3’s first column to result.txt.

If the file is too large to be loaded into memory, you need a temporary file to hold the intermediate result. SPL has the built-in mechanism to auto-generate a temporary file. The script for handling a large file is similar:

	A	B
1	=file("E:\\source.txt")	=file("E:\\result.txt")
2	=A1.cursor().sortx(_2)
3	for A2;_2	>B1.write@a(A3._2+"\t"+string(A3.len()))
4		>B1.export@a(A3,_1)

A1,B1: Open the to-be-handled file and the target export file.

A2: Import source.txt as a cursor and sort its records by the second column. sortx() function will generate a temporary file to hold the sorting result.

A3-B4: Loop through A2’s sequence to read in records having same second column value at a time, generate a new record to write it to result.txt, and export values of A3’s first column to result.txt.

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/cFTcUNs7

Youtube 👉 https://www.youtube.com/@esProc_SPL

text(125)

Application

blackduckie • 463 View • 4 Years ago

Text File Grouping & Aggregation

【Question】

ToC