"【 Question 】 What’s the most efficient way to parse a large text file? I’m trying to optimize so .."

blackduckie RaqForum 28 No.
409 View • 5 Years ago

How to Parse a Large Text File Fast

text(125)

【Question】

What’s the most efficient way to parse a large text file?

I’m trying to optimize some code. I have to open a large text file, match each line against a regular expression, and then process the results.

I’ve tried the simple approaches:

for line in my_file:

match = my_regx.match(line)

process(match.groups() )

and

data = my_file.read().splitlines()

for line in data:

# etc.

Neither is terribly speedy. Does anyone have a better method?

【Answer】

The multithreaded parallel processing can make the matching faster. But the Python way of coding parallel processing is complicated. Moreover, it’s a headache to segment the file by bytes. SPL (Structured Process Language) can make all these much easier. It divides file1.txt into multiple segments, gives each thread a segment to match with the regular expression to get the eligible rows, and then concatenate them to export to a text file. Below is the SPL script:

	A
1	=file("D:\\file1.txt")
2	=A1.cursor@m(;4).(~.array().concat())
3	=A2.regex(".smile.")
4	=file("D:\\result.txt").export(A3)

The regular expression is versatile yet inefficient. If the matching rule is not complicated, we can use like function to speed up the process. For example:

	A
1	=file("D:\\file1.txt")
2	=A1.cursor@m(;4)
3	=A2.select(like(#1,"smile"))
4	=file("D:\\result.txt").export(A5)

esProc SPL is equipped with a rich library of functions to achieve various algorithms, including grouping & aggregation, ranking & sorting, associated operations, multi-file query and merge query, etc.

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/cFTcUNs7

Youtube 👉 https://www.youtube.com/@esProc_SPL

text(125)

Application

blackduckie • 409 View • 5 Years ago

How to Parse a Large Text File Fast

【Question】

ToC