Get Specified Data from a Large File

Question

How to write PHP script for large data scraping.

 

Answer

To retrieve and process a large file in PHP, first you need to set file pointer with fseek function and then import a segment of data into the memory each time with fread function to process. The algorithm is not complicated but queries, such as multicondition query, grouping & aggregation and dynamic condition, are hard to be coded. For a large file, you often need to use the multithreading to speed up the computation, and, in the meanwhile, it’s difficult to write the code.

In this case, you can try SPL (Structured Process Language). The SPL script is more concise and executes fast. For example, to find female employees born after January 1, 1981 from a large file employee.txt, you can use the following SPL script:

A

1

=file("D:/employee.txt").cursor@t()

2

=A1.select(BIRTHDAY>=date(1981,1,1)    && GENDER=="F")

3

=A2.fetch()

If the query condition is indefinite, A2’s code can be A1.select(${where}) where a condition is passed through a parameter.

If the query result is expected to be a large one and can’t be held all in the memory, A3 can be file(“D:/result.txt”).export(A2), which outputs a result set directly into a file.

You can use multiple threads to increase performance. See Parallel Computing to learn more about parallel processing.