Large Text File Processing
【Question】
I am new to Revolution R and trying to open a large CSV file of 13GB. It is a dataset from Kaggle competition. R is not able to open it, so I turned towards Revolution R enterprise. How can I read a CSV file on my system and convert it into XDF format and load it in Revolution R enterprise to run further analysis.
My file path is “C:\Users\admin\Desktop\Kaggle\dog_1_both_marked.csv”.
I tried something like this but got error.
sampleDataDir <- rxGetOption("Kaggle")
inputFile <- file.path("C:\\Users\\admin\\Desktop\\Kaggle\\dog\_1\_both\_marked.csv", "dog\_1\_both\_marked.csv")
outputFile <- file.path(tempdir(), "basicClaims.xdf")
rxTextToXdf(inFile = inputFile, outFile = outputFile, overwrite = TRUE)
rxGetInfo(data = outputFile, getVarInfo = TRUE, numRows = 100000)
file.remove(outputFile)
【Answer】
R is able to retrieve a large text file segment by segment and processes them with parallel processing. But the code is complicated and executes poorly because R is intended to perform mathematics and statistics operations. It isn’t good at handling structured processing. A better tool is SPL (Structured Process Language). Below lists structured computations with SPL:
1. Open a large text file with cursor
A |
|
1 |
=file("C:\Users\admin\Desktop\Kaggle\dog_1_both_marked.csv").cursor@t() |
2. Data query
A |
|
1 |
=file("C:\Users\admin\Desktop\Kaggle\dog_1_both_marked.csv").cursor@t() |
2 |
=A1.select(BIRTHDAY>=date(1981,1,1) && GENDER=="F") |
3. Grouping & aggregation
A |
|
1 |
=file("C:\Users\admin\Desktop\Kaggle\dog_1_both_marked.csv").cursor@t() |
2 |
=A1.groups(DEPT:dept;count(~):count,sum(SALARY):salary) |
4. Sorting
A |
|
1 |
=file("C:\Users\admin\Desktop\Kaggle\dog_1_both_marked.csv").cursor@t() |
2 |
=A1.sortx(BIRTHDAY) |
··· ···
For more examples, please refer to Text Files.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL