Extract Data from a Specifically-patterned TXT File to Export to Excel
Problem description
Below is a text file (item.txt) storing quotation data:
Above the horizontal line there is the complex table header. Below it are the quotation records separated by empty rows. Each record occupies a row. The picture above shows only the table header and the quotation records. There are consecutive same areas in the text file. The areas enclosed in red box are Unit Price field and Exp. Date field with Quotation Number, Customer Code and Customer Name fields in between. Columns are separated by spaces.
We want to extract the quotation information from the file and store it to an Excel file in the following way:
Directions:
1. Examine the text file to find its data characteristics:
(1) Rows containing less than 136 characters don’t have valid information and thus can be skipped;
(2) The valid data settles in columns of 59~139 for every row;
(3) Split the valid values in each row by spaces. If the first split value is numeric type, then this is a quotation record; otherwise it can be skipped. The first split value is under Unit Price column, the second one under Quotation Number column, the third one under Customer Code, and the last under Contract Expiry Date column. Customer Name column is generated by joining the split values from the fourth to the second to last.
Each text file has its own specific data pattern and characteristics. But they must have some pattern to be parsed and analyzed.
2. Start esProc
Download esProc DSK edition and free license HERE. You will be prompted to load the license file when you run esProc for the first time
Write a script in esProc:
You see the code clearer in this way:
A |
B |
C |
|
1 |
=create(Customer_Code,Customer_Name,Quotation_No,Unit_Price,Contract_Expiry_Date) |
||
2 |
=file("E:/txt2excel/item.txt").read@n() |
||
3 |
for A2 |
if len(A3)<136 |
next |
4 |
=right(left(A3,136),-58) |
=B4.split@tp() |
|
5 |
if !ifnumber(C4(1)) |
next |
|
6 |
=C4.m(4:C4.len()-1).concat(" ") |
||
7 |
>A1.insert(0,C4(3),B6,C4(2),C4(1),C4(C4.len())) |
||
8 |
=file("E:/txt2excel/item.xlsx").xlsexport@t(A1) |
A1 Create the target data set.
A2 Open the quotation text file item.txt to read its content in; @n option enables reading each row as one string.
A3 Process each row circularly according to the above specific patterns.
B3C3 If the current row is shorter than 136 characters, just skip it;
B4 Extract values from column 59 to column 136 from the current row.
C4 Split B4’s extracted values by spaces; @t option enables trimming spaces on both sides, and @p option enables parsing each string after split into right data type.
B5C5 If the first split value in C4 isn’t numeric type, then skip the current row.
B6 Use spaces to join values from the fourth column to the second to last split by C4 into a string.
B7 Insert the 3rd value split by C4, the 2nd , the 2st and the last values split in B6and C4 into A1’s empty records in order.
A8 Export the extracted data table to Excel file item.xlsx.
3. Press F9 to execute the above program. Then you can view the newly-generated file in E:/txt2excel/item.xlsx.
【Reference】txt2excel.zip
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_Desktop/
SPL Learning Material 👉 https://c.scudata.com
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProcDesktop
Linkedin Group 👉 https://www.linkedin.com/groups/14419406/