1.1 Text file

 

1.1.1 Structured text file

The format of structured text is relatively regular, that is, there is one piece of data in each line, and the columns are separated by separators. SPL can use the function import/export to read and write the structured text.

For example: the file ordersNT.txt stores the order information, and the columns are separated by the tab. The business meanings of every column sequentially are: order ID, customer NO, sales ID, order amount and order date. Part of the data is as follows:

26 TAS 1 2142.4 2009-08-05  
33 DSGC 1 613.2 2009-08-14  
84 GC 1 88.5 2009-10-16  
133 HU 1 1419.8 2010-12-12  
32 JFS 3 468.0 2009-08-13  
39 NR 3 3016.0 2010-08-21  
43 KT 3 2169.0 2009-08-27  
…

The following is the partial result after the table is processed in a way that first sort the orders in ascending order alphabetically by customer NO, and then sort the orders with same customer NO in descending order by order amount, and finally save it to a new file in the original format:

136 ARO 25 899.0 2009-12-16  
16 BDR 27 2464.8 2009-07-23  
81 BDR 29 1168.0 2010-10-14  
108 BDR 12 480.0 2010-11-15  
139 BDR 30 166.0 2010-12-18  
93 BON 6 2564.4 2010-10-29  
106 BSF 27 10741.6 2009-11-13  
…

SPL script:

A
1 =file(“D:/data/ordersNT.txt”).import()
2 =A1.sort(_2,-_4)
3 =file(“D:/data/ordersNT_sort.txt”).export(A2)

A1, A3: Read in and write out structured text file.

A2: The function sort is to sort; _2 and _4 represent the 2nd and 4th columns respectively; By default, they are sorted in ascending direction, and the negative sign represents the descending direction.

SPL can also process the text file with column name (title). For example, the first line of orders.txt is the column name, and part of the data are as follows:

OrderID Client SellerId Amount OrderDate
26 TAS 1 2142.4 2009-08-05
33 DSGC 1 613.2 2009-08-14
84 GC 1 88.5 2009-10-16
133 HU 1 1419.8 2010-12-12
32 JFS 3 468.0 2009-08-13
39 NR 3 3016.0 2010-08-21
43 KT 3 2169.0 2009-08-27

Likewise, sort this file and write the result to a new file together with column name:

A
1 =file(“D:/data/orders.txt”).import@t()
2 =A1.sort(Client,-Amount)
3 =file(“D:/data/orders_sort.txt”).export@t(A2)

A1, A3: The option @t means the text file is read and written together with column name.

A2: Sort by column names rather than sequence number.

The default separator of the function import/export is tab, and the option @c means that the comma is used as the separator (usually used in csv files). If other special separators are encountered, SPL can also handle.

For example, the orders_semi.txt uses || as the separator:

A
1 =file(“D:/data/orders_semi.txt”).import@t(;,“”)
2 =A1.select(Amount>=1000 && Amount<2010)
3 =file(“D:/data/orders_semi_select.txt”).export@t(A2;“”)

The default function of the function export is to write the data to a new file, or to overwrite the file with the same name, but sometimes we need to append new data with the same structure to the original file, in this case, we can use the option @a:

=file("D:/data/orders\_semi\_select.txt").export@at(A2;"||")

1.1.2 String sequence

The format of some text files is not regular, and it cannot directly perform the structured calculation. Such files, however, can be read as a sequence of strings. The formats of such semi-structured data are numerous, let’s take multi-line data as an example to illustrate the general method of reading and writing the string sequence in SPL.

The first 2 lines of every 3 lines in file 3lines.txt correspond to one piece of data, and the 3rd line is useless. Part of the data is as follows:

26 TAS 1 2142.4  
2009-08-05  
some comment  
33 DSGC 1 613.2  
2009-08-14  
some comment  
27 TAS 1 2142.4  
2009-08-05  
some comment

Remove the useless line from the file and write the result to a new file:

A
1 =file(“D:/data/3lines.txt”).read@n()
2 =A1.step(3,1,2)
3 =file(“D:/3lines_reuslt.txt”).write(A5)

A1: Read the text file. @n means reading as a sequence by line, and each member of the sequence corresponds to a line.

A2: Take the first member and the second member for every three members of sequence A1.

A3: Write the sequence to a text file, and each member of the sequence corresponds to a line.