SPL read and write files

This article introduces the basic methods of SPL to read and write text files and Excel files, including simple file directory operations.

Text file

Read and write a structured text

The format of the structured text is relatively standardized, that is, one record per line and the columns are separated by separators. SPL can read and write a structured text with import/export functions.

For example, ordersNT.txt stores order records, and the columns are separated by tabs. The business meanings are order ID, customer number, sales ID, order amount, and order date. Part of the data is as follows:

26    TAS  1       2142.4     2009-08-05

33    DSGC        1       613.2       2009-08-14

84    GC    1       88.5 2009-10-16

133  HU   1       1419.8     2010-12-12

32    JFS   3       468.0       2009-08-13

39    NR   3       3016.0     2010-08-21

43    KT    3       2169.0     2009-08-27

Sort the table from small to large in alphabetical order of the customer number, and then sort the same customer number from large to small according to the order amount, and finally save the original format into a new file. Part of the results should be as follows:

136  ARO 25    899.0       2009-12-16

16    BDR 27    2464.8     2009-07-23

81    BDR 29    1168.0     2010-10-14

108  BDR 12    480.0       2010-11-15

139  BDR 30    166.0       2010-12-18

93    BON 6       2564.4     2010-10-29

106  BSF  27    10741.6   2009-11-13

To calculate the above results, the following SPL script can be used:


A

1

=file("D:/data/ordersNT.txt").import()

2

=A1.sort(_2,-_4)

3

=file("D:/data/ordersNT_sort.txt").export(A2)

A1, A3: Read in and write out structured text files.

A2: The function sort performs sorting. _2 and _4 represent the second and fourth columns, respectively. The default sorting is in order, and the negative sign means reverse order. 

 

SPL can also handle text files with column names (titles). For example, in the first row of orders.txt, the column names, some of the data are as follows:

OrderID   Client       SellerId    Amount   OrderDate

26    TAS  1       2142.4     2009-08-05

33    DSGC        1       613.2       2009-08-14

84    GC    1       88.5 2009-10-16

133  HU   1       1419.8     2010-12-12

32    JFS   3       468.0       2009-08-13

39    NR   3       3016.0     2010-08-21

43    KT    3       2169.0     2009-08-27

Sort the file in the same way, and write a new file with column names as a result. The following SPL script can be used:


A

1

=file("D:/data/orders.txt").import@t()

2

=A1.sort(Client,-Amount)

3

=file("D:/data/orders_sort.txt").export@t(A2)

 

A1, A3: Option @t means to read in and write out text files with column names.

A2: Sort by column name instead of the sequence number.

 

The default separator of the import/export function is tab, and the option @c means to use a comma as the separator (usually used in csv files). If you encounter other special separators, SPL can also handle them. For example, orders_semi.txt uses || as the separator, and the following SPL script can be used:


A

1

=file("D:/data/orders_semi.txt").import@t(;,"||")

2

=A1.select(Amount>=1000 &&   Amount<2010)

3

=file(""D:/data/orders_semi_select.txt").export@t(A2;"||")

 

The export function's default function is to write data to a new file or overwrite a file with the same name, but sometimes we need to append new data with the same structure to the original file. In this case, use the @a option:

=file(""D:/data/orders_semi_select.txt").export@at(A2;"||")

 

Read and write text sequence

Some texts are not standardized and cannot be directly structured, but they can be read as a sequence of texts. There are many such semi-structured data formats. The following uses a multi-line record to illustrate the general method of SPL to read and write text sequences.

 

The first two lines of every three lines in the file 3lines.txt correspond to a record, and the third line is useless. Part of the data is as follows:

26    TAS  1       2142.4

2009-08-05

some comment

33    DSGC        1       613.2

2009-08-14

some comment

27    TAS  1       2142.4

2009-08-05

some comment

Remove the useless lines in the file, and write the result into a new file. The following SPL script can be used:


A

1

=file("D:/data/3lines.txt").read@n()

2

=A1.step(3,1,2)

3

=file("D:/3lines_reuslt.txt").write(A5)

 

A1: Read in a text file. @n means to read a sequence by line, and each member of the sequence corresponds to a line.

A2: Take the first member and the second member every three members of the sequence A1.

A3: Write the sequence into a text file. Each member of the sequence corresponds to one line.

 

Excel file

Read and write structured tables

The structured Excel table is more standardized, and the SPL can be read and written with the xlsimport/xlsexport function.

For example, the business meaning of each column of the first sheet in ordersNT.xlsx is order ID, customer number, sales ID, order amount, and order date. Part of the data is as follows:

 


A

B

C

D

E

1

26

TAS

1

2142.4

2009/8/5

2

33

DSGC

1

613.2

2009/8/14

3

84

GC

1

88.5

2009/10/16

4

133

HU

1

1419.8

2010/12/12

5

32

JFS

3

468

2009/8/13

6

39

NR

3

3016

2010/8/21

7

43

KT

3

2169

2009/8/27

 

Sort the table from small to large in alphabetical order of the customer number, and then sort the same customer number from large to small according to the order amount, and finally save the original format to the new Excel.

 

To calculate the above results, the following SPL script can be used:


A

1

=file("D:/data/ordersNT.xlsx").xlsimport()

2

=A1.sort(_2,-_4)

3

=file("D:/data/ordersNT_sort.xlsx").xlsexport(A2)

 

A1, A3: Read and write the first sheet of excel. If you want to read the specified sheet, you can use xlsimport (;sheet number or sheet name). If you want to write into the specified sheet, you can use xlsexport (A2; sheet number or sheet name)

 

SPL can also handle structured tables with column names (titles) in a similar way to text files. For example, some data of orders.xlsx is as follows:


A

B

C

D

E

1

OrderID

Client

SellerId

Amount

OrderDate

2

26

TAS

1

2142.4

2009/8/5

3

33

DSGC

1

613.2

2009/8/14

4

84

GC

1

88.5

2009/10/16

5

133

HU

1

1419.8

2010/12/12

6

32

JFS

3

468

2009/8/13

7

39

NR

3

3016

2010/8/21

8

43

KT

3

2169

2009/8/27

Sort the file, and write the result into a new file with column names. You can use the following SPL script:


A

1

=file("D:/data/orders.xlsx").xlsimport@t()

2

=A1.sort(Client,-Amount)

3

=file("D:/data/orders_sort.xlsx").xlsexport@t(A2)

Sometimes the first few rows of the table are useless and need to be skipped, such as the following data:


A

B

C

D

E

1

orders from 2009 to 2010

2

3






4

OrderID

Client

SellerId

Amount

OrderDate

5

26

TAS

1

2142.4

2009/8/5

6

33

DSGC

1

613.2

2009/8/14

7

84

GC

1

88.5

2009/10/16

8

133

HU

1

1419.8

2010/12/12

9

32

JFS

3

468

2009/8/13

 

The following SPL script can be used to read from line 4:

=file("D:/data/orders.xlsx").xlsimport@t(;,4)

 

Sometimes we need to append new data with the same structure to the original table. In this case, we only need to use the @a option, like:

=file(""D:/data/orders_sort.xlsx").xlsexport@at(A2)

 

If the appearance attribute is set for the last row of the original table with content, the additional data will inherit the style of the row. For example, the style of column D of the original table is #,##0.00, and the style of column E is mmm-dd-yyyy. The following table:


A

B

C

D

E

1

OrderID

Client

SellerId

Amount

OrderDate

2

26

TAS

1

2,142.40

Aug-05-2009

3

33

DSGC

1

613.20

Aug-14-2009

4

84

GC

1

88.50

Oct-16-2009

After adding data, the results are as follows:


A

B

C

D

E

1

OrderID

Client

SellerId

Amount

OrderDate

2

26

TAS

1

2,142.40

Aug-05-2009

3

33

DSGC

1

613.20

Aug-14-2009

4

84

GC

1

88.50

Oct-16-2009

5

133

HU

1

1,419.80

Dec-12-2010

6

32

JFS

3

468.00

Aug-13-2009

If the style attribute is set in the first blank row after the last row with content in the original table, the additional data will inherit the style attribute of the row. Using this feature, we can output data from scratch in a specified format. For example, create a blank Excel first, set the style of column D in row 2 to #,##0.00, and the style of column E to mmm-dd-yyyy.

 


A

B

C

D

E

1

OrderID

Client

SellerId

Amount

OrderDate

2






Then append data to the blank table, and the results are as follows:


A

B

C

D

E

1

OrderID

Client

SellerId

Amount

OrderDate

2

133

HU

1

1,419.80

Dec-12-2010

3

32

JFS

3

468.00

Aug-13-2009

 

Read and write two-dimensional text sequence

Some Excel formats are not standardized enough and have no clear column attributes. In this case, they can be treated as a two-dimensional string sequence.

For example, the following key-value data:


A

B

C

D

E

1



A=123

B=456

C=789

2

A=678

B=783

A=900

U=89


3

A=330

Y=67

B=890

C=311

F=19

Now we need to split the above data into two columns by key and value, sort them by key and value, and finally write them into a new Excel. The results are as follows:


A

B

1

A

123

2

A

330

3

A

678

4

A

900

5

B

456

6

B

783

7

B

890

8

C

311

9

C

789

10

F

19

11

U

89

12

Y

67

To calculate the above results, the following SPL script can be used:


A

1

=file("D:/data/keyvalue.xlsx").xlsimport@w()

2

=A1.conj().select(~)

3

=A2.(~.split("="))

4

=A3.sort(~(1),~(2))

5

=file("D:/data/keyvalue_result.xlsx").xlsexport@w(A4)

 

A1: Read into Excel, @w means read as a two-dimensional text sequence, the whole is a large sequence, each row is a member of the large sequence, but also a small sequence; each cell in the row is a member of the small sequence.

A2: Combine the two-dimensional sequence into a one-dimensional sequence, and remove possible blank cells, such as A1, B1, and E2.

A3: Split the text sequence into key and value.

A4: Sort by key and value.

A5: Write the result into a new Excel. @w represents the sequence of the written sequence.

 

Read and write cells

The previous content is to read and write Excel in units of tables or sequences, and sometimes we also read and write Excel in units of fine cells.

For example, the following table has the editor and edit date in line 1.


A

B

C

D

E

1

editor:emily




date:Dec-30-2011

2

OrderID

Client

SellerId

Amount

OrderDate

3

26

TAS

1

2,142.40

Aug-05-2009

4

33

DSGC

1

613.20

Aug-14-2009

5

84

GC

1

88.50

Oct-16-2009

6






7






Copy the editor and edit the date to the corresponding position on line 7. The result should be as follows.


A

B

C

D

E

1

editor:emily




date:Dec-30-2011

2

OrderID

Client

SellerId

Amount

OrderDate

3

26

TAS

1

2,142.40

Aug-05-2009

4

33

DSGC

1

613.20

Aug-14-2009

5

84

GC

1

88.50

Oct-16-2009

6






7

editor:emily




date:Dec-30-2011

To calculate the above results, the following SPL script can be used:


A

B

1

=file("D:/data/cell.xlsx")


2

=A1.xlsopen()


3

=str=A2.xlscell("A1")

=A2.xlscell("A7";str)

4

=str=A2.xlscell("E1")

=A2.xlscell("E7";str)

5

=A1.xlswrite(A2)


 

A2: Open the Excel file as an object.

A3: Read the A1 cell and assign the variable str. The default is to read from the first sheet. If you want to read the A1 cell in the specified sheet, you can use A2.xlscell("A1", sheet serial number or sheet name)

B3: Write cell A1 into cell A7. Similarly, if you want to write cell A7 in the specified sheet, you can use A2.xlscell("A7", sheet serial number or sheet name; str)

A4-B4: Read E1 and write to E7

A5: Write the result into the Excel file.

In the above example, the A1-E1 that needs to be read are continuous cells, and the A7-E7 that needs to be written is also continuous cells. For this continuous cell reading and writing, SPL can be implemented with a more simplified code. For example, the code in the above example can be changed to:


A

1

=file("cell.xlsx")

2

=A1.xlsopen()

3

=arry=A2.xlscell@w("A1":"E1")

4

=A2.xlscell("A7":"E7";arry)

5

=A1.xlswrite(A2)

A3: Read consecutive cells in sequence format

A4: Write a sequence to consecutive cells. Each member of the sequence corresponds to a cell. You can either use a sequence to write data to consecutive cells or use a string separated by \t or \r, where \t means horizontal(columnar) separation, and \r means vertical (cross-line) separation.

 

Multi-sheet processing

Using Excel objects, not only can read and write cells but also can process multiple sheets, as illustrated below.

An Excel uses multiple sheets to store order. Each sheet has the same format, but the number and name are variable. Now we need to sort these orders into the new Excel, and each sheet will store one year's data.

To calculate the above results, the following SPL script can be used:


A

1

=file("orders_sheet.xlsx").xlsopen()

2

=A1.(stname).(A1.xlsimport@t(;~)).conj()

3

=A2.group(string(year(OrderDate)):name;~:content)

4

=file("orders_result.xlsx").xlsopen@w()

5

=A3.(A4.xlsexport@t(~.content;string(~.name)))

6

=A4.xlsclose()

 

A1: Open the source Excel file as an object.

A2: Traverse each sheet, read the order of each sheet, and then merge all the orders. A1. (stname) means to take out all sheet names from the Excel object A1.

A3: Group the orders by year.

A4: Open the target Excel file as an object. @w means write mode, and a new file will be created if the file does not exist.

A5: Traverse each group (yearly) orders of A3 and write them into the new sheet of A4 in turn.

A6: The Excel object opened in @w mode must be closed with the function xlsclose.

 

Files and directories

Parse file name

The function filename can parse out the different parts of the file name:


A

B

1

=filename("D://file/test.dfx")

Output file name with extension: test.dfx

2

=filename@e("D://file/test.dfx")

Output extension: dfx

3

=filename@n("D://file/test.dfx")

Output file name without extension: test

4

=filename@d("D://file/test.dfx")

Output path: D://file

For each part of a known file name, you can use the concat function to spell out the full path. such as:


A

B

1

=concat("D://file/","test",".dfx")

Output full path: D://file/test.dfx

 

Iterate through the files

There are many Excel files in a certain directory. The first sheet of these files stores order data and has the same structure. Now you want to merge these orders into a new Excel.

To calculate the above results, the following SPL script can be used:


A

1

=direcotory@p("d:/data/*.xlsx")

2

=A1.conj(file(~).xlsimport@t())

3

=file("d:/result.xlsx").xlsexport@t(A2)

 

A1: Search for all file names with the extension xlsx in the directory. @p means to return the full path.

A2: Read in Excel circularly according to the file name, and then merge the data.

The function directory has more functions, such as using the @s option to search subdirectories recursively, using the @d option to list subdirectories, using @r to delete directories, and @m to create directories.

 

System directory

In the previous example, we used the full path to access the data file. If the main path in the system directory is configured, the main path can be used as the root directory, and the relative path is used to access the data file. The specific configuration interface is shown in the figure below:

undefined

For example, when the main path is not configured, the script is: file("d:/data/p/orders.xlsx").xlsimport()

 

Configure main path=d:\data, the script can be written as:file("p/orders.xlsx").xlsimport()

 

If the main path is not configured and the relative path is used directly, the actual main path is the directory where esProc is started. When esProc is started directly (or via shortcut), the directory is [esProc installation directory\bin]. When double-clicking the dfx file to start esProc indirectly, the directory is the directory where the dfx file is located. Use the following script to verify the main path: filename@p("")

 

In addition to the main path, esProc has other important system directories.

temp: The directory where the calculation engine stores temporary files. If it is not set, the operating system temporary directory will be used by default.

searching path: The root directory of the script file, including the main path, multiple directories can be set, and the directories are separated by semicolons.