SPL: Text Handling

 

Text handling is an integral and indispensable part of any programming language. SPL (Structured Process Language) provides abundant functions for handling text. There are four categories – character handling, simple string handling, sequence-related handling, and complex analysis. They cover extracting a substring from a string, type judgment and conversion, character set handling, data type parsing, search and replacement, format handling, string generation/splitting/combination, the use of regular expression matching, extracting words and numbers, SQL statement analysis, a HTML analysis, etc.


1.  character handling

1.1   Type judgment

Task

Code

Return value

Check if all characters are numbers

isdigit(“12345”)

true

Check if all characters are letters

isalpha(“abcde”)

true

Check if all characters are lower-case letters

islower(“aBcd”)

false

Check if all characters are upper-case letters

isupper(“ABCD”)

true

 

1.2   Conversion

Task

Code

Return value

Convert all characters to lower-case

lower(“aB’cD’e”)

ab’cd’e

Skip a string put in quotes when converting to lower-case

lower@q(“aB’cD’e”)

ab’cD’e

Convert all characters to upper-case

upper(“aB’cD’e”)

AB’CD’E

Skip a string put in quotes when converting to upper-case

upper@q(“aB’cD’e”)

AB’cD’E

Convert row number and column number to an Excel-style cell name

cellname(8,32)

AF8

 

1.3   Character set handling

Task

Code

Return value

Get the Unicode value for the 2nd character

asc(“abc”,2)

98

Get the Unicode value for the 1st character

asc(“中国”)

20013

Get the corresponding character for the   Unicode value

char(98)

B

Get the corresponding character for the   Unicode value

char(22269)

Encode a URL string

urlencode("a=b+c>0?1:-1","UTF-8")

a%3Db%2Bc%3E0%3F1%3A-1

Decode an encoded URL string

urlencode@r(“a%3Db%2Bc”,”UTF-8”)

a=b+c

Base64-encode a byte array

base64(bytes)

ixb7x1dxfbxefxff

Convert to an MD5-encrypted string

md5(“abcde”)

AB56B4D92B40713ACC5AF89985D4B786

 

1.4   Data type parsing

Parse a string value into one of the corresponding data type.

Task

Code

Return value

Parse a numeric string into a number

number("1234.56")

1234.56

Parse a currency string into a number

number("$10,234.55","$#,###.#")

10234.55

Parse a time format string into a time object

parse("10:20:30")

A time object

Parse a sequence type string into a sequence

parse("[1,2,3]")

Sequence [1,2,3]

Remove quotes and escape characters at parsing

parse@e("\"a\tb\"")

a     b

Only parse the numeric characters beginning from the left-most one

parse@n("123adsdx234")

123

Parse until the second half the   quotation marks appears when a given string begins with the quotation marks

parse@q("\'sadsdxc\'+23")

sadsdxc

 

1.5   Format handling

Task

Code

Return value

Convert a numeric value to a corresponding format string

string(12345.678,”#,##0.00”)

12,345.68

Convert a date value to a corresponding   format string

string(date(“2020-03-15”),”MM/dd/yyyy”)

03/15/2020

Convert a time value to a corresponding   format string

string(time(“16:18:54”),”h:mm   a”)

4:18 PM

Convert a datetime value to a   corresponding format string

string(now(),”MM/dd/yyyy   HH:mm”)

2020-05-28 14:06

Enclose a string with quotation marks

string@q(“ab cd”)

“ab  cd”

Escape a character

string@e(“ab\”c   d”)

ab\"c\td

Escape a character, and convert characters in a large character set to Unicode values

string@u(“ab\”c   中国”)

ab\"c\t\u4E2D\u56FD

Parse a format specifier substring of a string according to the second parameter

format("The price is $%.2f",8.5)

The price is $8.50

 

2.   Simple string handling

2.1   Getting a substring

Get a substring from a source string. Related functions are mid(), left() and right(). Below are some examples:

Task

Code

Return value

Get a substring from the 3rd character

mid(”abcdef”,3)

cdef

Get a substring of two characters from the 3rd character

mid(”abcdef”,3,2)

cd

Get a substring of the leftmost three characters

left(”abcdef”,3)

abc

Get a substring by removing the last two characters

left(”abcdef”,-2)

abcd

Get a substring of the rightmost three characters

right(”abcdef”,3)

def

Get a substring by removing the first two characters

right(”abcdef”,-2)

cdef

substr(s1,s2)

The function searches source string s1 for the target substring s2, and returns the substring before or after s2 or null if s2 cannot be found.

Task

Code

Return value

Get the substring after a given   substring

substr(“abCDcdef”,”cd”)

ef

Get the substring before a given   substring

substr@l(“abCDcdef”,”cd”)

abCD

Get the case-insensitive substring after a given substring

substr@c(“abCDcdef”,”cd”)

cdef

Get a substring with a string put in quotes skipped

substr@q(“ab\”acd\”cdef”,”cd”)

ef

 

2.2   Search and matching

pos(s1,s2{,begin})

The function finds the position of the target substring s2 in source string s1, and returns the original position of s2 in s1 or null if s2 cannot be found.

Task

Code

Return value

The ordinary search

pos(”abcdef”,”aa”)

null

The ordinary search

pos(”abcDedefgh”,”de”)

6

Perform the search from the 5th character

pos(”abcdedefgh”,”de”,5)

6

Perform the search backwards

pos@z(”abcdedefgh”,”de”)

6

Perform a case-insensitive search

pos@c(”abcDedefgh”,”de”)

4

Search for the target string at the  beginning

pos@h(”abcdefgh”,”ab”)

1

Search for the target string at the tail end

pos@hz(”abcdefgh”,”fgh”)

6

Perform the search with a string put in quotes skipped

pos@q(”ab\”cde\”fcd”,”cd”)

9

Check if the source string matches a given format string

like("abc123", "abc*")

true

Case-insensitive for format string   matching

like@c("abc123", "ABC*")

true

 

2.3   Replacement

replace(s,a,b)

The function replaces substring a in source string s with string b and returns the new s.

Task

Code

Return value

The ordinary replacement

replace("abc’ab’deA","a","ss")

ssbc'ssb’deA

Replace the first-found target substring only

replace@1("abc’ab’deA","a","ss")

ssbc'ab’deA

The case-insensitive replacement

replace@c("abc’ab’deA","a","ss")

ssbc'ssb’dess

Perform the replacement with a string put in quotes skipped

replace@q("abc’ab’deA","a","ss")

ssbc'ab’deA

 

2.4   Others

Task

Code

Return value

Repeat a given string n times to   generate a new string

fill(“abc”,4)

abcabcabcabc

Delete white spaces at both ends of a given string

trim(" a     bc ")

"a   bc"

Delete white spaces on the left of a given string

trim@l(" a     bc ")

"a   bc "

Delete white spaces on the right of a given string

trim@r(" a     bc ")

" a   bc"

Delete unwanted white spaces from a given string

trim@a(" a     bc ")

"a bc"

Patch a string before an existing one to   extend the latter to the desired length

pad("Soth","Miss",10)

MissMiSoth

Patch a string after an existing one to extend the latter to the desired length

pad@r("Soth","er",8)

Sotherer

Generate a numeric string with the length of 1 randomly

rands(“0123456789”,11)

44238061662

 

3.   Sequence-related handling

3.1   Splitting

s.split(d)

The function splits string s into a sequence using delimiter d, or into a sequence of single characters when d is absent.

Task

Code

Return value

Split a given string using the vertical line

“aa|bb|cc”.split(“|”)

[aa,bb,cc]

Split a given string using the comma

"1,[a,b],(2,c),'5,6'".split@c()

[“1”,”[a,b]”,”(2,c)”,’5,6’]

Split a given string into two parts by the first-found delimiter

"1,[a,b],(2,c),'5,6'".split@c1()

[“1”,”[a,b],(2,c),’5,6’”]

Do not handle parentheses/brackets and quotation marks handling at splitting

"1,[a,b],(2,c),'5,6'".split@cb()

[“1”,”[a”,”b]”,”(2”,”c)”,”’5”,”6’”]

Parse each member in the result sequence as an object

"1,[a,b],(2,c),'5,6'".split@cp()

[1,[a,b],”(2,c)”,”5,6”]

Delete white spaces at both ends of each member in the result sequence

“192.168.0.3”.split@t(“.”)

[“192”,”168”,”0”,”3”]

Treat parameter d as a regular expression, and split the string, for instance, by digits

"a1b2c57d".split@r("\\d")

[“a”,”b”,”c”,””,”d”]

Split a given string by line break, and then perform the ordinary or the option-enabled splitting on each line

"s,a,y\ngood,morning".split@nc()

[[“s”,”a”,”y”],[“good”,”morning”]]

 

3.2   Concatenation

A.concat(d)

The function concatenates members of sequence A into a string through delimiter d, during which the sub-sequences are handled in the same way, or joins them up directly when d is absent.

In the following example functions, A1 is [1, ["a","b"],[2,"c"]].

Task

Code

Return value

Concatenate members without a delimiter

A1.concat()

1[ab][2c]

Concatenate members using the colon

A1.concat(“:”)

1:[a:b]:[2:c]

Concatenate members using the comma

A1.concat@c()

1,[a,b],[2,c]

Enclose each member with double quotation marks at concatenation

A1.concat@q()

1[“a””b”][2”c”]

Enclose each member with single quotation marks at concatenation

A1.concat@i()

1[‘a’’b’][2’c’]

With a two-level sequence, concatenate the outer level by line break and then perform concatenation on each sub-sequence

[[1,2,3],["a","b"],[2,"c"]].concat@n(“-“)

1-2-3
  a-b
  2-c

Concatenate parameters into a string and do not enclose each result members with quotes

concat(2,["a","b"],"cd")

2abcd

 

4.  Complex analysis

4.1   Regular expression matching

Task

Code

Return value

Match a given string with the regular expression

“4,23,a,test”.regex("(\\d),([0-9]*),([a-z]),([a-z]*)")

[“4”,”23”,”a”,”test”]

Match a given string with the regular expression

“4,23,a,test”.regex("(\\d),([a-z]),([0-9]*),([a-z]*)")

null

Case-insensitive for the matching

"a:Test:B".regex@c("([a-z]):([a-z])")

[“a”,”T”,”t”,”B”]

Learn more about regular expressions.

4.2   Word and number extraction

In the following example functions, s is "hi-10 hello!2020-01-01A8,3.14".

Task

Code

Return value

Extract English words

s.words()

[”hi”,”hello”,”A”]

Extract digits strings

s.words@d()

[”10”,”2020”,”01”,”01”,”8”,”3”,”14”]

Extract English words and strings of digits

s.words@a()

[”hi”,”10”,”hello”,”2020”,”01”,”01”,”A”,”8”,”3”,”14”]

Extract all: words, strings of digits and other character strings

s.words@w()

[”hi”,”-”,”10”,” ”,”hello”,”!”,”2020”,”-”,”01”,”-”,”01”,”A”,”8”,”,”,”3”,”.”,”14”]

Extract all, where strings beginning with digits will be identified as numbers or datetimes as a whole

s.words@wp()

[”hi”,”-10”,”   ”,”hello”,”!”,”2020-01-01”,”A”,”8”,”,”,”3.14”]

Treat continuous English letters and digits as a word

s.words@i()

[”hi”,”hello”,”A8”]

 

4.3   SQL statement analysis

sql.sqlparse(part)

The function splits a SQL query according to the order of select clause, from clause (including the join), group by clause, having clause and order by clause, make these syntactic units parameters members, and returns a sequence of them. When parameter part is present, replace the corresponding syntactic parameter in the source SQL query with it, and return a new SQL.

The sql in the following example functions is as follows:

select emp.EID,emp.NAME,emp.BIRTHDAY,emp.SALARY,dep.DEPT,dep.MANAGER
    from EMPLOYEE emp left join DEPARTMENT dep on emp.DEPT=dep.DEPT
    where emp.EID<100 and dep.DEPT='R&D'
    order by emp.SALARY,emp.EID

sql2select * from dept

Task

Code

Return value

Get all syntactic units of a given SQL query

sql.sqlparse()

Omitted; please refer to Function   Reference

Get select clause

sql.sqlparse@s()

emp.EID,emp.NAME,emp.BIRTHDAY,emp.SALARY,dep.DEPT,dep.MANAGER

Get from clause

sql.sqlparse@f()

EMPLOYEE emp left join DEPARTMENT dep on emp.DEPT=dep.DEPT

Get where clause

sql.sqlparse@w()

emp.EID<100 and dep.DEPT='R&D'

Get group by clause

sql.sqlparse@g()

null

Get having clause

sql.sqlparse@h()

null

Get order by clause

sql.sqlparse@o()

emp.SALARY,emp.EID

Get syntactic units of a given SQL query and split each unit into a sequence of smaller ones

sql.sqlparse@a()

Omitted; please refer to Function Reference

Replace select clause with the parameter

sql2.sqlparse@s(“deptid”)

select deptid from dept

Replace from clause with the parameter

sql2.sqlparse@f(“sales”)

select * from sales

Convert a SQL query to the counterpart in a specified database

sql.sqltranslate(“MYSQL”)

Omitted

 

4.4   Node string handling

Node strings are key/value strings. In node strings in SPL, a key and its value are connected by the equals sign. Different nodes are separated by the white space.

xs.property(n,v)

The function reads value of the node named n from node string xs and returns the result. When string v is present, make v n’s value; and delete n when v is absent.

In the following example functions, xs is a set of key=value pairs: color=red size=20 price=500.

Task

Code

Return value

Get a two-column table sequence made up of all properties

xs.property()

Omitted

Get value of a given property

xs.property(“size”)

“20”

Get value of a given property and   enclose it

xs.property@q(“size”)

“\”20\””

Get value of a given property and parse it into the corresponding data type

xs.property@v(“size”)

20

Modify value of a given property

xs.property(“size”,”30”)

color=red size=30 price=500

Delete a given property

xs.property(“size”,null)

color=red price=500

Get value of a given property form node strings separated by semicolon

“color=red;size=30“.property@cv(“size”)

30

Get value of a given property form node strings where the property name and value are comma-separated

“color:red     size:30“.property@j(“size”)

“30”

 

4.5   HTML string handling

s.htmlparse(tag:i:j,…)

The function gets the jth text under the ith tag in HTML string s; or gets all text from s when all parameters are absent.

Below is the HTML string in the following example functions:

<div><div>aaa<div>bbb<span>ccc</span>ddd</div></div></div>
<table><tr><td>1</td><td>2</td><td>3</td></tr>
<tr><td>a</td><td>b</td><td>c</td></tr>
</table>

Task

Code

Return value

Get the 3rd text under the 3rd   div tag

html.htmlparse("div":2:2)

ddd

Get the 1st text under the 2nd   div tag and the 1st text under the 1st span tag

html.htmlparse("div":1:0,"span":0:0)

[aaa,ccc]

Get the content of the 1st table tag

html.htmlparse("table":0)

[[1,2,3],[a,b,c]]

 

Learn more about text handling through string() functions.