Sample Programs of String Split in Excel

 

  Though Excel functions are good at handling simple string split tasks, they become awkward for handling certain special string splits and those having complicated conditions. This article gives examples of the difficult tasks, explains the challenges and offers solutions in esProc SPL. SPL is the abbreviation of Structured Process Language used by esProc, the professional data computing engine. It’s convenient to handle structured data processing, including string split of course, with esProc.

I. Split by words

  We have a product purchase list. In the list, each item includes the brand and type we want. What we want is to split each item to respectively list the brand and type after it.

  Below is productlist.xlsx:

  undefined

  The expected result:

  undefined

  We can use Excel’s text-to-columns feature to do the splitting. During the process, both the brand string and type string will be split into multiple cells and then combined again. It seems that this is over-splitting and a waste of resource and time. We can also use flash fill in Excel to automatically fill the data according to a given pattern, to extract PANASONIC for the first row, for instance, and then get the type string using MID function. It’s still inconvenient.

  SPL can read in an Excel file directly. To do a real-time analysis, it copies the Excel data onto the clipboard, uses the clipboard function to get the data, writes the processed data back to the clipboard and then pastes the result set to Excel. The whole process is smooth, seamless, convenient and efficient.

  SPL script:


A

B

1

=clipboard().import@i()

/ Import the product list   directly form the clipboard

2

=A1.(~.split@1(""))

/ Split each item of the   product list by the first space to generate a sequence of sequences

3

=A2.concat@n("\t")

/ Concatenate A2 into a   string of values of two-dimensional table, where members in each subsequence   are delimited by tab and member sequences are separated by carriage return

4

=clipboard(A3)

/ Put the string of values   onto the clipboard

  After the SPL script is executed, we just need to paste the result set onto B1 of the Excel file to get the desired result.

 

II. Split away digits

  There are strings made up of both digits and characters. We want to separate the digitss from the characters.

  Below is numbers.xlsx:

  undefined

  The expected result:

  undefined

  To split away the digits only, we can use flash fill by defining a pattern on the first row. But to pick up the rest of the characters, there isn’t a way in Excel because some digits are mixed with the characters and MID function cannot play a role. The only way out is to write a program in VBA to loop over each character to check their type and split them one by one. The process is roundabout and complicated.

  SPL can directly split a string into individual characters and categorize them by type:


A

B

1

=clipboard().split@n()

/Get data from the   clipboard, split it into a sequence by carriage return and then split each   member character by character

2

=A1.(~.align@a([true,false],isdigit(~)).(~.concat()))

/Group each sequence of   characters according to whether a character is a digit or not and concatenate   each group into a string to split digits from the characters

3

=A2.concat@n("\t")

/Concatenate the two levels   of sequences into a two-dimensional table string by tab and carriage return   respectively

4

=clipboard(A3)

/ Paste A3’s string onto the   clipboard

 

III. Split away dates

  Here are rows of words containing dates. We want to split away all dates from each row and separate the dates by semicolon there are more than one date.

  Below is multidates.xlsx:

  undefined

  The expected result:

  undefined

  Excel hasn’t a special way to split a date away from a row, particularly when the number of dates in the rows varies. We can sue MID function or flash fill only when all dates have fixed positions or lengths in the rows. In our case, we have to turn to VBA to write a program and use a regular expression to match and analyze each row. This requires relatively high skills and produces complicated code.

  SPL’s way is to split a string into a sequence of words by spaces and convert members according to the specified date format:


A

B

1

=clipboard().split@n(“ “)

/ Get data from the   clipboard, split it into a sequence by carriage return and then split each   member into a sequence of words

2

=A1.(~.   (date(~,"dd.MM.yy")))

/ Convert members in the   sequence of words into date type data according to the specified format

3

=A2.(~.select(ifdate(~)))

/ Get date type values from   the sequence

4

=clipboard(A3.concat@n(“;”))

/ Concatenate the date   string into a two-dimensional data string and paste it onto the clipboard

  After the SPL script is executed, we just need to paste the result set onto B1 of the Excel file to get the desired result.

 

IV. Split by character

  The following table contains a column of numbers of different lengths. We want to split each digit away from each numeric value to become an individual column.

  Below is number.xlsx:

  undefined

  The expected result:

  undefined

  Text to column comes first for this case. But to split each character away requires specifying the splitting position each time a character is split away. It’s particularly tedious to do a lot of repeated specifications for a number with lots of digits. An alternative is to use functions to do the splitting, such as =MID($A1,COLUMN(A1),1) for B1, and then drag mouse to fill rows and columns. But it’s inconvenient to decide the maximum number of columns and to handle a table with lots of rows.

  SPL can do the splitting by characters directly:


A

B

1

=clipboard().split@n()

/ Get data from the   clipboard, split it into a sequence by carriage return and then, by default,   split each member into a sequence of individual characters

2

=A1.concat@n("\t")

/ Concatenate the two levels   of sequences into a two-dimensional table string

3

=clipboard(A2)

/ Paste the result set onto   the clipboard

  After the SPL script is executed, we just need to paste the result set onto B1 of the Excel file to get the desired result.

 

V. Split away attribute tables & file names

  Here is a log file of complicated structure. It contains description of nodes that are similar to an attribute table. Now we want to split away the values of PublicKeyToken and the file names from the attribute description.

  Below is log.xlsx:

  undefined

  The expected result:

  undefined

  To split such a comprehensive string to get a comma-separated description string at the first level and an attribute-node-like description string in Excel from the second item, we need a number of text-to-columns operations and multiple flash fills. There will be many steps and a given pattern is needed to do the flash fill. The whole process is extremely complicated.

  SPL has the special function to directly get values of attribute strings and get different parts of a file name:


A

B

1

=clipboard().split@nc()

/ Get data from the   clipboard, split it into a sequence by carriage return and then, by default,   split each member into a sequence by commas

2

=A1.([replace(~(2),"\"","").property("PublicKeyToken"),filename(replace(~(3),"\"",""))])

/ Remove quotes at both   ends, use property function to get PublicKeyToken value from item 2 and   filename function to get the file name from the third item; then join the two   values into a sequence

3

=clipboard(A2.concat@n("\t"))

/ Paste the result set onto   the clipboard

 

  Find more examples in SPL Cookbook.