Concatenate A Set of Indefinite Number of White-space-separated Rows into a Comma-separated String

Problem description & analysis

Below is text file sample.txt:

CHM 110 - 10

9:30 am - 10:45 am

Thursday

 

CSC 150 - 04

10:30 am - 11:45 am

Monday

11:30 am - 12:45 am

Wednesday

 

BIO 132 - 01

The above file contains data separated by white spaces. Different parts may have different number of rows. We are trying to join up the rows in each part as a string using the comma and output the result to a CSV file, as shown below:

CHM 110 - 10,9:30 am - 10:45 am,Thursday

CSC 150 - 04,10:30 am - 11:45 am,Monday,11:30 am - 12:45 am,Wednesday

BIO 132 - 01

Solution

Method 1: String

Write the following script p1.dfx in esProc:

A

1

=file("sample.txt").read()

2

=A1.split("\n").group@i(~[-1]=="").(~.select(~!="")).concat@nc()

3

=file("result.csv").write(A2)

Explanation

A1  Read in the txt file as a string.

A2  Split A1’s string into a sequence through the line break, group it by the white space and then remove white spaces, concatenate the sequence of sequences into a large string by comma and line break (When splitting a string into a sequence using the line break, usually \n is used under Unix and MacOS X and \r\n is used for Windows. But, SPL import@i function makes the selection unnecessary).

A3  Write A2’s string into result.csv.

Method 2 (Extension): Table sequence

Write the following script p1.dfx in esProc:

A

1

=file("sample.txt").import@i()

2

=A1.group@i(~[-1]==null).(~.select(~))

3

=A2.(~.len()).max()

4

=create(${to(A3).concat@c()})

5

=A2.(~.pad(null,A3)).conj()

6

=A4.record(A5)

7

=file("result.csv").export@c(A4)

Explanation

A1  Read in the txt file as a sequence.

A2  Group the sequence by white space and then remove white spaces.

A3  Get the maximum number of columns based on A2’s sequence.

A4  Create an empty table sequence according to A3’s maximum value.

A5  Pad data to A2’s sequence (because records will be populated to the empty table sequence in order using record function).

A6  Populate data in.

A7  Output data in A4’s table sequence to result.csv.

Read How to Call an SPL Script in Java to learn how to integrate the script code into a Java program.

Q & A Collection

https://stackoverflow.com/questions/60467971/how-to-format-a-large-string-with-a-pattern-that-changes-sometimes-into-a-csv-fo