Handle Commas inside a Quotation Mark

Question

I am trying to parse a comma-separated string using:

val array = input.split(",")

Then I notice that some input lines have "," inside a quotation mark:

data0, "data1", data2, data3, "data4-1, data4-2, data4-3", data5

*Note that the data is not very clean, so some fields are inside quotation marks while some don't.

How do I split such line into:

array(0) = data0

array(1) = data1

array(2) = data2

array(3) = data3

array(4) = data4-1, data4-2, data4-3

array(5) = data5

 

Answer

The key to this problem is identifying commas outside quotation marks, instead of those inside them, as separators. It’s OK to handle it in Java, but the code is complicated. Since there are no other computing targets in this case, we can do it in SPL (Structured Process Language) and then embed the script into Java. One-liner is enough:

A

1

=file("d:\\source.txt").import@qc()

A1: According to the commas, f.import@c() function reads the text file source.txt as a two-dimensional table and remove the quotation marks automatically with @q function. Here’s the result:

undefined

If the source of the to-be-handled string is a variable (say str), A1’s code should be =str.import@qc(). Take the following string as an example:

data0, "data1", data2, data3, "data4-1, data4-2, data4-3", data5

"data0, data0", data1, "data2", data3-1, "data4-2", data5

Then the result is:

undefined

An SPL script is easily integrated into a Java application. (See How to Call an SPL Script in Java)