Text File Parsing – Records with Unfixed Number of Lines

Question

I got a 29MB text dump from a visualization application (Wonderware Intouch) containing objects, their descriptions, actions and scripts which I have to analyze. I'm looking for a tool (Windows or Linux) to parse this text, set up some rules and generate a visual representation of relations (quasi a call stack). Bonus points if the result allows me to navigate through the base text.

undefined

 

Answer

It’s easy to parse a text file where each line makes a record into structured data. If a record consists of an unfixed number of lines, you need to split the text into records by a certain marker, and then match each record with a regular expression. Structured Process Language (SPL) is ideal to handle this type of logs. The following SPL script can solve you problem:

 

A

1

=file("e:\\reportXXX.log").read()

2

=A1.split("Object Type:").delete(1)

3

=A2.regex("(.+)[\\s\\S]+left:    (.+)[\\s\\S]+top: (.+)[\\s\\S]+right:   (.+)[\\s\\S]+bottom: (.+)[\\s\\S]+Line    Color:\\t\\t(.+)[\\s\\S]+Fill  Color:\\t\\t(.+)[\\S\\s]+Link:(.+)[\\s\\S]+Type:   (.+)[\\s\\S]+Condition Type:  (.+)[\\s\\S]+Statement:\\s+(.+)[\\s\\S]+Link:(.+)[\\s\\S]+Type:   (.+)[\\s(\\S]+Expression :  (.+)";ObjectType,left,top,right,bottom,lineColor,fillColor,ojbectLink,type,conditionType,statement,statementLink,statementType,lastExpress)

4

=file("e:\\result.txt").export@t(A3)

 

A1: Read the log file as a string.

A2: Split the log string into multiple records by "Object Type:" and delete the first record.

A3: Match each record with a regular expression to get a complete record.

A4: Export A3’s records to resul.txt.

Sometimes the log file is too large and batch retrieval is needed. But since records consist of different number of lines, batch retrieval may result in half records. The development process becomes more difficult. It’s easy for SPL to handle these scenarios.