9.24 Parse HTML file and generate a table sequence

 

Parse and HTML file and generate a table sequence.
Parse the following HTML file generate a score table, and calculate total score for each student.

<html>
…
<table id="tfhover" class="tftable" border="1">
<tr><th>CLASS</th><th>STUDENTID</th><th>SUBJECT</th><th>SCORE</th></tr>
<tr><td>Class one</td><td>1</td><td>Math</td><td>77</td>
<tr><td>Class one</td><td>1</td><td>PE</td><td>69</td>
<tr><td>Class one</td><td>1</td><td>English</td><td>84</td>
<tr><td>Class one</td><td>2</td><td>Math</td><td>80</td>
<tr><td>Class one</td><td>2</td><td>PE</td><td>97</td>
…
</table>
…
</html>

s.htmlparse(tag:i:j) function is used to get the jth text under the ith tag of string s of HTML format.

SPL script:

A
1 =file(“table.html”).read()
2 =A1.htmlparse(“table”:0)
3 =create(${A2(1).concat@c()})
4 =A3.record(A2.to(2,).conj())
5 =A3.groups(STUDENTID; sum(SCORE):TOTALSCORE)

A1 Read the HTML file.
A2 Use htmlparse() function to parse the html string and return content of the first table tag.
A3 Create a table sequence using title on A2’s first row.
A4 Insert A2’s data beginning from the second row in order into A3’s table sequence.
A5 Group student score table sequence and calculate total score of each student.

Execution result:

STUDENTID TOTALSCORE
1 230
2 258
3 228