9.23 Parse HTML file and analyze data

 

Parse an HTML file and analyze data in it.
Find all numbers in the body an HTML file. Below is part of the file:

<!DOCTYPE html>
<html class="html__responsive html__unpinned-leftnav">
<head>
 <title>Stack Overflow - Where Developers Learn, Share, &amp; Build Careers</title>
        <link rel="shortcut icon" href="https://cdn.sstatic.net/Sites/stackoverflow/Img/favicon.ico?v=ec617d715196">
        <link rel="apple-touch-icon" href="https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a">
        <link rel="image_src" href="https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a"> 
…
</html>

SPL has s.htmlparse() function to get all text of an HTML file.

SPL script:

A
1 =file(“sof.html”).read()
2 =A1.htmlparse()
3 =A2.(~.words@d()).conj()

A1 Read the HTML file.
A2 Use htmlparse() function to parse the html strings and return a sequence of text strings.
A3 Loop through the parsed text sequence to get the number in each string and concatenate these numbers.

Execution result:

Members
30
3
16.5
5