Word Appearance Frequency

 

Problem

In a normal English document, words are separated by blank, comma, full stop, and carriage return, and the sign “-” is used to connect the characters before and after the carriage return into a word.

Now suppose there is such a document according to which you need to get the total number of different words, count the appearance frequency of each word, and select the word with the highest appearance frequency.

Tip

Load the document, break the document content into a sequence consisting of single characters, and then convert all upper-case letters in the sequence into lower-case letters and change all non-letter characters in the sequence into blanks. Delete the consecutive blanks into one blank, combine members of the sequence into a string, and then according to blanks break the string again into sequences, each of which is composed of one word. Group the same words into one group.The returned sub-group with the largest length contains the word with the highest appearance frequency.

  1. Read the document content.

  2. Break the document content into a sequence consisting of single characters, and then convert all upper-case letters in the sequence into lower-case letters and change all non-letter characters in the sequence into blanks.

  3. Delete consecutive blanks into one blank, combine members of the sequence to form a string, and then according to blanks break the string again into sequences, each of which consists of one word.

  4. Group the same words into one group, and the returned sub-group with the largest length contains the word with the highest appearance frequency.

Code

A
1 E:\\esProc exercise\\word.txt
2 =file(A1).read()
3 =A2.split().(if(isalpha(~), lower(~)," " )) Break the document content into sequence consisting of single characters, and then convert all upper-case letters in the sequence into lower-case letters and change all non-letter characters in the sequence into blanks.
4 =A3.select(~!="" || ~[-1]!=" " ) Delete consecutive blanks into one blank.
5 =A4.concat().split(" ") Put sequences together to form a string, and then break the string again with blank into sequences, so they form sequences in which one word is a member.
6 =A5.group().maxp(~.len())(1) Group sequences, query the member with the largest length after grouping, and it is the word with the highest appearance frequency.

Result

imagepng