Group by a Sign and Then Regroup for Aggregation

 

Question

I’m new to Java. I want to read in data from a txt file, modify it and write the frequency of each neighboring pair of letters to a new txt file (like ab= times, ba= times, aa= times, etc.). The asterisk sign * and the pound sign # represents the start and the end of a string of letters. It would be best if the code can be directly executed and is beginner-friendly with comments provided. Thanks
Here’s a part of the source text file (All the file is in this format):
*
a
b
#
*
a
b
b
#
*
a
a
b
c
#
*
a
c
c
b
#
*
d
#
*
a
d
b
a
d
d
c
#

Answer

It’s rather complicated to do this in Java. You can handle it in SPL (Structured Process Language) and then integrate it with the Java application:

A

1

=file("E:\\s.txt").import@i()

2

=A1.select(~!="#").group@i(~=="*")

3

=A2.conj(~.([~[-1],~]).to(3,))

4

=A3.groups(~:a;count(~):b)

5

=A4.new(a.concat()+"="+string(b)+"time")

6

=file("E:\\result.txt").export(A5)

 

A1: Read in data from s.txt.

 undefined

A2: Group data by the pound sign “#” with each group starting with the asterisk sign “*”.

http://www.raqsoft.com.cn/raq/qiniu/themes/default/images/spacer.gif undefined

A3: For every group, get each member and its previous neighboring member to form a sequence, and get a sequence consisting of the third member and member (s) after it, and then concatenate these sequences.


A4: Group the sequence to find the frequency of each pair of letters.

undefined

A5: Generate a new table sequence from A4 according to the required format.

 

index

a.concat()+"="+string(b)+"time"

1

aa=1 time

2

ab=3 times

3

ac=1 time

4

ad=2 times

5

ba=1 time

6

bb=1 time

7

bc=1 time

8

cb=1 time

9

cc=1 time

10

db=1 time

11

dc=1 time

12

dd=1 time

 

A6: Export A5’s table sequence to a specified text file.

The SPL script can be conveniently integrated into a Java application. See How to Call an SPL Script in Java