String Split – Count the Frequency of a Certain String

Question
ID    text
1    &aaa&&bbb&
2    &aaa&&bbb&&123&
3    &aaa&&ccc&&456&
4    &aaa&&ddd&&eee&&222&
5    &aaa&&ddd&&eee&&789&

I want to find how many times a certain string appears. The string will start a value and appear at least twice. Here’s what I want:
&aaa&     5 times
&aaa&&bbb&     2 times
&aaa&&ddd&     2 times  
&aaa&&ddd&&eee&    2 times

The basic format of a string is &[a-zA-Z0-9]&. There can be multiple such strings, like &[a-zA-Z0-9]&&[a-zA-Z0-9]&.

 

Answer
The key to you question is string split. A stored procedure is a little easier than a SQL query. But both are complicated. I handle it in SPL (Structure Process Language):

 

A

1

$select text from tb

2

=A1.(text).(replace(~,"&&","&,&").array())

3

=A2.conj().group(~:str;~.count():num).select(num>1)

A2: Split each text value into a sequence where each member is a &[a-zA-Z0-9]&;

A3: Loop over each letter to repeatedly join it up to the “&” sign, which is the resulting string, and then group the original data by the resulting string and find the group whose member is greater than 1. Here’s the final result:

undefined

An SPL script can be integrated into another application via esProc JDBC. Refer to How to Call an SPL Script in Java for details.