How to Do Batch Searching of Key Values on Big Data Quickly

How to Do Batch Searching of Key Values on Big Data Quickly

 

Key words: Big data  Random batch searching of key values  Efficient searching

 

For data stored in databases, we can speed up searching using the index created on a database table. This will control time of searching for one record within about dozens of milliseconds (while the degree of complexity is LogN) even when the total record count is 10 billion. However, to search a huge number of key values, like thousands of or ten thousand of them, there will be a total of ten thousand or one hundred thousand of retrievals and comparisons; and it will take dozens of minutes, even an hour/hours, to finish work if we do the searching one by one. That degrades the user experience to an unbearable level.

Here’s 6 billion records of the following structure:

Field

Type

Note

id

long

Auto-increment after 1000000000001

data

string

Random strings (with the length of 180 bytes)

The searching task is to get ten thousand records by random ids. It takes Oracle about 120 seconds to finish it.

The SQL query looks like this: select * from testdata where id in (…)

Since the in statement can include 1000 elements at most, we need multiple queries and concatenate them to get the final result. It’s inefficient.

The program will be brief and efficient if we could use esProc to do the searching. Here’s an example script:


A

B

1

=file("testdata.ctx").create()

// Open composite table file testdata.ctx

2

=A1.index@3(id_idx)

// Load a 3-level index

3

=keys

// The sequence of random key values to be searched

4

=A1.icursor(;A3.contain(id),id_idx)

// Do the searching using the composite table’s   index id_idx

Here we use esProc’s composite table to handle the batch searching of key values according to a high-performing index. The searching is over at the 20th seconds, 6 times faster than an Oracle searching. More explanations can be found in Performance Optimization - Search.

An esProc SPL script can be easily embedded into a Java program. Read How to Call an SPL Script in Java to learn details.

Read Getting Started with esProc to download and install esProc, get a license for free and find related documentation.