Is Elasticsearch Best Suited to High-Concurrency Queries?
Compared with SQL databases/ data warehouses, the search engine Elasticsearch is more suitable for implementing high-concurrency queries, such as account detail query that searches for several to thousands of detailed records from tens of millions of, even a billion, rows of historical data. The queries feature huge amounts of data, high concurrency, and demand of sub-second response times. The SQL-based technological frameworks, including relational databases and HADOOP data warehouses, can hardly meet the requirements with available resources. The popular practice is to export data to Elasticsearch and use its search technology to achieve good performance at high concurrency. From this perspective, ES is a good choice for performing high-concurrency queries.
Unfortunately, ES does not support JOIN operations, which is rather inconvenient. Take the account details query as an example, and we want to get a result set consisting of fields including store name, address, and phone number, etc. These fields are generally stored in the store table, and a join between it and the detail table is needed to get them. As ES does not support performing JOINs, we have to combine the store data into the detail data to get a wide table, as shown below:
Obviously, this involves storing store data repeatedly. The redundant data will take up a lot of disk space. It’s even worse that whenever there is any change to the store data, such as a merger between stores, splitting up or closing a store, or renaming a store, the wide table containing tens of millions of, even a billion, rows of data will refresh itself completely. The process is extremely slow, during which the query service stops and user experience deteriorates. Besides, the store table is usually not the only one to be joined and data in any of those tables could be changed, causing the frequent refreshes.
Apart from these, there is the small issue that an ES cluster is slow to restart. Each time when the application is upgraded, restart is needed and query service is suspended.
In view of this, ES is not a good choice for handling high-concurrency queries involving join operations.
It is convenient to deal with those queries using the open-source esProc SPL. SPL boasts built-in order-based row-oriented storage and layered index pre-load to be able to reach or surpass ES query performance while achieving join operations and avoiding the wide table refresh issue that ES has. The SPL server is light and easy to administer, as well as starts fast.
To perform high-concurrency query on account data and join store table and any other related table in SPL, for instance, we have the following code:
A1=detail.icursor (id,store_id,amt,detail_date,…;id=="101312",index_detail_id).fetch()
// Perform the query using the order-based row-oriented storage and the index, where id can be a pass-in parameter
A2=file("store.btx").import@b(id,name,...).keys(id)
// Load the store table, or you can load it at the initiation stage to become faster
A3=A1.switch(store_id,A2)
// Perform a join
return A3.new(id,store_id.store_id:store_id,store_id.store_name:store_name,…,amt,detail_date,…)
// Reference the related field to get the query result
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version