Multiple desktop tools that allow querying data files directly in SQL
In our daily work, we often need to query and analyze data files (such as csv/xls file), most of which have quite regular form (each row corresponds to one record) and can be regarded as a database table, and it will be relatively convenient to calculate such files in SQL. However, we often have to import these files into database before using SQL, which is troublesome and difficult, and not friendly to non-professional programmers. Given this situation, we would want to ask the question: is there a tool that can execute SQL directly on a file? In that case, it would be much more convenient. This article will present some desktop tools for non-professional programmers, allowing them to calculate such files directly in SQL.
OpenOffice Base
OpenOffice Base, known as the open-source version of access, is specifically designed for desktop users, featuring low technical requirements and user-friendly interface, yet it only supports csv file and is weak in SQL syntax.
The advantages of OpenOffice Base are: it is small in size; it supports one-click installation; it provides a user-friendly graphical interface; it is specially designed for desktop users; it is convenient to use; its computing engine is self-developed, allowing us to calculate directly on files; it eliminates complex preparations; it has no hidden backend database; its performance is quite good; it supports large files very well.
The disadvantages of OpenOffice Base include: it doesn’t support xls file; it is relatively weak in SQL syntax, even it does not support association calculation between files; it has strict requirements on the form of file, and cannot process files with irregular form.
https://www.openoffice.org/product/base.html
Microsoft Query
Excel is an excellent desktop data computing tool with powerful functions and numerous components, one of which, called Microsoft Query, supports querying csv and xls files in SQL, and features an average user interface in operation difficulty, low technical requirements and weak SQL syntax ability.
Microsoft Query is small in size, and configuration is not required, but it needs to be installed along with Excel; it provides a graphical user interface with old style and few functions; several simple configuration steps are needed before executing SQL, which can be performed with a wizard with ease; it has no hidden backend database; its performance is quite good; it supports larger files.
SQL syntaxes that Microsoft Query supports include the from clause, fuzzy query, part of string and date functions, but it doesn’t support with clause. Although it supports association query, it is limited to query between files of the same type; it doesn’t support association query between csv and xls files because Microsoft Query provides separate driver for each of the two types of files and the two cannot be used simultaneously. In addition, it has strict requirements on the form of file and cannot process files with irregular form.
esProc Desktop
esProc Desktop is a desktop tool dedicated to file computing, featuring low technical requirements, user-friendly interface, and it supports csv/xls file and rich SQL syntaxes.
esProc Desktop is small in size, supports one-click installation, provides a relatively friendly graphical interface, and does not need to configure the file before executing SQL. esProc Desktop has a self-developed computing engine, allowing us to execute SQL without resorting to database, and achieve better performance even for large files. In terms of SQL syntax, esProc Desktop supports with and from subqueries, fuzzy query, a lot of string and date functions, as well as the association calculation between csv and xls files. In addition to csv/xls file, it also supports json/xml/http and many other data sources. Moreover, with esProc's native syntax, the files with irregular form can be processed very well.
http://www.raqsoft.com/esproc-desktop
DBeaver
DBeaver is a data management and SQL query tool, which supports multiple data sources including csv file. This tool features medium technical requirements and convenient user interface, but it is very weak in SQL syntax.
DBeaver is small in size, and supports one-click installation. By default, it does not support csv file, we need to download the components that support csv file with a wizard, which is not difficult.
The disadvantages of DBeaver are: before DBeaver executes SQL, we need to specify a directory and cannot access the files in other directories; it only supports simple SQL calculations, such as fuzzy query, filtering, sorting, grouping and aggregating, and doesn't support association query, from clause and with clause; when executing SQL, we need to load full data into memory; its performance is poor; it does not support large files.
Log Parser
Log Parser is both a command line tool and a desktop tool, and supports csv file and multiple log forms, featuring low technical requirements and above-average user interface in operation difficulty, but it is weak in SQL syntax.
The advantages of Log Parser are: it is small in size; it supports one-click installation; it can execute SQL directly without any configuration; it requires no pre-command or action; the default input and output are done at command line, and the output result is displayed in a graphical table if the option “-o:DATAGRID" is selected when inputting; it has self-developed computing engine and performs better when querying large files.
The disadvantages of Log Parser are: it only supports simple SQL syntaxes, such as fuzzy query, sorting, grouping and aggregating, and doesn't support common association, from clause and with clause; it has strict requirements on the form of file and cannot process files with irregular form.
https://www.microsoft.com/en-us/download/details.aspx?id=24659
Flatbase
It is a website that can query file in SQL, and only supports csv file, featuring low technical requirements, relatively friendly graphical user interface. It supports relatively rich SQL syntax.
The advantage of Flatbase is that we don't need to download and install, just open the website and register an account. When it comes to SQL syntax ability, it supports fuzzy query, association, from clause, with clause, and a large number of string and date functions.
The disadvantages are: we need to drag the file to the work area before executing SQL, the backend action is actually to upload the file and store it into database, resulting in a very poor performance even though the operation is not difficult; it is only suitable for querying small files; this site lacks security measures and is not suitable for querying data-sensitive files; it has strict requirements on the form of file and cannot process files with irregular form.
H2 database
This is a tiny database, and only supports csv file, featuring medium technical requirements and average user interface in operation difficulty. It supports relatively rich SQL syntax.
H2 is very small in size, just a jar package. However, we need to download and install Java environment additionally, and certain technical skills are required in configuration. The web manager that comes with H2 is rather rough, it will be much easier to use a third-party manager. H2 has a self-developed computing engine, which supports with and from subqueries, fuzzy query, association query, and a lot of string and date functions. Before executing SQL, there is no need to create a table, and no pre-operation is required, and instead, we can use ”select … from csvread(file path/name)“ to query the file directly.
Although H2 hides the actions of creating a table and storing it into database, the actions do exist, so the file should not be too large, otherwise the performance will be seriously affected. In addition, H2 has strict requirements on the form of file and doesn’t support the file with irregular form.
http://www.h2database.com/html/functions.html#csvread
csvsql
csvsql is a small Python-based tool, and we can operate at command line as usual. It only supports csv file, and is weak in SQL syntax, and troublesome in installation and configuration.
csvsql is small in size and does not take up much space, and no extra operations or commands are required before executing SQL.
csvsql encapsulates SQLite, but does not support all SQLite syntaxes yet, including with and from clauses, as well as most string and date functions. Currently it only supports few syntaxes such as sorting, filtering, grouping and aggregating, and association. Since csvsql is based on Python, it is troublesome to configure, and requires certain technical skills. When executing SQL, since the process of storing the file into database occurs in backend, it is not suitable for querying large files, otherwise it will take a lot of time. In addition, csvsql has strict requirements on the form of file, only the file with very regular form can be queried.
https://csvkit.readthedocs.io/en/latest/scripts/csvsql.html
q
Apart from the fact that q can be regarded as an improved version of csvsql, there is no significant difference between them.
q provides the “cache mode”. To be specific, when we query a file for the first time, the file will be stored into database, the performance is thus poor, while when we query the same file again, q will directly use the database table in SQLite, the performance will be significantly improved. Note that if the file changes, we need to replace the buffer file manually, otherwise what we query is the old data in the database table.
https://github.com/harelba/q/releases/download/2.0.19/q-AMD64-Windows-installer.exe
csvq
csvq is a small tool developed in GO language. This tool supports csv file, but doesn't support xls file, featuring low technical requirements and common command line interface, and it supports rich SQL syntax.
csvq provides only one binary executable file, no installation and configuration are required and we can execute SQL directly without any preparation. In terms of SQL syntax, csvq supports fuzzy query, association query, from clause, with clause, window function, and a lot of string and date functions. In addition to csv file, it also supports json file.
csvq encapsulates sqlite. Since storing the file into database occurs in backend when executing SQL, and it needs to load full data into memory, the file should not be too large, or the performance will be poor. In addition, csvq has strict requirements on the form of file and cannot calculate the file with irregular form.
https://mithrandie.github.io/csvq/
trdsql
trdsql is a command line tool developed in GO language. This tool supports csv/json file, but doesn't support xls file, features low technical requirements and supports relatively rich SQL syntax.
trdsql provides only one binary executable file, no installation and configuration are required and we can execute SQL directly without any preparation. In terms of SQL syntax, trdsql supports fuzzy query, association query, from clause, with clause, window function, and a lot of string and date functions. trdsql encapsulates multiple databases, which can be switched with option according to our habits.
Since the process of storing the file into database hides in backend when executing SQL, the file should not be too large, or the performance will be poor. In addition, trdsql has strict requirements on the form of file and cannot calculate the file with irregular form.
https://github.com/noborus/trdsql/releases/download/v0.10.1/trdsql_v0.10.1_windows_amd64.zip
rows
rows is a command-line tool written in Python, supports many kinds of files such as csv/xls/json, and features high technical requirements. As for the SQL syntax ability, rows is in a below-average position.
rows supports both csv and xls files, and supports the association calculation of two types of files. It is convenient that no complex pre-commands are required before executing SQL.
The disadvantages of rows are: although it is small in size, the installation and configuration are complex, and we need to install and configurate it at command line. Specifically, we need to install and configure Python environment before installing rows, and we also need to install other plug-ins, including those that support xls file, after installing rows; In terms of SQL syntax, rows supports association query, from subquery, but it doesn’t support fuzzy query, with clause, and many string and date functions; since rows can execute SQL only with the help of SQLite, and the process of storing the file into database exists in backend, the performance is poor when the file is large; rows has strict requirements on the form of file.
http://turicas.info/rows/installation/
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL