Which Intermediate File Type You Are Using for Data Migration between Databases in a Program?

To migrate data between databases, we need to first export the source data to an intermediate file, modify data in the file as necessary, and then load it to the target database. Text files (txt/csv) are the most popular intermediate file type because it is general and does not set limits for data size. There are cons. Data types may be missing and type conversion may be needed at data reading. At times we use Excel. Though data types are attached to Excel cell information, the file type has a limit on number of rows and columns, leading to slow migration.

No more problems when we use esProc SPL’s bin file (with the extension btx) to perform the migration. The btx file is a special binary file that can automatically record data types, sets no limits on data size, supports appending data to an existing bin file, and can use segment-based parallel processing to increase efficiency when file is big.

SPL is a programming language intended for processing structured (semi-structured) data. Besides being able to handle the special binary format file, it is equipped with a full set of text file and Excel file handling capabilities. To read data from different intermediate files, SPL has the following statements:

btx file: =file("Department.btx").import@b(DEPT,MANAGER,...)

txt file: =file("Department.txt").import@t(DEPT,MANAGER:int,...)

csv file: =file("Department.csv").import@tc(DEPT,MANAGER:int,...)

Excel file: =file("Department.xlsx").xlsimport@t(DEPT,MANAGER,...; "sheet1")

To export data to different intermediate files, SPL has the following statements:

btx file: =file("Department.btx").export@b(data)

txt file: =file("Department.txt").export@t(data)

csv file: =file("Department.csv").export@tc(data)

Excel file: =file("Department.xlsx").xlsexport@t(data; "sheet1")

When the file is large, we can create a cursor for it and perform computations on the cursor; or retrieve data from the cursor and write it to a file. One example. info.txt is a large file. We are trying to filter it according to the condition type==3 && sdate>=date("2021-01-01") and load the result set to the database:


A

B

1

=file("info.txt").cursor@t(id:int,...,type:int,sdate:date)

/ info is a text file storing data exported from the database   using database tool; create a cursor for it, fetch data from the cursor, and   specify field data types

2

=A1.select(type==3 && sdate>=date("2021-01-01"))

/ Filter A1’s text file according to a condition, which can also be   passed in as parameter

3

=connect("mysql")

/Connect to the database

4

=A3.update@i(A2,INFO)

/ Insert A2’s data to INFO table having the same fields

5

>A3.close()

/Close database connection

esProc SPL offers JDBC driver for being invoked by Java. To save the above script as info_ldr.dfx, for instance, we can call it in Java code the same way as we call a stored procedure:

…
Class.forName("com.esproc.jdbc.InternalDriver");
con = DriverManager.getConnection("jdbc:esproc:local://");
st = con.prepareCall("call info_ldr()");
st.execute();
…