What programming language should business people learn?

 

【Abstract】

Through programming, the business people can overcome the difficulties of manual operation of Excel, and greatly improve work efficiency. This paper carefully selected four programming languages, from the installation, debugging, tabular data calculation and other aspects of in-depth comparison, and focused on the learning difficulty, esProc SPL performs well in these tools, and is most suitable for business people to learn.

The most common data in business work is tabular data, such as order records, personnel information, sales contracts, etc., which is called structured data in professional terms. Excel is the most commonly used tool when dealing with tabular data. This is because excel provides easy-to-use formulas and user-friendly operations, and can complete most tasks. But sometimes it will be very tedious to do manual operations with excel when encountering some special and difficult operations, as well as repeated actions or the processing of large quantities of files. In this case, if you can program to solve the problems, you can greatly improve the work efficiency. This is also the reason why many business people are learning programming languages.

There are many programming languages in the market that business people can learn and use to process these tabular data. We choose four typical technologies to discuss: 1. Small database represented by MySQL; 2. VBA with Excel; 3. Python, which is very popular in recent years and highly praised by major training organizations; 4. The rising star esProc SPL. Now, let’s have a deep understanding of the differences in the installation, debugging and tabular data calculation of these programming languages to see which one is most suitable for business people to learn and use.

MySQL

Mysql, HSQLDB and SQLite are all small databases. These databases have low requirements on software and hardware environment, and can run on the desktop, which business people can learn and use theoretically. This paper takes MySQL as an example to discuss the characteristics of this programming language.

In terms of installation and configuration, although MySQL is much simpler than large databases, it is still too difficult for business people. MySQL’s own client is too simple and crude, so it is usually necessary to install additional third-party clients (such as Navicat and toad), which makes the installation of MySQL more difficult. What’s more difficult is that both the server and the client need to specify the database name, port, character set and other key configuration items. It is difficult for none professionals to understand these new concepts, which will also increase the difficulty of installation.

In terms of debugging, SQL does not advocate step-by-step and lacks the foundation of debugging. MySQL also does not provide special debugging functions. Simple statements are OK, but if they are slightly more complex, such as nested statements, debugging is very difficult. You can only manually split them into multiple statements and execute them separately, and then gradually narrow down the scope of the problem. The whole operation process is not only very cumbersome, but also requires a high level of skill, which is really too difficult for business people.

MySQL is also inconvenient for the calculation of tabular data. MySQL does not support the direct calculation of Excel files. Instead, it is necessary to export Excel to CSV, and then import the CSV into MySQL database table before calculation. The loading of the file is divided into four steps: switch database, create table, load data and create index. For example, to import emp.csv to MySQL, you need to enter the following codes on the command line:

mysql>use testdb;
mysql>create   table emp (
->   empid int(10) not null,
->   name varchar(50),
    -> deptid int(10),
    -> salary float,
    -> sex varchar(1),
    -> birthday date,
->   hireday date)CHARSET = utf8;
mysql>LOAD DATA   INFILE 'd:\data\emp.csv' INTO TABLE emp
   ->CHARACTER SET utf8
   ->FIELDS TERMINATED BY ','
   ->LINES TERMINATED BY '\r\n'
   ->IGNORE 1 LINES;
mysql>ALTER TABLE   emp ADD PRIMARY KEY (empid);

The process of importing files is not only cumbersome, but also to learn many new concepts. Most business people are blocked out of the door, so it is difficult to enter the subsequent calculation stage. If you can break through the barrier of previous importing files, the subsequent data calculation will be relatively easy, because SQL has a very long history and has accumulated a large number of teaching materials. For almost all common operations you can find corresponding examples. However, if the operations encountered are very complex, SQL will be difficult to use. For example, the common ordered operations, such as same period ratio and ranking, are not convenient to implement; SQL does not advocate step-by-step, which often leads to the writing of multi-layer nested statements, and this is a too high requirement for none professionals.

VBA

VBA is an official Excel development language provided by Microsoft. It can not only customize the freest operation interface, but also realize any kind of operation, including repeated and batch operations. In other words, VBA can do anything in theory.

In terms of installation and configuration, VBA is built into Excel, so there is no need for additional installation, which is the most friendly.

In terms of debugging, VBA has rich functions and is friendly to business people. VBA is a modern development language, and it advocates step-by-step calculation. The big goal can be divided into several small steps, which has a good debugging foundation. VBA also provides debugging functions such as breakpoint, run to cursor, step in, step over, etc. it can observe the calculation results of each step and the real-time status of each variable, which is helpful for developers to adjust their thinking and quickly find error codes.

However, in the calculation of tabular data, VBA performance is very poor, far inferior to MySQL. VBA can do anything means nothing is refined. VBA lacks basic operations designed specifically for tabular data. Common operations such as filtering, sorting and aggregation that can be completed in one statement in SQL require many lines of code to be written in VBA. It is not only too cumbersome, but also quite difficult for none professionals, which often becomes insurmountable obstacles.

Python

The biggest drawback of VBA is that there is no tabular data function library, and Python provides a wealth of such computing capabilities, which can just make up for the defects of VBA. For basic tabular data operations, such as filtering, sorting, aggregation, etc., Python usually only needs one or two lines of code, such as the following grouping aggregation:
data.groupby("deptid")['salary'].agg([len,   np.sum, np.mean])     

For accessing Excel files, Python only needs two lines compared with a dozen lines of code in SQL:

import pandas as pd
data =   pd.read_excel('d:/emp.xlsx')

Python installation and deployment is a bit difficult. When installing Python, users need to have the quality of professional programmers to make correct choices from a dozen options, otherwise they will easily lose functions, such as debugging functions will not be installed by default. Some options even need to be solved by downloading different packages, such as 32-bit / 64-bit. After installing Python itself, the above tabular data calculation cannot be performed. Pandas and Excel support library also need to be installed. This is done by the installation tool of the third-party function library. This installation tool runs in the command line, which is very unfriendly to business people, and the installation tool itself needs to be updated, and another installation command is required during the update. There are also some packaged third-party installation packages in the market, some of which can be easily installed with Python and pandas, but there are too many kinds out there, and it is difficult for none professionals to choose. From this perspective, Python is actually a tool for professionals.

In terms of debugging, Python, as a modern development language advocating step-by-step calculation, also has perfect debugging function. It should be noted that Python’s tabular data type dataFrame is provided by pandas, not Python’s native data type. During debugging, Python will present it as a messy multi-layer structure rather than a friendly table form. While debugging, printing variables manually in the “execute now” window can alleviate this problem to a certain extent, but it is far less convenient than the native data type.

Python code looks simple, but in fact it is quite difficult to learn, especially for business people. Python pandas is not designed for structured data. The dataFrame used to process tabular data is not a set composed of records as we normally understand, but a common concept of mathematicians, a matrix defined by multiple rows and columns. In fact, pandas does not have record data type, which leads to many calculations that need detours to think clearly. Although it seems that the code is not long, it is actually very difficult to think through it. For example, “get the previous row” should be interpreted as “shift the column down one row”. Secondly, Python has many data types related to tabular data. In addition to the dataFrame mentioned above, there are DataFrameGroupBy type after grouping and aggregation, as well as sequences and matrices. These are set data types in essence, but the operation mode is different. You can’t draw inferences from one instance. You have to look for examples every time you encounter them. If you don’t have a deep understanding of its internal structure (which is too hard for business people), getting it right or wrong often depends on luck. Although there are many training classes, the outcome is that you can only copy the code and change the examples. When there is no example to copy, many people will faint.

As mentioned earlier, Python is good at basic tabular data operations, which actually has another meaning: Python is more difficult to implement complex tabular data calculation, even if it is written, the code is no longer so concise. For a less complicated example, to calculate the longest consecutive rising days of a stock, the Python code is like this:

import pandas as pd
aapl =   pd.read_excel('d:/AAPL.xlsx')
continue_inc_days=0 ;   max_continue_inc_days=0
for i in   aapl['price'].shift(0)>aapl['price'].shift(1):
      continue_inc_days =0 if i==False else continue_inc_days +1
      max_continue_inc_days = continue_inc_days if max_continue_inc_days   < continue_inc_days else max_continue_inc_days
print(max_continue_inc_days)

esProc SPL

Similar to pandas, esProc SPL also has a set of tabular data function library. Unlike pandas, SPL is much less difficult to learn.

The basic data type(table sequence) of SPL is a set composed of records row by row, as we normally understand it. When using this data type to calculate tabular data, there is no need to detour, only need to follow the natural way of thinking, and the code is very short. SPL has few important data types, only record and set. Tabular data, arrays, and grouped data (sets of sets), these are essentially set, and the operation methods in SPL are completely consistent. As long as you learn one of them, you can draw inferences from it and master others.

In terms of basic operation, SPL and pandas are both easy to operate, and it seems that there is little difference between them. But when the problems are complex, it needs to combine several functions to solve, and some of them need to do calculations in the loop judgment. In this case, SPL is easier to achieve the goal than Python, and the code is shorter. For example, for the same calculation of the longest consecutive rising days of a stock, SPL implements the same idea with only two lines:


A

B

1

=file("d:/AAPL.xlsx").xlsimpor@t()

Read Excel file,the first row is title

2

=a=0,A1.max(a=if(price>price[-1],a+1,0))

Get the longest consecutive rising days


In terms of installation and deployment, although esProc is not like VBA that does not need to be installed, it is much simpler than MySQL and Python. In order to facilitate user operation, esProc specially simplifies the steps and supports one click installation. Even if the installation is customized, there are only simple choices such as deployment path and 32-bit / 64-bit, and the main functions are not missing.

In terms of debugging, esProc is similar to Python and has perfect debugging functions. More convenient than python, esProc supports debugging table data type (table sequence), which can easily observe the changes of tabular data, and can significantly improve the debugging efficiency. In addition, esProc uses cells to represent calculation steps or variables, which can be referenced by cell names without defining variables. This kind of development style similar to excel can significantly reduce the development / debugging threshold of business people.

Summary

Business people learn programming to improve their work efficiency. It's good to use tools, but if the tools themselves are more difficult, it's unnecessary. Therefore, these programming languages need to be not only easy to use, but also easy to learn, easy to install, easy to debug and so on.

After the above analysis and comparison, we can make a comparison table for these tools:


Installation

Debugging

Tabular data calculation

Learning

MySql

★★

★★★★

VBA

★★★★★

★★

Python

★★

★★★

★★★

★★★

esProc SPL

★★★★

★★★★★

★★★★★

★★★★


Note: the highest is five stars. The more stars, the more convenient.

It can be seen that, in contrast, esProc SPL is the most suitable programming language for business people. It is not only easy to learn, but also easy to implement complex tabular data calculation. It also provides one click installation and debugging functions for tabular data.