What kind of OLAP do we need?
The word OLAP literally means online analysis, which means that personnel perform various interactive analysis operations on data.
However, the current concept of OLAP has been severely narrowed down by BI software. When it comes to OLAP on business analysis, it often only has the function of multidimensional analysis in technology, which is to summarize a pre-built data cube according to the specified dimension level and present it as a table or graph, supplemented by operations such as drilling, aggregation, rotation, slicing, etc. to change the dimension level and summary range. These are familiar to everyone, so we won’t go into detail anymore.
Is multidimensional analysis the entirety of online analysis?
Let’s examine this data analysis process.
Practitioners with years of work experience in any industry generally have some speculations about the business they are engaged in, such as:
- Stock analysts speculate that stocks that meet certain conditions are likely to rise;
- The company manager has a clear idea of which salespeople are good at dealing with difficult customers;
- The teacher in charge also has a general understanding of the characteristics of the grades of students who are biased towards certain subjects;
- …
These speculations are the foundation of the prediction. After running the business system for a period of time, a large amount of data will accumulate, and these speculations are likely to be verified by these accumulated data. If confirmed, they can be used as a regular conclusion to guide the next step of action. If falsified, they can be guessed again.
This is what online analysis should do! The basic action is to guess and verify, with the aim of finding patterns or supporting arguments from historical data to support certain conclusions. And what online analysis software needs to do is to help users verify guesses based on data.
It should be noted that these speculations were made by people with business experience, not software systems! The reason why it is necessary to be online is because many speculations are made by users after seeing an intermediate result. It is impossible and unnecessary to design a complete end-to-end path in advance, which means modeling is impossible.
Technically, it is necessary to enable users to have the ability to flexibly and interactively query and calculate data. For example, based on the examples given above, the calculation that the user needs to implement may be as follows:
- What is the ratio of stocks that have risen continuously for three days this month and continue to rise on the fourth day?
- Which customers who do not issue orders for six months will issue orders six months after changing sales personnel?
- What is the ranking of science scores for students who rank in the top 10 in both English and mathematics?
- …
Obviously, the above questions can be answered by calculating historical data, but can multidimensional analysis techniques do that?
I’m afraid not!
Multidimensional analysis has two technical shortcomings: firstly, the cube needs to be prepared in advance, and users usually do not have the ability to instantly design and modify the cube. Once there are new analysis requirements, the cube must be rebuilt; The second is that the analysis actions that can be conducted on the cube are monotonous, with only a few types such as drilling, aggregation, slicing, and rotation, making it difficult to implement complex computational behaviors with multiple steps. In recent years, popular agile BI products have greatly improved the smoothness of operation and the attractiveness of the interface compared to early OLAP products. However, the essential computing functions have not increased much, and they are still conducting multidimensional analysis. What cannot be calculated earlier still cannot be calculated.
Multidimensional analysis can indeed provide some useful information, such as the frequently cited examples, which can accurately pinpoint which department and business caused it when the cost is too high. However, multidimensional analysis cannot obtain the regular conclusions we hope to obtain from the data in the aforementioned examples, as only with regular conclusions can we predict and guide work. In this sense, understanding online analysis solely as multidimensional analysis is incomplete.
Then, what kind of OLAP/BI software should be used for pattern discovery (more precisely, pattern validation)?
As mentioned earlier, from a technical perspective, pattern verification can be seen as a process of querying and computing data, and its key point is that this process can be freely defined by analysts, which means that OLAP/BI software should have the function of enabling business personnel to independently implement interactive calculations.
After obtaining the data, it is the calculation. The characteristic of this calculation is that the next action needs to be instantly determined based on the results of the previous step, and the process cannot be designed in advance, so it must be interactive, similar to the mode of a calculator. In addition, the data to be calculated here is batch structured data, not simple numerical values. In difference to ordinary numerical calculators, this function can be vividly referred to as a data table calculator.
Excel has this ability to a certain extent, and in fact, Excel has become the most widely used desktop analysis tool. However, Excel may not be able to handle complex data operations and repetitive actions. For example, the calculations in the earlier examples are not easily implemented directly in Excel.
At this point, it is necessary to leverage the power of programming. Programming languages that support steps can be used to write very complex calculations. Unfortunately, there are not many suitable programming languages available. As a built-in programming language in Excel, VBA can naturally run in Excel, but VBA is not a set-oriented syntax, with high programming complexity and limited proficiency in handling structured data. As for Python, as we have mentioned before, it only looks beautiful but is actually difficult for most people to learn, and it can only run outside of Excel, which is also very inconvenient.
esProc SPL may be the only programming language suitable for Excel analysts in the industry. SPL has powerful structured data processing capabilities, and in particular, SPL also provides Excel plugins, allowing users to directly use SPL code in Excel to implement complex calculations that are difficult to implement in Excel.
Programming has a certain threshold, and some business analysts may not be able to master programming, so these problems can be solved through the cooperation of technical personnel. In this case, the task of OLAP software is not to enable business personnel to perform process calculations themselves, but to improve the efficiency of business personnel in obtaining technical resources and the development efficiency of technical personnel in meeting requirements.
Specifically, there are two aspects: one is to establish a historical problem library, where some previously solved problems can be directly executed by business personnel by calling algorithms to change parameters; Even for new requirements, similar issues can be found to assist technical personnel in accurately understanding. Inconsistent understanding between technical and business personnel is one of the main factors causing task delays; The second is to provide efficient and manageable development techniques, allowing technicians to quickly write and modify computational code, and store this code in a historical algorithm library for safekeeping and re execution.
However, for this matter, there are also not many suitable technologies in the industry, and SQL has good manageability, but it is cumbersome to write and difficult to handle procedural calculations; The stored procedure needs to be recompiled and is not convenient to execute again; Java code also needs to be recompiled and is basically unmanageable; Script languages like Python have poor integration and version consistency, making it difficult to manage and execute on a large scale.
For this scenario, esProc SPL is also a better choice. SPL has powerful functions, high development efficiency, and is also suitable for big data. Scripted code is also easy to store, manage, and reuse.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version