RaqForum 23 No.
1,121 View •
Simplifying Join Syntax: The Significance
By rethinking and redefining the equi-joins, we are able to simplify the join syntax, creating easy to write and understand queries. We offer three solutions – foreign key attributization, interconnection of homo-dimension tables, and integration of the parent and child tables – to eliminate the explicit JOIN actions, making queries more conform to human thinking. The dimension alignment join syntax frees programmers from taking care of relationships between tables to let them create simple queries.
What’s more, the simplified join syntax helps avoid mistakes.
SQL allows writing a join condition using the WHERE clause (which goes to the conventional Cartesian product-based join definition), and many programmers are accustomed to using the clause. There isn’t any issue when there are only two or three tables to be joined, but it is very likely that a join condition will be missed out if there are about a dozen of tables. The omission will cause a many-to-many complete cross join while the SQL statement is executing properly, leading to a wrong result (As previously mentioned, it is most probably that the SQL statement is wrong when a many-to-many join occurs), and a crashed database because of the quadratic scale of the Cartesian product if the table over which the join condition is missed out is too big.
We won’t miss out any join condition by adopting the simplified join syntax. That’s because the syntax isn’t Cartesian product-based and denies the significance of the many-to-many relationship. With the syntax, the complete cross product is impossible.
SQL uses subqueries to handle a JOIN where the child tables need to be first grouped before aligning to the parent table. With only one child table, we can JOIN tables first and then perform GROUP without using a subquery. Habitually and in an effort to avoid the subqueries, some programmers apply this rule to a JOIN with multiple child tables by first writing JOIN and then GROUP, only to get a wrong result.
The dimension alignment join syntax will save programmers from this kind of mistake. It handles any number of child tables in the same way without using subqueries.
The greatest impact in redefining JOIN operations will be the realization of associated queries.
Now the agile BI products are popular. Vendors offer products claiming that they enable the frontline staff to perform query reporting through drag and drop. The real-life effect, however, is far below expectation. IT resources are still a necessity. The reason behind this is that most of the business queries involve procedural computations that the non-programmers can’t handle by themselves. Yet there are about 30% of the business queries that are non-procedural but still can’t be dealt with without the help of IT department.
This is because most of the agile BI products (and the once popular OLAP products many years ago) don’t support associated queries. The querying model is pre-constructed by IT professionals, and the frontline staff makes queries on the interface based on the model. The modelling is in fact the construction of a logical or physical single table, within which the querying and analysis is performed. The interface may be smooth and gorgeous, but the data retrieval is limited in the pre-constructed single table. Once a query exceeds the data range the single table covers to associate data in another table, IT pros are needed to re-construct the model. Different queries need different models, which is on-demand modelling. Most of the business queries involve data association, but IT pros can’t predict all querying needs and it is impossible to include all the possibilities in a single table even they can predict. The result is either frequent re-modelling or the inability of the BI products. Both make a BI product lose its intended agility.
Why can’t these BI products support associated queries? It isn’t easy to offer the support. The root of the difficulty lies in the SQL definition of JOINs. Simple as it is, it instead makes the association operation so complicated that the operation is completely beyond the frontline staff. Moreover, it is impossible to expose the data structure to them to perform the joins independently. Some BI products, with the improvements on interface, enable users to handle the non-circular joins and the joins where one table hasn’t same dimension fields. For a complete self-join (a circular join) or a join where one table contains multiple same dimension fields, the remodeling is still needed. We’ll explain this in detail when discussing the concept of dimension.
A different angle in looking at the JOIN operations will solve the problem of the associated queries at the root. The three join types we summed up and the corresponding solutions treat the multi-table associations as queries over one table, which, for the frontline staff, is easy to understand, though the attributes of a table (fields) become a little complicated. An attribute may have the sub-attributes (the foreign key table fields pointed by a foreign key), and a sub-attribute may also have the sub-attributes (with multi-layer foreign key tables). Sometimes the field values are sets instead of single values (when a child table is regarded as the field of the parent table). The perspective makes a self-join easy to understand (the last example in the previous article involves a self-join), and a join where one table having same dimension fields not an issue any more (because each attribute has its own sub-attributes). With this association mechanism, IT pros only need to define the data structure (metadata) once. The frontline staff can then perform joins independently without turning to the IT department. Re-modelling is needed only when a change to the data structure happens, rather than whenever a new associated query need appears. That constitutes the non-on-demand modelling.