RaqForum 23 No.
1,123 View •
Simplifying Join Syntax by Eliminating JOINs
In the previous article, we divide equi-joins into three types. The division emphasizes their distinctions, enabling us to make most use of the features of each type.
1. The attributization of the foreign key
We’ll look at it through an example. Here are two tables:
The employee table, with the following fields:
The department table, with the following fields:
Both tables use the id field as the primary key. The department field of the employee table is the foreign key pointing to the department table. The manager field of the department table is the foreign key pointing to the employee table. This is the conventional table structure.
We might want to find the US employees who work under a Chinese manager.
Following is the SQL way of doing it:
SELECT A.* FROM employee A JOIN department B ON A.department=B.id JOIN employee C ON B.manager=C.id WHERE A.nationality=‘US’ AND C.nationality=‘Chinese’
The statement is long. Since the employee table needs to be JOINed twice, it should be given an alias to distinguish them.
We can code the query in another way:
SELECT * FROM employee WHERE nationality=‘US’ AND department.manager.nationality=‘Chinese’
Of course this isn’t a standard SQL statement.
The code highlighted in red in the second statement means “the nationality of the manager of the department where the employee in the current record works”. We treat the foreign key field as an object, and the fields in the foreign key table as the attributes of the foreign key field. Thus department.manager is“the manager of the department where an employee works”. Since the manager field is the foreign key in the department table, the fields of the foreign key table to which it points can be also treated as the field’s attributes, which generates the code department.manager.nationality – “the nationality of the manager of the department where an employee works”.
Obviously the object-oriented thinking is more natural and intuitive than that based on Cartesian product plus filtering. The foreign-key-oriented joins don’t perform multiplication between two tables. The foreign key field is used to find the corresponding records in the foreign key table, without being involved in the Cartesian product operation having multiplication property.
As previously stipulated, the related field in a dimension table for a foreign-key-oriented join must be the primary key. So, one foreign key value matches one record in the foreign table. The department.manager.nationality is unique for each record of the employee table without any ambiguity. Without the stipulation, a many-to-many correspondence may occur. The department.manager.nationality cannot be uniquely defined.
The object-oriented syntax is common in high level languages for structured programming (like C language and Java), which stores data in objects. Though the values of department field in the employee table are numbers, they essentially represent objects. In many data tables, the primary key values haven’t substantial meanings, but are only used to identify the records. Similarly, the foreign key field is just for locating the matching records in its foreign key table. If the foreign key values are directly stored in objects, numbers will become unnecessary. But as SQL lacks discreteness, the object storage is impossible and numbers are still needed.
In a foreign-key-oriented relationship, the fact table and the dimension table are not in an equal position. We can look up a field in a dimension table based on the fact table, but not vice versa.
**2. The interconnection of homo-dimension tables **
It’s simpler to perform a join between homo-dimension tables. Considering the following two tables:
The employee table, with the following fields:
The manager table, with the following fields:
Both tables use the id field as their primary key. Managers are also employees, so the two tables share the ids. Since they have more attributes than others, their information is stored in another table.
We might want to find the total income (including the allowance) of every employee (including every manager).
A JOIN operation is necessary for SQL to do it:
SELECT employee.id, employee.name, employee.salary+manager.allowance FROM
employee LEFT JOIN manager ON employee.id=manager.id
But for two tables having a one-to-one relationship, we can treat them like one table:
SELECT id,name,salary+allowance FROM employee
According to the stipulation, homo-dimension tables are joined according to the primary keys. Records with same primary key values correspond exclusively to each other. The expression “salary+allowance” is uniquely computed over each record of the employee table, having no possibility of causing ambiguity.
Homo-dimension tables are equal. One can refer a field of another.
**3. The integration of the parent and child tables **
A typical example of the parent and child tables is the orders table and the order detail table, like the following two tables:
The Orders table, with the following fields:
The OrderDetail table, with the following fields:
The Orders table’s primary key is id field. The OrderDetail table’s primary key is id and no fields. The former is a part of the latter.
We might want to know the total amount of every order.
The following is the SQL code:
SELECT Orders.id, Orders.customer, SUM(OrderDetail.price)
FROM Orders JOIN OrderDetail ON Orders.id=OrderDetail.id
GROUP BY Orders.id, Orders.customer
SQL needs a GROUP BY to reduce the records with same ids produced by JOIN operation.
If we treat the child table OrderDetail as a field of the parent table Orders, with the field values being the records in the child table matching the primary key of the parent table, the JOIN and GROUP BY won’t be necessary:
SELECT id, customer, OrderDetail.SUM(price) FROM Orders
Unlike a conventional field, the values of OrderDetail field are sets, because the relationship between a parent table and its child table is one-to-many. Here an aggregation is performed over each set type value to get a single value.
This perspective on associating tables makes the query easy to write and understand, as well as less error-prone.
Suppose the Orders table has another child table that records the status of payments:
The OrderPayment table, with the following fields:
We might want to find the orders that haven’t yet been fully paid, or whose accumulated payments are less than the total amount.
We shouldn’t simply JOIN the three tables. A many-to-many relationship will happen between the OrderDetail table and the OrderPayment table and the result will be wrong (Try thinking about the high possibility of error occurrence by performing a many-to-many join mentioned in the previous article). The right way is to GROUP the two tables separately and JOIN each grouped result with the Orders Table. The query will be complicated.
If we treat each child table as a field of the parent table, the query will be simple and easy:
SELECT id,customer,OrderDetail.SUM(price) x, OrderPayment.SUM(amount) y FROM Orders WHERE x>y
This way, the error caused by a many-to-many relationship can be avoided.
The parent table and its child table are not equal. But a two-way reference is allowed. The above is the case of referencing records of the child table from the parent table. The reference in an opposite direction is similar to that from a foreign key table.
By treating a multiple table related operation as a relatively complex single table operation, we abandon the Cartesian product to look at the JOIN operations from a different perspective. The new approach almost eliminates associating actions from the most common equi-join operations, and even the JOIN key word from the syntax, creating simple and easy to understand queries.