" In the development of database application, we often need to face complex SQL computing, such .."

pjf RaqForum 22 No.
1 Reply • 642 View • 4 Years ago

SPL Simplified SQL Case Details - Grouping & Join

In the development of database application, we often need to face complex SQL computing, such as join computing in Multi-tier grouping. In SQL, grouping must be aggregated at the same time, and object-based associative access can not be performed, so dealing with this kind of problem will be more complex, which can only be achieved by advanced techniques such as nesting multi-layer sub-queries with window functions. The SPL introduced in this article can support real grouping and provide intuitive object-based associative access, which makes it easier to solve such problems.

Grouping and join are often used in practical business. Here is a general example based on actual business to illustrate the specific process of SPL implementation of grouping and join.
Computing objective: To query out-of-stock DVD stores, that is, the stores whose existing DVD copies are less than four categories.

Data structure:
• Branch table, which stores DVD branch information；
• DVD table, which stores the title and category information of DVD. DVD is virtual data. For example, “Transformers 4” is a DVD, but it is not a visible CD-ROM.
• DVD Copy table, which stores multiple copies of DVD. DVD copies are real CD-ROMs and are stored in various branches in physical form. Note: DVDCopy table is associated with Branch table using BranchID field, and with DVD table using DVDID field.

Following are some data examples:

Branch table：

BID	Street	City
B001	street1	New York
B002	street2	Houston
B003	street3	LA
B004	street4	Lincoln

DVD table：

DVDID	Category	Title
D001	science fiction	Transformers IV
D002	science fiction	Transformers II
D003	science fiction	Guardians of the Galaxy
D004	act	The Expendables III
D005	sport	Need for Speed
D006	feature	Grace of Monaco

DVDCopy table：

CopyID	DVDID	BID	Status	LastDateRented	LastDateReturned	MemberID
C000	D001	B001		7/10/2014	7/13/2014	M001
C001	D004	B001		7/10/2014	7/13/2014	M001
C002	D001	B001		7/10/2014		M001
C003	D005	B001		7/10/2014	7/13/2014	M003
C004	D006	B001		7/10/2014	7/13/2014	M003
C005	D005	B002		7/10/2014	7/13/2014	M003
C006	D002	B002		7/10/2014	7/13/2014	M006
C007	D002	B002		7/10/2014	7/13/2014	M007
C008	D001	B002		7/10/2014	7/13/2014	M008
C009	D004	B002		7/10/2014	7/13/2014	M009
C010	D005	B002		7/10/2014	7/13/2014	M010
C011	D006	B002	Miss	7/10/2014	7/13/2014	M010
C000	D001	B003		7/10/2014	7/13/2014	M001
C001	D004	B003		7/10/2014	7/13/2014	M001
C002	D001	B003	Miss	7/10/2014		M001
C003	D005	B003		7/10/2014	7/13/2014	M003

Illustration:
1.   The calculation results should be some records in the Branch table.
2.   If the Status field in DVDCopy table is “Miss”, then the disc is lost. If the LastDateReturned field is empty, the disc loan has not been returned. Obviously, the lost or unreturned discs are not within the calculation range and should be filtered out.
3.   Consideration should be given to the fact that some branches may not have records in DVDCopy table, although this is rare.

Resolving ideas:
1.   Filter out the existing DVD copies from the DVDCopy table (no loss or loan)
2.   Grouping DVDCopy table according to BID, each group is all DVD copies of a store.
3.   Find out the DVDs that correspond to the DVD copies in each store, and then calculate the number of categories of DVDs.
4.   Search for stores with DVD categories less than 4, and such stores meet the requirements.
5.   Find out stores that do not appear in the DVDCopy table. Such stores also meet the requirements.。
6.   Merge the two types of stores that meet the requirements

SPL codes：

	A
1	=Branch=db.query("select * from Branch")
2	=DVD=db.query("select * from DVD")
3	=DVDCopy=db.query("select * from DVDCopy")
4	=DVDCopy.switch(DVDID,DVD:DVDID; BID,Branch:BID)
5	=DVDCopy.select(STATUS!="Miss" && LASTDATERETURNED!=null)
6	=A5.group(BID)
7	=A6.new(~.BID:BonList, ~.(DVDID).id(CATEGORY).count():CatCount)
8	=A7.select(CatCount<4)
9	=A8.(BonList) \| (Branch \ A7.(BonList))
10	>file("shortage.xlsx").xlsexport@t(A9)

A1-A3: Retrieve data from the database, named as variables Branch, DVD, DVDCopy respectively. The results are as follows:

undefined

A4：=DVDCopy.switch(DVDID,DVD:DVDID; BID,Branch:BID)

Use function switch, switch the DVDID field in DVDCopy table to the corresponding records in DVD table, and switch the BID field to the corresponding records in Branch table. This step is the basis for the object-based associative access, and the results of DVDCopy after calculation are as follows:

undefined

The light blue font indicates that the field corresponds to a record, which can be viewed after clicking, as follows:
undefined

At this point, only the operator "." need be used for the object-based associative access. For example, DVDCopy.(DVDID).(CATEGORY) represents the corresponding DVD category for each DVD copy. DVDCopy.(BID) represents the detailed information of a branch(complete record) corresponding to a DVD copy.

A5：=DVDCopy.select(STATUS!="Miss" && LASTDATERETURNED!=null)

This code is used to filter data, that is, lost and unreturned DVD copies are not within the scope of calculation, the A5 value after filtering is as follows:

undefined

A6：=A5.group(BID)

The above code is used to group data in A5 according to BID, each line representing all DVD copies of a store, as follows:
undefined

Click on the light blue font to see the members of the group:

undefined

As you can see, the function group only groups the data, and doesn’t perform the aggregate calculation at the same time, which is different from the grouping function in SQL. When we need to further process the grouped data rather than simply aggregate, it is more convenient to use the group function of SPL, such as the code in A7.

A7：=A6.new(~.BID:BonList, ~.(DVDID).id(CATEGORY).count():CatCount)

The above code is used to calculate the number of DVD categories that corresponds to each branch. Function new can generate new object A7 based on the data in A6. A7 has two columns: BonList and CatCount. BonList comes directly from the BID column of data in group A6, and CatCount comes from the DVDID column of data in group A6. CatCount's algorithm is divided into three parts: ~.(DVDID) find the DVD records that all DVD copies in each store correspond to; id(CATEGORY) remove duplicate Category from these DVD records; count() calculate the number of Categories. The result is as follows:

undefined

That is: Branch B002 has 3 categories of DVD copies, branch B003 has 3 categories of DVD copies, and branch B001 has 4 categories of DVD copies.

A8：A7.select(CatCount<4)

The above code executes the query and finds stores with CatCount less than 4. The results are as follows:

The stores that are out of stock are calculated on the basis of DVDCopy tables. But some stores that are seriously out of stock may not appear on DVDCopy table. For example, all the DVD copies of the store have been lent out, or the store has no DVD copies at all, so it is necessary to merge these stores. The code is as follows:

A9：=A8.(BonList) | (Branch \ A7.(BonList))

In the above code, the operator "|" denotes the union of two datasets (which can be replaced by a union function), and the operator "\" denotes the difference set calculation (which can be replaced by a diff function). A8.(BonList), Branch, A7.(BonList) represents respectively: stores out of stock in DVDCopy table, all the stores, stores that appear in DVDCopy table. The values are respectively:

undefined

A9 is the final result of this case, and its value is:

undefined

A10：>file("shortage.xlsx").xlsexport@t(A9)

Finally, the result is exported to the excel file shortage.xlsx. Open the file to see the results as follows:

From this example, we can see that SQL lacks an explicit set and cannot represent data sets with variables such as A8 or Branch, so the short SPL code mentioned above must be implemented with several lengthy SQL.

In addition, SPL can be invoked by reporting tools or Java programs. The method of invoking SPL is similar to that of ordinary database. The JDBC interface provided by SPL can return the result of calculation in ResultSet form to the main program of java. Specific methods can refer to relevant documents. 【How to call an SPL script in Java】

SPL Official Website 👉 http://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc

SPL Learning Material 👉 http://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/ydhVnFH9

Youtube 👉 https://www.youtube.com/@esProc_SPL

esProc