SPL Simplified SQL Case Details - Grouping & Join

 

  In the development of database application, we often need to face complex SQL computing, such as join computing in Multi-tier grouping. In SQL, grouping must be aggregated at the same time, and object-based associative access can not be performed, so dealing with this kind of problem will be more complex, which can only be achieved by advanced techniques such as nesting multi-layer sub-queries with window functions. The SPL introduced in this article can support real grouping and provide intuitive object-based associative access, which makes it easier to solve such problems.

  Grouping and join are often used in practical business. Here is a general example based on actual business to illustrate the specific process of SPL implementation of grouping and join.
  Computing objective: To query out-of-stock DVD stores, that is, the stores whose existing DVD copies are less than four categories.

  Data structure:
      Branch table, which stores DVD branch information
      DVD table, which stores the title and category information of DVD. DVD is virtual data. For example, “Transformers 4” is a DVD, but it is not a visible CD-ROM.
      DVD Copy table, which stores multiple copies of DVD. DVD copies are real CD-ROMs and are stored in various branches in physical form. Note: DVDCopy table is associated with Branch table using BranchID field, and with DVD table using DVDID field.

  Following are some data examples:

  Branch table

BID

Street

City

B001

street1

New York

B002

street2

Houston

B003

street3

LA

B004

street4

Lincoln

  DVD table

DVDID

Category

Title

D001

science fiction

Transformers IV

D002

science fiction

Transformers II

D003

science fiction

Guardians of the Galaxy

D004

act

The Expendables III

D005

sport

Need for Speed

D006

feature

Grace of Monaco

  DVDCopy table

CopyID

DVDID

BID

Status

LastDateRented

LastDateReturned

MemberID

C000

D001

B001


7/10/2014

7/13/2014

M001

C001

D004

B001


7/10/2014

7/13/2014

M001

C002

D001

B001


7/10/2014


M001

C003

D005

B001


7/10/2014

7/13/2014

M003

C004

D006

B001


7/10/2014

7/13/2014

M003

C005

D005

B002


7/10/2014

7/13/2014

M003

C006

D002

B002


7/10/2014

7/13/2014

M006

C007

D002

B002


7/10/2014

7/13/2014

M007

C008

D001

B002


7/10/2014

7/13/2014

M008

C009

D004

B002


7/10/2014

7/13/2014

M009

C010

D005

B002


7/10/2014

7/13/2014

M010

C011

D006

B002

Miss

7/10/2014

7/13/2014

M010

C000

D001

B003


7/10/2014

7/13/2014

M001

C001

D004

B003


7/10/2014

7/13/2014

M001

C002

D001

B003

Miss

7/10/2014


M001

C003

D005

B003


7/10/2014

7/13/2014

M003

  Illustration:
    1.   The calculation results should be some records in the Branch table.
    2.   If the Status field in DVDCopy table is “Miss”, then the disc is lost. If the LastDateReturned field is empty, the disc loan has not been returned. Obviously, the lost or unreturned discs are not within the calculation range and should be filtered out.
    3.   Consideration should be given to the fact that some branches may not have records in DVDCopy table, although this is rare.

  Resolving ideas:
    1.   Filter out the existing DVD copies from the DVDCopy table (no loss or loan)
    2.   Grouping DVDCopy table according to BID, each group is all DVD copies of a store.
    3.   Find out the DVDs that correspond to the DVD copies in each store, and then calculate the number of categories of DVDs.
    4.   Search for stores with DVD categories less than 4, and such stores meet the requirements.
    5.   Find out stores that do not appear in the DVDCopy table. Such stores also meet the requirements.
    6.   Merge the two types of stores that meet the requirements
 
  SPL codes


A

1

=Branch=db.query("select * from   Branch")

2

=DVD=db.query("select * from   DVD")

3

=DVDCopy=db.query("select * from   DVDCopy")

4

=DVDCopy.switch(DVDID,DVD:DVDID;   BID,Branch:BID)

5

=DVDCopy.select(STATUS!="Miss"   && LASTDATERETURNED!=null)

6

=A5.group(BID)

7

=A6.new(~.BID:BonList,   ~.(DVDID).id(CATEGORY).count():CatCount)

8

=A7.select(CatCount<4)

9

=A8.(BonList) | (Branch \ A7.(BonList))

10

>file("shortage.xlsx").xlsexport@t(A9)

  A1-A3: Retrieve data from the database, named as variables Branch, DVD, DVDCopy respectively. The results are as follows:

  undefined
  
  


  A4=DVDCopy.switch(DVDID,DVD:DVDID; BID,Branch:BID)

  Use function switch, switch the DVDID field in DVDCopy table to the corresponding records in DVD table, and switch the BID field to the corresponding records in Branch table. This step is the basis for the object-based associative access, and the results of DVDCopy after calculation are as follows:

  undefined

  The light blue font indicates that the field corresponds to a record, which can be viewed after clicking, as follows:
  undefined

  At this point, only the operator "." need be used for the object-based associative access. For example, DVDCopy.(DVDID).(CATEGORY) represents the corresponding DVD category for each DVD copy. DVDCopy.(BID) represents the detailed information of a branch(complete record) corresponding to a DVD copy.

 

  A5=DVDCopy.select(STATUS!="Miss" && LASTDATERETURNED!=null)

  This code is used to filter data, that is, lost and unreturned DVD copies are not within the scope of calculation, the A5 value after filtering is as follows:

  undefined

  A6=A5.group(BID)

  The above code is used to group data in A5 according to BID, each line representing all DVD copies of a store, as follows:
  undefined

  Click on the light blue font to see the members of the group:

  undefined

  As you can see, the function group only groups the data, and doesn’t perform the aggregate calculation at the same time, which is different from the grouping function in SQL. When we need to further process the grouped data rather than simply aggregate, it is more convenient to use the group function of SPL, such as the code in A7.

 

  A7=A6.new(~.BID:BonList, ~.(DVDID).id(CATEGORY).count():CatCount)

  The above code is used to calculate the number of DVD categories that corresponds to each branch. Function new can generate new object A7 based on the data in A6.  A7 has two columns: BonList and CatCount. BonList comes directly from the BID column of data in group A6, and CatCount comes from the DVDID column of data in group A6. CatCount's algorithm is divided into three parts: ~.(DVDID) find the DVD records that all DVD copies in each store correspond to; id(CATEGORY) remove duplicate Category from these DVD records; count() calculate the number of Categories. The result is as follows:

  undefined

  That is: Branch B002 has 3 categories of DVD copies, branch B003 has 3 categories of DVD copies, and branch B001 has 4 categories of DVD copies.

 

  A8A7.select(CatCount<4)

  The above code executes the query and finds stores with CatCount less than 4. The results are as follows:undefined

  The stores that are out of stock are calculated on the basis of DVDCopy tables. But some stores that are seriously out of stock may not appear on DVDCopy table. For example, all the DVD copies of the store have been lent out, or the store has no DVD copies at all, so it is necessary to merge these stores. The code is as follows:

  A9=A8.(BonList) | (Branch \ A7.(BonList))

  In the above code, the operator "|" denotes the union of two datasets (which can be replaced by a union function), and the operator "\" denotes the difference set calculation (which can be replaced by a diff function). A8.(BonList), Branch, A7.(BonList) represents respectively: stores out of stock in DVDCopy table, all the stores, stores that appear in DVDCopy table. The values are respectively:

  undefined

  A9 is the final result of this case, and its value is:

  undefined

  A10>file("shortage.xlsx").xlsexport@t(A9)

  Finally, the result is exported to the excel file shortage.xlsx. Open the file to see the results as follows:

  

  From this example, we can see that SQL lacks an explicit set and cannot represent data sets with variables such as A8 or Branch, so the short SPL code mentioned above must be implemented with several lengthy SQL.

  In addition, SPL can be invoked by reporting tools or Java programs. The method of invoking SPL is similar to that of ordinary database. The JDBC interface provided by SPL can return the result of calculation in ResultSet form to the main program of java. Specific methods can refer to relevant documents. 【How to call an SPL script in Java