SPL CookBook

 

Preface

SPL (Structured Process Language) is the programming language of esProc – the professional data computing engine – uses. The language encapsulates a wealth of class libraries for structured data processing, and captures the essences of a variety of computing scenarios, including order-based, set-oriented computations, data structure conversion, special grouping operations, table association & comparison and big data computing. Its stepwise computation and intermediate result reuse mechanisms are particularly useful in simplifying the conventionally complicated code and making the language easy to learn. In short, SPL proves a handy and easy to use development tool.

It is much easier and more efficient to code data computing and processing in SPL than in SQL or Python. SPL offers ordered set type storage object whose members can reference an ordered set or a single record, which helps to implement complex computations simply; naturally supports temporary storage and the reuse of intermediate results, which simplifies complicated SQL statements; provides a rich set of functions for handling order-based merge and order-based association, saving programmers much effort in dealing with those complicated scenarios; and supplies access mechanisms including database cursor and file cursor and functions for multithreaded processing, letting programmers manipulate big data as easily as small amounts of data.

This book presents hundreds of commonly seen data processing tasks and their SPL code. The examples cover most of the scenarios data analysis may encounter. Mastering solutions of these tasks and learning to mix them well will enable you to achieve general data analysis tasks effortlessly.

Table of contents

Chapter 1 Search, Location and Filtering

1.1 Getting positions of members based on a specified condition

1.2 Getting members based on a specified condition

1.3 Multi-condition filtering

1.4 Getting the first record meeting the specified condition

1.5 Getting the record containing the minimum value of a specified field

1.6 Getting the record containing the maximum value of a specified field

1.7 Getting positions of members according to primary key values

1.8 Getting a member according to primary key value

1.9 Getting members according to primary key values

1.10 Top/Bottom N: getting values

1.11 Top/Bottom N: getting ordinal numbers of records

1.12 Top/Bottom N: getting corresponding records

1.13 Getting Top/Bottom N directly

1.14 Grouping: get top/bottom N from each group

1.15 Grouping: get top/bottom N without keeping the grouped subsets

1.16 Grouping: get groups according to the specified aggregate value

1.17 Grouping: filter records according to the specified aggregate value

1.18 Order-based computation: find ordinal number of the record containing the maximum value and perform cross-row computations

1.19 Order-based computation: filter by comparing with the next row

1.20 Order-based computation: filter in neighboring interval

1.21 Order-based computation: find the continuous interval having the specified feature

Chapter 2 Aggregation

2.1 Sum

2.2 Calculate the maximum/minimum value

2.3 Calculate average

2.4 Count

2.5 Logical AND

2.6 Logical OR

2.7 Distinct count

2.8 Calculate median

2.9 Grouping: sum

2.10 Grouping: get the maximum/minimum value

2.11 Grouping: calculate average

2.12 Grouping: Count

2.13 Grouping: logical AND

2.14 Grouping: logical OR

2.15 Grouping: distinct count

2.16 Grouping: calculate median

2.17 Iterative aggregation

2.18 Iterative aggregation: User-defined iteration termination criterion

Chapter 3 Set-oriented Computations

3.1 Get concatenation of two sets

3.2 Get intersection of two sets

3.3 Get union of two sets

3.4 Get difference of two sets

3.5 Get XOR of two sets

3.6 Use concatenation and difference together

3.7 Concatenate all set members in a sequence

3.8 Union all set members in a sequence

3.9 Get intersection of all set members of a sequence

3.10 Belong to & Contain: check whether an object is a member of a set

3.11 Belong to & Contain: check whether a set contains one or more certain objects

3.12 Belong to & Contain: check whether a set contains another set based on order

3.13 Belong to & Contain: find if a record exists by primary key values

3.14 Inter-sequence computations: compare sequences

3.15 Inter-sequence computation: perform alignment operations on members of multiple sequences

3.16 Inter-sequence computation: check whether two sequences are equal

3.17 Perform computation between a sequence and a single value

3.18 Merge-concatenate same-order sequences in the original order

3.19 Merge-union same-order sequences in the original order

3.20 Merge same-order sequences in the original order and calculate intersection

3.21 Merge same-order sequences in the original order and calculate difference

3.22 Merge same-order sequences in the original order and calculate XOR

3.23 Merge-concatenate multiple table sequences by primary key in order

3.24 Find difference between table sequences through MERGE

3.25 Union unordered table sequences through MERGE

3.26 Aggregation on sequences: union and difference

3.27 Aggregation on sequences: intersection

3.28 Perform set-oriented computations on two small data files

3.29 Perform complex set-oriented computations on multiple small data files

3.30 Concatenate two large tables through MERGE

3.31 Union two large tables through MERGE

Chapter 4 Ranking and Sorting

4.1 Sort simple members

4.2 Sort row-wise data by single field

4.3 Sort row-wise data by multiple fields

4.4 Sort row-wise data by expression

4.5 Sort records and get desired ones according to the original ordinal numbers

4.6 Sort according to specified order

4.7 Sort records with duplicate values by the specified order

4.8 Sort some data by the specified order and the rest by the original order

4.9 Shuffle the data

4.10 Sort data in each group

4.11 Sort groups

4.12 Get ranking of members (simple)

4.13 Rank row-wise data by a single field

4.14 Rank row-wise data by multiple fields

4.15 Rank row-wise data by expression

4.16 Concatenate members with tied ranks

4.17 Rank members in each group

4.18 Rank groups

Chapter 5 Order-based computations

5.1 Reference the previous/next row

5.2 Reference a neighboring interval

5.3 When data is unordered, find difference between the specified column and the column of the previous date in original order

5.4 Reference the previous/next row in each group

5.5 Locate a specified member and compare it with the neighboring row

5.6 Locate multiple specified members and compare each with the neighboring row

5.7 Iteration: count

5.8 Iteration: sum

5.9 Iteration: The early termination cumulation

5.10 Application: find the maximum number of days for continuously increasing duration

5.11 Application: Complex inter-row computations

Chapter 6 Grouping

6.1 Ordinary grouping: by field value

6.2 Ordinary grouping: by expression

6.3 Ordinary grouping: by ordinal numbers

6.4 Alignment grouping: keep one record at most for each group

6.5 Alignment grouping: keep all matching members for each group

6.6 Alignment grouping: put non-matcheding records to a separate group

6.7 Alignment grouping by ordinal number: keep one record at most for each group

6.8 Alignment grouping by ordinal number: keep all matching members for each group

6.9 Alignment grouping by ordinal number: overlapping groups

6.10 Enumerated grouping: put each member to the first matching group

6.11 Enumerated grouping: put non-matching members to a separate group

6.12 Enumerated grouping: overlapping groups

6.13 Group by segment: segment by field value

6.14 Group by segment: segment by expression

6.15 Order-based grouping: by continuous same value

6.16 Order-based grouping: by continuous same value – big data

6.17 Order-based grouping: by the neighboring condition

6.18 Order-based grouping: by the neighboring condition – big data

6.19 Order-based grouping: by ordinal numbers of groups

6.20 Grouped subsets: get subsets instead of aggregate values

6.21 Grouped subsets: filter grouped subsets by aggregate value

6.22 Grouped subsets: filter grouped subsets and group the selected subsets

6.23 Grouped subsets: perform inter-row computations in each grouped subset

Chapter 7 Association

7.1 Association query: join two tables on single fields while filtering joined records

7.2 Association query: join two tables on multiple fields while filtering joined records

7.3 Association query: multi-table join

7.4 Association query: left join on single fields

7.5 Association query: left join on multiple fields

7.6 Association query: full join

7.7 Association query: Align-join two tables on ordinal numbers

7.8 Association query: Align-join multiple tables on ordinal numbers

7.9 Association query: align-join on ordinal numbers to shuffle field values

7.10 Add records for missing values in the grouped and summarized result according to table association

7.11 Association on interval: write correspondence in a formula

7.12 Association on interval: using the table join

7.13 Cascade association

7.14 Convert foreign key values to matching records

7.15 Judge data existence through foreign key mapping

7.16 Judge data nonexistence through foreign key mapping

7.17 Merge-join tables of same order

7.18 Merge-join tables of same order – big data

7.19 Get Cartesian product with filtering condition

7.20 Multiply matrices using Cartesian product

7.21 Calculate Cartesian product using left join

7.22 Join query on large data table and large dimension table

7.23 Quick join query on small data table and large dimension table

7.24 Quick join query on same-order data table and large dimension table

7.25 Cross Apply operation

7.26 Outer Apply operation

7.27 Degenerate Apply operation to Cartesian product

7.28 Complex applications of Apply operation

Chapter 8 Data Structure Transformation

8.1 Row-to-column transposition

8.2 Column-to-row transposition

8.3 Bidirectional transposition

8.4 Dynamic row-to-column transposition

8.5 Row-to-column transposition through insertion, during which column names are dynamically generated

8.6 Multi-row to multi-row transposition

8.7 Row-to-column transposition with location-based value assignment

8.8 Transpose rows to columns while performing cross-column computations

8.9 Dynamic row-to-column transposition related to primary table – sub table join

8.10 Dynamic row-to-column transposition related to multi-table join

8.11 Place data in column groups

8.12 Perform transposition while handling data type inconsistency

8.13 Split and expand similar attributes in same row to multiple rows

Chapter 9 String

9.1 Concatenate strings in two columns

9.2 Concatenate string and values of other types

9.3 Concatenate members of a sequence

9.4 Enclose each string member with quotation marks during concatenation

9.5 Convert table sequence into CSV format

9.6 Split string into a sequence of characters

9.7 Split string into a sequence of words

9.8 Split string using tab as the delimiter

9.9 Split string using comma as the delimiter

9.10 Split a string into two parts with specified delimiter

9.11 Extract a string according to regular expression

9.12 Parse a string as a numerical value

9.13 Parse a percentage string as a numerical value

9.14 Automatically parse one string as appropriate data type

9.15 Parse while splitting a string

9.16 Parse string as table

9.17 Parse key values pairs from Base64 encoded string

9.18 Parse table field values according to a regular expression

9.19 Parse text where line count in a record varies according to a regular expression

9.20 Parse table field values using code

9.21 Change SQL filtering condition

9.22 Translate standard SQL statement into syntax of the specified database

9.23 Parse HTML file and analyze data

9.24 Parse HTML file and generate a table sequence

Chapter 10 Date/time/datetime-related computations

10.1 Get the date N days after the date specified

10.2 Calculate number of days between two dates

10.3 Calculate the number of seconds/minutes between two datetime values

10.4 Get the first date and last date of a week

10.5 Calculate average daily sale amount in a quarter

10.6 Calculate age

10.7 Get the date N months before

10.8 Get the date after n workdays

10.9 Get a sequence of workdays

10.10 Get a sequence of dates

10.11 Get equal time intervals

Chapter 11 Recursion

11.1 Recursively find single references

11.2 Recursively find all references in loop

11.3 Recursively find references until reach the specified value

11.4 Find leaf-level records

11.5 Recursively get field values

11.6 Recursively concatenate field values

11.7 Traverse all files in a directory

11.8 Traverse directory and summarize files

11.9 Search for the upper-level reference

11.10 Search for records containing the specified value on the reference chain and list their upper-level reference

11.11 Search for all upper-level references

11.12 Recursion: Tower of Hanoi problem

11.13 Recursion: The pirate game problem

Chapter 12 Complex query

12.1 Judge belongingness of sets

12.2 Judge belongingness of a large set

12.3 Judge existence of foreign key mapping

12.4 Judge existence of non-foreign key matching

12.5 Speed up computations of judging existence of non-foreign key matching

12.6 Judge existence of composite foreign key mapping

12.7 An example of simplified self-join

12.8 Judge nonexistence of foreign key mapping

12.9 Judge nonexistence of data matching

12.10 An example of simplified SQL double negation

12.11 Judge existence of data matching

12.12 Compare with all values in the result of subquery