Performance optimization skill: Joining Small Fact Data and Big Dimension Data

 

I Problem introduction & Solution

Sometimes when we handle a join between a table and its subtable we find that while the filtered fact data can be wholly loaded into the memory or just slightly exceeds the memory capability, the dimension table being joined contains a large volume of data, far larger than the available memory space. If the latter stores data by the key and since the number of its records corresponding to the fact records is relatively small, we can locate them all at once using binary search  instead of traversing all records in the dimension table like what HASH algorithm does. This is more efficient.

Thats the algorithm Structured Process Language (SPL) uses in dealing with the situation. It gets all join key values from the fact tables, sorts them and reads as many of the key values as possible at a time to match the dimension table and find the eligible records. It’s the searching that takes up the lions share of the time spent in doing the join whereas the real joining actions are not time-consuming. Well test it using both single thread and multithreads (as multithreaded processing is not sensitive to the algorithm in performance increasing) and compare it with the HASH JOIN algorithm (Take Oracle as an example) through an instance in the following part.

 

II Test environment & computing scenario

The test computer has two Intel2670 16-core processors of 2.6 GHz, a 64GB RAM and an SSD, where a 16-core virtual machine with a 64GB RAM is running.

We have created on the virtual machine a dimension table named account, which is made up of 3 fields (accountid, name and state) and has a total of 5 billion records, and three same-structure fact tables (trade1, trade2 and trade3) that contain respectively 0.3 million, 3 million and 15 million records with 4 fields  tradedate, outid (account from which money is transferred), receiveid (account to which money is transferred) and amount (transferred amount). accountid is the foreign key for the fact tables outid field and received field. The relationship between both pairs is a one-to-many relationship.

Our test goal is to calculate the total transferred amount between states according to the three fact tables respectively. Then we test the multithreads performance using the fact table trade2.

 

III Tests

1. With Oracle

SQL query:

select /*+ parallel(n) */

a1.state,

a2.state,

sum(amount) as amount

from

account a1,

account a2,

trade1

where

outid = a1.accountid

and receiveid = a2.accountid

group by

a1.state,

a2.state

order by

a1.state,

a2.state;

/*+ parallel(n) */ is used for testing performance of multithreaded processing. n is the number of parallel tasks.

2. with SPL

SPL script:

 


A

1

=now()

2

=file(path+"account.ctx").create()

3

=file(path+"trade1.ctx").create().cursor@m(outid,receiveid,amount;;1)

4

=A3.joinx@q(outid,A2:accountid,state:out_state;receiveid,A2:accountid,state:receive_state;5000000)

5

=A4.groups(out_state,receive_state;sum(amount):amount)

6

=interval@s(A1,now())

 

joinx() function works with @q option to increase join performance when the fact data is small and the dimension data is big. Here the functions last parameter defines the number of records retrieved from the fact table cursor at one time for a join comparison. When there is enough available memory space, the greater this value the higher the performance.

 

3. Test results

Below are results of tests over different volumes of fact data (Unit: Sec):

 

Records count after filtering

15 million

3 million

0.3 million

Oracle

17277

2747

421

SPL

517

106

72

 

Result of multithreaded processing test using trade2 (which contains 3 million records) (Unit: Sec):

 

Number of threads

1

2

4

8

16

Oracle

2747

1589

1318

1375

1631

SPL

106

109

117

115

109

 

IV Summary

With single-threaded processing, the SPL algorithm is many times faster than Oracle thanks to the absence of full-traversal.

But the SPL algorithm has little effect on multithreaded processing performance. Oracle does enhance performance by using the parallel processing but it is still much slower due to the time-consuming full traversal.