Database Column Shuffle & Update

Question

What is a best way to shuffle a database column using PL/SQL? E.g. First name with around 70000 of rows.

Here’s a solution:

 

DECLARE 

SUBTYPE Name IS VARCHAR2(200); 

TYPE Names_tab IS TABLE OF Name; 

-- 

tabOrigin Names_tab; 

tabShuffled Names_tab; 

vSwap Name; 

idx BINARY_INTEGER; 

BEGIN 

\-\- Do simple play WITH DUAL instead of real data 

WITH rs AS ( 

SELECT DBMS_RANDOM.STRING('A', 1) name FROM dual 

UNION ALL 

SELECT DBMS_RANDOM.STRING('A', 1) name FROM dual 

UNION ALL 

SELECT DBMS_RANDOM.STRING('A', 1) name FROM dual 

UNION ALL 

SELECT DBMS_RANDOM.STRING('A', 1) name FROM dual 

UNION ALL 

SELECT DBMS_RANDOM.STRING('A', 1) name FROM dual 

UNION ALL 

SELECT DBMS_RANDOM.STRING('A', 1) name FROM dual 

SELECT name origin, name shuffle 

BULK COLLECT INTO tabOrigin, tabShuffled 

FROM rs 

-- 

\-\- As mentioned in 'Operational Notes' for package DBMS_RANDOM

 

\-\- invocation of DBMS_RANDOM.SEED is necessary only for obtaining 

\-\- repeatable sequence of values, therefore omit this call

 

-- 

FOR i IN 1 .. tabOrigin.COUNT LOOP 

idx := DBMS_RANDOM.VALUE(1, tabOrigin.COUNT); 

vSwap := tabShuffled(i); 

tabShuffled(i) := tabShuffled(idx); 

tabShuffled(idx) := vSwap; 

END LOOP; 

\-\- Lookup result after shuffling 

FOR i IN 1 .. tabOrigin.COUNT LOOP 

dbms\_output.put\_line( 

'Origin:' || tabOrigin(i) || 'Shuffled to:' || tabShuffled(i) 

); 

END LOOP; 

END; 

 

Output for this script is:

Origin: D Shuffled to: N
Origin: U Shuffled to: n
Origin: j Shuffled to: K
Origin: N Shuffled to: D
Origin: K Shuffled to: j
Origin: n Shuffled to: U

 

Answer

With the above solution, you can generate shuffled column in SQL. But you still need further computation to write shuffled values back to SHUFFLED_VALUE field. The whole process is very complicated. In this case, I think you can try SPL. The Structured Process Language (SPL) generates concise and easy-to-understand code:

 

A

1

=connect("myDB1").query("select    id,ORIGINAL_VALUE from   REF_VALUES")

2

=A1.sort(rand())

3

=join@p(A1.(ID);A2.(ORIGINAL_VALUE))

4

=connect("myDB1").update@u(A3,REF_VALUES,ID:_1,SHUFFLED_VALUE:_2    ;ID)

A1: Retrieve data in SQL;

A2: Perform a random sort over the sequence of column values;

A3: Join A1’s ID field with A2’s ORIGINAL_VALUE field by sequence numbers and return a new sequence;

A4: REF_VALUES Update A3’s column values into the database table according to key ID.

For detailed explanations about functions used in the SPL script, see Function Reference