"【Question】 What is a best way to shuffle a database column using PL/SQL? E.g. First name with ar .."

Home HowTo Blog TalkingData eBook Q&A Download Documentation Official Website

Login Register

blackduckie RaqForum 28 No.
477 View • 5 Years ago

...

Database Column Shuffle & Update

SQL(179)

【Question】

What is a best way to shuffle a database column using PL/SQL? E.g. First name with around 70000 of rows.

Here’s a solution:

DECLARE

SUBTYPE Name IS VARCHAR2(200);

TYPE Names_tab IS TABLE OF Name;

--

tabOrigin Names_tab;

tabShuffled Names_tab;

vSwap Name;

idx BINARY_INTEGER;

BEGIN

\-\- Do simple play WITH DUAL instead of real data

WITH rs AS (

SELECT DBMS_RANDOM.STRING('A', 1) name FROM dual

UNION ALL

SELECT DBMS_RANDOM.STRING('A', 1) name FROM dual

UNION ALL

SELECT DBMS_RANDOM.STRING('A', 1) name FROM dual

UNION ALL

SELECT DBMS_RANDOM.STRING('A', 1) name FROM dual

UNION ALL

SELECT DBMS_RANDOM.STRING('A', 1) name FROM dual

UNION ALL

SELECT DBMS_RANDOM.STRING('A', 1) name FROM dual

)

SELECT name origin, name shuffle

BULK COLLECT INTO tabOrigin, tabShuffled

FROM rs

;

--

\-\- As mentioned in 'Operational Notes' for package DBMS_RANDOM

\-\- invocation of DBMS_RANDOM.SEED is necessary only for obtaining

\-\- repeatable sequence of values, therefore omit this call

--

FOR i IN 1 .. tabOrigin.COUNT LOOP

idx := DBMS_RANDOM.VALUE(1, tabOrigin.COUNT);

vSwap := tabShuffled(i);

tabShuffled(i) := tabShuffled(idx);

tabShuffled(idx) := vSwap;

END LOOP;

\-\- Lookup result after shuffling

FOR i IN 1 .. tabOrigin.COUNT LOOP

dbms\_output.put\_line(

'Origin:' || tabOrigin(i) || 'Shuffled to:' || tabShuffled(i)

);

END LOOP;

END;

/

Output for this script is:

Origin: D Shuffled to: N
Origin: U Shuffled to: n
Origin: j Shuffled to: K
Origin: N Shuffled to: D
Origin: K Shuffled to: j
Origin: n Shuffled to: U

【Answer】

With the above solution, you can generate shuffled column in SQL. But you still need further computation to write shuffled values back to SHUFFLED_VALUE field. The whole process is very complicated. In this case, I think you can try SPL. The Structured Process Language (SPL) generates concise and easy-to-understand code:

	A
1	=connect("myDB1").query("select id,ORIGINAL_VALUE from REF_VALUES")
2	=A1.sort(rand())
3	=join@p(A1.(ID);A2.(ORIGINAL_VALUE))
4	=connect("myDB1").update@u(A3,REF_VALUES,ID:_1,SHUFFLED_VALUE:_2 ;ID)

A1: Retrieve data in SQL;

A2: Perform a random sort over the sequence of column values;

A3: Join A1’s ID field with A2’s ORIGINAL_VALUE field by sequence numbers and return a new sequence;

A4: REF_VALUES 中 Update A3’s column values into the database table according to key ID.

For detailed explanations about functions used in the SPL script, see Function Reference.

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/2bkGwqTj

Youtube 👉 https://www.youtube.com/@esProc_SPL

SQL(179)

Application

blackduckie • 477 View • 5 Years ago

Reply

Please input Comment content ...

0 Collect • 0 Follow • 0 Up • 0 Down