Split values of a field in a database table and transform it to new fields
Below is a table queried from BigQuery database. Its 4th field has comma-separated strings.
user_id |
fst_name |
lst_name |
team_list |
1111 |
DANNY |
CRUISE |
TEAM34,TEAM12,TEAM4 |
2222 |
CARLOS |
SMITH |
TEAM34,TEAM44,TEAM12 |
33333 |
SASHA |
CONOR |
TEAM5,TEAM34,TEAM44 |
Use Java to do this: Retain the first 3 fields, split values the 4th field, make the distinct values names of the fields from 4th to Nth, write 1 or 0 under each new field to record whether they are originally included in the current row or not, and save result as a new file. Below is the expected result:
user_id,fst_name,lst_name,TEAM34,TEAM12,TEAM4,TEAM44,TEAM5 1111,DANNY,CRUISE,1,1,1,0,0 2222,CARLOS,SMITH,1,1,0,1,0 33333,SASHA,CONOR,1,0,0,1,1 |
Write SPL code as follows:
A |
|
1 |
=bigqry1.query("select user_id,fst_name,lst_name,team_list from data where p=1") |
2 |
=A1.news(team_list.split@c();user_id,fst_name,lst_name,~) |
3 |
=A2.pivot@s(#1,#2,#3; #4,if(#4,1,0)) |
4 |
=T("d:/result.csv":A3) |
A1: Query the database through JDBC.
A2: Split up each value of the 4th field and generate a record for each distinct value; ~ is the current item.
A3: Perform row-to-column transposition – retain the first 3 columns, transform values of the 4th column to the new field names, and write 1 under each new field if they are included in the current row in the original table and 0 if they are not.
A4: Write result to a csv file.
Read How to Call a SPL Script in Java to find how to integrate SPL into a Java application.
Source:https://stackoverflow.com/questions/74645043/how-do-i-write-csv-file-with-dynamic-headers-in-apache-beam-java
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL