Others transformation
In addition to these mathematical transformation, some variables reflecting the relationship with the target variable can also be derived, such as target positive sample proportion, odds encoding, log-odds encoding, and numerical target mean value
Titanic's target variables Survived as categorical variable, and the"Sex" variable is transformed by target positive sample proportion, odds encoding, and log-odds encoding
A |
|
1 |
=file("D://titanic.csv").import@qtc() |
2 |
=A1.groups(Sex;count(Survived==1)/count(~):tar_P) |
3 |
=A1.derive(if(Sex=="female",A2(1).tar_P,A2(2).tar_P):tar_P_Sex,if(Sex=="female",A2(1).tar_P/A2(2).tar_P,A2(2).tar_P/A2(1).tar_P):odds,lg(odds):lg_odds) |
A2 Group the samples according to “Sex”, count the proportion of group members rescued
A3 Calculate the target positive sample proportion, Odds encoding, log-odds encoding
The target variable "SalePrice" in the house price data is a numerical variable. The categorical variable "MSZoning" is transformed by the numerical target mean value
A |
|
1 |
=T("D://house_prices_train.csv") |
2 |
=A1.groups(MSZoning;avg(SalePrice):tar_mean) |
3 |
=A1.derive(A2(A2.(MSZoning).pos(MSZoning)).tar_mean:MSZoing_tar_mean) |
A2 According to the "MSZoing" group, the corresponding target variable mean is calculated
A3 According to the value of each variable, the corresponding target mean is obtained
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL