"[url]Case 1 [url] Data set description: 2.9 million rows, 37 columns, a size of 477MB. Target va .."

mars RaqForum 25 No.
671 View • 3 Years ago

Use Cases of Modeling Tests – Comparison between YModel and Manual Model Building

Case 1

Task: According to the data of defaulted installment loans for a bank, predict the probability of default (PD) among its personal users.

Data set description: 2.9 million rows, 37 columns, a size of 477MB.

Target variable: IsDefaulted.

Test content:

1. Model performance indexes over the test data set: AUC, lift in top 10%, and model attenuation level.

2. Model building duration.

3. Skill requirements.

Test result:

1. Model performance

Modeling type		AUC for training data set	AUC for test data set	Model attenuation	Lift in top 10% for test data set
Manual model Building	Model 1	1	0.973	0.027	9.22
	Model 2	0.999	0.971	0.028	9.18
	Model 3	0.999	0.968	0.031	9.09
	Model 4	0.998	0.922	0.076	7.9
	Model 5	0.996	0.965	0.031	8.63
	Model 6	0.995	0.959	0.036	8.77
	Model 7	0.993	0.927	0.066	7.99
	Model 8	0.988	0.956	0.032	8.63
	Model 9	0.982	0.928	0.054	7.99
	Model 10	0.976	0.914	0.062	7.76
	Model 11	0.969	0.919	0.05	7.85
	Model 12	0.961	0.924	0.037	7.95
YModel		0.918	0.911	0.007	8.0

Note: Manual model building produces a series of intermediate models (Models 1-12) as a result of model tuning while YModel generates the desired final model directly.

Result explanation:

1) The first several manually-made models have high AUC on training data set. It’s apparently they are overfitting. A more suitable model (model 12) is created after multiple tunings.

2) Compared with YModel, Model 12 has higher AUC on test data set but much higher model attenuation level. So it is overfitting too. YModel has very small model attenuation level and thus will perform better on scoring unknown data.

3) YModel is slightly higher than Model 12 in lift in top 10% on test data set.

Summary: This is a close contest in terms of the above indexes, but YModel has better generalization ability.

2. Model building duration

Manual model building: About three weeks for manual preprocessing and model tuning.

YModel: 13 minutes for automatic preprocessing and model building.

3. Skill requirements

Manual model building: Professional statistical knowledge.

YModel: General knowledge.

Case 2

Task: According to the data of defaulted corporate loans for a bank, predict the probability of default (PD) among micro and small corporate users.

Data set description: 36000 rows, 5500 columns, a size of 453MB; high dimensional and sparse.

Target variable: IsDefaulted.

Test content:

1. Model performance indexes over the test data set: AUC, lift in top 10%, and model attenuation level.

2. Model building duration.

Test result:

	YModel	Manual model building
Model building duration	17 minutes (data preprocessing & model building)	2 weeks
Model number	1	1
AUC for training data set	0.996	0.998
AUC for test data set	0.987	0.972
Lift in top 10% for test data set	9.8	9.6

1) YModel has higher AUC and lift and lower attenuation level on test data set.

2) YModel is fast and efficient, even in handling high dimensional data; manual model building is slow and inefficient, particularly complicated in dealing with high dimensional data.

Case 3

Task: Predict claim settlement risk for the insurance company.

Data set description: 1.38 million rows, dozens of columns, a size of 4G; high proportion of missing data and high-cardinality categorical variables.

Target variable: ClaimOccured

Test content:

1. Gini index on test data set.

2. Model building duration.

Test result:

	YModel	Manual model building
Model building duration	60 minutes (data preprocessing & mode building)	1 month
Model performance (Gini)	0.683	0.608
Key derived variables	3	-

1) YModel has higher Gini index on test data set.

2) YModel can automatically handle missing data and high-cardinality categorical variables and auto-generate derived variables. It is much faster and more efficient.

SPL Official Website 👉 http://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc

SPL Learning Material 👉 http://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/ydhVnFH9

Youtube 👉 https://www.youtube.com/@esProc_SPL

YModel

mars • 671 View • 3 Years ago

Use Cases of Modeling Tests – Comparison between YModel and Manual Model Building

Case 1

Case 2

Case 3

ToC