Comparison of Automatic Modeling Effects of Ymodel, Weka, Rapidminer

 

Objective:To compare the automatic modeling effects of Weka, Rapidminer, and Ymodel

Data to be used:5 pieces of data in total, 3 pieces of classification, and 2 pieces of regression

2 classic Kaggle cases and 3 real business data

Titanic Data

Classification

Kaggle

House Price Prediction

Regression

Kaggle

Credit Company User Overdue Prediction

Classification


Claims prediction of insurance company policies

Classification


Second-hand car transaction price prediction

Regression


Due to the limited data size of Rapidminer's free version of 10000 items, three real business data was sampled, with sample sizes controlled within a few thousand items. It is not possible to conduct large data volume testing.

Product introduction: Weka is open source, and the automatic modeling function is an extension module of Weka, which is free to use. Rapidminer is a commercial software. Although it has a free version, the auto model function will be charged.

Overall user experience:Ymodel has the fastest modeling speed. Rapidminer is relatively fast in model building, and when there are many variables, the modeling time increases significantly. Weka modeling requires setting the modeling time beforehand, and the modeling speed is also relatively slow. In Weka, sometimes it is necessary to manually handle some variable types in order to be recognized by automatic modeling. In terms of automatic modeling functionality, Weka's experience is relatively poor.

Testing method:All data is divided into a training set and a prediction set, and the prediction results are exported and scored uniformly.

Test results:

1. Titanic Survival Prediction - Classification

Training data: 802 items, 12 variables

The ratio of positive and negative samples is approximately 3:5


Weka

Rapidminer

Ymodel

Accuracy

0.722

0.787

0.775

Precision

0.862

0.809

0.857

Recall

0.556

0.756

0.667

Specificity

0.909

0.818

0.886

F1

0.676

0.782

0.75

AUC


0.793

0.847

Ranking

3

2

1

It is unable to output probability values in Weka (or possibly not finding how to output), therefore unable to calculate AUC.

2. House Price Prediction - Regression


Weka

Rapidminer

Ymodel

Mse

4.17E8

1.41E9

9.85E8

Rmse

20430

37539

31385

Mae

14164

19459

16378

Mape

9.108

11.317

9.921

R2

0.889

0.755

0.829

Ranking

1

3

2

3. Credit Company User Overdue Prediction - Classification

Training data: 8938 items, 56 variables

The ratio of positive and negative samples is approximately 1:8


Weka

Rapidminer

Ymodel

Accuracy

0.878

0.880

0.804

Precision

-

0.471

0.281

Recall

0

0.063

0.409

Specificity

1

0.99

0.858

F1

-

0.111

0.333

AUC


0.729

0.742

Ranking

3

2

1

On this data, the Weka model failed and did not capture any positive sample.

4. Claims prediction of insurance company policies - classification

Training data: 3470 items, 29 variables

The ratio of positive and negative samples is approximately 1:7


Weka

Rapidminer

Ymodel

Accuracy

0.905

0.949

0.882

Precision

0.051

0.033

0.022

Recall

0.264

0.069

0.139

Specificity

0.916

0.965

0.895

F1

0.086

0.045

0.038

AUC


0.642

0.638

Ranking

1

2

3

5. Second-hand car transaction price prediction


Weka

Rapidminer

Ymodel

Mse

2779927

8466716

9429967

Rmse

1667

2910

3070

Mae

835

1580

1537

Mape

27

75

54

R2

0.941

0.821

0.801

Ranking

1

2

3

Overall evaluation:Among the 5 data samples used in this testing, the rankings vary depending on the data, but the difference in indexes is not significant, and the overall performance of Ymodel is quite good. In comparison, Weka performs well in regression model, Ymodel performs well in classification model, and Rapidminer is in the middle.