Stratified split

 

The proportion of class 0/1 (target variable) in the training set and the test set is close to the same


A

1

=file("D://titanic.csv").import@qtc()

2

=A1.group@p(Survived)

3

=A2(1).group(rand()<=0.3)

4

=A2(2).group(rand()<=0.3)

5

=(A3(1)|A4(1)).sort()

6

=(A3(2)|A4(2)).sort()

7

=train=A1(A5)

8

=test=A1(A6)

A2 Divide the samples into two groups according to the Survived value 0/1

A3 The first group is divided into two groups with 7:3 ratio

A4 The second group is divided into two groups with 7:3 ratio

A5 70% of the groups with target variables 0 and 1 are taken to form the training set

A6 Groups with target variables 0 and 1 take 30% each to form a prediction set