Under sampling

 

Under sampling is to achieve sample balance by reducing the number of samples of most classes. The simple and direct method is to remove some data randomly to reduce the size of most class samples.

For example, the target Survived in the Titanic data is a binary variable with a value of 0,1, and a 1:1 sample balance is achieved by under sampling


A

1

=file("D://titanic.csv").import@qtc()

2

1

3

=A1.group@p(Survived)

4

=A3.sort(~.len())

5

=ceil(min(A4(2).len(),A4(1).len()*A2))

6

=to(A4(2).len()).sort(rand())

7

=A6(to(A5)).sort()

8

=(A4(2)(A7)|A4(1)).sort()

9

=A1(A8)

A2 Set the sampling balance ratio, majority sample/minority sample

A3 Group according to the target variable, and get the member locations of each group

..

A4 Sorting by the number of samples, the first group is a minority sample, and the second group is a majority sample

A5 Calculate the quantity to be sampled according to the sampling proportion

A6 Majority sample groups of classes are randomly ordered

A7 Take the first A5 samples from A6 and sort them to realize random sampling

A8 Combine the position sequence of the majority and minority samples that need to be sampling

..

A9 Take the sample of the corresponding position and complete the sampling