Oversampling

 

Over sampling is to achieve sample balance by increasing the data amount of small sized samples. Among them, the simpler way is to copy small samples directly to form a quantitative equilibrium.

The Titanic sample data is sampled as follows:


A

1

=file("D://titanic.csv").import@qtc()

2

1

3

=A1.group@p(Survived)

4

=A3.sort(~.len())

5

=A4(2).len()/A2-A4(1).len()

6

=if(A5>0,A5,0)

7

=A6.(A4(1)(rand(A4(1).len())+1))

8

=(to(A1.len()))|A7.sort()

9

=A1(A8)

A5 Calculate the number of samples of the fewer classes that need to be replicated based on the balance ratio

A7 Randomly select the sample to be copied from the small number of samples

A8 Merge the sample location of the original samples and the copied samples

A9 Take the sample of the corresponding position and complete the sampling