Equi-frequency binning

 

The variable values are arranged in the order of small to large. According to the number of samples in the dataset, they are equally divided into k parts. Each part is treated as a bin. For example, if the number of bins is 10, each bin contains about 10% of the samples.

Equi-frequency binning the “Fare” variable


A

1

=file("D://titanic.csv").import@qtc()

2

=A1.ranks(Fare)

3

3

4

=ceil(A1.len()/A3)

5

=A3.(~*A4)

6

=A1.derive(if(A2(#)<A5(1),"low",if(A2(#)>=A5(2),"hign","middle")):Fare_equifre_binning)

A2 Sort “Fare”, return the sorted position

A3 Set the number of bin

A4 Calculate the frequency of each bin

A5 Calculate the frequency boundary value for each bin

A6 Binning Fare according to the position sorted