Statistic filling

 

Mean filling

Use the mean value to fill the variable "Age" of Titanic data.


A

1

=file("D://titanic.csv").import@qtc()

2

=A1.avg(Age)

3

=A1.run(Age=if(!Age,A2,Age))

Before filling After filling

.. ..

Automatic filling according to the data type of the variable

Using the mean to fill requires that the data type must be quantitative, and if it is not quantitative data, it needs to be filled with other statistics. For example, for integer variables, either the mean or the median can be used; For floating-point variables, only the mean can be used; For character variables, the mode is generally used to fill. Conveniently, the A.impute()and P.impute() functions are provided in SPL which can automatically select different statistics to fill in the missing values, depending on the data type.


A

1

=file("D://titanic.csv").import@qtc()

2

=A1.impute@N("Age")

3

=A1.fname()

4

=A3.(A1.impute@c(~))

A2 Fill the variable "Age", return the fill result and fill record Rec, @N indicates that the variable type is a number

..

The impute()function performs either mode fill or mean fill or impute the missing value into a new class depending on the variable type specified. When the variable type is not specified, impute() will automatically detects the variable type to fill in.

A3 Get the field names of A1

A4 Automatically fill all the fields. @c indicates that the original data is modified to the filling data, and there is no missing value in table A1.