Select variables using correlation coefficients

 

Correlation coefficient is a statistic that measures the degree of correlation between two variables. Pearson correlation coefficient and Spearman correlation coefficient are commonly used, and their values are both distributed between [-1,1]. When the value is 0, the two variables are not correlated; when the value is 1 or -1, it indicates that the two variables are completely positively correlated or negatively correlated. The greater the absolute value, the stronger the correlation between the two variables

For example, for variables in credit card data, correlation coefficient method is used to select variables, and the absolute value of Pearson or Spearman is greater than 0.5.


A

B

C

1

=file("D://test//creditcard_b.csv").import@tc()



2

=A1.fname()



3

=A2.delete(A2.pos("Class"))



4

for A2

=pearson(A1.(${A4}),A1.(Class))


5


=spearman(A1.(${A4}),A1.(Class))


6


>B1=B1|[A4|B4|B5]


7


=if(abs(B4)>0.5 || abs(B5)>0.5,A4)


8


>C1=C1|B7


A2-A3 Get the field name except for the target variable

A4-B8 All independent variables were looped, and the correlation coefficients between them and target variables were calculated and stored in B1, and variable names with pearson or spearman correlation coefficient greater than 0.5 were screened out and stored in C1.

.. ..