數據幀的多個變量之間的相關

我有一個data.frame的變量在R中。讓我們稱之爲var1var2 ... var10數據幀的多個變量之間的相關

我想找到的var1對於一個的相關性 var2，var3 ... var10

我們怎樣才能做到這一點？

cor功能可以一次找到2個變量之間的相關性。通過使用我必須編寫cor函數爲每個分析

來源

2016-07-24 Milind Kumar

你可以使用一個適用聲明： '申請（iris [，2：4]，2，function（x）cor（x，iris $ Sepal.Length））' –

您可以使用'cor（data.frame）'，它將爲您提供所有變量之間的相關矩陣。從這個矩陣中提取相關的行/列。 – Sumedh

'cor（dat $ var1，dat [c（「var2」，「var3」，「var4」）]）''。所以使用菲利普的例子，'cor（iris $ Sepal.Length，iris [2：4]）' – user20650

我的包corrr，這有助於探索相關性，有一個簡單的解決方案。我將以mtcars數據集爲例，並說我們要關注mpg與所有其他變量的關聯。

install.packages("corrr") # though keep eye out for new version coming soon 
library(corrr) 
mtcars %>% correlate() %>% focus(mpg) 


#> rowname  mpg 
#>  <chr>  <dbl> 
#> 1  cyl -0.8521620 
#> 2  disp -0.8475514 
#> 3  hp -0.7761684 
#> 4  drat 0.6811719 
#> 5  wt -0.8676594 
#> 6  qsec 0.4186840 
#> 7  vs 0.6640389 
#> 8  am 0.5998324 
#> 9  gear 0.4802848 
#> 10 carb -0.5509251

這裏，correlate()產生的相關數據幀，focus()讓您專注於與所有其他某些變量的相關性。

FYI focus()與dplyr軟件包的工作方式類似於select()，只是它改變了行和列。所以如果你熟悉select()，你會發現使用focus()很容易。例如：

mtcars %>% correlate() %>% focus(mpg:drat) 

#> rowname  mpg  cyl  disp   hp  drat 
#>  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> 
#> 1  wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 
#> 2 qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 
#> 3  vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 
#> 4  am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 
#> 5 gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 
#> 6 carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980

來源

2016-07-25 02:33:28

另一種方法是使用庫Hmisc和corrplot獲得相關在所有對，意義和一個漂亮的情節，像這樣：

#Your data frame (4 variables instead of 10)  
df<-data.frame(a=c(1:100),b=rpois(1:100,.2),c=rpois(1:100,.4),d=rpois(1:100,.8),e=2*c(1:100)) 

#setup 
library(Hmisc) 
library(corrplot) 

df<-scale(df)# normalize the data frame. This will also convert the df to a matrix. 

corr<-rcorr(df) # compute Pearson's (or spearman's corr) with rcorr from Hmisc package. I like rcorr as it allows to separately access the correlations, the # or observations and the p-value. ?rcorr is worth a read. 
corr_r<-as.matrix(corr[[1]])# Access the correlation matrix. 
corr_r[,1]# subset the correlation of "a" (=var1) with the rest if you want. 
pval<-as.matrix(corr[[3]])# get the p-values 

corrplot(corr_r,method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot all pairs 

corrplot(corr_r,p.mat = pval,sig.level=0.05,insig = "blank",method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot pairs with significance cutoff defined by "p.mat"

來源

2016-07-25 15:58:53 thisisrg

數據幀的多個變量之間的相關

回答

相關問題