0
我有一個200個變量的列表,我想總結那些高度相關的變量。總和相關變量
假設這是我的數據
mydata <- structure(list(APPLE= c(1L, 2L, 5L, 4L, 366L, 65L, 43L, 456L, 876L, 78L, 687L, 378L, 378L, 34L, 53L, 43L),
PEAR= c(2L, 2L, 5L, 4L, 366L, 65L, 43L, 456L, 876L, 78L, 687L, 378L, 378L, 34L, 53L, 41L),
PLUM = c(10L, 20L, 10L, 20L, 10L, 20L, 1L, 0L, 1L, 2010L,20L, 10L, 10L, 10L, 10L, 10L),
BANANA= c(2L, 10L, 31L, 2L, 2L, 5L, 2L, 5L, 1L, 52L, 1L, 2L, 52L, 6L, 2L, 1L),
LEMON = c(4L, 10L, 31L, 2L, 2L, 5L, 2L, 5L, 1L, 52L, 1L, 2L, 52L, 6L, 2L, 3L)),
.Names = c("APPLE", "PEAR", "PLUM", "BANANA", "LEMON"),
class = "data.frame", row.names = c(NA,-16L))
我發現這個代碼,我不知道如何才能利用它爲我的目的 https://stackoverflow.com/a/39484353/4797853
var.corelation <- cor(as.matrix(mydata), method="pearson")
library(igraph)
# prevent duplicated pairs
var.corelation <- var.corelation*lower.tri(var.corelation)
check.corelation <- which(var.corelation>0.62, arr.ind=TRUE)
graph.cor <- graph.data.frame(check.corelation, directed = FALSE)
groups.cor <- split(unique(as.vector(check.corelation)), clusters(graph.cor)$membership)
lapply(groups.cor,FUN=function(list.cor){rownames(var.corelation)[list.cor]})
輸出調整,我我正在尋找的是2數據幀如下:
DF1
GROUP1 GROUP2
3 16
4 40
ETC..
的值是一組中的值的總和
DF2
ORIGINAL_VAR GROUP
APPLE 1
PEAR 1
PLUM 2
BANANA 2
LEMON 2
謝謝,這正是我期待的對於。 – user4797853