如何通過集羣

假設我有以下數據彙總數據：如何通過集羣

library(data.table)  
set.seed(200) 
data <- data.table(income=runif(20, 1000,8000), gender=sample(0:1,20, T), asset=runif(20, 10000,80000),education=sample(1:4,20,T), cluster = sample(1:4, 20, T))

我的數據同時包含連續變量和分類變量。我想基於聚類變量彙總數據如下：

連續變量（收入和資產）：使用mean，所以我申請

data[,lapply(.SD, mean), by = cluster, .SDcols = c(1,3)]

分類變量（性別和教育）：我用

table(data[,gender, by = cluster])/rowSums(table(data[,gender, by = cluster])) 

table(data[,education, by = cluster])/rowSums(table(data[,education, by = cluster]))

我不認爲我的代碼是有效的。

您能否給我建議如何處理這種情況？

來源

2014-11-24 newbie

我會做這種方式：

data[, .N, by=.(gender, cluster)][, .(gender, ratio = N/sum(N)), by=cluster] 
data[, .N, by=.(education, cluster)][, .(education, ratio = N/sum(N)), by=cluster]

來源

2014-11-24 13:19:54 Arun

你可以使用一個for循環的categorical變量

res <- list() 
for(i in c('gender', 'education')){ 
    res[[i]] <- prop.table(table(cbind(data[,'cluster', with=FALSE], 
          data[,i, with=FALSE])), margin=1) 
} 

res

或者

lapply(data[,c('gender','education'), with=FALSE], function(x) 
     prop.table(table(cbind(data[,'cluster', with=FALSE],x)), margin=1))

來源

2014-11-24 07:54:52 akrun

如何通過集羣

回答

相關問題