根據簇中的聚合順序和二進制數據R

我使用CRAN cluster包與R進行了k-medoid聚類分析。數據位於data.frame上，名爲df4和13111 obs。的二進制和序數值。在羣集之後，我將羣集結果應用到原始data.frame，顯示對應的羣集編號爲用戶標識。根據簇中的聚合順序和二進制數據R

如何根據羣集聚合二進制和序數選項？

例如，Gender變量具有從「18-20」，「21-24」，「25-34」男性/女性的值和範圍Age 「35-44」，「45-54」，「 55-64" 和「65歲以上」我希望男性和可變Gender和類別中Age每個羣集女性值的總和

這裏是我的data.frame與簇標籤欄的頭：

#12 variables because I added the clustering object to the data.frame 
#I only included two variables from the R output 
> str(df4) 
'data.frame': 13111 obs. of 12 variables: 
$ Age     : Factor w/ 7 levels "18-20","21-24",..: 6 6 6 6 7 6 5 7 6 3 ... 
$ Gender   : Factor w/ 2 levels "Female","Male": 1 1 2 2 2 1 2 1 2 2 … 

#I only included three variables from the R output 
> head(df4) 
    Age Gender 
1 55-64 Female   
2 55-64 Female   
3 55-64 Male   
4 55-64 Male   
5  65+ Male   
6 55-64 Female

這裏是類似我的數據集重複的例子：

輸出（假設的）的期望的結果：

cluster female male 18-20 21-24 25-34 35-44 45-54 55-64 65+ 
1 1  1  1 1  2  1  0  3  1  0 
2 2  2  1 1  1  0  1  2  0  0 
3 3  0  1 1  1  1  1  0  2  3

讓我知道如果我可以提供更多的信息。

來源

2014-11-06 Scott Davis

'頭= TRUE'是沒有意義的，你有很多「聰明引號」會導致解析器窒息。你也應該發佈你認爲是「正確的答案」，特別是如果它不僅僅是'with（df4，table（gender，cluster））' – 2014-11-06 20:58:58

我已經從可重現的代碼中刪除了智能引用和 2014-11-06 21:35:55

@BondedDust我包含了一個關於聚合的假設答案，並對聚類做了一個可重現的例子。 – 2014-11-06 22:29:41

看起來要顯示來自這兩個表一個聚類性別和一個聚類年齡tabluation在一個矩陣：

with(smalldf, cbind(table(cluster, gender), table(cluster, age) )) 
#---------------- 
    Female Male 18-20 21-24 25-34 35-44 45-54 55-64 65+ 
1  2 0  1  1  0  0  0  0 0 
2  0 4  0  0  1  1  1  1 0 
3  1 0  0  0  0  0  0  0 1

來源

2014-11-06 22:37:53

完美，有效！ – 2014-11-06 22:47:18

根據簇中的聚合順序和二進制數據R

回答

相關問題