使用R在矩陣中執行像tapply這樣的聚合

-1

我在做矩陣計算時遇到了問題，請您注意一下。非常感謝您提前！使用R在矩陣中執行像tapply這樣的聚合

我有一個數據幀genderLocation和矩陣測試，它們彼此對應與索引

genderLocation [1：6]

  scanner_gender cmall_gender wechat_gender scanner_location cmall_location wechat_location 
    156043    3   2    2    Guangzhou   Shenzhen   Shenzhen 
    156044    2   NA   NA    Shenzhen   <NA>     
    156045    2   NA    2    Shenzhen   <NA>   Hongkong 
    156046    2   NA    2    Shenzhen   <NA>   Shenzhen

測試

 [,1] [,2] [,3] [,4] [,5] [,6] 
    [1,] 0.8 0.7 0.6 0.6 0.7 0.7 
    [2,] 0.8 1.0 1.0 0.6 0.7 0.7 
    [3,] 0.8 1.0 0.6 0.6 0.7 0.7 
    [4,] 0.8 1.0 0.6 0.6 0.7 0.7

現在我想要聚合genderLocation，計算矩陣測試中相應位數的平均值。以156043排例如，結果應該是

 2 3 Guangzhou Shenzhen 
    0.65 0.80 0.60 0.70

我不知道如何使用申請家庭做到這一點（因爲它是不建議使用for循環中的R）。這似乎是

> apply(test,1,function(tst,genderLoc) print(tapply(tst,as.character(genderLoc),mean)),genderLocation)

，但我不明白的結果，如果限制到前兩排，似乎無可厚非。

> apply(test[1:2,],1,function(tst,genderLoc) print(tapply(tst,as.character(genderLoc),mean)),genderLocation[1:2,]) 
      c("2", NA)  c("3", "2") c("廣州", "深圳")  c("深圳", "")  c("深圳", NA) 
       0.65    0.80    0.60    0.70    0.70 
      c("2", NA)  c("3", "2") c("廣州", "深圳")  c("深圳", "")  c("深圳", NA) 
        1.0    0.8    0.6    0.7    0.7 
         [,1] [,2] 
    c("2", NA)  0.65 1.0 
    c("3", "2")  0.80 0.8 
    c("廣州", "深圳") 0.60 0.6 
    c("深圳", "")  0.70 0.7 
    c("深圳", NA)  0.70 0.7

##### FYI

test=matrix(c(0.8,0.8,0.8,0.8, 0.7,1,1,1, 0.6,1,0.6,0.6, 0.6,0.6,0.6,0.6, 0.7,0.7,0.7,0.7, 0.7,0.7,0.7,0.7),nrow=4,ncol=6,byrow=F) 
    genderLocation<- data.frame(scanner_gender=c(3,2,2,2),cmall_gender=c(2,NA,NA,NA),wechat_gender=c(2,NA,2,2), 
           scanner_location=c("Guangzhou","Shenzhen","Shenzhen","Shenzhen"), 
           cmall_location=c("Shenzhen",NA,NA,NA), 
           wechat_location=c("Shenzhen","","Hongkong","Shenzhen")) 
    genderLocation1<-cbind(genderLocation,test) # binded for some apply functions only accepting one input.

來源

2017-07-26 Bylon

下面爲您的示例數據的作品，但我不知道它是如何穩定與您的所有數據的。如果df中的某些行與其他行不共享公共值，則可能會出現問題。但是，如果您希望將輸出保留爲列表，則應該沒有問題（即跳過Reduce...）。牢記這一點......

- 您的數據 -

test <- matrix(c(0.8,0.8,0.8,0.8,0.7,1,1,1,0.6,1,0.6,0.6,0.6,0.6,0.6,0.6,rep(0.7,8)), nrow=4) 

df <- data.frame(scanner_gender=c(3,2,2,2), 
      cmall_gender=c(2,NA,NA,NA), 
      wechat_location=c(2,NA,2,2), 
      scanner_location=c("Guanzhou","Shenzhen","Shenzhen","Shenzhen"), 
      cmall_location=c("Shenzhen",NA,NA,NA), 
      wechat_location=c("Shenzhen",NA,"Hongkong","Shenzhen"), 
      stringsAsFactors=F) 
rownames(df) <- c(156043,156044,156045,156046)

--Operation--

我結合map從purrr與其他tidyverse動詞來1）創建一個2列數據第一列爲dfrow-entry，第二列爲testrow-entry，第二列爲2）然後filter出在哪裏is.na(A)==T，3）然後總結mean由基，4）然後spread成使用A（鍵橫行數據幀）爲列

L <- map(1:nrow(df),~data.frame(A=unlist(df[.x,]),B=unlist(test[.x,])) %>% 
       filter(!is.na(A)) %>% 
       group_by(A) %>% 
       summarise(B=mean(B)) %>% 
       spread(A,B))

我然後降低該列表到的數據幀使用Reduce和full_join

newdf <- Reduce("full_join", L)

--Output--

`2` `3` Guanzhou Shenzhen Hongkong 
1 0.65 0.8  0.6  0.70  NA 
2 0.80 NA  NA  0.60  NA 
3 0.70 NA  NA  0.60  0.7 
4 0.70 NA  NA  0.65  NA

來源

2017-07-26 13:49:49 CPak

謝謝Chi Pak！它適用於我。順便說一下，我使用這個軟件包進行測試，它花費了30分鐘，用於175999行，與使用for-loops大約相同。 – Bylon

您可以通過接受答案（左側的複選標記）來關閉此問題，或者如果您正在尋找其他答案，則可以保持此問題處於打開狀態。您也可以在將來更改您接受的答案 – CPak

謝謝Chi Pak :) – Bylon

使用R在矩陣中執行像tapply這樣的聚合

回答

相關問題