2016-12-12 153 views
0

我有問題能夠計算獨特的繁殖代碼的種植和死亡數量的總和。我試圖使用聚合函數來做到這一點。使用聚合函數來計算唯一總計

Goal: 

##  rowBrCrCode rowPlanted rowPsaDeath rowSurvival 
## 1:  GL_287   63   24   0 
## 2:  GL_287   13   7   0 
## 3:  GL_287   67   26   0 
## 4:  aCK_227   17   5   0 
## 5:  aCK_406   20   1   0 

into 

##  rowBrCrCode rowPlanted rowPsaDeath rowSurvival 
## 1:  GL_287  143   57   0 
## 2:  aCK_227   17   5   0 
## 3:  aCK_406   20   1   0 

rowSurvival將在此函數之後計算。

目前的代碼是在這裏輸入這個代碼,有問題的代碼被註釋掉了(我寫在R):

library(stringr) 
library(data.table) 
library(plyr) 
test <- fread(file.choose(), header = TRUE, data.table = TRUE) 
testRowLength <- nrow(test) 
rowBrCrCode <- "" 
rowPlanted <- 0 
rowPsaDeath <- 0 
rowSurvival <- 0 
for(i in 1:testRowLength){ 
    if(test$BrCrCode[i] == ""){ 
    test$BrCrCode[i] <- paste("PH_", i, sep = "") 
    print(paste("hit found, turned nothing into ", test$BrCrCode[i], sep = "")) 
    } 
    slashCount <- str_count(test$BrCrCode[i], '/') 
    if(slashCount == 1){ 
    print(paste("hit found, turn ", test$BrCrCode[i], " into ", unlist(strsplit(test$BrCrCode[i], split = '/', fixed=TRUE))[1], sep = "")) 
    test$BrCrCode[i] <- unlist(strsplit(test$BrCrCode[i], split = '/', fixed=TRUE))[1] 
    } 
    else if(slashCount > 1){ 
    print(paste("control found, value ", test$BrCrCode[i], " into ", paste("control_", unlist(strsplit(test$BrCrCode[i], split = '/', fixed=TRUE))[1], sep = ""), sep = "")) 
    test$BrCrCode[i] <- paste("control_", unlist(strsplit(test$BrCrCode[i], split = '/', fixed=TRUE))[1], sep = "") 
    } 
    rowBrCrCode[i] <- test$BrCrCode[i] 
    rowPlanted[i] <- test$Planted[i] 
    rowPsaDeath[i] <- test$PsaDeath[i] 
} 
firstDT <- data.table(rowBrCrCode, rowPlanted, rowPsaDeath, rowSurvival) 
print(firstDT) 
#firstDT_agg <- aggregate(x = firstDT, by = list(rowPlanted, rowPsaDeath), FUN = "sum") 
+0

已確認兩種方式都可以工作,但只有在data.table中的某些變量在輸入時作爲數字進行轉換後纔可以。無論出於何種原因,當您將數字添加到矢量中時,它會更改爲字符。 – ImmortalMewtwo

回答

0

這應該只是這個,如果你想要做的就是累積總

test2 <- aggregate(cbind(rowPlanted, rowPsaDeath, rowSurvival) ~ BrCrCode, 
        data = test, FUN = sum) 
+0

如果要從文件導入第一個DT數據,請預先定義列類。如果您在程序中創建它,請確保列是數字。現在,您可以使用firstDT [,2:4] < - as.numeric(firstDT [,2:4]),然後在代碼 –

+0

之上運行它在此行定義。 'firstDT < - data.table(rowBrCrCode,rowPlanted,rowPsaDeath,rowSurvival)''。剛剛意識到我更喜歡這個,它按字母順序排列了交叉代碼的類型 – ImmortalMewtwo

1

該數據集似乎是data.table,所以我們可以使用data.table方法。按'rowBrCrCode'分組,我們遍歷Data.table的子集(.SD)並獲取sum

library(data.table) 
dt[, lapply(.SD, sum), by = rowBrCrCode] 
#  rowBrCrCode rowPlanted rowPsaDeath rowSurvival 
#1:  GL_287  143   57   0 
#2:  aCK_227   17   5   0 
#3:  aCK_406   20   1   0