如何使用變量查找在R中創建新列？ [R編程

我有一個看起來像一個數據表：如何使用變量查找在R中創建新列？ [R編程

Cause of Death    Ethnicity     Count 
1: ACCIDENTS EXCEPT DRUG POISONING ASIAN & PACIFIC ISLANDER 1368 
2: ACCIDENTS EXCEPT DRUG POISONING     HISPANIC 3387 
3: ACCIDENTS EXCEPT DRUG POISONING  NON-HISPANIC BLACK 3240 
4: ACCIDENTS EXCEPT DRUG POISONING  NON-HISPANIC WHITE 6825 
5:    ALZHEIMERS DISEASE ASIAN & PACIFIC ISLANDER 285 
---

我想創建一個新的列是一個簡單的人，從死亡的一個具體原因廢去種族之間的百分比。像這樣：

Cause of Death    Ethnicity     Count PercentofDeath 
1: ACCIDENTS EXCEPT DRUG POISONING ASIAN & PACIFIC ISLANDER 1368  0.09230769 
2: ACCIDENTS EXCEPT DRUG POISONING     HISPANIC 3387  0.22854251 
3: ACCIDENTS EXCEPT DRUG POISONING  NON-HISPANIC BLACK 3240  0.21862348 
4: ACCIDENTS EXCEPT DRUG POISONING  NON-HISPANIC WHITE 6825  0.46052632 
5:    ALZHEIMERS DISEASE ASIAN & PACIFIC ISLANDER 285  0.04049446 
---

這裏是我的代碼做到這一點，這是相當難看：

library(data.table) 
    #load library, change to data table 
    COD.dt <- as.data.table(COD) 


    #function for adding the percent column 
    lala <- function(x){ 

    #see if I have initialized data.table I'm going to append to 


     if(exists("started")){ 
     p <- COD.dt[x ==`Cause of Death`] 
     blah <- COD.dt[x ==`Cause of Death`]$Count/sum(COD.dt[x ==`Cause of Death`]$Count) 
     p$PercentofDeath <- blah 
     started <<- rbind(started,p) 
     } 

     #initialize data table 
     else{ 
      l <- COD.dt[x ==`Cause of Death`] 
      blah <- COD.dt[x ==`Cause of Death`]$Count/sum(COD.dt[x ==`Cause of Death`]$Count) 
      l$PercentofDeath <- (blah) 
      started <<- l 
     } 

#if finished return 
if(x == unique(COD.dt$`Cause of Death`)[length(unique(COD.dt$`Cause of Death`))]){ 
    return(started) 
} 
} 

#run function 
h <- sapply(unique(COD.dt$`Cause of Death`), lala) 
    #remove from environment 
rm(started) 
#h is actually ends up being a list, the last object happen to be the one I want so I take that one 
finalTable <- h$`VIRAL HEPATITIS`

所以，你可以看到。這段代碼非常難看，並且不適用。我希望從一些指導如何使這個更好。也許使用dpylr或其他一些函數？

最佳

來源

2015-09-19 njBernstein

純數據表的解決方案將是容易的爲好，但這裏的dplyr：

library(dplyr) 

COD.dt %>% group_by(`Cause of Death`) %>% 
    mutate(PercentofDeath = Count/sum(Count))

您可以變成一個功能，但它是這麼小，基本操作大多數人不會打擾。

來源

2015-09-19 01:40:02 Gregor

嗚。這很好。我一直有意使用％>％運算符。非常感謝。 – njBernstein

不確定它是否提高了可讀性，但是對於magrittr，'PercentofDeath = Count％>％{./sum（。）}'在'mutate'中起作用。 – Frank

@Frank我會說這大大降低了可讀性。 – Gregor

我剛剛發現了一個辦法更好的辦法：

library(data.table) 
#load library, change to data table 
COD.dt <- as.data.table(COD) 

#make column of disease total counts 
COD.dt[,disease:=sum(Count), by = list(`Cause of Death`)] 

#use that column to make percents 
COD.dt[,percent:=Count/disease, by = list(`Cause of Death`)]

來源

2015-09-19 01:42:43 njBernstein

這不需要是兩個單獨的步驟。除非你想因爲其他原因需要「疾病」專欄，你可以做'百分比：=計數/總和（計數）'。 – Gregor

此外，您可以修改'COD'而不是單獨製作一個對象，比如'setDT（COD）'。 – Frank

哦。很高興知道。甜。謝謝！ @Gregor – njBernstein

如何使用變量查找在R中創建新列？ [R編程

回答

相關問題