2015-09-19 108 views
0

我有一個看起來像一個數據表:如何使用變量查找在R中創建新列? [R編程

Cause of Death    Ethnicity     Count 
1: ACCIDENTS EXCEPT DRUG POISONING ASIAN & PACIFIC ISLANDER 1368 
2: ACCIDENTS EXCEPT DRUG POISONING     HISPANIC 3387 
3: ACCIDENTS EXCEPT DRUG POISONING  NON-HISPANIC BLACK 3240 
4: ACCIDENTS EXCEPT DRUG POISONING  NON-HISPANIC WHITE 6825 
5:    ALZHEIMERS DISEASE ASIAN & PACIFIC ISLANDER 285 
---  

我想創建一個新的列是一個簡單的人,從死亡的一個具體原因廢去種族之間的百分比。像這樣:

Cause of Death    Ethnicity     Count PercentofDeath 
1: ACCIDENTS EXCEPT DRUG POISONING ASIAN & PACIFIC ISLANDER 1368  0.09230769 
2: ACCIDENTS EXCEPT DRUG POISONING     HISPANIC 3387  0.22854251 
3: ACCIDENTS EXCEPT DRUG POISONING  NON-HISPANIC BLACK 3240  0.21862348 
4: ACCIDENTS EXCEPT DRUG POISONING  NON-HISPANIC WHITE 6825  0.46052632 
5:    ALZHEIMERS DISEASE ASIAN & PACIFIC ISLANDER 285  0.04049446 
--- 

這裏是我的代碼做到這一點,這是相當難看:

library(data.table) 
    #load library, change to data table 
    COD.dt <- as.data.table(COD) 


    #function for adding the percent column 
    lala <- function(x){ 

    #see if I have initialized data.table I'm going to append to 


     if(exists("started")){ 
     p <- COD.dt[x ==`Cause of Death`] 
     blah <- COD.dt[x ==`Cause of Death`]$Count/sum(COD.dt[x ==`Cause of Death`]$Count) 
     p$PercentofDeath <- blah 
     started <<- rbind(started,p) 
     } 

     #initialize data table 
     else{ 
      l <- COD.dt[x ==`Cause of Death`] 
      blah <- COD.dt[x ==`Cause of Death`]$Count/sum(COD.dt[x ==`Cause of Death`]$Count) 
      l$PercentofDeath <- (blah) 
      started <<- l 
     } 

#if finished return 
if(x == unique(COD.dt$`Cause of Death`)[length(unique(COD.dt$`Cause of Death`))]){ 
    return(started) 
} 
} 

#run function 
h <- sapply(unique(COD.dt$`Cause of Death`), lala) 
    #remove from environment 
rm(started) 
#h is actually ends up being a list, the last object happen to be the one I want so I take that one 
finalTable <- h$`VIRAL HEPATITIS` 

所以,你可以看到。這段代碼非常難看,並且不適用。我希望從一些指導如何使這個更好。也許使用dpylr或其他一些函數?

最佳

回答

1

純數據表的解決方案將是容易的爲好,但這裏的dplyr:

library(dplyr) 

COD.dt %>% group_by(`Cause of Death`) %>% 
    mutate(PercentofDeath = Count/sum(Count)) 

可以變成一個功能,但它是這麼小,基本操作大多數人不會打擾。

+0

嗚。這很好。我一直有意使用%>%運算符。非常感謝。 – njBernstein

+0

不確定它是否提高了可讀性,但是對於magrittr,'PercentofDeath = Count%>%{./sum(。)}'在'mutate'中起作用。 – Frank

+1

@Frank我會說這大大降低了可讀性。 – Gregor

0

我剛剛發現了一個辦法更好的辦法:

library(data.table) 
#load library, change to data table 
COD.dt <- as.data.table(COD) 

#make column of disease total counts 
COD.dt[,disease:=sum(Count), by = list(`Cause of Death`)] 

#use that column to make percents 
COD.dt[,percent:=Count/disease, by = list(`Cause of Death`)] 
+2

這不需要是兩個單獨的步驟。除非你想因爲其他原因需要「疾病」專欄,你可以做'百分比:=計數/總和(計數)'。 – Gregor

+0

此外,您可以修改'COD'而不是單獨製作一個對象,比如'setDT(COD)'。 – Frank

+0

哦。很高興知道。甜。謝謝! @Gregor – njBernstein