2016-12-15 35 views
2

我試圖從數據框中計算出家庭規模,其中還包含兩類事件:死亡的家庭成員和離開家人的家庭成員。我想考慮這兩個參數來計算實際的家庭規模。 這是我的問題的生殖例如,僅具有3個家族:來自數據幀的R計數和減法事件

family <- factor(rep(c("001","002","003"), c(10,8,15)), levels=c("001","002","003"), labels=c("001","002","003"), ordered=TRUE) 
dead <- c(0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0) 
left <- c(0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,1,0,0,0,0,0,0,1,1,1,0,0) 
DF <- data.frame(family, dead, left) ; DF 

我能數N =總家族成員(在每個家庭)在第二數據幀DF2,通過簡單地使用表()

DF2 <- with(DF, data.frame(table(family))) 
colnames(DF2)[2] <- "N" ; DF2 
family N 
1 001 10 
2 002 8 
3 003 15 

但我找不到一個合適的方法來獲取實際人數(例如,創建一個新的變量N2到DF2中),通過減去N來計算死亡或離開家庭的成員數量。我想我必須以某種方式將兩個數據幀DF和DF2聯繫起來。我已經在這個網站尋找其他相關的問題,但找不到正確的答案... 如果任何人有一個好主意,這將是偉大的! 預先感謝您.. 傑尼

+0

'library(dplyr); DF%>%group_by(family)%>%summarize(n() - sum(dead)-sum(left))' –

回答

2

邏輯:首先,我們要group_by(family),然後計算2號:1)各組二)總#obs減去這個總的sum(dead) + sum(left)

dplyr包:n()幫助我們獲得總#observations每組

data.table.N做同樣的工作之上

library(dplyr) 
DF %>% group_by(family) %>% summarise(total = n(), current = n()-sum(dead,left, na.rm = TRUE)) 
# family total current 
# (fctr) (int) (dbl) 
#1 001 10  6 
#2 002  8  4 
#3 003 15  7 


library(data.table) 
# setDT() is preferred if incase your data was a data.frame. else just DF. 
setDT(DF)[, .(total = .N, current = .N - sum(dead, left, na.rm = TRUE)), by = family] 
# family total current 
#1: 001 10  6 
#2: 002  8  4 
#3: 003 15  7 
+1

謝謝Joel爲您提供的兩種解決方案。這對我來說是一大步,謝謝 – den

+1

請不要發表[代碼只是答案](http://meta.stackexchange.com/questions/148272/is-there-any-benefit-to-allowing-code-only -answers-while-blocking-code-only-ques)對於除OP之外的任何人都沒有幫助,他/她的具體問題 –

+0

這在上面的例子中很好,但不是在我的真實數據庫中,在那裏我必須計數某些變量的屬性(不僅是0或1):「DF%>%group_by(family)%>%summarize(total = n(),current = n() - sum(dead == 1)-sum(left = = 1))「我得到了以下錯誤信息:錯誤mutate_impl(.data,dots): 錯誤的結果大小(3853),預計33或1 ...任何想法如何解決這個問題?謝謝 – den

2

這裏是一個base R選項

do.call(data.frame, aggregate(dl~family, transform(DF, dl = dead + left), 
     FUN = function(x) c(total=length(x), current=length(x) - sum(x)))) 

或修改後的版本是

transform(aggregate(. ~ family, transform(DF, total = 1, 
    current = dead + left)[c(1,4:5)], FUN = sum), current = total - current) 
#  family total current 
#1 001 10  6 
#2 002  8  4 
#3 003 15  7 
0

我終於找到另一個工作正常(從另一篇文章),允許計算從原始DF表中的一切。本品採用ddply功能:

DF <- ddply(DF,.(family),transform,total=length(family)) DF <- ddply(DF,.(family),transform,actual=length(family)-sum(dead=="1")-sum(left=="1")) DF

非常感謝大家誰幫助! Deni