2013-07-17 29 views
0

我創建了一個變量,根據數據框將物種的組描述爲家庭,野生或異國情調,其中每行表示在唯一網站中找到的物種(siteID) 。我想通過每個站點ID將行插入到我的數據框中,以報告在該站點上未觀察到的組的一個或多個0。換句話說,這是數據幀我有:在數據框中爲觀察到的變量創建行,但沒有通過因子明確記錄變量

df.start <- data.frame(species = c("dog","deer","toucan","dog","deer","toucan"), 
    siteID = c("a","b","b","c","c","c"), 
    group = c("domestic", "wild", "exotic", "domestic", "wild", "exotic"), 
    value = c(2:7)) 

df.start 
# species siteID group value 
# 1  dog  a domestic  2 
# 2 deer  b  wild  3 
# 3 toucan  b exotic  4 
# 4  dog  c domestic  5 
# 5 deer  c  wild  6 
# 6 toucan  c exotic  7 

這是該數據幀我想:

df.end <-data.frame(species=c("dog","NA","NA","NA","deer", 
           "toucan","dog","deer","toucan"), 
    siteID = c("a","a","a","b","b","b","c","c","c"), 
    group = rep(c("domestic", "wild", "exotic"),3), 
    value = c(2,0,0,0,3,4,5,6,7)) 

df.end 
# species siteID group value 
# 1  dog  a domestic  2 
# 2  NA  a  wild  0 
# 3  NA  a exotic  0 
# 4  NA  b domestic  0 
# 5 deer  b  wild  3 
# 6 toucan  b exotic  4 
# 7  dog  c domestic  5 
# 8 deer  c  wild  6 
# 9 toucan  c exotic  7 

在此之前,因爲我想用一個plyr功能來概括按組平均值我意識到某些團體網站組合的零點缺失,並誇大了我的估計。也許我錯過了一個更明顯的解決方法?

回答

1

使用基礎R功能:

result <- merge( 
    with(df.start, expand.grid(siteID=unique(siteID),group=unique(group))), 
    df.start, 
    by=c("siteID","group"), 
    all.x=TRUE 
) 
result$value[is.na(result$value)] <- 0 

> result 
    siteID group species value 
1  a domestic  dog  2 
2  a exotic <NA>  0 
3  a  wild <NA>  0 
4  b domestic <NA>  0 
5  b exotic toucan  4 
6  b  wild deer  3 
7  c domestic  dog  5 
8  c exotic toucan  7 
9  c  wild deer  6 
1
df.sg <- data.frame(xtabs(value~siteID+group, data=df.start)) 
merge(df.start[-4], df.sg, by=c("siteID", "group"), all.y=TRUE) 
#------------- 
    siteID group species Freq 
1  a domestic  dog 2 
2  a exotic <NA> 0 
3  a  wild <NA> 0 
4  b domestic <NA> 0 
5  b exotic toucan 4 
6  b  wild deer 3 
7  c domestic  dog 5 
8  c exotic toucan 7 
9  c  wild deer 6 

xtabs該函數返回一個表,它讓as.data.frame.table方法適用於它。非常便利。

+0

不錯的答案。我覺得必須有一個「聚合」解決方案,但這不是我想要的。 – thelatemail

+0

我不這麼認爲。 「集合」丟棄(或者從未通知)缺失的交叉組合。 '聚合(df.start $ value,df.start [,c(「siteID」,「group」)],FUN = I)' –

+0

我不知道xtabs。非常方便 - 謝謝! – sho