假數據

K <- 5 # number of rows set to NaN 

df <- data.frame(state = c(rep(1, 10), rep(2, 10)), 
       county = rep(1:4, 5), yield = 100) 

df[sample(1:20, K), 3] <- NaN

當前代碼：

df1 <- read.csv("gly2.csv",header=TRUE) 

df <- data.frame(df1) 


droprows_1 <- function(df, v1, v2, v3, value = 'x'){ 
    idx <- df[, v3] == value 
    todrop <- df[idx, c(v1, v2)]; todrop # should have K rows missng 
    todrop <- unique(todrop); todrop # but unique values could be less 

    nrow <- dim(todrop)[1] 
    for(i in 1:nrow){ 
    idx <- apply(df, 1, function(x) all(x == todrop[i, ])) 
    df <- df[!idx, ] 
    } 
    return(df) 
} 

qq <- droprows_1(df, 1, 2, 3)

謝謝

來源

2015-05-14 Peter Alexander

這是不明確在所有說實話。假定您的假數據會導致什麼後果？ – thelatemail

我希望包含缺失年份的縣被完全從數據集中刪除。 –

要降縣與一個單一的缺失值，使用：

library(dplyr) 
df %>% group_by(county) %>% filter(!any(is.nan(yield)))

來源

2015-05-14 07:32:39 Jthorpe

優秀，完美的作品。謝謝 –

這在data.table中很容易。我不完全按照你的例子，但我認爲你這個樣本數據獲取正在尋找：現在

dt<-data.table(state=letters[sample(26,size=20000,replace=T)], 
       county=sample(20,size=20000,replace=T), 
       year=rep(1981:2000,length.out=20000), 
       var=rnorm(20000), 
       key=c("state","county","year")) 

# Duplicated a bunch of state/year combinations 
dt<-unique(dt)

，您的問題。如果您是data.table的新手，我會一步一步來解決;最後一行是你真正需要的。

# This will count the number of years for each state/county combination: 
dt[,.N,by=.(state,county)] 

# To focus on only those combinations which appear for every year 
# (in my example there are 20 years) 
# (also simultaneously drop the N column since we know every N is 20) 
dt[,.N,by=.(state,county)][N==20,!"N",with=F] 

# The grande finale: reduce your data set to 
# ONLY those combinations with full support: 
full_data<-dt[.(dt[,.N,by=.(state,county)][N==20,!"N",with=F])]

注意，最後一步需要我們設置的dt的鑰匙state和county，按照這個順序，可以用setkey(dt,state,county)完成。如果您不熟悉data.table表示法，我建議使用this頁，特別是this小插圖。

編輯：剛纔看到你可存儲NA值year，這種情況下你應該調整代碼出去計數NA S的：

full_data<-dt[.(dt[!is.na(year),.N,by=.(state,county)][N==20,!"N",with=F])]

來源

2015-05-15 18:59:22 MichaelChirico

非常有用，謝謝 –

只有完整的年份過濾

假數據

回答

相關問題