！is.na在其他列中創建NDA

在合併多個數據集的過程中，我試圖刪除某個特定變量缺少值的數據框的所有行（我想讓NAs保持在當前的一些其他專欄）。我用下面的一行：！is.na在其他列中創建NDA

data.frame <- data.frame[!is.na(data.frame$year),]

這成功地消除了與NAS進行year，（沒有人）都行，但其他列，而此前有數據，現在完全是來港定居。換句話說，非缺失值正被轉換爲NA。關於這裏發生了什麼的任何想法？我已經試過這些替代方案，得到了相同的結果：我使用is.na不當

data.frame <- subset(data.frame, !is.na(year)) 

data.frame$x <- ifelse(is.na(data.frame$year) == T, 1, 0); 
data.frame <- subset(data.frame, x == 0)

是誰？在這種情況下是否有任何替代is.na？任何幫助將不勝感激！

編輯下面是代碼，應該重現該問題：

#data 
tc <- read.csv("http://dl.dropbox.com/u/4115584/tc2008.csv") 
frame <- read.csv("http://dl.dropbox.com/u/4115584/frame.csv") 

#standardize NA codes 
tc[tc == "."] <- NA 
tc[tc == -9] <- NA 

#standardize spatial units 
colnames(frame)[1] <- "loser" 
colnames(frame)[2] <- "gainer" 
frame$dyad <- paste(frame$loser,frame$gainer,sep="") 
tc$dyad <- paste(tc$loser,tc$gainer,sep="") 
drops <- c("loser","gainer") 
tc <- tc[,!names(tc) %in% drops] 
frame <- frame[,!names(frame) %in% drops] 
rm(drops) 

#merge tc into frame 
data <- merge(tc, frame, by.x = "year", by.y = "dyad", all.x=T, all.y=T) #year column is duplicated in  this process. I haven't had this problem with nearly identical code using other data. 

rm(tc,frame) 

#the first column in the new data frame is the duplicate year, which does not actually contain years. I'll rename it. 
colnames(data)[1] <- "double" 

summary(data$year) #shows 833 NA's 

summary(data$procedur) #note that at this point there are non-NA values 

#later, I want to create 20 year windows following the events in the tc data. For simplicity, I want to remove cases with NA in the year column. 

new.data <- data[!is.na(data$year),] 

#now let's see what the above operation did 
summary(new.data$year) #missing years were successfully removed 
summary(new.data$procedur) #this variable is now entirely NA's

來源

2013-02-19 davy

請給我們一個可重複的數據。請不要將'data.frame'命名爲'data.frame'。由於已經有一個名爲'data.frame'的函數。 – Arun 2013-02-19 21:20:27

@Arun但是他能命名他的'data.frame''函數嗎，還是已經有'data.frame'叫'function'？ :) – juba 2013-02-19 21:23:27

:)我的頭在旋轉。大聲笑。 – Arun 2013-02-19 21:38:39

我認爲實際的問題是您的merge。

合併完並有數據data，如果你這樣做：

# > table(data$procedur, useNA="always") 

# 1  2  3  4  5  6 <NA> 
# 122 112 356  59  39  19 192258

你看有這麼多（122+112...+19）值data$procedur。但是，所有這些值都對應於data$year = NA。

> all(is.na(data$year[!is.na(data$procedur)])) 
# [1] TRUE # every value of procedur occurs where year = NA

所以，基本上，的procedur所有值也將被刪除，因爲你刪除了這些行中year爲NA檢查。

爲了解決這個問題，我認爲你應該使用merge爲：如果該合併給你想要的結果

merge(tc, frame, all=T) # it'll automatically calculate common columns 
# also this will not result in duplicated year column.

檢查。

來源

2013-02-19 22:29:31 Arun

嘗試complete.cases：

data.frame.clean <- data.frame[complete.cases(data.frame$year),]

...雖然，如上面提到的，你可能想選擇一個更具描述性的名字。

來源

2013-02-19 21:52:13

'is.na'的用法是對的。所以，我懷疑這會有什麼不同。 – Arun 2013-02-19 21:54:07

感謝您的建議。但是，結果是完全一樣的。 – davy 2013-02-19 22:07:13

！is.na在其他列中創建NDA

回答

相關問題