在合併多個數據集的過程中,我試圖刪除某個特定變量缺少值的數據框的所有行(我想讓NAs保持在當前的一些其他專欄)。我用下面的一行:!is.na在其他列中創建NDA
data.frame <- data.frame[!is.na(data.frame$year),]
這成功地消除了與NAS進行year
,(沒有人)都行,但其他列,而此前有數據,現在完全是來港定居。換句話說,非缺失值正被轉換爲NA。關於這裏發生了什麼的任何想法?我已經試過這些替代方案,得到了相同的結果:我使用is.na
不當
data.frame <- subset(data.frame, !is.na(year))
data.frame$x <- ifelse(is.na(data.frame$year) == T, 1, 0);
data.frame <- subset(data.frame, x == 0)
是誰?在這種情況下是否有任何替代is.na
?任何幫助將不勝感激!
編輯下面是代碼,應該重現該問題:
#data
tc <- read.csv("http://dl.dropbox.com/u/4115584/tc2008.csv")
frame <- read.csv("http://dl.dropbox.com/u/4115584/frame.csv")
#standardize NA codes
tc[tc == "."] <- NA
tc[tc == -9] <- NA
#standardize spatial units
colnames(frame)[1] <- "loser"
colnames(frame)[2] <- "gainer"
frame$dyad <- paste(frame$loser,frame$gainer,sep="")
tc$dyad <- paste(tc$loser,tc$gainer,sep="")
drops <- c("loser","gainer")
tc <- tc[,!names(tc) %in% drops]
frame <- frame[,!names(frame) %in% drops]
rm(drops)
#merge tc into frame
data <- merge(tc, frame, by.x = "year", by.y = "dyad", all.x=T, all.y=T) #year column is duplicated in this process. I haven't had this problem with nearly identical code using other data.
rm(tc,frame)
#the first column in the new data frame is the duplicate year, which does not actually contain years. I'll rename it.
colnames(data)[1] <- "double"
summary(data$year) #shows 833 NA's
summary(data$procedur) #note that at this point there are non-NA values
#later, I want to create 20 year windows following the events in the tc data. For simplicity, I want to remove cases with NA in the year column.
new.data <- data[!is.na(data$year),]
#now let's see what the above operation did
summary(new.data$year) #missing years were successfully removed
summary(new.data$procedur) #this variable is now entirely NA's
請給我們一個可重複的數據。請不要將'data.frame'命名爲'data.frame'。由於已經有一個名爲'data.frame'的函數。 – Arun 2013-02-19 21:20:27
@Arun但是他能命名他的'data.frame''函數嗎,還是已經有'data.frame'叫'function'? :) – juba 2013-02-19 21:23:27
:)我的頭在旋轉。大聲笑。 – Arun 2013-02-19 21:38:39