找到R向量中第一個非NA值的索引位置？

我有一個問題，一個矢量在開始時有一堆NAs，之後有數據。然而，我的數據的特點是前n個非NA值可能不可靠，所以我想刪除它們並用NA代替它們。找到R向量中第一個非NA值的索引位置？

例如，如果我在索引位置4的長度爲20的矢量，和非NAS開始：

> z 
[1]   NA   NA   NA -1.64801942 -0.57209233 0.65137286 0.13324344 -2.28339326 
[9] 1.29968050 0.10420776 0.54140323 0.64418164 -1.00949072 -1.16504423 1.33588892 1.63253646 
[17] 2.41181291 0.38499825 -0.04869589 0.04798073

我想去除第一3非NA值，我相信要是不可靠的，給這個：

> z 
[1]   NA   NA   NA   NA   NA   NA 0.13324344 -2.28339326 
[9] 1.29968050 0.10420776 0.54140323 0.64418164 -1.00949072 -1.16504423 1.33588892 1.63253646 
[17] 2.41181291 0.38499825 -0.04869589 0.04798073

當然，我需要一個通用的解決方案，我永遠不知道什麼時候第一個非NA值開始。我會如何去做這件事？ IE如何找出第一個非NA值的索引位置？

爲了完整起見，我的數據實際上排列在一個數據框中，這些數據框中有很多列，每個矢量可以有不同的非NA起始位置。同樣，一旦數據開始，可能會有更多的零星神經網絡進一步下降，這使我無法簡單地計算它們的數量，作爲解決方案。

來源

2011-07-24 Thomas Browne

有沒有一種有效的方法來做到這一點，當它找到第一個時會停止搜索？ –

使用is.na和which的組合來查找非NA索引位置。

NonNAindex <- which(!is.na(z)) 
firstNonNA <- min(NonNAindex) 

# set the next 3 observations to NA 
is.na(z) <- seq(firstNonNA, length.out=3)

來源

2011-07-24 18:25:43

當，這是我的第二個猜測。想用'rle（）'看中，但我更喜歡這個解決方案。 –

完美的謝謝。經過一番思考，我想出了分鐘（（1：長度（z））[！is.na（z）]），但當然這個想法要好得多。完美 –

'firstNonNA < - NonNAindex [1]'更快嗎？我會遇到一些使用'[1]'和'min（）'的問題嗎？ –

我會做沿着

# generate some data 
tb <- runif(10) 
tb[1:3] <- NA 

# I convert vector to TRUE/FALSE based on whether it's NA or not 
# rle function will tell you when something "changes" in the vector 
# (in our case from TRUE to FALSE) 
tb.rle <- rle(is.na(tb)) 

# this is where vector goes from all TRUE to (at least one) FALSE 
# your first true number is one position ahead, so +1 
tb.rle$lengths[1] 

# you can now subset your vector with the first non-NA value 
# and do with it whatever you want. I assign it a fantastic 
# non-believable number 
tb[tb.rle$lengths[1] + 1] <- 42

來源

2011-07-24 18:26:26

類似的想法到@Joshua線的東西，但使用which.min()

## dummy data 
set.seed(1) 
dat <- runif(10) 
dat[seq_len(sample(10, 1))] <- NA 

## start of data 
start <- which.min(is.na(dat))

這給：

> (start <- which.min(is.na(dat))) 
[1] 4

用此設置start:(start+2)至NA

is.na(dat) <- seq(start, length.out = 3)

導致：

> dat 
[1]   NA   NA   NA   NA   NA 
[6]   NA 0.94467527 0.66079779 0.62911404 0.06178627

來源

2011-07-24 18:43:00

更清潔。謝謝，也是爲了延續答案。 –

+1，但我不清楚清潔。它比較短，但對於沒有意識到'which.min'分別將'TRUE'和'FALSE'強制爲'1'和'0'的人可能不太清楚。 –

@Joshua同意，它也依賴於which.min返回任何綁定最小值的第一個行爲。不確定更短的值得接受。 –

如果處理大數據，Position比which相當快，因爲它只直到找到一個匹配，而不是評估全矢量計算。

x=c(rep(NA,3),1:1e8) 
Position(function(x)!is.na(x), x) 
# 4

我們可以通過

pos = Position(function(x)!is.na(x), x) 
x[pos:min(pos+N-1, length(x))] <- NA

來源

2016-08-06 06:36:33 dww

這對大數據執行得很好 –

不需要定義一個新函數，可以使用'complete.cases' – Renu

na.trim（）在動物園包可以幫助分配NA以下的N個值（或載體，以先到者爲準的端部）。

library(zoo) 
dummy.data <- c(rep(NA, 5), seq(1:7), NA) 
x <- length(dummy.data) - length(na.trim(dummy.data, sides = "left")) 
dummy.data[(x+1):(x+3)] <- NA 
dummy.data 
[1] NA NA NA NA NA NA NA NA 4 5 6 7 NA

來源

2017-05-19 22:10:19 InColorado

找到R向量中第一個非NA值的索引位置？

回答

相關問題