根據我的經驗,有三個原因可以避免for
循環。首先是他人可能難以閱讀(如果你分享你的代碼),並且功能家族可以改善這一點(並且對收益更加明確)。第二種是在某些情況下可能帶來的速度優勢,特別是如果您想讓代碼並行運行(例如,大多數apply
函數非常平行,而for
循環需要更多工作來分解)。
但是,這是你在這裏服務你的第三個原因:向量化解決方案通常比上述任何方法都要好,因爲它避免了重複調用(例如,在循環結尾的c
,檢查if
等) 。在這裏,你可以用一個矢量化的調用來完成所有的事情。
首先,一些樣本數據
set.seed(8675309)
yrdf <- data.frame(Adj.Close = rnorm(5))
然後,我們乘100
一切,把相鄰條目的diff
在Adj.Close
和使用矢量除以以下條目來劃分。請注意,如果(且僅當)您需要結果與輸入的長度相同時,我需要填充NA
。如果你不想/需要這個向量末尾的NA
,它可以更容易。
100 * c(diff(yrdf$Adj.Close),NA)/c(yrdf$Adj.Close[2:nrow(yrdf)], NA)
返回
[1] 238.06442 216.94975 130.41349 -90.47879 NA
而且,要明確,這裏是microbenchmark
比較:
myForLoop <- function(){
numrows = nrow(yrdf)
diff.vec = c() # vector of differences
for (index in 1:nrow(yrdf)) { # yrdf is a data frame
if (index == numrows) {
diff = NA # because there is no entry "below" it
} else {
val_index = yrdf$Adj.Close[index]
val_next = yrdf$Adj.Close[index+1]
diff = val_index - val_next # diff between two adjacent values
diff = diff/yrdf$Adj.Close[index+1] * 100.0
}
diff.vec<-c(diff.vec,diff) # append to vector of differences
}
return(diff.vec)
}
microbenchmark::microbenchmark(
forLoop = myForLoop()
, vector = 100 * c(diff(yrdf$Adj.Close),NA)/c(yrdf$Adj.Close[2:nrow(yrdf)], NA)
)
給出:
Unit: microseconds
expr min lq mean median uq max neval
forLoop 74.238 78.184 82.06786 81.287 84.3740 104.190 100
vector 20.193 21.718 23.91824 22.716 24.0535 80.754 100
注意,vector
辦法採取s約爲for
循環的30%。這得到作爲數據大小的增加更重要的是:
set.seed(8675309)
yrdf <- data.frame(Adj.Close = rnorm(10000))
microbenchmark::microbenchmark(
forLoop = myForLoop()
, vector = 100 * c(diff(yrdf$Adj.Close),NA)/c(yrdf$Adj.Close[2:nrow(yrdf)], NA)
)
給
Unit: microseconds
expr min lq mean median uq max neval
forLoop 306883.977 315116.446 351183.7997 325211.743 361479.6835 545383.457 100
vector 176.704 194.948 326.6135 219.512 236.9685 4989.051 100
注意,在這些規模如何龐大的差異 - 矢量版本採用的小於0.1%運行的時間。在這裏,這可能是因爲每次調用c
添加新條目都需要重新讀取完整的向量。略有變化可以加速for循環了一下,但沒有得到它一路矢量速度:
myForLoopAlt <- function(){
numrows = nrow(yrdf)
diff.vec = numeric(numrows) # vector of differences
for (index in 1:nrow(yrdf)) { # yrdf is a data frame
if (index == numrows) {
diff = NA # because there is no entry "below" it
} else {
val_index = yrdf$Adj.Close[index]
val_next = yrdf$Adj.Close[index+1]
diff = val_index - val_next # diff between two adjacent values
diff = diff/yrdf$Adj.Close[index+1] * 100.0
}
diff.vec[index] <- diff # append to vector of differences
}
return(diff.vec)
}
microbenchmark::microbenchmark(
forLoop = myForLoop()
, newLoop = myForLoopAlt()
, vector = 100 * c(diff(yrdf$Adj.Close),NA)/c(yrdf$Adj.Close[2:nrow(yrdf)], NA)
)
給
Unit: microseconds
expr min lq mean median uq max neval
forLoop 304751.250 315433.802 354605.5850 325944.9075 368584.2065 528732.259 100
newLoop 168014.142 179579.984 186882.7679 181843.7465 188654.5325 318431.949 100
vector 169.569 208.193 331.2579 219.9125 233.3115 2956.646 100
這節省了一半的時間關閉for
循環的方法,但仍然比矢量化解決方案慢得多。
0123'有'diff'函數來獲取'R'中的相鄰元素的差異另外,檢查'dplyr'中的'lead'和'lag'函數 – akrun
誰告訴你這是錯誤的。一些操作將需要一個循環。 –
有時候'for loops'是首選的方法。請參閱[this](http://stackoverflow.com/a/6466415/4408538)文章,瞭解何時實施「for循環」的詳細說明。查看這些帖子以更好地理解R的循環結構:[post1](http://stackoverflow.com/a/2276001/4408538)和[post2](http://stackoverflow.com/q/28983292/4408538)。 –