如何在沒有實際運行R代碼的情況下測量代碼的執行時間？

我可以使用`microbenchmark來計算在R中執行我的代碼所需的大概時間嗎？我正在運行一些代碼，我可以看到執行需要很多小時？我不想一直運行我的代碼。我想看到的大致執行時間沒有R.實際運行的代碼如何在沒有實際運行R代碼的情況下測量代碼的執行時間？

來源

2016-06-29 Ronnie Day

這將如何在概念上起作用？如果你不想執行你的代碼*，那麼你想要的就是[暫停問題]（http://www.biomart.org/other/biomart_0.9_0_documentation.pdf） - 也就是說不可能的。 –

即使你想運行你的代碼「一點點」，並從中推斷出現問題：例如，你的算法可能由多個部分組成，其中第一個需要2小時，第二個需要5秒。所以，即使你只是執行「一點點」，即使profiler會知道它「走了多遠」，它也會推斷出代碼的第二部分需要和第一部分一樣長，除非profiler已經存在知道情況並非如此（它將如何知道？我們又回到了原點）。 –

嘗試在較小的問題上運行你的代碼，看看它是如何擴展

> fun0 = function(n) { x = integer(); for (i in seq_len(n)) x = c(x, i); x } 
> p = microbenchmark(fun0(1000), fun0(2000), fun0(4000), fun0(8000), fun0(16000), 
+     times=20) 
> p 
Unit: milliseconds 
     expr  min   lq  mean  median   uq  max 
    fun0(1000) 1.627601 1.697958 1.995438 1.723522 2.289424 2.935609 
    fun0(2000) 5.691456 6.333478 6.745057 6.928060 7.056893 8.040366 
    fun0(4000) 23.343611 24.487355 24.987870 24.854968 25.554553 26.088183 
    fun0(8000) 92.517691 95.827525 104.900161 97.305930 112.924961 136.434998 
fun0(16000) 365.495320 369.697953 380.981034 374.456565 390.829214 411.203191 
neval 
    20 
    20 
    20 
    20 
    20

加倍問題規模導致成倍較慢的執行;可視化爲

library(ggplot2) 
ggplot(p, aes(x=expr, y=log(time))) + geom_point() + 
    geom_smooth(method="lm", aes(x=as.integer(expr)))

這對於大問題是一個可怕的消息！

研究在返回相同答案時縮放得更好的替代實現，既隨着問題在規模和給定問題大小上的增加而增加。首先確保你的算法/實現得到相同的答案

> ## linear, ok 
> fun1 = function(n) { x = integer(n); for (i in seq_len(n)) x[[i]] = i; x } 
> identical(fun0(100), fun1(100)) 
[1] TRUE

然後看看有問題的尺寸新算法尺度

> microbenchmark(fun1(100), fun1(1000), fun1(10000)) 
Unit: microseconds 
     expr  min  lq  mean median   uq  max neval 
    fun1(100) 86.260 97.558 121.5591 102.6715 107.6995 1058.321 100 
    fun1(1000) 845.160 902.221 932.7760 922.8610 945.6305 1915.264 100 
fun1(10000) 8776.673 9100.087 9699.7925 9385.8560 10310.6240 13423.718 100

如何探索更多的算法，尤其是那些與矢量更換迭代

> ## linear, faster -- *nano*seconds 
> fun2 = seq_len 
> identical(fun1(100), fun2(100)) 
[1] TRUE 
> microbenchmark(fun2(100), fun2(1000), fun2(10000)) 
Unit: nanoseconds 
     expr min  lq  mean median uq max neval 
    fun2(100) 417 505.0 587.53 553 618 2247 100 
    fun2(1000) 2126 2228.5 2774.90 2894 2986 5511 100 
fun2(10000) 19426 19741.0 25390.93 27177 28209 43418 100

比較特定算法芯片尺寸

> n = 1000; microbenchmark(fun0(n), fun1(n), fun2(n), times=10) 
Unit: microseconds 
    expr  min  lq  mean median  uq  max neval 
fun0(n) 1625.797 1637.949 2018.6295 1657.1195 2800.272 2857.400 10 
fun1(n) 819.448 843.988 874.9445 853.9290 910.871 1006.582 10 
fun2(n) 2.158 2.386 2.5990 2.6565 2.716 3.055 10 
> n = 10000; microbenchmark(fun0(n), fun1(n), fun2(n), times=10) 
Unit: microseconds 
    expr  min   lq  mean  median   uq  max 
fun0(n) 157010.750 157276.699 169905.4745 159944.5715 192185.973 197389.965 
fun1(n) 8613.977 8630.599 9212.2207 9165.9300 9394.605 10299.821 
fun2(n)  19.296  19.384  20.7852  20.8595  21.868  22.435 
neval 
    10 
    10 
    10

顯示隨着問題規模的增加，合理實施的重要性日益增加。

來源

2016-06-29 12:37:17

如何在沒有實際運行R代碼的情況下測量代碼的執行時間？

回答

相關問題