優化R中for循環的性能

我有一個字符向量，並且想要爲每對向量值（使用stringdist包）創建一個包含距離矩陣的矩陣。目前，我有嵌套的for循環的實現：優化R中for循環的性能

library(stringdist) 

strings <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic") 
m <- matrix(nrow = length(strings), ncol = length(strings)) 
colnames(m) <- strings 
rownames(m) <- strings 

for (i in 1:nrow(m)) { 
    for (j in 1:ncol(m)) { 
    m[i,j] <- stringdist::stringdist(tolower(rownames(m)[i]), tolower(colnames(m)[j]), method = "lv") 
    } 
}

導致下面的矩陣：

> m 
     Hello Helo Hole Apple Ape New Old System Systemic 
Hello  0 1 3  4 5 4 4  6  7 
Helo   1 0 2  4 4 3 3  6  7 
Hole   3 2 0  3 3 4 2  5  7 
Apple  4 4 3  0 2 5 4  5  7 
Ape   5 4 3  2 0 3 3  5  7 
New   4 3 4  5 3 0 3  5  7 
Old   4 3 2  4 3 3 0  6  8 
System  6 6 5  5 5 5 6  0  2 
Systemic  7 7 7  7 7 7 8  2  0

但是，如果我有 - 例如 - lenght 1000的矢量與許多非獨特的價值觀，這個矩陣是相當大的（比方說，800行800列）和循環是非常慢。我喜歡優化性能，例如通過使用apply函數，但我不知道如何將上面的代碼翻譯成apply語法。誰能幫忙？

來源

2014-09-03 Daniel

'apply'也循環，並不見得快於for循環。請參閱http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar – 2014-09-03 12:04:08

代碼優化問題應該在CodeReview上提出，而不是StackOverflow http://codereview.stackexchange.com/ – 2016-06-26 16:08:41

由於@hrbrmstr的提示我發現了stringdist包本身提供了稱爲stringdistmatrix的函數，該函數完成我所要求的操作（請參閱here）。

函數調用很簡單：stringdistmatrix(strings, strings)

來源

2014-09-03 12:22:36 Daniel

當使用嵌套循環時，檢查outer()是否不適合您是非常有趣的。 outer()是嵌套循環的向量化解決方案;它將向量化的函數應用於前兩個參數中元素的每種可能的組合。 as stringdist()對載體有效，你可以簡單地做：

library(stringdist) 
strings <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", 
      "Old", "System", "Systemic") 

outer(strings,strings, 
     function(i,j){ 
     stringdist(tolower(i),tolower(j)) 
     })

給你想要的結果。

來源

2014-09-03 12:00:10

以前不知道'外部'功能，但是這也有訣竅！ – Daniel 2014-09-03 12:10:20

下面是一個簡單的開始：矩陣是對稱的，所以不需要計算對角線下的條目。 m[j][i] = m[i][j]。顯然，對角元素都是零，所以沒有必要打擾這些。

像這樣：

for (i in 1:nrow(m)) { 
    m[i][i] <- 0 
    for (j in (i+1):ncol(m)) { 
    m[i,j] <- stringdist::stringdist(tolower(rownames(m)[i]), tolower(colnames(m)[j]), method = "lv") 
    m[j,i] <- m[i,j] 
    } 
}

來源

2014-09-03 12:01:50 duffymo

Bioconductor的具有stringDist功能，可以爲你做這個：

source("http://bioconductor.org/biocLite.R") 
biocLite("Biostrings") 

library(Biostrings) 

stringDist(c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic"), upper=TRUE) 

## 1 2 3 4 5 6 7 8 9 
## 1 1 3 4 5 4 4 6 7 
## 2 1 2 4 4 3 3 6 7 
## 3 3 2 3 3 4 3 5 7 
## 4 4 4 3 2 5 4 5 7 
## 5 5 4 3 2 3 3 5 7 
## 6 4 3 4 5 3 3 5 7 
## 7 4 3 3 4 3 3 6 8 
## 8 6 6 5 5 5 5 6 2 
## 9 7 7 7 7 7 7 8 2

來源

2014-09-03 12:02:30 hrbrmstr

非常感謝我的恥辱：'stringdist'包也有這樣一個函數：'stringdistmatrix' – Daniel 2014-09-03 12:09:24

你可以/應該發佈它作爲答案並且拒絕接受我並接受它（點！）。我最近在腦海裏有了「bioconductor」（爲infosec構建類似的東西），並且它的答案太過於誇張。 – hrbrmstr 2014-09-03 12:15:22

好的，完成了，但我可以在兩天內首先接受我自己的答案。 – Daniel 2014-09-03 12:23:10

優化R中for循環的性能

回答

相關問題