R凍結，我的電腦也是

我不知道我的問題是否容易回答，但讓我們問一下。我使用R作爲語料庫語言學，我想用正確表達式來匹配，使用「exact.matches」（參見St. Th。Gries）。問題是，當我讓R運行腳本時，它凍結了很長時間，我的電腦也凍結了。所以我必須用電腦的電源按鈕來重啓所有的東西。R凍結，我的電腦也是

我想嘗試分析的是100個文本（以txt格式）的集合。整個捆綁包是17,254,537個令牌，但我一直試圖運行20個文件的代碼。同樣的問題：一切都凍結。代碼如下：

rm(list=ls(all=T)) 

setwd("C:/Users/Christophe/Documents/Doctorat_ULg/Corpora/Dutch/Gutenberg_corpus_NL") 
source("C:/_qclwr/_scripts/_scripts_code-exerciseboxes_chapters_3-5/exact_matches_new.R") 

corpus.files.1<-choose.files() # to load the first 58 text files 
corpus.files.2<-choose.files() # to load the 42 other files 
whole.corpus.file<-c(corpus.files.1, corpus.files.2) # to concatenate everything into one vector 
all.matches.verbs<-vector()  

for(i in whole.corpus.files) { 
    current.corpus.file<-scan(i, what="char", sep="\n", quiet=T) 
    current.matches.verbs<-exact.matches("aan<prep>", current.corpus.file, case.sens=F, pcre=T) 
    if(length(current.matches.verbs)==0) { next } 
    all.matches.verbs<-append(all.matches.verbs, current.matches.verbs) 
}

有沒有簡單的方法來解決這個問題？這似乎是一個記憶問題。我輸入以下內容，如果它可以幫助：

> memory.size() 
[1] 35.02 
> memory.limit() 
[1] 3976 
> gc() 
      used (Mb) gc trigger (Mb) max used (Mb) 
Ncells 558406 29.9  818163 43.7 741108 39.6 
Vcells 1039743 8.0 1757946 13.5 1300290 10.0

我提前感謝您的寶貴幫助。

最好，

CBechet。

來源

2015-10-02 CBechet

經典的錯誤：你正在循環中增長一個對象。閱讀[R地獄]的第二圈（http://www.burns-stat.com/pages/Tutor/R_inferno.pdf）。 – Roland

在進入循環之前預先定義對象的大小 –

如果我作弊並試圖使用外部硬盤驅動器（1TB），即使它不能解決增長對象的問題，您是否認爲它可以工作？ – CBechet

有一種替代for循環：

rm(list=ls(all=T)) 

setwd("C:/Users/Christophe/Documents/Doctorat_ULg/Corpora/Dutch/Gutenberg_corpus_NL") 
source("C:/_qclwr/_scripts/_scripts_code-exerciseboxes_chapters_3-5/exact_matches_new.R") 

corpus.files.1<-choose.files() # loads the first set of corpus files 
corpus.files.2<-choose.files() # loads the second set of corpus files 
whole.corpus.file<-c(corpus.files.1, corpus.files.2) # concatenate all the corpus files into one vector 

whole.text <-unlist(lapply(whole.corpus.file, function(x) scan(x, what="char", sep="\n", quiet=T))) # reads the content of the files in the vector

而且數據還是太大（和我不使用一個for循環）：

Error: cannot allocate vector of size 4.3 Mb 
In addition: Warning messages: 
1: In substr(lines, if (characters.around != 0) starts - characters.around else 1, : 
    Reached total allocation of 3976Mb: see help(memory.size) 
2: In substr(lines, if (characters.around != 0) starts - characters.around else 1, : 
    Reached total allocation of 3976Mb: see help(memory.size) 
3: In substr(lines, if (characters.around != 0) starts - characters.around else 1, : 
    Reached total allocation of 3976Mb: see help(memory.size) 
4: In substr(lines, if (characters.around != 0) starts - characters.around else 1, : 
    Reached total allocation of 3976Mb: see help(memory.size)

來源

2015-10-02 20:52:52 CBechet

R凍結，我的電腦也是

回答

相關問題