R永遠需要計算一個簡單的程序

allWords是一個130萬字的矢量，有一些重複。我想要做的，是創建兩個載體：R永遠需要計算一個簡單的程序

一個字

B帶字的次數

所以，我可以在以後加入他們在一個矩陣，從而關聯他們，像：「媽媽」，3; 「鉛筆」，14等

for(word in allWords){ 

    #get a vector with indexes for all repetitions of a word 
    temp <- which(allWords==word) 
    #Make "allWords" smaller - remove duplicates 
    allWords= allWords[-which(allWords==word)] 
    #Calculate occurance 
    occ<-length(temp) 
    #store 
    A = c(A,word) 
    B = c(B,occ) 
}

這for循環需要永遠，我真的不知道爲什麼或我做錯了什麼。從文件中讀取130萬字最多隻需5秒，但執行這些基本操作決不會讓算法終止。

來源

2013-09-28 user2827159

您在[將R地獄]的圈2（http://www.burns-stat.com/pages/Tutor/R_inferno.pdf） – GSee

有人應該得到這是一個更好的標題......也許是「縮短和增加循環中的對象」 – Frank

給你的向量的大小，我覺得data.table能在這種情況下成爲好朋友_

> library(data.table) 
> x <- c("dog", "cat", "dog") # Ferdinand.kraft's example vector 
> dtx <- data.table(x)   # converting `x` vector into a data.table object 
> dtx[, .N, by="x"]   # Computing the freq for each word 
    x N 
1: dog 2 
2: cat 1

來源

2013-09-28 21:46:45

OP也想要位置，所以......'dtx [，list（.N，occ = list（.I）），by = 「x」]'，我想。 – Frank

這確實是最快的。檢查此：http://stackoverflow.com/questions/17223308/fastest-way-to-count-occurrences-of-each-unique-element –

使用table()：

> table(c("dog", "cat", "dog")) 

cat dog 
    1 2

載體可相應數據幀的列：

A <- as.data.frame(table(c("dog", "cat", "dog")))[,1] 
B <- as.data.frame(table(c("dog", "cat", "dog")))[,2]

結果：

> A 
[1] cat dog 
Levels: cat dog 
> B 
[1] 1 2

來源

2013-09-28 21:29:40

您可以使用list來製作類似hash「key：value」對的東西。

data = c("joe", "tom", "sue", "joe", "jen") 

aList = list() 

for(i in data){ 
    if (length(aList[[i]]) == 0){ 
     aList[[i]] = 1 
    } else { 
     aList[[i]] = aList[[i]] + 1 
    } 
}

結果

$joe 
[1] 2 

$tom 
[1] 1 

$sue 
[1] 1 

$jen 
[1] 1

來源

2013-09-28 22:11:33 AGS

問題是循環。這是註定要永遠。 :-) –

@ Ferdinand.kraft，看來你是對的。我沒有意識到R會與中等大小的陣列發生衝突。給予好評。 – AGS

R永遠需要計算一個簡單的程序

回答

相關問題