打印發生/字的位置

我已經嘗試了一些不同的程序包，以便構建一個R程序，它將輸入文本文件並生成該文件中的單詞列表。每個單詞應該有一個向量，包含該單詞在該文件中存在的所有位置。作爲一個例子，如果文本文件具有字符串：打印發生/字的位置

"this is a nice text with nice characters"

輸出應該是這樣的：

$this 
[1] 1 

$is  
[1] 2 

$a   
[1] 3 

$nice  
[1] 4 7 

$text 
[1] 5 

$with 
[1] 6 

$characters 
[1] 8

我碰到一個有用的帖子，http://r.789695.n4.nabble.com/Memory-usage-in-R-grows-considerably-while-calculating-word-frequencies-td4644053.html來了，但它不包括位置的每個字。我發現了一個名爲「str_locate」的類似函數，但是我想要計算「單詞」而不是「字符」。

的，是在使用什麼包/技術，將是任何指導，非常感謝

來源

2013-04-22 ardarel

你可以用基礎R做到這一點（這奇怪的精確產生你所建議的輸出）：

# data 
x <- "this is a nice text with nice characters" 
# split on whitespace 
words <- strsplit(x, split = ' ')[[1]] 
# find positions of every word 
sapply(unique(words), function(x) which(x == words)) 

### result ### 
$this 
[1] 1 

$is 
[1] 2 

$a 
[1] 3 

$nice 
[1] 4 7 

$text 
[1] 5 

$with 
[1] 6 

$characters 
[1] 8

來源

2013-04-22 17:41:11 EDi

非常感謝你許多！有用。我需要檢查「sapply」文檔，然後獲取更多信息（不知道） – ardarel 2013-04-22 17:57:50

打印發生/字的位置

回答

相關問題