查找以某個字母開頭的所有單詞

我在R和正則表達式中都超級生鏽。我嘗試閱讀R的正則表達式幫助文件，但它根本沒有幫助！查找以某個字母開頭的所有單詞

我有3列的數據幀：

詞彙，即在語料庫
數量，時間的詞出現次數中發現的500個最常見的單詞的列表，並
概率，計數除以所有字數的總和

該列表從大多數排列到最不常見，所以不按字母順序排列。

我需要爲所有以相同字母開頭的單詞拉出整行。（我不需要遍歷所有的字母表，我只需要一個字母的結果。）

我不只是問關於正則表達式，而是如何寫在R中，所以我得到的結果一個新的數據框。

來源

2013-02-04 punstress

您可以使用grep：

df <- data.frame(words=c("apple","orange","coconut","apricot"),var=1:4) 
df[grep("^a", df$words),]

哪位能給：

words var 
1 apple 1 
4 apricot 4

來源

2013-02-04 11:10:57 juba

輕鬆完成！（一旦我輸入正確）。謝謝！ – punstress

也許這是對您有用。

# Creating some data 
set.seed(001) 
    count <- sample(1:100, 6, TRUE) 
    DF <- data.frame(vocabulary=c('action', 'can', 'book', 'candy', 'any','bar'), 
        count=count, 
        probability=count/sum(count) 
        ) 

# Spliting by the first letter 
Split <- lapply(1:3, function(DF, i){ 
    DF[grep(paste0('^', letters[i]), DF$vocabulary),] 
}, DF=DF) 

Split 
[[1]] 
     vocabulary count probability 
1  action 27 0.08307692 
5  any 21 0.06461538 

[[2]] 
    vocabulary count probability 
3  book 58 0.1784615 
6  bar 90 0.2769231 

[[3]] 
    vocabulary count probability 
2  can 38 0.1169231 
4  candy 91 0.2800000

正如你所看到的結果是一個列表，您可能希望通過與1:26改變1:3在lapply通話考慮到所有的字母。

注意，結果是unodered，但可以伊斯利使用orderBy功能從doBy包

lapply(Split, function(x) orderBy(~vocabulary, data=x)) 
[[1]] 
    vocabulary count probability 
1  action 27 0.08307692 
5  any 21 0.06461538 

[[2]] 
    vocabulary count probability 
6  bar 90 0.2769231 
3  book 58 0.1784615 

[[3]] 
    vocabulary count probability 
2  can 38 0.1169231 
4  candy 91 0.2800000

來源

2013-02-04 11:13:51

爲什麼不使用'split（DF，substr（DF $ vocabulary，1,1））'來分割？ – thelatemail

@thelatemail真棒！ –

查找以某個字母開頭的所有單詞

回答

相關問題