2014-03-28 40 views
1

我有以下用於查找連續元音的代碼,但它沒有給我正確的結果: **我的代碼是否錯誤?使用正則表達式查找連續元音

sapply(v, function(x){ gsub(".*[0-9]\\s", "", grep("[aeiou]{2}", x, value = TRUE, invert = FALSE)) }) 

其中v是:

c("Joe 4311 rsfuvgcozbxwlnnfevze", "Clayton 2414 qsncnpvdfpjmvmvbdvce", 
"Addison 25 melmasilbgrurqbezgyu", "Donovan 2013 gozagvswtitjjinrzgup", 
"Sage 540 aamyvegiadwjwpvwtjko", "Zavier 133 cyomwtxftslukvmvpmcl", 
"Maria 1241 ngqjynxnpblcztnlkack", "Mercedes 2400 xcwbxxljspneilwejutw", 
"Micheal 4400 oovhyodyubhqwzdcwybf", "Brylee 2532 sarbmelbeycrnhytbout", 
"Giancarlo 3351 xmocyljxquklbchgmdcj", "Elin 5513 nbjovdtmijpfluzixebu", 
"Ray 2553 snrqrzshlzmmhumzlecl", "Jade 4030 rhibewstyrwdervgqnru", 
"Amelia 5205 lcnvnjhamhzavdfosmae", "Karissa 2030 vhvzyfckgogduqqayzku", 
"Conor 325 sbgfntjejbtwsvidvtnu", "Tripp 454 xmvuhycjnvqgnmorfdrl", 
"River 5120 zcxavkwzhwbvdqadajgh", "Tianna 251 mwoqwzyfddhuunmtiioh", 
"Conner 3543 ngyuzdbeyizfarxuxntz", "Mackenzie 3113 yvycqaquwtfjjtqsdduh", 
"Melody 4422 buagtfiaipniavdnsxhv", "Dallas 5343 blyjvtlpvpqondrdhluu") 

V中每一行的形式爲「NAME比分WORD」,我們希望找到多少線在WORD連續兩個元音?

回答

2

以下是一次性完成的方法。我們可以使用這個正則表達式來跳過WORD之前的所有內容,並在最後一部分查找連續的元音。

> (zz <- do.call(rbind, lapply(v, function(x){ 
     grep("^.*[0-9]\\s.*[aeiou]{2}", x, value = TRUE) 
     }))) 
    [,1]         
[1,] "Sage 540 aamyvegiadwjwpvwtjko"  
[2,] "Mercedes 2400 xcwbxxljspneilwejutw" 
[3,] "Micheal 4400 oovhyodyubhqwzdcwybf" 
[4,] "Brylee 2532 sarbmelbeycrnhytbout" 
[5,] "Amelia 5205 lcnvnjhamhzavdfosmae" 
[6,] "Tianna 251 mwoqwzyfddhuunmtiioh" 
[7,] "Melody 4422 buagtfiaipniavdnsxhv" 
[8,] "Dallas 5343 blyjvtlpvpqondrdhluu" 
> length(zz) 
[1] 8 
+0

比較你的解決方案與我的,你能告訴我什麼是錯的? –

+1

我認爲讓他們都在sapply裏面,在這種情況下,也會在NAME中捕捉連續的元音。把它們分開解決了它,因爲它首先剔除了前兩部分。這樣看來更安全。 –

+0

是的,作爲一個例子,它捕獲了喬的名字。感謝您的解釋 –

4

如果先strsplit文本,您可以將grep更容易。

v[grep("[aeiou]{2}",sapply(strsplit(v," "),"[",3))] 

#[1] "Sage 540 aamyvegiadwjwpvwtjko"  
#[2] "Mercedes 2400 xcwbxxljspneilwejutw" 
#[3] "Micheal 4400 oovhyodyubhqwzdcwybf" 
#[4] "Brylee 2532 sarbmelbeycrnhytbout" 
#[5] "Amelia 5205 lcnvnjhamhzavdfosmae" 
#[6] "Tianna 251 mwoqwzyfddhuunmtiioh" 
#[7] "Melody 4422 buagtfiaipniavdnsxhv" 
#[8] "Dallas 5343 blyjvtlpvpqondrdhluu" 
+1

非常好。好像我每晚都從你那裏學到一些東西。 –

0

我認爲,如果你讓你的三個變量你的生活會更容易 (姓名,分數,文字)明確:

library(stringr) 
df <- as.data.frame(str_split_fixed(v, " ", 3)) 
names(df) <- c("name", "score", "word") 

然後提取的比賽是一個簡單的子集:

subset(df, str_detect(word, "[aeiou]{2}")) 

##  name score     word 
## 5  Sage 540 aamyvegiadwjwpvwtjko 
## 8 Mercedes 2400 xcwbxxljspneilwejutw 
## 9 Micheal 4400 oovhyodyubhqwzdcwybf 
## 10 Brylee 2532 sarbmelbeycrnhytbout 
## 15 Amelia 5205 lcnvnjhamhzavdfosmae 
## 20 Tianna 251 mwoqwzyfddhuunmtiioh 
## 23 Melody 4422 buagtfiaipniavdnsxhv 
## 24 Dallas 5343 blyjvtlpvpqondrdhluu