2014-12-29 87 views
1

是否可以編寫(單行)grep表達式來查找包含三個相同單詞的行?請注意,我們並不知道這個詞先驗。下面的代碼片段捕獲大多數情況下:是否有可能找到與grep相同的單詞多個出現的行?

$ grep -E '(\w+)[[:space:]]+\1[[:space:]]+\1' test_data.txt 

然而,這不趕下正面例子:

午餐晚餐晚飯晚飯午餐

另外請注意,我們只尋找完整的單詞,而不是簡單的字符重複。因此,一個反面例子的一個例子是:

彈撥牛逼花牛逼重新

EDIT(感謝@列弗-levitsky):

上面的正面例子實際上被抓到了,但以下不是:

午餐午餐晚餐晚餐午餐

+2

但它*會捕獲晚餐的例子。它是否應匹配由其他詞分隔的三個詞? –

+1

是的,你是對的。將它改爲'午餐午餐晚餐晚餐午餐',不會被抓到。 – user313967

回答

1

這應該爲你工作:

grep -E "[[:<:]](\w+)[[:>:]].*[[:<:]]\1[[:>:]].*[[:<:]]\1[[:>:]]" testfile 

例如:

[email protected]:~/src/sandbox$ cat testfile 
how is summer summer summer ha ha 
this summer is a hot summer of summers yes it is 
summer summer summer 
there is only one summer in this sentence 
summer appears as the first and last summer words in this summer 
the summertime is always in summer, one of several summers 
the summer of which we speak is summery but is a real summer summer, yes 
this also works with cats, since there are three cats in these cats, ha! 
[email protected]:~/src/sandbox$ grep -E "[[:<:]](\w+)[[:>:]].*[[:<:]]\1[[:>:]].*[[:<:]]\1[[:>:]]" testfile 
how is summer summer summer ha ha 
summer summer summer 
summer appears as the first and last summer words in this summer 
the summer of which we speak is summery but is a real summer summer, yes 
this also works with cats, since there are three cats in these cats, ha! 
[email protected]:~/src/sandbox$ 

[[:<:]][[:>:]]比賽在一個字,分別的開始和結束的空字符串,這樣你就可以使用它們確定單詞的邊界,而不必假設單詞之間用空格分隔,而不是用標點符號等。

+0

爲您的解決方案找到了更簡潔的符號:grep -E'\ <(\w+)\>。* \ <\1\>。* \ <\1\>'testfile – user313967

0

這不是grep也不regex,但它可能工作:

awk -F"[,. \t]*" '{for (i=1;i<=NF;i++) {if (++a[$i]==3) {printf "%s ",$i;f=1}} if (f) print "";f=0;delete a}' file 

它計算在每一行字,並打印就行了字,如果有三個或更多的發現了它的。

+0

在這種情況下它不起作用: $ echo'晚餐晚餐。' | awk'{for(i = 1; i <= NF; i ++){if(++ a [$ i] == 3){printf「%s」,$ i; f = 1}} if(f)print 「」; f = 0; delete a}' – user313967

+0

@ user313967通過添加字段分隔符'-F「[,. \ t] *」'PS修正了這個問題,您必須考慮'regex'。 – Jotne

相關問題