2016-09-28 24 views
-1

標題說明了一切,我需要過濾一個具有規格的egrep文件,但我無法弄清楚的是確保它發生3次。 (從問題的直接措辭 - 包含5個或更多字符的話,這至少發生三次行)UNIX - 使用egrep,如何過濾發生n次的模式?

+1

感謝提供樣品輸入/輸出 –

+2

您是否嘗試過的東西? –

+0

我無法弄清楚如何在運行grep時檢查它是否符合發生次數。到目前爲止,我有\\ egrep'\ b [a-zA-Z] {5} \ b'。* 其中涵蓋了我需要的所有內容,但我需要能夠將它過濾爲至少出現3次的單詞 – KenP

回答

0

隨着(未測試):

awk ' 
    /\b[a-zA-Z]{5}\b/{ 
    matches[$0]++ 
    } 
    END{ 
    for (m in matches) { 
     if (matches[m] >= 3) {print m} 
    } 
    } 
' file 
1
egrep '([a-zA-Z]{5}).*\1.*\1' 

這工作在我快速測試,但我不知道它是

\1(和\2\3 ...)如何強大的是反向引用。我放置了()五個字母的模式,[a-zA-Z],這被稱爲第一個捕獲組\1則意味着正則表達式希望找到在第一組中匹配的相同單詞的重複。

最後,有三個詞之間的.*,以允許任何在它們之間出現

+1

你能解釋1嗎? – KenP

+0

@KenP在任何正則表達式教程中查找「back references」。 – Barmar

0
$ cat ip.txt 
abc abc abc should not match 
totally this line should totally match, isn't it? totally 
Title: word with 5 letters like title should also match, given title is present 3 or more times 
this line should not totally match, total only partly matches with totally 

,以配合匹配情況下的話:無論情況

$ grep -wE '([a-zA-Z]{5,}).*\1.*\1' ip.txt 
totally this line should totally match, isn't it? totally 

要匹配的話:

$ grep -iwE '([a-zA-Z]{5,}).*\1.*\1' ip.txt 
totally this line should totally match, isn't it? totally 
Title: word with 5 letters like title should also match, given title is present 3 or more times 

匹配任意五個或更多字母的序列:

$ grep -iE '([a-zA-Z]{5,}).*\1.*\1' ip.txt 
totally this line should totally match, isn't it? totally 
Title: word with 5 letters like title should also match, given title is present 3 or more times 
this line should not totally match, total only partly matches with totally 
  • -E擴展正則表達式
  • -w匹配全字
  • -i忽略大小寫
  • [a-zA-Z]{5,}小寫或大寫字母,五次或更多次
  • ()捕獲組和\1被回去參考它

和一點樂趣,如果你有pcre正則表達式

$ echo 'totally title match' | grep -P '([a-zA-Z]{5,}).*(?1).*(?1)' 
totally title match 
  • (?1)指的是正則表達式模式([a-zA-Z]{5,})本身
+0

非常感謝!相當接近,但不能把它放在一起。乾杯:) – KenP

+0

@KenP很高興它的幫助,請參閱http://stackoverflow.com/help/someone-answers和http://stackoverflow.com/help一般:) – Sundeep