創建使用python

列表「匹配的號碼」我有一個文件中的文本格式如下：創建使用python

[NP mr. speaker ] , [NP mr. vice president ] , [NP members ] [PP of ] [NP congress ] [NP my fellow ] [VP americans ] : [NP today ]

我想獲得與顯示使用NP的它匹配的數量列表，按降序排列。要清楚，文中可能有很多NP，[NP先生。說話人]可能會在文中出現5次，[NP先生。副主席]可能會在文中出現6次......等等。我想找到所有這些比賽的頻率。

輸出應該像這樣：

6 [NP mr. vice president ] 

5 [NP mr. speaker ]

等

任何想法如何去了解呢？我很確定python中的正則表達式會有所幫助，但我迷失於我的表達應該看起來像以及如何將這些匹配放在列表中。

來源

2014-02-13 user2951046

正在使用python的一個要求，或者只是你認爲可能會幫助你的東西？ –

python不是必需的，shell工具可以工作 – user2951046

這裏不需要python，基本的shell工具就是你所需要的。

grep -o '\[NP[^]]*]' input.txt | sort | uniq -c | sort -rg

如果您需要在括號中間檢查NP，你需要稍微調整

grep -o '\[[^]]*NP[^]]*]' test.in | sort | uniq -c | sort -rg

來源

2014-02-13 02:45:59 Kevin

得到這個錯誤：-bash：％：找不到命令 – user2951046

'％'是命令提示符，你可能會看到'$'。不要包含它。 – Kevin

感謝第一個人的工作，但它沒有按照匹配數量的降序給出，從我可以看到其根據字母表的升序排列... – user2951046

您可以在Python中使用re和Counter：

In [150]: from collections import Counter 
    ...: import re 
    ...: s='[NP mr. speaker ] , [NP mr. vice president ] , [NP members ] [PP of ] [NP congress ] [NP my fellow ] [VP americans ] : [NP today ]' 
    ...: c=Counter(re.findall('\[[ .\w]*\]', s)) 
    ...: 

In [152]: c['[NP mr. speaker ]'] 
Out[152]: 1

要按降序排列按鍵：

In [156]: sorted(c, key=c.get, reverse=True) 
Out[156]: 
['[NP members ]', 
'[NP mr. speaker ]', 
'[NP congress ]', 
'[PP of ]', 
'[VP americans ]', 
'[NP my fellow ]', 
'[NP mr. vice president ]', 
'[NP today ]']

來源

2014-02-13 03:02:41 zhangxaochen

創建使用python

回答

相關問題