如何搜索，計數和保存單詞？

我試圖找出一個特定的詞，然後數它。我需要保存每個標識符的計數。如何搜索，計數和保存單詞？

例如，

風險風險風險利率

星號風險風險

市場風險風險[風險

*文檔中包含上述，我需要的話計算'風險'不是星號。我還需要把[風險視爲'風險'）。這是我到目前爲止。但是，它會返回星號和[風險以及風險。我不需要爲星號計數，但僅用於風險，包括[風險。我試圖使用正則表達式，但不斷收到錯誤。另外，我是Python的初學者。如果有人有任何想法，請幫助我！^^謝謝。

from collections import defaultdict 
word_dict = defaultdict(int) 

for line in mylist: 
words = line.lower().split() 
for word in words: 
    word_dict[word]+=1 

for word in word_dict: 
if 'risk' in word: 
    word, word_dict[word]

來源

2012-08-31 Jimmy

-2

所以你算

'\n' + risk + '\n' 
'\n' + risk + ' ' 
' ' + risk + '\n' 
' ' + risk + ' '

來源

2012-08-31 13:39:20 Topro

做流水線的方式。我的意思是，在將單詞添加到詞典之前，對文本執行任何轉換以使計數正確。

word_dict = {} # empty dictionary 

for line in mylist: 
    words = line.strip().lower().split() # the strip gets rid of new lines 
    for word in words: 
     # the strip here will strip away any surrounding punctuation. 
     # add any other symbols to the string that you need 
     # the key insight here, is you get rid of extra stuff BEFORE inserting 
     # into the dictionary 
     word_dict[word.strip('[/@#$%')]+=1 

for word in word_dict: 
    print word, word_dict[word] 

# to just see the count for risk: 
print word_dict['risk']

只要你算上你的單詞「風險」，它就可以統計單詞「星號」，這很好。

來源

2012-08-31 13:43:46

給正則表達式另一個去。匹配單詞邊界圍成的串'risk'

import re 
re.findall(r'\brisk\b', 'risk risk') ## 2 matches 
re.findall(r'\brisk\b', 'risk risk riskrisk') ## 2 matches 
re.findall(r'\brisk\b', 'risk risk riskrisk [risk') ## 3 matches 
re.findall(r'\brisk\b', 'risk risk riskrisk [risk asterisk') ## 3 matches

來源

2012-08-31 13:47:59

你可以試試這個片斷：

import shlex 

words = shlex.split("risk risk risk free interest rate") 
word_count = len([word for word in words if word == "risk" or word =="[risk"]) 
print word_count

來源

2012-08-31 13:48:26 Bob

我認爲你需要更嚴格的risk計算的標準是什麼，什麼不明確。在這一點上

from collections import Counter 
c = Counter() 
with open(yourfile) as f: 
    for line in f: 
     c += Counter(line.split())

現在，你需要創建一個將弄清楚它是否應該算作「風險」或不是一個函數：：不過，我會用一個Counter

def is_risk(word): 
    w = word.lower() 
    return 'risk' in w and w!='asterisk'

現在只需加與這些鍵對應的元素：

sum(c[k] for k in c if is_risk(k))

來源

2012-08-31 13:53:31 mgilson

如何搜索，計數和保存單詞？

回答

相關問題