Python - 計數重複的字符串

我試圖寫一個函數，它將計算字符串中重複單詞的數量，然後如果重複次數超過某個數字（n），則返回該單詞。這是我到目前爲止：Python - 計數重複的字符串

from collections import defaultdict 

def repeat_word_count(text, n): 
    words = text.split() 
    tally = defaultdict(int) 
    answer = [] 

    for i in words: 
    if i in tally: 
     tally[i] += 1 
    else: 
     tally[i] = 1

我不知道從哪裏去比較字典值與n。

如何它應該工作： repeat_word_count（「一個一個是賽馬二兩是一個太」，3）應返回[ '一']

來源

2015-09-12 Saltharion

你想用'key'作爲輸出的'dictionary'是'count'還是'value'這個詞？那是你想要得到的嗎？所以，如果有一個沒有重複的詞，那麼'key'就是'1'，如果有重複的話，那麼'key'將會是多少重複的數字？ –

這裏是一個辦法做到這一點：

from collections import defaultdict 
tally = defaultdict(int) 
text = "one two two three three three" 
for i in text.split(): 
    tally[i] += 1 
print tally # defaultdict(<type 'int'>, {'three': 3, 'two': 2, 'one': 1})

把它放在af

def repeat_word_count(text, n): 
    output = [] 
    tally = defaultdict(int) 
    for i in text.split(): 
     tally[i] += 1 
    for k in tally: 
     if tally[k] > n: 
      output.append(k) 
    return output 

text = "one two two three three three four four four four" 
repeat_word_count(text, 2) 
Out[141]: ['four', 'three']

來源

2015-09-12 02:23:02

嘗試

for i in words: 
    tally[i] = tally.get(i, 0) + 1

，而不是

for i in words: 
    if i in tally: 
     tally[words] += 1 #you are using words the list as key, you should use i the item 
    else: 
     tally[words] = 1

如果你只是想要計算單詞，使用collections.Counter會很好。

>>> import collections 
>>> a = collections.Counter("one one was a racehorse two two was one too".split()) 
>>> a 
Counter({'one': 3, 'two': 2, 'was': 2, 'a': 1, 'racehorse': 1, 'too': 1}) 
>>> a['one'] 
3

來源

2015-09-12 01:57:24 luoluo

這對理貨問題有效 - 謝謝！你有什麼建議，我應該如何解決其餘的問題？ – Saltharion

如果你想要的是一個dictionary一個字符串計算的話，你可以試試這個：

string = 'hello world hello again now hi there hi world'.split() 
d = {} 
for word in string: 
    d[word] = d.get(word, 0) +1 
print d

輸出：

{'again': 1, 'there': 1, 'hi': 2, 'world': 2, 'now': 1, 'hello': 2}

來源

2015-09-12 02:17:40

正如羅洛說的，使用collections.Counter。

要獲得最高計數的物品，請使用參數1的Counter.most_common方法，該方法返回其第2個座標最大相同的配對物列表(word, tally)。如果「句子」不是空的，那麼這個列表也是。所以，下面的函數返回發生至少n倍（如果有）一些詞，否則返回None：

from collections import Counter 

def repeat_word_count(text, n): 
    if not text: return None  # guard against '' and None! 
    counter = Counter(text.split()) 
    max_pair = counter.most_common(1)[0] 
    return max_pair[0] if max_pair[1] > n else None

來源

2015-09-12 02:25:54 BrianO

你爲什麼不使用Counter類該案例：

from collections import Counter 
cnt = Counter(text.split())

將元素存儲爲字典鍵並將其計數存儲爲字典值。然後，它很容易保持超過同出iterkeys您的n個的話在（）for循環像

list=[] 
for k in cnt.iterkeys(): 
    if cnt[k]>n: 
     list.append(k)

在列表中，您會擁有你的詞彙列表。

**編輯：對不起，這就是說，如果你需要很多的話，布賴恩奧有正確的你的情況。

來源

2015-09-12 03:30:16 Rulolp

我認爲你是獲得頻率超過n的所有單詞的最佳方式。但是你可以說'for c in'，不需要'.iterkeys（）'。 – BrianO

謝謝，並且可以在列表理解中，然後返回'[k for k in cnt if cnt [k]> n]''，儘管它不太清楚。 – Rulolp

是的，這是我會做的。我認爲它有點*更清楚，實際上是:)但那就是我。 – BrianO

Python - 計數重複的字符串

回答

相關問題