創建文本字詞典

我想創建一個文本中所有唯一字詞的字典。關鍵是單詞，值是這個詞的頻率創建文本字詞典

dtt = ['you want home at our peace', 'we went our home', 'our home is nice', 'we want peace at home'] 
word_listT = str(' '.join(dtt)).split() 
wordsT = {v:k for (k, v) in enumerate(word_listT)} 
print wordsT

我希望這樣的事情：

{'we': 2, 'is': 1, 'peace': 2, 'at': 2, 'want': 2, 'our': 3, 'home': 4, 'you': 1, 'went': 1, 'nice': 1}

不過，我收到這樣的：

{'we': 14, 'is': 12, 'peace': 16, 'at': 17, 'want': 15, 'our': 10, 'home': 18, 'you': 0, 'went': 7, 'nice': 13}

很顯然，我濫用功能或做錯事。

請幫助

來源

2015-11-05 Toly

的問題，你在做什麼是你在存儲這裏所說的是不是那些話的計數的數組索引。

要做到這一點，你可以只使用collections.Counter

from collections import Counter 

dtt = ['you want home at our peace', 'we went our home', 'our home is nice', 'we want peace at home'] 
counted_words = Counter(' '.join(dtt).split()) 
# if you want to see what the counted words are you can print it 
print counted_words 

>>> Counter({'home': 4, 'our': 3, 'we': 2, 'peace': 2, 'at': 2, 'want': 2, 'is': 1, 'you': 1, 'went': 1, 'nice': 1})

一些清理：在評論中提到

str()是不必要的你' '.join(dtt).split()

您還可以刪除列表中的分配並在同一行上做你的計數器

Counter(' '.join(dtt).split())

有關您的列表索引的更多細節;首先你必須瞭解你的代碼在做什麼。

dtt = [ 
    'you want home at our peace', 
    'we went our home', 
    'our home is nice', 
    'we want peace at home' 
]

注意，這裏有19個單詞; print len(word_listT)回報19.現在在word_listT = str(' '.join(dtt)).split()您做的所有的單詞列表的下一行，它看起來像這樣

word_listT = [ 
    'you', 
    'want', 
    'home', 
    'at', 
    'our', 
    'peace', 
    'we', 
    'went', 
    'our', 
    'home', 
    'our', 
    'home', 
    'is', 
    'nice', 
    'we', 
    'want', 
    'peace', 
    'at', 
    'home' 
]

再數一數：19個字。最後一個字是'家'。並且列表索引從0開始，因此0到18 = 19個元素。 yourlist[18]是'家'。這與字符串位置或任何內容無關，只是新數組的索引。 :)

來源

2015-11-05 19:18:13

很好用！謝謝！ – Toly

@當然是！很高興我能幫上忙！你應該看看周圍的集合，那裏有很多有用的工具。「計數器」是一個，我也一直使用'defaultdict'。如果你有任何問題隨時問，我會盡力幫助，如果我可以:) –

@JohnRuddell join（）返回一個字符串，你爲什麼要把它轉換爲字符串？計數器（''.join（dtt）.split（））會做 – helloV

試試這個：

from collections import defaultdict 

dtt = ['you want home at our peace', 'we went our home', 'our home is nice', 'we want peace at home'] 
word_list = str(' '.join(dtt)).split() 
d = defaultdict(int) 
for word in word_list: 
    d[word] += 1

來源

2015-11-05 19:18:39 levi

enumerate返回一個單詞列表與他們的指標，不符合他們的頻率。也就是說，當您創建單詞T字典時，每個v實際上是k的最後一個實例的word_listT中的索引。要做你想做的事，使用for循環可能是最直接的。

wordsT = {} 
for word in word_listT: 
    try: 
     wordsT[word]+=1 
    except KeyError: 
     wordsT[word] = 1

來源

2015-11-05 19:24:26

創建文本字詞典

回答

相關問題