正則表達式：從字符串列表中創建字典

每個不同的單詞都是一個鍵，值是該單詞出現在各種字符串的整個列表中的次數。

我是新來的Python仍然有點失落。我相信，我所要做的循環，其中我將不得不：

檢查下一個字是不是重複
維持迭代計算字典

如果我先使用set（）來獲取所有唯一的單詞，然後循環遍歷並計算出頻率，該怎麼辦？

將不勝感激任何意見

[u'retw', u'folivi_jochan', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc'] [u'retw', u'chr1sa', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc'] [u'retw', u'olutosinfashusi', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc'] [u'retw', u'shakycode', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc'] [u'an', u'interesting', u'read', u'manhattan', u'is', u'the', u'best', u'tv', u'show', u'that', u'hardly', u'anybody', u'is', u'watching', u'http', u':', u'//t.co/psfmauuwfg'] [u'tmr', u'am', u':', u'lunch', u'at', u'the', u'arts', u'!', u'from', u'11-2pm', u'at', u'1935', u'manhattan', u'beach', u'blvd', u'in', u'redondo', u'beach', u'!', u'map', u':', u'http', u':', u'//t.co/x6x2eeijbh'] [u's1', u'was', u'superb', u'.', u'``', u'manhattan', u'is', u'the', u'best', u'tv', u'show', u'that', u'hardly', u'anybody', u'is', u'watching', u"''", u'http', u':', u'//t.co/q6iazmtaam'] [u'taylor', u'swift', u'seen', u'leaving', u'msr', u'studios', u'in', u'manhattan', u'on', u'october', u'07', u',', u'2015', u'in', u'new', u'york', u',', u'new', u'york', u'.', u'http', u':', u'//t.co/3cwxrapr38'] [u'viva', u'a1054665', u'manhattan', u'acc', u'estimated', u'to', u'be', u'7', u'yrs', u'old', u'american', u'staff', u'mix', u',', u'white', u'/', u'brown', u',', u'spayed', u'female', u'...', u'http', u':', u'//t.co/sloopljyxq'] [u'#', u'3d', u'taevision', u"'showroom", u'in', u'the', u'night', u'#', u'porsche', u'996', u"'", u'#', u'automotive', u'#', u'fashion', u'#', u'makeup', u'#', u'ny', u'#', u'nyc', u'#', u'manhattan', u'http', u':', u'//t.co/eftvytqedk']

謝謝

來源

2015-10-14 Toly

請查看https://docs.python.org/2/library/collections.html#collections.Counter –

您可以將每個列表放在一個單獨的行中，以便它更具可讀性。你可以使用[Counter]（https://docs.python.org/2/library/collections.html#collections.Counter） – kmad1729

對於Python 2.7及以上版本使用Counter從collectionsmodule：

from collections import Counter 
mylist = [u'retw', u'folivi_jochan', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc', u'retw', u'chr1sa', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc', u'retw', u'olutosinfashusi', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of'] 
c = Counter(mylist) 
print dict(c) 
[(u':', 8), 
(u'rt', 3), 
(u'uber', 3), 
(u'newsycombinator', 3), 
(u'of', 3), 
(u'is', 3), 
(u'retw', 3), 
(u'taking', 3), 
(u'millions', 3), 
(u'from', 2), 
(u'//t.co/zluyq3f6cc', 2), 
(u'manhattan', 2), 
(u'away', 2), 
(u'http', 2), 
(u'taxis', 2), 
(u'rides', 2), 
(u'olutosinfashusi', 1), 
(u'chr1sa', 1), 
(u'folivi_jochan', 1)]

如果你有三個不同的列表嘗試使用chain從itertools：

one,two,three = [u'retw', u'folivi_jochan', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc'],[u'retw', u'chr1sa', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc'], [u'retw', u'olutosinfashusi', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of'] 
from itertools import chain 
from collections import Counter 
c=Counter(chain(one,two,three))

Counter是一個高性能類，用於計算迭代表中元素的出現次數。它的most_common（）方法返回一個列表tuple s (element,count)。元組的這個列表可以用於構建dict

來源

2015-10-14 23:20:15

這會給我最常見的元素。我需要一個完整的字典，其中Key =字符串列表中的唯一字，Value =字符串列表中字的頻率 – Toly

太棒了！對於同一組中的一些rewason，我得到：{'！'：2，''：209，'＃'：8，'''：6，''「：418，' - '：1看不到錯誤，我確信我是在某個地方做的，這是一個很棒的解決方案！謝謝！ – Toly

好的答案，但是'most_common（）'有什麼意思？'Counter'已經是'dict'的子類了。需要完全轉換爲'dict';如果你想直接做：'d = dict（c）'。 – FMc

替代方法，使用您的for循環：

for word in strings: 
if word not in dict.keys(): 
    dict[word]=1 
else: 
    dict[word] += 1

以上假設string是你要遍歷單詞列表。

來源

2015-10-14 23:25:24

不需要'.keys（）'。只需檢查字典直接對字典。 – FMc

正則表達式：從字符串列表中創建字典

回答

相關問題