在你原來的代碼行
dic[country]= dic[country]+1
應引起KeyError
,因爲關鍵是還沒有出現在字典中,當一個國家被滿足第一次。相反,你應該檢查重點是存在的,如果不是,初始化值設爲1。
在另一方面,它不會,因爲檢查
if country in country_codes['English short name lower case']:
收益率對於所有的值False
:一Series
對象的__contains__
與indices instead of values一起使用。你應該例如檢查
if country in country_codes['English short name lower case'].values:
如果你的list of values is short。
對於一般計數任務,Python提供collections.Counter,它的行爲有點像defaultdict(int)
,但帶來了額外的好處。它刪除鍵等的人工檢查的需要
正如你已經有DataFrame
對象,你可以使用的工具pandas規定:
In [12]: country_codes = pd.read_csv('wikipedia-iso-country-codes.csv')
In [13]: text = pd.DataFrame({'SomeText': """Finland , Finland , Finland
...: The country where I want to be
...: Pony trekking or camping or just watch T.V.
...: Finland , Finland , Finland
...: It's the country for me
...:
...: You're so near to Russia
...: so far away from Japan
...: Quite a long way from Cairo
...: lots of miles from Vietnam
...:
...: Finland , Finland , Finland
...: The country where I want to be
...: Eating breakfast or dinner
...: or snack lunch in the hall
...: Finland , Finland , Finland
...: Finland has it all
...:
...: Read more: Monty Python - Finland Lyrics | MetroLyrics
...: """.split()})
In [14]: text[text['SomeText'].isin(
...: country_codes['English short name lower case']
...:)]['SomeText'].value_counts().to_dict()
...:
Out[14]: {'Finland': 14, 'Japan': 1}
此發現的text
行,其中SomeText列的值是英文簡稱英文簡稱country_codes
列,計算唯一值SomeText,並轉換爲字典。
In [49]: where_sometext_isin_country_codes = text['SomeText'].isin(
...: country_codes['English short name lower case'])
In [50]: filtered_text = text[where_sometext_isin_country_codes]
In [51]: value_counts = filtered_text['SomeText'].value_counts()
In [52]: value_counts.to_dict()
Out[52]: {'Finland': 14, 'Japan': 1}
相同與Counter
:
In [23]: from collections import Counter
In [24]: dic = Counter()
...: ccs = set(country_codes['English short name lower case'])
...: for country in text['SomeText']:
...: if country in ccs:
...: dic[country] += 1
...:
In [25]: dic
Out[25]: Counter({'Finland': 14, 'Japan': 1})
或簡單地:用描述中間變量的相同
In [30]: ccs = set(country_codes['English short name lower case'])
In [31]: Counter(country for country in text['SomeText'] if country in ccs)
Out[31]: Counter({'Finland': 14, 'Japan': 1})
是'country_codes'的'dictionary'? –
你現在的代碼有一個縮進錯誤 - 你應該先看看。 –
不,縮進只是我在這裏剪切和粘貼的結果 – JayDoe