如何計算字典中各種字符串的實例數

我有一個較大的DictionaryID（10,000+條目）的ReviewIDs。詞典有兩個鍵，第一個是ReviewID＃，第二個是Review的語言。如何計算字典中各種字符串的實例數

我的任務是計算每種語言的評論總數，然後將其顯示在條形圖中。

import pandas as pd 
import csv 
import matplotlib.pyplot as plt 
import sys 
RevDict = {} 
with open('ReviewID.txt','r') as f: 
for line in f: 
    a,b = line.split(":") 
    RevDict[a] = str(b)

這導致了看起來像這樣一本字典：

我的想法，就是到字典轉換成數據幀與評價編號爲一列，語言是第二柱。然後我可以使用計數器遍歷行，並最終得出每種語言的最終計數。這可以很容易地轉換成條形圖。

不幸的是，我無法弄清楚如何做到這一點。

我還懷疑pythonic方法會更簡單地計算字典本身內每個字符串的實例數量，而不是通過製作數據幀的步驟。我嘗試這樣做：

from collections import Counter 
Counter(k['b'] for k in data if k.get('b'))

它拋出以下錯誤：

AttributeError的：「海峽」對象有沒有屬性「得到」

來源

2017-01-22 Andrew Smith

使用collections.Counter

import collections as coll 

data = { 
    'A': 'English', 
    'B': 'German', 
    'C': 'English' 
} 

print(coll.Counter(data.values())) 

--output:-- 
Counter({'English': 2, 'German': 1})

使用pandas：

import pandas as pd 

data = { 
    'A': 'fr\n', 
    'B': 'de\n', 
    'C': 'fr\n', 
    'D': 'de\n', 
    'E': 'fr\n', 
    'F': 'en\n' 
} 

df = pd.DataFrame(
    { 
     'id': list(data.keys()), 
     'lang': [val.rstrip() for val in data.values()], 
    } 
) 

print(df)

輸出：

id lang 
0 B de 
1 A fr 
2 F en 
3 D de 
4 E fr 
5 C fr

grouped = df.groupby('lang') 
print(grouped.size())

輸出：

lang 
de 2 
en 1 
fr 3

Respon SE發表評論

Plotting：

import collections as coll 
import matplotlib.pyplot as plt 
import numpy as np 
from operator import itemgetter 

data = { 
    'A': 'fr\n', 
    'B': 'de\n', 
    'C': 'fr\n', 
    'D': 'de\n', 
    'E': 'fr\n', 
    'F': 'en\n' 
} 

counter = coll.Counter(
    [val.rstrip() for val in data.values()] 
) 

langs, lang_counts = zip(
    *sorted(counter.items(), key=itemgetter(1)) 
) 
total_langs = sum(lang_counts) 

bar_heights = np.array(lang_counts, dtype=float)/total_langs 
x_coord_left_side_of_bars = np.arange(len(langs)) 
bar_width = 0.8 

plt.bar(
    x_coord_left_side_of_bars, 
    bar_heights, 
    bar_width, 
) 

plt.xticks( 
    x_coord_left_side_of_bars + (bar_width * 0.5), #position of tick marks 
    langs #labels for tick marks 
) 
plt.xlabel('review language') 
plt.ylabel('% of all reviews') 

x = plt.plot() 
#plt.show() #Can use show() instead of savefig() until everything works correctly 
plt.savefig('lang_plot.png')

情節：使用collections.Counter工作

來源

2017-01-22 16:55:47 7stud

字典方法。我現在有一個輸出看起來像是一個字典，其中列出了各種語言的降序排列的實例數量。最後一步是，我需要將其顯示在條形圖中，以顯示每種語言所代表的評論百分比。我假設這是一個matplotlib函數，但我不清楚如何從字典中提取數據來創建此圖。 –

@AndrewSmith，計數器是無序的，這意味着你不能指望任何特定的密鑰排序。請參閱我的答案底部的matplotlib示例。 – 7stud

在你for k in data循環中，每個k是一個字符串鍵（評論id）。字符串沒有.get()方法，原始變量b對此循環也沒有任何影響。

如果你想算值，只是通過字典的值直奔Counter：

Counter(data.values())

你可能想先刪除換行符：

for line in f: 
    review_id, lang = line.split(":") 
    RevDict[review_id] = lang.strip()

來源

2017-01-22 16:51:16

如何計算字典中各種字符串的實例數

回答

相關問題