多數和它的元組

的名單平均假設代表來自3種不同的方法情緒估計的元組下面的列表：多數和它的元組

[('pos', 0.2), ('neu', 0.1), ('pos', 0.4)]

我想知道什麼是發現大多數情緒最有效的方法，爲此計算其平均值，即：

result=('pos', 0.3)

感謝

來源

2017-07-14 Nicholas

你可以使用NumPy或熊貓嗎？ –

你想以什麼方式提高效率？有效利用CPU時間，內存或開發人員時間？ – skyking

CPU時間。這些情緒來自每秒數千次API調用。謝謝 – Nicholas

import collections 

reports = [('pos', 0.2), ('neu', 0.1), ('pos', 0.4)] 

oracle = collections.defaultdict(list) 
for mood, score in reports: 
    oracle[mood].append(score) 

counts = {mood: len(scores) for mood, scores in oracle.items()} 

mood = max(counts) # gives `'pos'` 

sum(oracle[mood])/len(oracle[mood]) # gives 0.3

來源

2017-07-14 11:26:44

嘗試將''neu''改爲''zeu''。這打破了它。 –

感謝 - 這是非常全面的 – Nicholas

import itertools 

l = [('pos', 0.2), ('neu', 0.1), ('pos', 0.4)]

您可以通過感悟第一組（注意，它們需要進行排序第一）

sentiments = [list(j[1]) for j in itertools.groupby(sorted(l), lambda i: i[0])] 
# sentiments = [[('neu', 0.1)], [('pos', 0.2), ('pos', 0.4)]]

然後找出哪些情緒是最常見的（也就是具有最長的組）

majority = max(sentiments, key=len) 
# majority = [('pos', 0.2), ('pos', 0.4)]

然後最後計算平均值

values = [i[1] for i in majority] 
average = (majority[0][0], sum(values)/len(values)) 
# average = ('pos', 0.30000000000000004)

來源

2017-07-14 11:29:41 CoryKramer

使用'l'作爲變量名是癌症的主要原因。 –

謝謝你 - 我想知道是否有解決方案[這個答案]（https://stackoverflow.com/questions/31212260/group-and-compute-the-average-in-list-of-tuples ）對於這種情況將是一種矯枉過正，但顯然不是。 – Nicholas

它更好地使用詞典。定義一個嵌套字典，其中'key'是情感名稱，value是一個字典，其中包含'數字'（鍵），它是情感發生次數的情感值（值）和計數（鍵）列表值）。例如：

sentiment['pos']['numbers'] = [0.2,0.4] 
sentiment['pos']['count'] = 2 
sentiment={'pos':{'numbers':[0.2,0.4],'count':2},'neu':{'numbers':`[0.1],'count:1'}}`

來源

2017-07-14 11:30:29 pooya

使用collections和statistics模塊，你可以這樣做：

from collections import Counter 
from statistics import mean 

lst = [('pos', 0.2), ('neu', 0.1), ('pos', 0.4)] 
count = Counter(item[0] for item in lst) # Counter({'pos': 2, 'neu': 1}) 
maj = count.most_common(1)[0][0]   # pos 
mn = mean(item[1] for item in lst if item[0] == maj) 
result = (maj, mn) 

print(result) # ('pos', 0.30000000000000004)

雖然給了你正在尋找的效率我喜歡CoryKramer's answer。

來源

2017-07-14 11:31:01

感謝您的回答和指針 – Nicholas

sorted_tuples = sorted(my_tuple_list, key = lambda x : x[-1] , reverse = True) 

majority_sentiment= sorted_tuples[0][0] 
majority_sentiment_score = 0 
num_items = 0 

for sentiment_tup in sorted_tuples: 
    if sentiment_tup[0] == majority_sentiment: 
     majority_sentiment_score+= sentiment_tup[1] 
     num_items +=1 

avg_sentiment_score = majority_sentiment_score/num_items 

result= (majority_sentiment,avg_sentiment_score)

應該這樣做。

來源

2017-07-14 11:52:13

這隻發現大多數項目，並不計算其平均值。另外'排序（my_tuple_list，key = lambda x：x [-1]，reverse = True）[1]'返回另一個pos'元素 – Nicholas

我誤解了這個問題。將編輯。 –

多數和它的元組

回答

相關問題