在對象中尋找變量的平均值python

如何迭代一組對象以最有效的方式查找它們的含義？這隻使用一個循環（除了Numpy中可能的循環），但我想知道是否有更好的方法。目前，我正在這樣做：在對象中尋找變量的平均值python

scores = [] 
ratings= [] 
negative_scores = [] 
positive_scores = [] 

for t in text_collection: 
scores.append(t.score) 
ratings.append(t.rating) 
if t.score < 0: 
    negative_scores.append(t.score) 
elif t.score > 0: 
    positive_scores.append(t.score) 

print "average score:", numpy.mean(scores) 
print "average rating:", numpy.mean(ratings) 
print "average negative score:", numpy.mean(negative_scores) 
print "average positive score:", numpy.mean(positive_scores)

有沒有更好的方法來做到這一點？

來源

2012-06-03 Zach

你需要這些變量列表或者你只想平均值？ – tokland

如果你想減少遍歷'text_collection'的次數，如果你還想最大限度地減少內存需求，你已經有了最好的解決方案 –

，你沒有最佳的解決方案。 – moooeeeep

import numpy as np 
scores, ratings = np.array([(t.score, t.rating) for t in text_collection]).T 

print 'average score: ', np.mean(scores) 
print 'average rating: ', np.mean(ratings) 
print 'average positive score: ', np.mean(scores[scores > 0]) 
print 'average negative score: ', np.mean(scores[scores < 0])

編輯：

要檢查是否其實有任何負面的分數，你可以使這樣的事情：

if np.count_nonzero(scores < 0): 
    print 'average negative score: ', np.mean(scores[scores < 0])

來源

2012-06-03 21:21:43 user545424

謝謝，爲什麼我得到這個錯誤的最後一行：'平均負分：C：\ Python27 \ lib \ site-packages \ numpy \ core \ fromnumeric.py： 2374：RuntimeWarning：遇到無效值double_scalars return mean（axis，dtype，out） nan' – Zach

是的，這意味着您沒有負面分數，並且您正在用空列表爲索引分數數組（查看最後兩行中應該查找的內容） numpy的花哨索引）。我很驚訝它不會拋出一個有用的例外，而不是狙擊南方。 – user545424

太棒了，修正了它。感謝指針。現在閱讀有趣的索引。 – Zach

您是否介意爲要從集合中獲得的每件物品循環？效率稍低，但更清晰：

avg_score = numpy.mean([t.score for t in text_collection]) 
avg_rating = numpy.mean([t.rating for t in text_collection]) 
avg_neg_score = numpy.mean([t.rating for t in text_collection if t.score < 0]) 
avg_pos_score = numpy.mean([t.rating for t in text_collection if t.score > 0])

來源

2012-06-03 20:53:02 tokland

你不需要'['和']';如果它是唯一的參數，它就變成了一個隱含括號「（...）」的生成器表達式。 – ninjagecko

@ninjagecko：numpy.mean需要一個列表。 – tokland

啊，道歉;顯然它確實要求「陣列式」。看起來像numpy的部分糟糕的編程，雖然。 – ninjagecko

如果您有NumPy可用，我認爲這是您最好的選擇。它完全符合你的要求，並有一個能夠自我記錄你在做什麼的名字。

如果你想要一個純Python的解決方案：

def mean(seq): 
    i = 0 
    sum = 0.0 
    for x in seq: 
     sum += x 
     i += 1 
    if i == 0: 
     raise ValueError, "cannot take mean of zero-length sequence" 
    return sum/i

我寫了與任何序列工作，其中包括這樣的事情計算值生成器表達式。所以它只能貫穿這個序列一次，並保留自己的計數器，以便知道有多少。如果你確認你只是想採取列表的意思是：

def list_mean(lst): 
    if len(lst) == 0: 
     raise ValueError, "cannot take mean of zero-length list" 
    return float(sum(lst))/len(lst)

如果你調用上的迭代器或發電機的表情，len()將無法正常工作，你會得到一個TypeError例外。

來源

2012-06-03 21:03:52 steveha

你可以從avg_neg_score和avg_pos_score的avg_score通過簡單的操作：

nneg = len(negative_scores) 
npos = len(positive_scores) 
avg_score = (avg_neg_score * nneg + avg_pos_score * npos)/(nneg + npos)

編輯：如果你是通過迭代text_collection創建陣列，這將是更有效的（假設你只想要的手段）：

n = len(text_collection) 
(npos, sumpos) = (0, 0) 
(nneg, sumneg) = (0, 0) 
sumrating = 0 
for t in text_collection: 
    sumrating += t.rating 
    if t.score < 0: 
     sumneg += t.score 
     nneg += 1 
    else: 
     sumpos += t.score 
     npos += 1 
avg_score = (sumneg + sumpos)/n 
avg_neg_score = sumneg/nneg 
avg_pos_score = sumpos/npos 
avg_rating = sumrating/n

EDIT2：固定：avg_neg_rating到avg_neg_score ...

來源

2012-06-03 21:15:06 AlbertFerras

在對象中尋找變量的平均值python

回答

相關問題