如何計算Python中列表中包含的集合的出現次數？

試圖實現apriori算法，並使其達到可以提取所有事務中一起出現的子集的地步。如何計算Python中列表中包含的集合的出現次數？

這是我有：

subsets = [set(['Breakfast & Brunch', 'Restaurants']), set(['American (Traditional)', 'Breakfast & Brunch']), set(['American (Traditional)', 'Restaurants']), set(['American (Traditional)', 'Breakfast & Brunch']), set(['Breakfast & Brunch', 'Restaurants']), set(['American (Traditional)', 'Restaurants'])]

例如set(['Breakfast & Brunch', 'Restaurants'])發生兩次，我需要用相應的方式跟蹤的出現次數一起。

我試着使用：

from collections import Counter 

support_set = Counter() 
# some code that generated the list above 

support_set.update(subsets)

但它生成此錯誤：

supported = itemsets_support(transactions, candidates) 
    File "apriori.py", line 77, in itemsets_support 
    support_set.update(subsets) 
    File"/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 567, in update 
    self[elem] = self_get(elem, 0) + 1 
TypeError: unhashable type: 'set'

任何想法？

來源

2017-02-13 flamenco

這可能不是先驗了，你要實現的，但「頻繁項集」的想法的幼稚和低效逼近。基準與一些較大的數據集反對ELKI或R的'arules'包裝。將所有內容放入「計數器」不會縮放。嘗試超市數據集。 –

它是Apriori的一部分。如果它縮小或者不是這個問題，那麼它就不是爲生產而建造的！ – flamenco

不，不是。 Apriori關於這樣做效率低下但效率不高。如果你忽視效率方面的話，它不再是Apriori。 –

您可以打開設置到frozenset情況下它們是可哈希：

>>> from collections import Counter 
>>> subsets = [set(['Breakfast & Brunch', 'Restaurants']), set(['American (Traditional)', 'Breakfast & Brunch']), set(['American (Traditional)', 'Restaurants']), set(['American (Traditional)', 'Breakfast & Brunch']), set(['Breakfast & Brunch', 'Restaurants']), set(['American (Traditional)', 'Restaurants'])] 
>>> c = Counter(frozenset(s) for s in subsets) 
>>> c 
Counter({frozenset(['American (Traditional)', 'Restaurants']): 2, frozenset(['Breakfast & Brunch', 'Restaurants']): 2, frozenset(['American (Traditional)', 'Breakfast & Brunch']): 2})

來源

2017-02-13 03:44:09 niemmi

這解決了我的問題。乾杯! – flamenco

如何計算Python中列表中包含的集合的出現次數？

回答

相關問題