2017-10-14 132 views
1

我有一個問題需要搜索組合的頻率的正確解決方案。計算在Dataframe列中出現組合的頻率 - Apriori算法

此我的代碼:

import pandas as pd 
import itertools 

list = [1,20,1,50] 

combinations = [] 
for i in itertools.combinations(list ,2): 
    combinations .append(i) 

data = pd.DataFrame({'products':combinations}) 

data['frequency'] = data.groupby('products')['products'].transform('count') 

print data 

The out is: 

    products frequency 
0 (1, 20)  1 
1 (1, 1)  1 
2 (1, 50)  2 
3 (20, 1)  1 
4 (20, 50)  1 
5 (1, 50)  2 

的問題是(1,20),(20,1),該頻率使1,但是相同的組合,並且具有爲2,是否有任何與方法正確的解決方案?

回答

0

可以通過使用applyand拉姆達

import pandas as pd 
import itertools 

list = [1,20,1,50] 

combinations = [] 
for i in itertools.combinations(list ,2): 
    combinations .append(i) 

data = pd.DataFrame({'products':combinations}) 

data['frequency'] = data.groupby(data['products'].apply(
    lambda i :tuple(sorted(i))))['products'].transform('count') 

print (data) 

通過在柱的變形例使用組的輸出將是

 products frequency 
0 (1, 20)   2 
1 (1, 1)   1 
2 (1, 50)   2 
3 (20, 1)   2 
4 (20, 50)   1 
5 (1, 50)   2