2016-06-08 54 views
0

我是新來的大熊貓和我得到一個任務:爲每一個產品找到最在同一會話熊貓Python。前3推薦項目在列

數據幀viewed.products看起來像一起觀看其他三個產品:

session    products 
00b3a43caf4209d2/10 1536 
00b3a43caf4209d2/10 42 
00b3a43caf4209d2/10 395 
00b3a43caf4209d2/10 590 
00b3a43caf4209d2/10 2031 
00b3a43caf4209d2/11 1309 
00b3a43caf4209d2/11 1879 
00b3a43caf4209d2/11 1309 
00b3a43caf4209d2/11 1879 
00b3a43caf4209d2/5 73 
00b3a43caf4209d2/5 147 
00b3a43caf4209d2/5 585 
00b3a43caf4209d2/5 774 
00b3a43caf4209d2/5 781 
00b3a43caf4209d2/5 1384 
00b3a43caf4209d2/5 1463 
00b3a43caf4209d2/6 73 
00b3a43caf4209d2/6 156 
00b3a43caf4209d2/6 1669 
00b3a43caf4209d2/6 52 
00b3a43caf4209d2/6 73 
00b3a43caf4209d2/6 156 

和期望的輸出如下所示(例如):

product recommended_products 
1536 42 
     73 
     2031 
42  73 
     1309 
     156 
395  781 
     585 
     1536 
590  147 
     42 
     781 

我假定必須有合計函數將它們分組,但我不能找出哪一個。

+3

我沒有看到這個模式選擇'recommended_products'。 – Lafexlos

回答

1

我認爲你可以在session先用mergeproducts,然後使用groupbynlargest如果需要頂部3值:

print (df) 
        products 
session      
00b3a43caf4209d2/10  1536 
00b3a43caf4209d2/10  42 
00b3a43caf4209d2/10  395 
00b3a43caf4209d2/10  590 
00b3a43caf4209d2/10  2031 
00b3a43caf4209d2/11  1309 
00b3a43caf4209d2/11  1879 
00b3a43caf4209d2/11  1309 
00b3a43caf4209d2/11  1879 
00b3a43caf4209d2/5   73 
00b3a43caf4209d2/5  147 
00b3a43caf4209d2/5  585 
00b3a43caf4209d2/5  774 
00b3a43caf4209d2/5  781 
00b3a43caf4209d2/5  1384 
00b3a43caf4209d2/5  1463 
00b3a43caf4209d2/6   73 
00b3a43caf4209d2/6  156 
00b3a43caf4209d2/6  1669 
00b3a43caf4209d2/6   52 
00b3a43caf4209d2/6   73 
00b3a43caf4209d2/6  156 
#if first column is index 
df.reset_index(inplace=True) 

df = pd.merge(df[['products', 'session']], 
       df[['products', 'session']], 
       on='session', 
       suffixes=('','_recommended')) 

print (df) 
    products    session products_recommended 
0  1536 00b3a43caf4209d2/10     1536 
1  1536 00b3a43caf4209d2/10     42 
2  1536 00b3a43caf4209d2/10     395 
3  1536 00b3a43caf4209d2/10     590 
4  1536 00b3a43caf4209d2/10     2031 
5   42 00b3a43caf4209d2/10     1536 
6   42 00b3a43caf4209d2/10     42 
7   42 00b3a43caf4209d2/10     395 
8   42 00b3a43caf4209d2/10     590 
9   42 00b3a43caf4209d2/10     2031 
10  395 00b3a43caf4209d2/10     1536 
11  395 00b3a43caf4209d2/10     42 
12  395 00b3a43caf4209d2/10     395 
13  395 00b3a43caf4209d2/10     590 
14  395 00b3a43caf4209d2/10     2031 
... 
... 
print (df.groupby(['session','products'])['products_recommended'] 
     .nlargest(3) 
     .reset_index() 
     .drop('level_2', axis=1)) 

       session products products_recommended 
0 00b3a43caf4209d2/10  42     2031 
1 00b3a43caf4209d2/10  42     1536 
2 00b3a43caf4209d2/10  42     590 
3 00b3a43caf4209d2/10  395     2031 
4 00b3a43caf4209d2/10  395     1536 
5 00b3a43caf4209d2/10  395     590 
6 00b3a43caf4209d2/10  590     2031 
7 00b3a43caf4209d2/10  590     1536 
8 00b3a43caf4209d2/10  590     590 
9 00b3a43caf4209d2/10  1536     2031 
10 00b3a43caf4209d2/10  1536     1536 
11 00b3a43caf4209d2/10  1536     590 
12 00b3a43caf4209d2/10  2031     2031 
13 00b3a43caf4209d2/10  2031     1536 
... 
...