2017-08-25 122 views
3

我有一個數據幀,看起來像下面轉換數據幀到元組

user        item \ 
0 b80344d063b5ccb3212f76538f3d9e43d87dca9e   The Cove - Jack Johnson 
1 b80344d063b5ccb3212f76538f3d9e43d87dca9e Entre Dos Aguas - Paco De Lucia 
2 b80344d063b5ccb3212f76538f3d9e43d87dca9e   Stronger - Kanye West 
3 b80344d063b5ccb3212f76538f3d9e43d87dca9e Constellations - Jack Johnson 
4 b80344d063b5ccb3212f76538f3d9e43d87dca9e  Learn To Fly - Foo Fighters 

rating 
0  1 
1  2 
2  1 
3  1 
4  1 

,並希望實現以下結構的列表的詞典:

dict-> list of tuples 
user-> (item, rating) 

b80344d063b5ccb3212f76538f3d9e43d87dca9e -> list((The Cove - Jack 
Johnson, 1), ... ,) 

我可以這樣做:

item_set = dict((user, set(items)) for user, items in \ 
data.groupby('user')['item']) 

但這隻能讓我半途而廢。我如何從groupby中獲得相應的「評級」值?

回答

2

設置user爲指標,轉換爲使用df.groupby(level=0)使用df.apply,GROUPBY指數數組和使用dfGroupBy.agg得到一個列表,並轉換爲使用df.to_dict到詞典:

In [1417]: df 
Out[1417]: 
             user        item \ 
0 b80344d063b5ccb3212f76538f3d9e43d87dca9e   The Cove - Jack Johnson 
1 b80344d063b5ccb3212f76538f3d9e43d87dca9e Entre Dos Aguas - Paco De Lucia 
2 b80344d063b5ccb3212f76538f3d9e43d87dca9e   Stronger - Kanye West 
3 b80344d063b5ccb3212f76538f3d9e43d87dca9e Constellations - Jack Johnson 
4 b80344d063b5ccb3212f76538f3d9e43d87dca9e  Learn To Fly - Foo Fighters 

    rating 
0  1 
1  2 
2  2 
3  2 
4  2 

In [1418]: df.set_index('user').apply(tuple, 1)\ 
      .groupby(level=0).agg(lambda x: list(x.values))\ 
      .to_dict() 
Out[1418]: 
{'b80344d063b5ccb3212f76538f3d9e43d87dca9e': [('The Cove - Jack Johnson', 1), 
    ('Entre Dos Aguas - Paco De Lucia', 2), 
    ('Stronger - Kanye West', 2), 
    ('Constellations - Jack Johnson', 2), 
    ('Learn To Fly - Foo Fighters', 2)]} 
+0

正是我想要的目的。謝謝 –

+1

@OktayGardener沒問題。再過幾分鐘,如果你願意,你可以[標記我的答案](https://stackoverflow.com/help/someone-answers)。 –