2015-09-15 158 views
2

我有一個熊貓DataFrame看起來像這樣(目前沒有一個索引以外的內置行索引,但如果它更容易添加索引到「人」和「汽車」,這也沒關係) :Flatten一個熊貓DataFrame

before = pd.DataFrame({ 
    'Email': ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]'], 
    'Person': ['John','Mary','Jane','John','Mary'], 
    'Car': ['Ford','Toyota','Nissan','Nissan','Ford'] 
}) 

我想重新塑造它看起來像這樣:

after = pd.DataFrame({ 
    'Person': ['John','Mary','Jane'], 
    'Email': ['[email protected]','[email protected]','[email protected]'], 
    'Ford': [True,True,False], 
    'Nissan': [True,False,True], 
    'Toyota': [False,True,False] 
}) 

注意,約翰已經擁有兩個福特和日產,瑪麗已擁有福特和豐田,保羅一直堅守着他可靠的日產。

我已經嘗試了堆疊多索引DataFrame,分組,pivoting的各種排列 - 我似乎無法弄清楚如何從「Car」列中取值並將其轉置到新列價值「真實」,通過他們的名字合併人們。

回答

1

不知道這是要做到這一點的最佳方式,但一個方法是 -

In [26]: before.pivot_table(index=['Email','Person'],columns=['Car'], aggfunc=lambda x: True).fillna(False).reset_index() 
Out[26]: 
Car    Email Person Ford Nissan Toyota 
0 [email protected] Jane False True False 
1 [email protected] John True True False 
2 [email protected] Mary True False True 
+0

接受,因爲我迷戀優雅的單行本,以及缺乏一次性柱。感謝您及時的回覆。 :) – Dustin

1
before['has_car'] = True 

Out[93]: 
car    Email Person has_car 
Ford [email protected] John True 
Toyota [email protected] Mary True 
Nissan [email protected] Jane True 
Nissan [email protected] John True 
Ford [email protected] Mary True 

df = before.pivot_table(index = ['Person' , 'Email'], columns= 'Car' , values='has_car') 


Out[89]: 
          Ford Nissan Toyota 
Person Email   
Jane [email protected] NaN  True NaN 
John [email protected] True True NaN 
Mary [email protected] True NaN  True 

df.fillna(False).reset_index() 

Out[102]: 
Car Person Email    Ford Nissan Toyota 
0 Jane [email protected] False True False 
1 John [email protected] True True False 
2 Mary [email protected] True False True 
+0

一步一步這對理解有幫助,謝謝! – Dustin