2014-02-22 161 views
1

我想合併/連接兩個數據幀,每個數據幀都有三個鍵(Age,Gender和Signed_In)。兩個數據框都具有相同的父級,並由groupby創建,但具有唯一的值列。Python:合併/連接兩個數據幀

鑑於獨特的組合鍵在兩個數據框之間共享,似乎合併/連接應該是無痛的。想到那裏,我想嘗試'合併'和'加入',但是不能在我的生活中解決它。

times = pd.read_csv('nytimes.csv') 

# Produces times_mean table consisting of two value columns, avg_impressions and avg_clicks 
times_mean = times.groupby(['Age','Gender','Signed_In']).mean() 
times_mean.columns = ['avg_impressions', 'avg_clicks'] 

# Produces times_max table consisting of two value columns, max_impressions and max_clicks 
times_max = times.groupby(['Age','Gender','Signed_In']).max() 
times_max.columns = ['max_impressions', 'max_clicks'] 

# Following intended to produce combined table with four value columns 
times_join = times_mean.join(times_max, on = ['Age', 'Gender', 'Signed_In']) 
times_join2 = pd.merge(times_mean, times_max, on=['Age', 'Gender', 'Signed_In']) 
+0

我們如果沒有'nytimes.csv'就無法測試。我的猜測是,既然''年齡'','性別','Signed_In''是指數,你也不需要'加入'' –

+0

'的調用,你應該提供什麼錯誤。 –

+0

欣賞筆記,我第一次發佈 - 絕對應該包含原始文件。 – jamesbev

回答

0

加入上等價的結構化MultiIndex

下面是一個例子演示這個時候你並不需要在on kwarg:

import numpy as np 
import pandas 

a = np.random.normal(size=10) 
b = a + 10 
index = pandas.MultiIndex.from_product([['A', 'B'], list('abcde')]) 

df_a = pandas.DataFrame(a, index=index, columns=['colA']) 
df_b = pandas.DataFrame(b, index=index, columns=['colB']) 

df_a.join(df_b) 

這給了我:

colA  colB 
A a -1.525376 8.474624 
    b 0.778333 10.778333 
    c 1.153172 11.153172 
    d 0.966560 10.966560 
    e 0.089765 10.089765 
B a 0.717717 10.717717 
    b 0.305545 10.305545 
    c 0.123548 10.123548 
    d -1.018660 8.981340 
    e -0.635103 9.364897 
+0

謝謝,解決了它。此外,還沒有看到MultiIndex之前 - 歡呼。 – jamesbev