熊貓 - 合併兩個dataframes具有不同數量的行

的我有以下兩個dataframes：熊貓 - 合併兩個dataframes具有不同數量的行

DF：

   value 
period 
2000-01-01 100 
2000-04-01 200 
2000-07-01 300 
2000-10-01 400 
2001-01-01 500

DF1：

   value 
period 
2000-07-01 350 
2000-10-01 450 
2001-01-01 550 
2001-04-01 600 
2001-07-01 700

這是所需的輸出：

df：

   value 
period 
2000-01-01 100 
2000-04-01 200 
2000-07-01 350 
2000-10-01 450 
2001-01-01 550 
2001-04-01 600 
2001-07-01 700

我在df1和df2上都有set_index(['period'])。在創建新列之後，我也嘗試了幾件事情，包括concat和where語句，但沒有按預期工作。我的第一個數據框是主要的。第二種是更新。它應該替換第一個對應的值，並且同時添加新的記錄（如果有的話）。

我該怎麼做？

來源

2017-05-08 sretko

它看起來像一個簡單的串連。你能否詳細說明「沒有預期的工作」？ –

這是行不通的：'pd.concat（[df，df1]，axis = 0）' – sretko

@AlIvon隨意投票接受的答案和任何其他您認爲有用的答案。 – piRSquared

您可以使用combine_first，此外，如果一些指標的dtype是object轉換to_datetime如果總是df1.index是很好的工作在df.index：

print (df.index.dtype) 
object 

print (df1.index.dtype) 
object 

df.index = pd.to_datetime(df.index) 
df1.index = pd.to_datetime(df1.index) 

df = df1.combine_first(df) 
#if necessary int columns 
#df = df1.combine_first(df).astype(int) 
print (df) 
      value 
period   
2000-01-01 100.0 
2000-04-01 200.0 
2000-07-01 350.0 
2000-10-01 450.0 
2001-01-01 550.0 
2001-04-01 600.0 
2001-07-01 700.0

如果沒有，那麼是必要的過濾器通過intersection第一：

df = df1.loc[df1.index.intersection(df.index)].combine_first(df)

numpy.setdiff1d與另一種溶液和concat

df = pd.concat([df.loc[np.setdiff1d(df.index, df1.index)], df1]) 
print (df) 
      value 
period   
2000-01-01 100 
2000-04-01 200 
2000-07-01 350 
2000-10-01 450 
2001-01-01 550 
2001-04-01 600 
2001-07-01 700

來源

2017-05-08 20:56:34 jezrael

'combine_first'完成了這項工作。謝謝。 – sretko

很高興能幫助你！美好的一天！ – jezrael

這就是你想要的嗎？

In [151]: pd.concat([df1, df.loc[df.index.difference(df1.index)]]).sort_index() 
Out[151]: 
      value 
period 
2000-01-01 100 
2000-04-01 200 
2000-07-01 350 
2000-10-01 450 
2001-01-01 550 
2001-04-01 600 
2001-07-01 700

PS確保這兩個指標是相同的D型的 - 這是更好地將其轉換爲datetime D型，使用pd.to_datetime()方法

來源

2017-05-08 20:49:34 MaxU

'TypeError：無法訂購的類型：datetime.date（）> str（）'。刪除'.sort_index（）'時，最後的結果不會到來。 2001-07-01缺失。 – sretko

@AlIvon，你的一個索引有'object' dtype，因此這個錯誤 – MaxU

這是正確的。讓我試着修復它。謝謝。 – sretko

另一種選擇用append和drop_duplicates

d1 = df1.append(df) 
d1[~d1.index.duplicated()] 

      value 
period   
2000-07-01 350 
2000-10-01 450 
2001-01-01 550 
2001-04-01 600 
2001-07-01 700 
2000-01-01 100 
2000-04-01 200

來源

2017-05-08 21:43:13 piRSquared

我用pd.concat（）共同作用關閉數據幀，然後刪除重複項以獲得結果。

df_con = pd.concat([df, df1]) 
df_con.drop_duplicates(subset="period",keep="last",inplace=True) 
print(df_con) 

     period value 
0 2000-01-01 100 
1 2000-04-01 200 
0 2000-07-01 350 
1 2000-10-01 450 
2 2001-01-01 550 
3 2001-04-01 600 
4 2001-07-01 700

要設置「期間」早在剛剛設置的索引的索引，

print(df_con.set_index("period")) 

      value 
period   
2000-01-01 100 
2000-04-01 200 
2000-07-01 350 
2000-10-01 450 
2001-01-01 550 
2001-04-01 600 
2001-07-01 700

來源

2017-05-08 22:22:13

熊貓 - 合併兩個dataframes具有不同數量的行

回答

相關問題