熊貓數據幀時間序列丟失重複信息

我試圖通過組合2個CSV文件來更新溫度時間序列，這些文件可能有時會出現重複行。熊貓數據幀時間序列丟失重複信息

我試圖實施drop_duplicates，但它不適合我。

這裏是什麼，我試圖做一個例子：

import pandas as pd 
import numpy as np 

from pandas import DataFrame, Series 


dfA = DataFrame({'date' : Series(['1/1/10','1/2/10','1/3/10','1/4/10'], index=[0,1,2,3]), 
    'a' : Series([60,57,56,50], index=[0,1,2,3]), 
    'b' : Series([80,73,76,56], index=[0,1,2,3])}) 

print("dfA")  
print(dfA) 

dfB = DataFrame({'date' : Series(['1/3/10','1/4/10','1/5/10','1/6/10'], index=[0,1,2,3]), 
    'a' : Series([56,50,59,75], index=[0,1,2,3]), 
    'b' : Series([76,56,73,89], index=[0,1,2,3])}) 

print("dfB") 
print(dfB) 

dfC = dfA.append(dfB) 

print(dfC.duplicated()) 

dfC.drop_duplicates() 
print("dfC") 
print(dfC)

這是輸出：

dfA 
    a b date 
0 60 80 1/1/10 
1 57 73 1/2/10 
2 56 76 1/3/10 
3 50 56 1/4/10 
dfB 
    a b date 
0 56 76 1/3/10 
1 50 56 1/4/10 
2 59 73 1/5/10 
3 75 89 1/6/10 
0 False 
1 False 
2 False 
3 False 
0  True 
1  True 
2 False 
3 False 
dtype: bool 
dfC 
    a b date 
0 60 80 1/1/10 
1 57 73 1/2/10 
2 56 76 1/3/10 
3 50 56 1/4/10 
0 56 76 1/3/10 
1 50 56 1/4/10 
2 59 73 1/5/10 
3 75 89 1/6/10

如何更新時間序列重疊數據，而不必重複？

來源

2014-09-18 Bill G.

嘿比爾：檢查了這一點http://stackoverflow.com/questions/13035764/remove-rows-with-duplicate-indices-pandas-dataframe-and-timeseries – 2014-09-18 18:36:30

而不是說「它不適合我」，它將會有助於描述*爲什麼*它不起作用。你會得到例外，不好的結果還是沒有迴應？ – skrrgwasme 2014-09-18 18:39:33

行dfC.drop_duplicates()實際上並不改變dfC綁定的DataFrame（它只是返回它的副本而沒有重複的行）。

您可以指定數據幀dfC被傳入inplace關鍵字參數修改就地，

dfC.drop_duplicates(inplace=True)

或重新綁定消除重複數據幀的視圖名稱dfC這樣

dfC = dfC.drop_duplicates()

來源

2014-09-18 18:34:14

當然。很簡單。現在這會從合併的CSV文件中刪除重複的行。非常感謝你。 Bill Bill – 2014-09-23 21:21:59

@BillG。很高興它是有幫助的！順便說一句，如果答案解決了問題，您可以通過[接受答案]告訴社區（http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work/5235# 5235）。 – 2014-10-04 11:53:37

熊貓數據幀時間序列丟失重複信息

回答

相關問題