如何從另一個熊貓數據框中減去一行？

我想要做的操作與合併類似。例如，通過inner合併，我們得到一個數據框，其中包含第一個數據框和第二個數據框中存在的行。通過outer合併，我們可以得到一個數據幀，無論是在第二個數據幀中的第一個OR中。如何從另一個熊貓數據框中減去一行？

我需要的是一個數據框，其中包含第一個數據框中存在的行，而不存在於第二個數據框中？有沒有一個快速和優雅的方式來做到這一點？

來源

2014-04-25 Roman

how ='left'？當然這不是你想要的（考慮到你的SO分數，它必須比這更復雜） –

左或右合併爲我提供了一個數據框，其中包含存在於其中一個數據框中的行。但我需要一個數據框，其中包含存在於一個數據框中的行，而不存在於另一個數據框中。 – Roman

如果它只是一個合併鍵，那麼你可以用'isin'和'〜'做它 –

下面的情況如何？

print df1 

    Team Year foo 
0 Hawks 2001 5 
1 Hawks 2004 4 
2 Nets 1987 3 
3 Nets 1988 6 
4 Nets 2001 8 
5 Nets 2000 10 
6 Heat 2004 6 
7 Pacers 2003 12 

print df2 

    Team Year foo 
0 Pacers 2003 12 
1 Heat 2004 6 
2 Nets 1988 6

只要有一個非鍵通常稱爲欄，可以讓加在sufffexes做的工作（如果沒有非關鍵共性列，那麼你可以創建一個臨時使用.. 。df1['common'] = 1和df2['common'] = 1）：

new = df1.merge(df2,on=['Team','Year'],how='left') 
print new[new.foo_y.isnull()] 

    Team Year foo_x foo_y 
0 Hawks 2001  5 NaN 
1 Hawks 2004  4 NaN 
2 Nets 1987  3 NaN 
4 Nets 2001  8 NaN 
5 Nets 2000  10 NaN

或者你可以使用isin但你必須創建一個鍵：

df1['key'] = df1['Team'] + df1['Year'].astype(str) 
df2['key'] = df1['Team'] + df2['Year'].astype(str) 
print df1[~df1.key.isin(df2.key)] 

    Team Year foo   key 
0 Hawks 2001 5 Hawks2001 
2 Nets 1987 3 Nets1987 
4 Nets 2001 8 Nets2001 
5 Nets 2000 10 Nets2000 
6 Heat 2004 6 Heat2004 
7 Pacers 2003 12 Pacers2003

來源

2014-04-25 05:52:44

您可以運行進軍電子商務如果您的非索引列具有帶有NaN的單元格，則會出現錯誤。

print df1 

    Team Year foo 
0 Hawks 2001 5 
1 Hawks 2004 4 
2 Nets 1987 3 
3 Nets 1988 6 
4 Nets 2001 8 
5 Nets 2000 10 
6 Heat 2004 6 
7 Pacers 2003 12 
8 Problem 2112 NaN 


print df2 

    Team Year foo 
0 Pacers 2003 12 
1 Heat 2004 6 
2 Nets 1988 6 
3 Problem 2112 NaN 

new = df1.merge(df2,on=['Team','Year'],how='left') 
print new[new.foo_y.isnull()] 

    Team Year foo_x foo_y 
0 Hawks 2001  5 NaN 
1 Hawks 2004  4 NaN 
2 Nets 1987  3 NaN 
4 Nets 2001  8 NaN 
5 Nets 2000  10 NaN 
6 Problem 2112 NaN NaN

的問題球隊在2112有在任一表foo沒有價值。因此，這裏的左連接將錯誤地返回與DataFrame中相匹配的那一行，因爲它們不存在於正確的DataFrame中。

解決方案：

我做的是一個獨特的列添加到數據框內部，併爲所有行的值。然後，當您加入時，您可以檢查該列是否爲NaN，以便內部表在外部表中查找唯一記錄。

df2['in_df2']='yes' 

print df2 

    Team Year foo in_df2 
0 Pacers 2003 12  yes 
1 Heat 2004 6  yes 
2 Nets 1988 6  yes 
3 Problem 2112 NaN  yes 


new = df1.merge(df2,on=['Team','Year'],how='left') 
print new[new.in_df2.isnull()] 

    Team Year foo_x foo_y in_df1 in_df2 
0 Hawks 2001  5 NaN  yes  NaN 
1 Hawks 2004  4 NaN  yes  NaN 
2 Nets 1987  3 NaN  yes  NaN 
4 Nets 2001  8 NaN  yes  NaN 
5 Nets 2000  10 NaN  yes  NaN

NB。問題行現在被正確過濾掉了，因爲它具有in_df2的值。

Problem 2112 NaN NaN  yes  yes

來源

2014-10-31 16:07:01 RockyRollinghills

非常好，這對我來說工作得很好。 – Dirk

考慮跟進：

df_one是第一個數據幀
df_two是第二個數據幀

目前在第一數據幀和不在第二數據幀

解決方案：指數 df = df_one[~df_one.index.isin(df_two.index)]

指數可以通過要求列在你希望做排除更換。在上面的示例中，我已將索引用作兩個數據幀之間的參考

此外，還可以使用布爾熊貓使用更復雜的查詢。以上系列解決。

來源

2016-08-04 10:26:00

如何從另一個熊貓數據框中減去一行？

回答

相關問題