pandas從系列B中獲得也在系列A中的條目;但只有在系列A中唯一的條目將具有填充值0

這是一種獨特的連接/組合，但我不知道這稱爲什麼，所以請隨時用術語來糾正我。pandas從系列B中獲得也在系列A中的條目;但只有在系列A中唯一的條目將具有填充值0

所以，比如我有一個系列profile如下：

In [1]: profile = pd.Series(data=[0.8,0.64,0.51,0.5,0.5], index=['google.com','facebook.com','twitter.com', 'instagram.com', 'github.com']) 

In [2]: profile 
Out[2]: 
google.com  0.80 
facebook.com  0.64 
twitter.com  0.51 
instagram.com 0.50 
github.com  0.50 
dtype: float6

而且我有一個transaction系列如下：

In [3]: transaction = pd.Series(data=[1,1,1,1], index=['twitter.com','facebook.com','instagram.com','9gag.com']) 

In [4]: transaction 
Out[4]: 
twitter.com  1 
facebook.com  1 
instagram.com 1 
9gag.com   1 
dtype: int64

我想實現的是，我比較了一系列windowprofile和transaction：如果transaction中的索引也存在於profile中，則我們得到該特定索引和它的相應值。其餘的指數，只有在profile是唯一有權0

In [5]: window 
Out[5]: 
google.com  0 
facebook.com  1 
twitter.com  1 
instagram.com 1 
github.com  0 
dtype: int64

填充值是否有任何現有的內置方法/功能可以做到這一點？

我已經嘗試：

window = transaction[transaction.keys().isin(profile.keys())]

但它只返回transaction和profile的交集。我在Series中遇到了這個combine()函數，但我不知道在func參數（isin()無效）中應用了什麼。

來源

2016-02-17 anobilisgorse

從Pandas的0.17.0版開始，您可以對該系列重新編制索引。

>>> transaction.reindex(profile.index).fillna(0) 
google.com  0 
facebook.com  1 
twitter.com  1 
instagram.com 1 
github.com  0 
dtype: float64

這似乎也比使用loc稍快，但我沒有測試過這在更大的數據幀。

%timeit transaction.reindex(profile.index).fillna(0) 
1000 loops, best of 3: 224 µs per loop 

%timeit transaction.loc[profile.index].fillna(0) 
1000 loops, best of 3: 329 µs per loop

來源

2016-02-17 03:59:30 Alexander

這樣比較好。如果'transaction'中沒有'profile'，它不會得到一個關鍵錯誤。 – anobilisgorse

pandas從系列B中獲得也在系列A中的條目;但只有在系列A中唯一的條目將具有填充值0

回答

相關問題