熊貓：最後一個非相等行的索引

我有一個帶有排序索引I的熊貓數據框F。我有興趣瞭解其中一列的最後更改，比如說A。特別是，我想構建一個與F相同索引的系列，即I，其值在i是j，其中j是小於i的最大索引值，例如F[A][j] != F[A][i]。例如，考慮下面的框架：熊貓：最後一個非相等行的索引

所需的系列將是：

1 NaN 
2 NaN 
3 2 
4 3 
5 3

有構造此係列熊貓/ numpy的慣用方法是什麼？

來源

2015-10-22 pythonic metaphor

這真是令人困惑。例如，「當前行」是什麼？ – ako

爲清晰起見進行了編輯。 –

huh ???描述仍然非常混亂。 – Alexander

試試這個：

df['B'] = np.nan 
last = np.nan 
for index, row in df.iterrows(): 
    if index == 0: 
     continue 
    if df['A'].iloc[index] != df['A'].iloc[index - 1]: 
     last = index 
    df['B'].iloc[index] = last

這將創建一個結果的新列。我相信，在通過它們時更改行並不是一個好主意，之後您可以簡單地替換列並刪除另一列，如果您願意的話。

來源

2015-10-22 02:04:54

我希望有一種方法可以避免在Python中循環。通常熊貓或numpy功能要快得多 –

我不認爲它效率更高。另外，當您想要在持有索引時循環訪問它時，您可能需要使用舊的'for'。不過，我可能是錯的。 –

np.argmax或pd.Series.argmax關於布爾數據可以幫助您找到第一個（或在這種情況下，最後一個）True值。不過，你仍然需要在這個解決方案中循環播放。

# Initiate source data 
F = pd.DataFrame({'A':[5,5,6,2,2]}, index=list('fobni')) 

# Initiate resulting Series to NaN 
result = pd.Series(np.nan, index=F.index) 

for i in range(1, len(F)): 
    value_at_i = F['A'].iloc[i] 
    values_before_i = F['A'].iloc[:i] 
    # Get differences as a Boolean Series 
    # (keeping the original index) 
    diffs = (values_before_i != value_at_i) 
    if diffs.sum() == 0: 
     continue 
    # Reverse the Series of differences, 
    # then find the index of the first True value 
    j = diffs[::-1].argmax() 
    result.iloc[i] = j

來源

2016-03-30 16:58:07 NTAWolf

熊貓：最後一個非相等行的索引

回答

相關問題