確定的值連續出現

我有一個DF像這樣：確定的值連續出現

，我想在一個新列返回1如果有兩個或更多的Count的1連續出現和0如果那沒有。因此，在新列中，每行將獲得1，基於在Count列中滿足該標準。然後我所需的輸出是：

Count New_Value 
1  0 
0  0 
1  1 
1  1 
0  0 
0  0 
1  1 
1  1 
1  1 
0  0

我想我可能需要使用itertools，但我一直在閱讀有關它和整個什麼，我需要還沒有來。我希望能夠使用這種方法來計算任意數量的連續事件，而不僅僅是2次。例如，有時我需要計數10次連續出現，我在這裏的示例中只使用了2。

來源

2016-06-21 Stefano Potter

檢查是否df ['Count'] [1] == df ['Count'] [1] .shift（1）'，如果是，則爲'1'，否則爲'0'。然後，你應該'將這些值（0或1）'.append（）'放到''數組'中。然後將第一個元素（'array [0]'）設置爲'0'（默認值）。然後你必須弄清楚如何將''數組'合併/連接/插入/連接'到'dataframe'中。 100％未經測試，但我認爲這可能有效...... :) –

我可能已經簡化了我的問題太多了，如果我想要連續3次出現，該怎麼辦？我不認爲這有效然後 –

，你可以：

df['consecutive'] = df.Count.groupby((df.Count != df.Count.shift()).cumsum()).transform('size') * df.Count

獲得：

Count consecutive 
0  1   1 
1  0   0 
2  1   2 
3  1   2 
4  0   0 
5  0   0 
6  1   3 
7  1   3 
8  1   3 
9  0   0

在這裏，您可以，對於任何閾值：

threshold = 2 
df['consecutive'] = (df.consecutive > threshold).astype(int)

獲得：

Count consecutive 
0  1   0 
1  0   0 
2  1   1 
3  1   1 
4  0   0 
5  0   0 
6  1   1 
7  1   1 
8  1   1 
9  0   0

，或者在一個單一的步驟：

(df.Count.groupby((df.Count != df.Count.shift()).cumsum()).transform('size') * df.Count >= threshold).astype(int)

在效率方面，使用pandas方法提供了顯著加速時的問題的大小增長：

df = pd.concat([df for _ in range(1000)]) 

%timeit (df.Count.groupby((df.Count != df.Count.shift()).cumsum()).transform('size') * df.Count >= threshold).astype(int) 
1000 loops, best of 3: 1.47 ms per loop

相比較：

%%timeit 
l = [] 
for k, g in groupby(df.Count): 
    size = sum(1 for _ in g) 
    if k == 1 and size >= 2: 
     l = l + [1]*size 
    else: 
     l = l + [0]*size  
pd.Series(l) 

10 loops, best of 3: 76.7 ms per loop

來源

2016-06-21 02:39:32 Stefan

這裏是一個單行：'df.assign（consecutive = df.Count.groupby（（df.Count！= df.Count.shift（））。cumsum（））.transform （'size'））。query（'consecutive> @threshold'）'這將適用於任何連續的值（不僅是1和0） – MaxU

不知道這是否是最佳的，但你可以試一試：

from itertools import groupby 
import pandas as pd 

l = [] 
for k, g in groupby(df.Count): 
    size = sum(1 for _ in g) 
    if k == 1 and size >= 2: 
     l = l + [1]*size 
    else: 
     l = l + [0]*size 

df['new_Value'] = pd.Series(l) 

df 

Count new_Value 
0 1 0 
1 0 0 
2 1 1 
3 1 1 
4 0 0 
5 0 0 
6 1 1 
7 1 1 
8 1 1 
9 0 0

來源

2016-06-21 02:32:12 Psidom

確定的值連續出現

回答

相關問題