2017-09-12 70 views
1

我期待在我的熊貓df中找到最長的零串。我有一個有10列的df數組,每列有25000行,它們有一個空值,一個零值或一個非零數字。我期待計算:如何查找熊貓數據幀中最長的連續字符串值

1. A value which states the longest consecutive number 
     of zeros in each column for all the columns. 
2. A value which states the longest consecutive number 
     of zeros AND nulls in each column for all the columns. 

例如,如果第一列是:

[col1:1,2,4,5,6,2,3,0,0,0,0,1,2,... (remaining all numbers)] 

將返回4.

感謝

+0

什麼你嘗試了嗎? – Netwave

回答

1

設置

考慮數據框df

df = pd.DataFrame(dict(
    col0=[1, 2, 3, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 9], 
    col1=[1, 2, 3, 0, 0, 4, 0, 1, 2, 3, 4, 0, 0, 0, 1, 2, 0, 0, 2, 0, 4, 8, 9] 
)) 

解決方案

def max_zeros(c): 
    v = c.values != 0 
    d = np.diff(np.flatnonzero(np.diff(np.concatenate([[True], v])))) 
    return d[::2].max() 

df.apply(max_zeros) 

col0 6 
col1 3 
dtype: int64 
1

如果你有一個像

df = pd.DataFrame([[1, 2, 4, 5, 6, 2, 3, 0, 0, 0 ,0, 1, 2],[1, 0, 0, 2, 0, 2, 0, 0, 0, 0 ,0, 1, 2]]) 

一個數據幀您可以使用itertools GROUPBY

from itertools import groupby 
def get_conti(a): 
    m = [] 
    for group in groupby(range(len(a)), lambda x: a[x]): 
     if group[0]==0: 
      zero=list(group[1]) 
      m.append(len(zero)) 
    return max(m) 

df['max'] = df.apply(get_conti,1) 

輸出:

 
    0 1 2 3 4 5 6 7 8 9 10 11 12 max 
0 1 2 4 5 6 2 3 0 0 0 0 1 2 4 
1 1 0 0 2 0 2 0 0 0 0 0 1 2 5