每個值分配給一個範圍上數據幀

我有這樣每個值分配給一個範圍上數據幀

[(1,3), (3,5), (5,7), (7,9)]

陣列states和這樣

我需要分配屬於範圍的索引上states一個數據幀df得到像這樣的東西

l y state 
0 a 8 3 
1 b 3 0 
2 c 7 2 
3 d 4 1 
4 e 1 0

對於每個範圍在states，該y值必須屬於範圍(start, end]除了在第一範圍，其中1不屬於(1,3)

到目前爲止，我有這個

def asign(x): 
    for a,b in states: 
     if x>=a and x<=b: 
      return states.index((a,b)) 
df['state'] = df.y.apply(asign)

，但我需要一個更快，更有效的方法更大的數據框架，任何想法？

來源

2016-06-20 Rosa Alejandra

使用pandas.cut()：

bins=pd.Series([1,3,5,7,9, np.inf]) 
df['state'] = pd.cut(df.y, bins=bins, labels=bins.index[:-1], include_lowest=True)

輸出：

In [113]: df 
Out[113]: 
    l y state 
0 a 8  3 
1 b 3  0 
2 c 7  2 
3 d 4  1 
4 e 1  0

如何在states元組列表轉換爲平板pd.Series：

In [125]: states 
Out[125]: [(1, 3), (3, 5), (5, 7), (7, 9)] 

In [126]: bins = pd.Series(np.unique(list(sum(states,())))) 

In [127]: bins 
Out[127]: 
0 1 
1 3 
2 5 
3 7 
4 9 
dtype: int32 

In [128]: bins.tolist() 
Out[128]: [1, 3, 5, 7, 9]

來源

2016-06-20 21:08:29 MaxU

爲了避免上循環使用.apply()所有行而是分配states在矢量方式：

df['states'] = 0 
for i, state in enumerate(states): 
    df.loc[(df.y > state[0]) & (df.y <= state[1]), 'states'] = i

獲得：

l y states 
0 a 8  3 
1 b 3  0 
2 c 7  2 
3 d 4  1 
4 e 1  0

來源

2016-06-20 21:11:04 Stefan

每個值分配給一個範圍上數據幀

回答

相關問題