添加一個新列並根據python中定義的intervall插入特定值

如何在熊貓數據框中添加新列併爲所有值插入1 < = W1，所有值爲2 < = W2和3對於所有值> W2？添加一個新列並根據python中定義的intervall插入特定值

W1=3 
W2=6

這是我的情況爲例：

column1 number 
2  1 
1  1 
5  2 
6  2 
7  3 
8  3 
3  1

來源

2016-11-23 matthew

您可以雙擊numpy.where：

W1=3 
W2=6 

df['d'] = np.where(df['column1'] <= W1, 1, 
      np.where(df['column1'] <= W2, 2, 3)) 
print (df) 
    column1 number d 
0  2  1 1 
1  1  1 1 
2  5  2 2 
3  6  2 2 
4  7  3 3 
5  8  3 3 
6  3  1 1

與cut另一種解決方案，docs：

bins = [-np.inf, W1, W2, np.inf] 
labels=[1,2,3] 
df['d1'] = pd.cut(df['column1'], bins=bins, labels=labels) 
print (df) 

    column1 number d d1 
0  2  1 1 1 
1  1  1 1 1 
2  5  2 2 2 
3  6  2 2 2 
4  7  3 3 3 
5  8  3 3 3 
6  3  1 1 1

來源

2016-11-23 16:36:30 jezrael

df['new'] = df.column1.gt(W1).add(1).add(df.column1.gt(W2)) 

df

當column1比W1時，我們得到True。小於或等於False。當我添加1時，這些布爾值分別轉換爲整數值1和0。所以結果是2和1對於True和False（因爲我加了1）。所以，截至目前，我有1小於或等於W1和2大於W1。我通過添加column1大於W2的布爾系列來完成它，如果小於或等於W2則添加0，並且當column1大於W2時將1添加到2的。

我可以告訴它像這樣使它更加明顯它在做什麼

c = df.column1 
(c > W1) + 1 + (c > W2) 

0 1 
1 1 
2 2 
3 2 
4 3 
5 3 
6 1 
Name: column1, dtype: int64

來源

2016-11-23 16:59:35 piRSquared

下面是一個使用np.searchsorted的方法 -

df['out'] = np.searchsorted([W1,W2],df.column1)+1

運行測試 -

In [230]: df = pd.DataFrame(np.random.randint(0,10,(10000)),columns=[['column1']]) 

In [231]: W1,W2 = 3,6 

In [232]: %timeit np.where(df['column1'] <= W1, 1,np.where(df['column1'] <= W2, 2, 3)) 
1000 loops, best of 3: 633 µs per loop # @jezrael's soln 

In [233]: %timeit df.column1.gt(W1).add(1).add(df.column1.gt(W2)) 
1000 loops, best of 3: 1.07 ms per loop # @piRSquared's soln 

In [234]: %timeit np.searchsorted([W1,W2],df.column1)+1 
1000 loops, best of 3: 205 µs per loop # Using np.searchsorted

使用df.column1.values ，因此np.searchsorted可與NumPy數組一起工作以進一步提升 -

In [235]: %timeit np.searchsorted([W1,W2],df.column1.values)+1 
1000 loops, best of 3: 184 µs per loop

來源

2016-11-23 17:36:32 Divakar

太棒了！我將把這個想法銘刻在我腦海中。 – piRSquared

@piRSquared很高興與'pandas'大師分享一些有用的東西！ – Divakar

添加一個新列並根據python中定義的intervall插入特定值

回答

相關問題