灌裝據幀具有獨特的正整數

我有一個數據幀，看起來像這樣灌裝據幀具有獨特的正整數

col1 col2 col3 col4 col5 
0 0 1 0  1  1 
1 0 1 0  0  1

我想分配一個唯一的正整數大於1，每個0項。

，所以我希望有一個數據幀，看起來像這樣

 col1 col2 col3 col4 col5  
    0 2 1  3  1 1 
    1 4 1  5  6 1

的整數不必是從一個有序序列，只是積極的和獨特的。

來源

2016-03-02 mikeL

np.arange(...).reshape(df.shape)生成一個數據幀起始於2

df.where(df, ...)作品df由連續整數西伯因爲你的數據幀由二進制指示符（零和一）的。它保留所有真值（即那些值），然後使用連續的numpy數組填充零。

# optional: inplace=True 
>>> df.where(df, np.arange(start=2, stop=df.shape[0] * df.shape[1] + 2).reshape(df.shape)) 
    col1 col2 col3 col4 col5 
0  2  1  4  1  1 
1  7  1  9 10  1

來源

2016-03-02 16:33:30 Alexander

我認爲你可以使用numpy.arange與shape生成唯一的隨機號碼和df == 0布爾掩模產生替換所有0：

print df 
    col1 col2 col3 col4 col5 
0  0  1  0  1  1 
1  0  1  0  0  1 

print df == 0 
    col1 col2 col3 col4 col5 
0 True False True False False 
1 True False True True False 

print df.shape 
(2, 5) 

#count of integers 
min_count = df.shape[0] * df.shape[1] 
print min_count 
10 

#you need add 2, because omit 0 and 1 
print np.arange(start=2, stop=min_count + 2).reshape(df.shape) 
[[ 2 3 4 5 6] 
[ 7 8 9 10 11]] 

#use integers from 2 to max count of values of df 
df[ df == 0 ] = np.arange(start=2, stop=min_count + 2).reshape(df.shape) 
print df 
    col1 col2 col3 col4 col5 
0  2  1  4  1  1 
1  7  1  9 10  1

或者使用numpy.random.choice更大的唯一的隨機整數：

#count of integers 
min_count = df.shape[0] * df.shape[1] 
print min_count 
10 
#you can use bigger number in np.arange, e.g. 100, but minimal is min_count + 2 
df[ df == 0 ] = np.random.choice(np.arange(2, 100), replace=False, size=df.shape) 
print df 
    col1 col2 col3 col4 col5 
0 17  1 53  1  1 
1 39  1 15 76  1

來源

2016-03-02 16:02:36 jezrael

這並不能保證唯一性。你可以選擇相同的隨機數。 – Alexander

@Alexander - 你是對的。我編輯答案。謝謝。 – jezrael

雖然這不是熊貓最大的表現，但它仍然有效：

import random 

MAX_INT = 100 

for row in df: 
    for col in row: 
     if col == 0: 
      col == random.randrange(1, MAX_INT)

類似itertuples()會更快，但如果它不是很多數據，這是好的。

來源

2016-03-02 16:03:09

df[df == 0] = np.random.choice(np.arange(2, df.size + 2), replace=False, size=df.shape)

地塊已經很好的答案在這裏，但這個扔在那裏的。

replace指示樣本是否有替換。
np.arange來自（2,size of the df + 2）。它2，因爲你希望它大於1
size必須是相同的形狀df所以我只是用df.shape

爲了說明什麼數組值np.random.choice產生：

>>> np.random.choice(np.arange(2, df.size + 2), replace=False, size=df.shape) 
array([[11, 4, 6, 5, 9], 
     [ 7, 8, 10, 3, 2]])

請注意，它們都大於1，都是獨一無二的。

前：

col1 col2 col3 col4 col5 
0  0  1  0  1  1 
1  0  1  0  0  1

後：

col1 col2 col3 col4 col5 
0  9  1  7  1  1 
1  6  1  3 11  1

來源

2016-03-02 19:59:05 Jarad

灌裝據幀具有獨特的正整數

回答

相關問題