2016-10-19 88 views
1

我有一個數據框,在列中有響應和預測變量,在行中有觀察值。響應中的某些值低於給定的檢測限(LOD)。由於我計劃對答覆應用排名轉換,因此我希望將所有這些值設置爲等於LOD。可以說,數據幀是將混合值DataFrame中的特定值設置爲固定值?

data.head() 

    age response1 response2 response3 risk  sex smoking 
0 33 0.272206 0.358059 0.585652 no female  yes 
1 38 0.425486 0.675391 0.721062 yes female  no 
2 20 0.910602 0.200606 0.664955 yes female  no 
3 38 0.966014 0.584317 0.923788 yes female  no 
4 27 0.756356 0.550512 0.106534 no female  yes 

我願做

responses = ['response1', 'response2', 'response3'] 
LOD = 0.2 

data[responses][data[responses] <= LOD] = LOD 

其中有多個原因不工作(如大熊貓不知道是否應該產生對數據的視圖或不,它不會,我猜)

我如何在

data[responses] <= LOD 

等於LOD設置的所有值?


最少例如:

import numpy as np 
import pandas as pd 

from pandas import Series, DataFrame 

x = Series(random.randint(0,2,50), dtype='category') 
x.cat.categories = ['no', 'yes'] 

y = Series(random.randint(0,2,50), dtype='category') 
y.cat.categories = ['no', 'yes'] 

z = Series(random.randint(0,2,50), dtype='category') 
z.cat.categories = ['male', 'female'] 

a = Series(random.randint(20,60,50), dtype='category') 

data = DataFrame({'risk':x, 'smoking':y, 'sex':z, 
    'response1': random.rand(50), 
    'response2': random.rand(50), 
    'response3': random.rand(50), 
    'age':a}) 
+0

做'數據[數據[應答] <= LOD] = 0.2' – EdChum

回答

0

可以使用DataFrame.mask

import numpy as np 
import pandas as pd 

np.random.seed(123) 
x = pd.Series(np.random.randint(0,2,10), dtype='category') 
x.cat.categories = ['no', 'yes'] 
y = pd.Series(np.random.randint(0,2,10), dtype='category') 
y.cat.categories = ['no', 'yes'] 
z = pd.Series(np.random.randint(0,2,10), dtype='category') 
z.cat.categories = ['male', 'female'] 

a = pd.Series(np.random.randint(20,60,10), dtype='category') 

data = pd.DataFrame({ 
'risk':x, 
'smoking':y, 
'sex':z, 
'response1': np.random.rand(10), 
'response2': np.random.rand(10), 
'response3': np.random.rand(10), 
'age':a}) 
print (data) 
    age response1 response2 response3 risk  sex smoking 
0 24 0.722443 0.425830 0.866309 no male  yes 
1 23 0.322959 0.312261 0.250455 yes male  yes 
2 22 0.361789 0.426351 0.483034 no female  no 
3 40 0.228263 0.893389 0.985560 no female  yes 
4 59 0.293714 0.944160 0.519485 no female  no 
5 22 0.630976 0.501837 0.612895 no male  yes 
6 40 0.092105 0.623953 0.120629 no female  no 
7 27 0.433701 0.115618 0.826341 yes male  yes 
8 55 0.430863 0.317285 0.603060 yes male  yes 
9 48 0.493685 0.414826 0.545068 no male  no 
responses = ['response1', 'response2', 'response3'] 
LOD = 0.2 

print (data[responses] <= LOD) 
    response1 response2 response3 
0  False  False  False 
1  False  False  False 
2  False  False  False 
3  False  False  False 
4  False  False  False 
5  False  False  False 
6  True  False  True 
7  False  True  False 
8  False  False  False 
9  False  False  False 

data[responses] = data[responses].mask(data[responses] <= LOD, LOD) 
print (data) 
    age response1 response2 response3 risk  sex smoking 
0 24 0.722443 0.425830 0.866309 no male  yes 
1 23 0.322959 0.312261 0.250455 yes male  yes 
2 22 0.361789 0.426351 0.483034 no female  no 
3 40 0.228263 0.893389 0.985560 no female  yes 
4 59 0.293714 0.944160 0.519485 no female  no 
5 22 0.630976 0.501837 0.612895 no male  yes 
6 40 0.200000 0.623953 0.200000 no female  no 
7 27 0.433701 0.200000 0.826341 yes male  yes 
8 55 0.430863 0.317285 0.603060 yes male  yes 
9 48 0.493685 0.414826 0.545068 no male  no 
+0

如何它工作嗎? – jezrael

+0

Thx,它工作完美!今天學到了熊貓的另一個功能。 .mask看起來確實很強大。 –