Python Pandas Dataframe：將數據標準化爲0.01到0.99之間？

我0.01和0.99Python Pandas Dataframe：將數據標準化爲0.01到0.99之間？

之間試圖綁定每個值在數據幀我已經使用成功的歸一化數據0和1之間：.apply(lambda x: (x - x.min())/(x.max() - x.min()))如下：

df = pd.DataFrame({'one' : ['AAL', 'AAL', 'AAPL', 'AAPL'], 'two' : [1, 1, 5, 5], 'three' : [4,4,2,2]}) 

df[['two', 'three']].apply(lambda x: (x - x.min())/(x.max() - x.min())) 

df

現在我想綁定的所有值0.01和0.99

之間這是我曾嘗試：

def bound_x(x): 
    if x == 1: 
     return x - 0.01 
    elif x < 0.99: 
     return x + 0.01 

df[['two', 'three']].apply(bound_x)

但我收到以下錯誤：

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index two')

來源

2016-03-19 jfive

有一個應用程序，犯錯clip method，對於：

產量

two three 
0 0.01 0.99 
1 0.01 0.99 
2 0.99 0.01 
3 0.99 0.01

問題與

df[['two', 'three']].apply(bound_x)

是bound_x被傳遞了一系列像df['two']然後if x == 1要求x == 1是在布爾上下文評估。 x == 1就像

In [44]: df['two'] == 1 
Out[44]: 
0 False 
1 False 
2  True 
3  True 
Name: two, dtype: bool

Python的一個布爾系列嘗試這個系列減少到單個布爾值，True或False。熊貓遵循raising an error when you try to convert a Series (or array) to a bool的NumPy約定。

來源

2016-03-19 13:32:08 unutbu

所以我有一個類似的問題，我想定製規範化，因爲我的基準或z分數的常規百分位數不夠。有時候我知道人口的可行最大和最小值是什麼，因此想要定義它，而不是我的樣本，或者不同的中點，或者其他任何東西！所以，我建立了一個自定義函數（用在代碼中加入額外步驟，在這裏，使其儘可能地易讀）：

def NormData(s,low='min',center='mid',hi='max',insideout=False,shrinkfactor=0.):  
    if low=='min': 
     low=min(s) 
    elif low=='abs': 
     low=max(abs(min(s)),abs(max(s)))*-1.#sign(min(s)) 
    if hi=='max': 
     hi=max(s) 
    elif hi=='abs': 
     hi=max(abs(min(s)),abs(max(s)))*1.#sign(max(s)) 

    if center=='mid': 
     center=(max(s)+min(s))/2 
    elif center=='avg': 
     center=mean(s) 
    elif center=='median': 
     center=median(s) 

    s2=[x-center for x in s] 
    hi=hi-center 
    low=low-center 
    center=0. 

    r=[] 

    for x in s2: 
     if x<low: 
      r.append(0.) 
     elif x>hi: 
      r.append(1.) 
     else: 
      if x>=center: 
       r.append((x-center)/(hi-center)*0.5+0.5) 
      else: 
       r.append((x-low)/(center-low)*0.5+0.) 

    if insideout==True: 
     ir=[(1.-abs(z-0.5)*2.) for z in r] 
     r=ir 

    rr =[x-(x-0.5)*shrinkfactor for x in r]  
    return rr

這將需要在熊貓系列，甚至只是一個列表，並將其歸到自己指定的低，中心和高點。還有一個收縮因素！以允許您將數據從0和1分開（我必須在matplotlib中組合色彩地圖時執行此操作：Single pcolormesh with more than one colormap using Matplotlib）因此，您可能會看到代碼是如何工作的，但基本上說您有值[-5,1,10 ]，但要基於-7到7的範圍進行標準化（因此，大於7的任何數據，我們的「10」被有效地視爲7），中點爲2，但縮小到適合256 RGB顏色映射：

#In[1] 
NormData([-5,2,10],low=-7,center=1,hi=7,shrinkfactor=2./256) 
#Out[1] 
[0.1279296875, 0.5826822916666667, 0.99609375]

它也可以把你的數據裏面出來......這可能看起來很奇怪，但我發現它適用於熱映射。假設你想要一個更接近0的值而不是hi/low值較深的顏色。你可以熱圖基於標準化的數據，其中insideout = TRUE：

#In[2] 
NormData([-5,2,10],low=-7,center=1,hi=7,insideout=True,shrinkfactor=2./256) 
#Out[2] 
[0.251953125, 0.8307291666666666, 0.00390625]

所以現在「2」，這是最接近中心，定義爲「1」爲最高值。

無論如何，我認爲我的問題與您的問題非常相似，並且此功能可能對您有用。

來源

2017-05-05 18:13:24 Vlox

Python Pandas Dataframe：將數據標準化爲0.01到0.99之間？

回答

相關問題