離散化在Python

概率陣的我有一個包含像下面的例子中一個物種的出現概率值的numpy的數組（從GIS柵格地圖實際進口）：離散化在Python

a = random.randint(1.0,20.0,1200).reshape(40,30) 
b = (a*1.0)/sum(a)

現在我希望得到一個該陣列的離散版本再次。就像我有例如位於該陣列區域的100個人（1200個細胞）他們如何分發？當然，它們應該根據它們的概率分佈，意味着較低的值表示較低的發生概率。但是，由於所有數據都是統計數據，因此個人的機會仍然很小，可能性不大。它應該是可能的，多個人可以對細胞佔據...

它就像再次轉化連續分佈曲線爲直方圖。像許多不同的直方圖可能會導致一定的分佈曲線，它也應該是相反的。因此，應用我所尋找的算法將會每次產生不同的離散值。

...是否有任何算法在python中可以做到這一點？由於我不熟悉離散化，也許有人可以提供幫助。

來源

2012-07-25 Johannes

「*還有一個單獨的位於低概率細胞的機會。*」 - 這意味着你要另一個隨機函數。它應該是多麼隨意？沒有隨機函數，對於相同的給定輸入，總是得到相同的結果。 – eumiro 2012-07-25 13:07:46

使用random.choice與bincount：

np.bincount(np.random.choice(b.size, 100, p=b.flat), 
      minlength=b.size).reshape(b.shape)

如果你沒有NumPy的1.7，你可以替換random.choice有：

np.searchsorted(np.cumsum(b), np.random.random(100))

，並提供：

np.bincount(np.searchsorted(np.cumsum(b), np.random.random(100)), 
      minlength=b.size).reshape(b.shape)

來源

2012-07-25 13:18:31 ecatmur

我不是那麼熟悉Python，但我無法得到模塊（random.choice）工作：>>> import numpy >>> numpy.version。版本 '1.6.2' >>> numpy.random.choice（）回溯（最近通話最後一個）：文件「」，1號線，在 numpy.random.choice（） AttributeError的： '模塊'對象沒有任何屬性'選擇' – Johannes 2012-07-25 13:38:46

@Johannes'random.choice'在NumPy 1.7中添加;你有哪個版本？試試我的替代解決方案，如果你有1.6（檢查'np.version.version'）。 – ecatmur 2012-07-25 13:46:57

這類似於到我本月早些時候的question。

import random 
def RandFloats(Size): 
    Scalar = 1.0 
    VectorSize = Size 
    RandomVector = [random.random() for i in range(VectorSize)] 
    RandomVectorSum = sum(RandomVector) 
    RandomVector = [Scalar*i/RandomVectorSum for i in RandomVector] 
    return RandomVector 

from numpy.random import multinomial 
import math 
def RandIntVec(ListSize, ListSumValue, Distribution='Normal'): 
    """ 
    Inputs: 
    ListSize = the size of the list to return 
    ListSumValue = The sum of list values 
    Distribution = can be 'uniform' for uniform distribution, 'normal' for a normal distribution ~ N(0,1) with +/- 5 sigma (default), or a list of size 'ListSize' or 'ListSize - 1' for an empirical (arbitrary) distribution. Probabilities of each of the p different outcomes. These should sum to 1 (however, the last element is always assumed to account for the remaining probability, as long as sum(pvals[:-1]) <= 1). 
    Output: 
    A list of random integers of length 'ListSize' whose sum is 'ListSumValue'. 
    """ 
    if type(Distribution) == list: 
     DistributionSize = len(Distribution) 
     if ListSize == DistributionSize or (ListSize-1) == DistributionSize: 
      Values = multinomial(ListSumValue,Distribution,size=1) 
      OutputValue = Values[0] 
    elif Distribution.lower() == 'uniform': #I do not recommend this!!!! I see that it is not as random (at least on my computer) as I had hoped 
     UniformDistro = [1/ListSize for i in range(ListSize)] 
     Values = multinomial(ListSumValue,UniformDistro,size=1) 
     OutputValue = Values[0] 
    elif Distribution.lower() == 'normal': 
     """ 
      Normal Distribution Construction....It's very flexible and hideous 
      Assume a +-3 sigma range. Warning, this may or may not be a suitable range for your implementation! 
      If one wishes to explore a different range, then changes the LowSigma and HighSigma values 
      """ 
      LowSigma = -3#-3 sigma 
      HighSigma = 3#+3 sigma 
      StepSize = 1/(float(ListSize) - 1) 
      ZValues  = [(LowSigma * (1-i*StepSize) +(i*StepSize)*HighSigma) for i in range(int(ListSize))] 
      #Construction parameters for N(Mean,Variance) - Default is N(0,1) 
      Mean  = 0 
      Var   = 1 
      #NormalDistro= [self.NormalDistributionFunction(Mean, Var, x) for x in ZValues] 
      NormalDistro= list() 
      for i in range(len(ZValues)): 
       if i==0: 
        ERFCVAL = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2)) 
        NormalDistro.append(ERFCVAL) 
       elif i == len(ZValues) - 1: 
        ERFCVAL = NormalDistro[0] 
        NormalDistro.append(ERFCVAL) 
       else: 
        ERFCVAL1 = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2)) 
        ERFCVAL2 = 0.5 * math.erfc(-ZValues[i-1]/math.sqrt(2)) 
        ERFCVAL = ERFCVAL1 - ERFCVAL2 
        NormalDistro.append(ERFCVAL) 
      #print "Normal Distribution sum = %f"%sum(NormalDistro) 
      Values = multinomial(ListSumValue,NormalDistro,size=1) 
      OutputValue = Values[0] 
     else: 
      raise ValueError ('Cannot create desired vector') 
     return OutputValue 
    else: 
     raise ValueError ('Cannot create desired vector') 
    return OutputValue 

ProbabilityDistibution = RandFloats(1200)#This is your probability distribution for your 1200 cell array 
SizeDistribution = RandIntVec(1200,100,Distribution=ProbabilityDistribution)#for a 1200 cell array, whose sum is 100 with given probability distribution

是重要的兩條主線是在代碼的最後兩行以上

來源

2012-07-25 13:29:54 torrho

到目前爲止，我認爲ecatmur的答案似乎很合理，簡單。

我只是想添加一個更「應用」的例子。考慮一個骰子與6面（6個數字）。每個數字/結果的概率爲1/6。顯示在陣列的形式中，骰子可能看起來像：

b = np.array([[1,1,1],[1,1,1]])/6.0

因此滾動的骰子100倍(n=100)結果在下面的仿真：

np.bincount(np.searchsorted(np.cumsum(b), np.random.random(n)),minlength=b.size).reshape(b.shape)

我認爲可以是這樣的一種合適的方法應用。因此，謝謝你ecatmur的幫助！

/約翰內斯

來源

2012-08-06 11:39:28 Johannes

我仍然不確定是否使用該程序來模擬例如100個骰子？ – Johannes 2012-08-08 11:01:26

離散化在Python

回答

相關問題