2014-01-21 51 views
1

我有一個從excel導出的包含數據值的數組,如圖所示。第一列x和第二列y是因變量,而第三列z是獨立變量(輸出)。使用curve_fit從scipy.optimize求解數據集的係數

from xlrd import open_workbook 

Data = open_workbook("simple.xls") 
sheet = Data.sheet_by_name('Sheet1') 

A=[] 

# Read row by row 
for rownum in range(sheet.nrows): 
    rowValues = sheet.row_values(rownum) 
    A.append(rowValues) 

A = np.array(A) 

A= 
[[ 0.00000000e+00 1.49761692e-05 0.00000000e+00] 
[ 8.85000000e+02 1.49761692e-05 6.41362500e-02] 
[ 1.48500000e+03 1.49761692e-05 1.19340000e-01] 
[ 2.09000000e+03 1.49761692e-05 1.58760000e-01] 
[ 3.36000000e+03 1.49761692e-05 2.08080000e-01] 
[ 3.87000000e+03 1.49761692e-05 2.16933750e-01] 
[ 6.48000000e+03 1.49761692e-05 2.46746250e-01] 
[ 8.22000000e+03 1.49761692e-05 2.54700000e-01] 
[ 1.05300000e+04 1.49761692e-05 2.59470000e-01] 
[ 1.58250000e+04 1.49761692e-05 2.62035000e-01] 
[ 2.37600000e+04 1.49761692e-05 2.68751250e-01] 
[ 8.18400000e+04 1.49761692e-05 2.92848750e-01] 
[ 0.00000000e+00 8.57250668e-06 0.00000000e+00] 
[ 6.75000000e+02 8.57250668e-06 4.97436412e-02] 
[ 1.27500000e+03 8.57250668e-06 1.27749375e-01] 
[ 1.88000000e+03 8.57250668e-06 1.88617039e-01] 
[ 3.15000000e+03 8.57250668e-06 2.65089780e-01] 
[ 3.66000000e+03 8.57250668e-06 2.90344849e-01] 
[ 6.27000000e+03 8.57250668e-06 3.36295316e-01] 
[ 8.01000000e+03 8.57250668e-06 3.42702439e-01] 
[ 1.03200000e+04 8.57250668e-06 3.65205982e-01] 
[ 1.56150000e+04 8.57250668e-06 3.67269626e-01] 
[ 2.35500000e+04 8.57250668e-06 3.87296798e-01] 
[ 8.16300000e+04 8.57250668e-06 4.43486869e-01] 
[ 0.00000000e+00 4.26671486e-06 0.00000000e+00] 
[ 4.65000000e+02 4.26671486e-06 2.61407250e-02] 
[ 1.06500000e+03 4.26671486e-06 1.22371762e-01] 
[ 1.67000000e+03 4.26671486e-06 2.19629475e-01] 
[ 2.94000000e+03 4.26671486e-06 3.26680087e-01] 
[ 3.45000000e+03 4.26671486e-06 3.34340662e-01] 
[ 6.06000000e+03 4.26671486e-06 4.18330575e-01] 
[ 7.80000000e+03 4.26671486e-06 4.50631350e-01] 
[ 1.01100000e+04 4.26671486e-06 4.55053950e-01] 
[ 1.54050000e+04 4.26671486e-06 4.60937587e-01] 
[ 2.33400000e+04 4.26671486e-06 5.10770813e-01] 
[ 8.14200000e+04 4.26671486e-06 6.12569587e-01] 
[ 0.00000000e+00 2.13335743e-06 0.00000000e+00] 
[ 8.55000000e+02 2.13335743e-06 1.03773150e-01] 
[ 1.46000000e+03 2.13335743e-06 2.21130000e-01] 
[ 2.73000000e+03 2.13335743e-06 3.45515625e-01] 
[ 3.24000000e+03 2.13335743e-06 3.85634925e-01] 
[ 5.85000000e+03 2.13335743e-06 4.76061300e-01] 
[ 7.59000000e+03 2.13335743e-06 4.79220300e-01] 
[ 1.51950000e+04 2.13335743e-06 5.24709900e-01] 
[ 2.31300000e+04 2.13335743e-06 5.64829200e-01] 
[ 8.12100000e+04 2.13335743e-06 6.46568325e-01] 
[ 0.00000000e+00 1.42359023e-06 0.00000000e+00] 
[ 6.45000000e+02 1.42359023e-06 8.03596500e-02] 
[ 1.25000000e+03 1.42359023e-06 2.36700000e-01] 
[ 2.52000000e+03 1.42359023e-06 4.25941650e-01] 
[ 3.03000000e+03 1.42359023e-06 4.61683350e-01] 
[ 5.64000000e+03 1.42359023e-06 5.99561100e-01] 
[ 7.38000000e+03 1.42359023e-06 6.05952000e-01] 
[ 9.69000000e+03 1.42359023e-06 6.16958550e-01] 
[ 1.49850000e+04 1.42359023e-06 6.57434250e-01] 
[ 2.29200000e+04 1.42359023e-06 6.45954300e-01] 
[ 8.10000000e+04 1.42359023e-06 7.79689800e-01] 
[ 0.00000000e+00 9.36010573e-07 0.00000000e+00] 
[ 4.35000000e+02 9.36010573e-07 3.40200000e-02] 
[ 1.04000000e+03 9.36010573e-07 1.91160000e-01] 
[ 2.31000000e+03 9.36010573e-07 3.77640000e-01] 
[ 2.82000000e+03 9.36010573e-07 4.44240000e-01] 
[ 5.43000000e+03 9.36010573e-07 5.50440000e-01] 
[ 7.17000000e+03 9.36010573e-07 5.36580000e-01] 
[ 9.48000000e+03 9.36010573e-07 5.83740000e-01] 
[ 1.47750000e+04 9.36010573e-07 5.87340000e-01] 
[ 2.27100000e+04 9.36010573e-07 6.33060000e-01] 
[ 8.07900000e+04 9.36010573e-07 7.36200000e-01]] 

x= A[:,0] 
y= A[:,1] 
z= A[:,2] 

我有將融入從陣列A中的數據,以便求解係數ab的功能。

def func(data,a,b): 
    return a/(data[:,1]*b)*np.log(1+(data[:,1]*b/a)*(1-np.exp(-a*data[:,0]))) 

的代碼的其餘部分示出了係數ab,所述scipy.optimize.curve_fit()函數的初始猜測,並matplotlib.pyplot繪製的結果。

guess = [3.0e-5, 128 ] 

print guess, 'initial guessed parameters' 

params, pcov = scipy.optimize.curve_fit(func, A[:,:2], A[:,2], guess) 

print params, 'fitted parameters' 

import matplotlib.pyplot as plt 
plt.plot(x,func(A,params[0],params[1]),'-r',x,z,'o') 
plt.title('Plot') 
plt.legend(['Fit', 'Data'], loc='lower right') 
plt.show() 

積的結果是這樣的

enter image description here

而且所得的係數是:

[3e-05, 128] initial guessed parameters 
[ 2.00773153e-04 1.22752179e+02] fitted parameters 

因爲所有的數據是內部array A,scipy認爲,點在數組中從一個點連接到另一個點,導致每條曲線結束返回到原點,這是als o後續曲線的開始。

我應該如何編碼python,以便scipy.optimize.curve_fit知道數組中的數據由多條曲線組成,而不是一條單獨的聯合數據?任何建議將不勝感激。

+0

你是如何從excel中導出數組的?它是否已經以這種格式存在,或者導出爲ex​​cel,然後導入到python導致它以這種格式?具體來說,我想知道這個數據最初是在三個單獨的列中。 – cosmosis

+0

@cosmosis數據在我的電子表格中,因此它已經是這種格式。我編輯了代碼以顯示如何從excel導入。 – user3211991

+0

謝謝你。你是否嘗試過單獨閱讀每一列,例如'x_column = sheet.col_values(0)',因爲這是對你正試圖完成的最有用的格式?它需要修改你的'func'函數定義,但一般來說應該更容易。 – cosmosis

回答

0

我已經編輯了代碼(後面會附上),您只需將它剪切並粘貼到Python中,以防其他人想要嘗試它。

雖然我不確定我是否理解你的問題。它看起來是xy是您的獨立變量(而不是依賴變量)和z您的因變量(即從每對(x,y)中計算出來的東西)。在這種情況下,我想你會想要一個三維圖 - 目前,如果我正在閱讀這個權利,你正在繪製z vs x而不是顯示y

假設你確實希望這樣做,我同意你最好的評論,如果你分開分開的曲線 - 我認爲迴歸零對你的適應性有負面影響。您可以使用np.where(A[:,0]==0)[0]來查找x==0並在循環中使用它來拆分A的索引 - 儘管我認爲np.split(A,np.where(A[:,0]==0)[0])可以在一行中爲您完成。

from scipy.optimize import curve_fit 
import numpy as np 
import matplotlib.pyplot as plt 

def func(data,a,b): 
    return a/(data[:,1]*b)*np.log(1+(data[:,1]*b/a)*(1-np.exp(-a*data[:,0]))) 

A=np.array(
(0.00000000e+00, 1.49761692e-05, 0.00000000e+00, 
8.85000000e+02, 1.49761692e-05, 6.41362500e-02, 
1.48500000e+03, 1.49761692e-05, 1.19340000e-01, 
2.09000000e+03, 1.49761692e-05, 1.58760000e-01, 
3.36000000e+03, 1.49761692e-05, 2.08080000e-01, 
3.87000000e+03, 1.49761692e-05, 2.16933750e-01, 
6.48000000e+03, 1.49761692e-05, 2.46746250e-01, 
8.22000000e+03, 1.49761692e-05, 2.54700000e-01, 
1.05300000e+04, 1.49761692e-05, 2.59470000e-01, 
1.58250000e+04, 1.49761692e-05, 2.62035000e-01, 
2.37600000e+04, 1.49761692e-05, 2.68751250e-01, 
8.18400000e+04, 1.49761692e-05, 2.92848750e-01, 
0.00000000e+00, 8.57250668e-06, 0.00000000e+00, 
6.75000000e+02, 8.57250668e-06, 4.97436412e-02, 
1.27500000e+03, 8.57250668e-06, 1.27749375e-01, 
1.88000000e+03, 8.57250668e-06, 1.88617039e-01, 
3.15000000e+03, 8.57250668e-06, 2.65089780e-01, 
3.66000000e+03, 8.57250668e-06, 2.90344849e-01, 
6.27000000e+03, 8.57250668e-06, 3.36295316e-01, 
8.01000000e+03, 8.57250668e-06, 3.42702439e-01, 
1.03200000e+04, 8.57250668e-06, 3.65205982e-01, 
1.56150000e+04, 8.57250668e-06, 3.67269626e-01, 
2.35500000e+04, 8.57250668e-06, 3.87296798e-01, 
8.16300000e+04, 8.57250668e-06, 4.43486869e-01, 
0.00000000e+00, 4.26671486e-06, 0.00000000e+00, 
4.65000000e+02, 4.26671486e-06, 2.61407250e-02, 
1.06500000e+03, 4.26671486e-06, 1.22371762e-01, 
1.67000000e+03, 4.26671486e-06, 2.19629475e-01, 
2.94000000e+03, 4.26671486e-06, 3.26680087e-01, 
3.45000000e+03, 4.26671486e-06, 3.34340662e-01, 
6.06000000e+03, 4.26671486e-06, 4.18330575e-01, 
7.80000000e+03, 4.26671486e-06, 4.50631350e-01, 
1.01100000e+04, 4.26671486e-06, 4.55053950e-01, 
1.54050000e+04, 4.26671486e-06, 4.60937587e-01, 
2.33400000e+04, 4.26671486e-06, 5.10770813e-01, 
8.14200000e+04, 4.26671486e-06, 6.12569587e-01, 
0.00000000e+00, 2.13335743e-06, 0.00000000e+00, 
8.55000000e+02, 2.13335743e-06, 1.03773150e-01, 
1.46000000e+03, 2.13335743e-06, 2.21130000e-01, 
2.73000000e+03, 2.13335743e-06, 3.45515625e-01, 
3.24000000e+03, 2.13335743e-06, 3.85634925e-01, 
5.85000000e+03, 2.13335743e-06, 4.76061300e-01, 
7.59000000e+03, 2.13335743e-06, 4.79220300e-01, 
1.51950000e+04, 2.13335743e-06, 5.24709900e-01, 
2.31300000e+04, 2.13335743e-06, 5.64829200e-01, 
8.12100000e+04, 2.13335743e-06, 6.46568325e-01, 
0.00000000e+00, 1.42359023e-06, 0.00000000e+00, 
6.45000000e+02, 1.42359023e-06, 8.03596500e-02, 
1.25000000e+03, 1.42359023e-06, 2.36700000e-01, 
2.52000000e+03, 1.42359023e-06, 4.25941650e-01, 
3.03000000e+03, 1.42359023e-06, 4.61683350e-01, 
5.64000000e+03, 1.42359023e-06, 5.99561100e-01, 
7.38000000e+03, 1.42359023e-06, 6.05952000e-01, 
9.69000000e+03, 1.42359023e-06, 6.16958550e-01, 
1.49850000e+04, 1.42359023e-06, 6.57434250e-01, 
2.29200000e+04, 1.42359023e-06, 6.45954300e-01, 
8.10000000e+04, 1.42359023e-06, 7.79689800e-01, 
0.00000000e+00, 9.36010573e-07, 0.00000000e+00, 
4.35000000e+02, 9.36010573e-07, 3.40200000e-02, 
1.04000000e+03, 9.36010573e-07, 1.91160000e-01, 
2.31000000e+03, 9.36010573e-07, 3.77640000e-01, 
2.82000000e+03, 9.36010573e-07, 4.44240000e-01, 
5.43000000e+03, 9.36010573e-07, 5.50440000e-01, 
7.17000000e+03, 9.36010573e-07, 5.36580000e-01, 
9.48000000e+03, 9.36010573e-07, 5.83740000e-01, 
1.47750000e+04, 9.36010573e-07, 5.87340000e-01, 
2.27100000e+04, 9.36010573e-07, 6.33060000e-01, 
8.07900000e+04, 9.36010573e-07, 7.36200000e-01)) 
A = A.reshape(len(A)/3, 3) 
x= A[:,0] 
y= A[:,1] 
z= A[:,2] 

guess = [3.0e-5, 128 ] 
print guess, 'initial guessed parameters' 

params, pcov = curve_fit(func, A[:,:2], A[:,2], guess) 
print params, 'fitted parameters' 

plt.plot(x,func(A,params[0],params[1]),'-r',x,z,'o') 
plt.title('Plot') 
plt.legend(['Fit', 'Data'], loc='lower right', numpoints=1) 
plt.show() 
0

看來你的數據集A包含所有這些曲線背對背。

相反,您可以每次拆分數據集A[:,0] == 0.00000000e+00。將它分成6個數據集後,可以分別適合每個數據集。

但是,如果我正確理解您的問題,您還希望參數ab對於每個數據集都是相同的,對嗎?

爲了幫助您實現這一目標,我將無恥地插入我的symfit程序包,其中包含curve_fit以使這些問題更容易解決。

symfit,你會做如下::

from symfit import Fit, variables, parameters, log, exp 

datasets = [A_1, A_2, ...] # I'm going to assume this holds the untangled datasets one through six 

xs = variables('x_1, x_2, x_3, x_4, x_5, x_6') 
ys = variables('y_1, y_2, y_3, y_4, y_5, y_6') 
zs = variables('z_1, ...') # same for z 
a, b = parameters('a, b') 

model_dict = { 
    z: a/(y * b) * log(1 + (y * b/a) * (1 - exp(- a * x))) 
     for x, y, z in zip(xs, ys, zs) 
} 

此代碼將創建矢量值模型,這將允許同時適合該方程組(隨着ab相同的實例每一個!)。爲了健康,我們現在可以簡單做到以下幾點:

fit = Fit(model_dict, 
    x_1=datasets[0][:,0], x_2=datasets[1][:,0], ..., 
    y_1=datasets[0][:,1], y_2=datasets[1][:,1], ..., 
    z_1=datasets[0][:,2], z_2=datasets[1][:,2], ... 
) 

我沒有完全寫東西展現出來,但我希望這給你如何完成這一點的想法。更多信息可以在文檔中找到:symfit docs

作爲最後一句話,請注意我使用了符號exp和log,而不是numpy。