2016-10-31 37 views
1

我只是嘗試在Azure ML中使用SGD在Python中訓練一個簡單的邏輯迴歸模型,但是當我運行代碼時,它不斷收到錯誤。更令人困惑的是,這個錯誤只出現在Epoch 8中,而不是在任何時代。我會欣賞,如果有人,並可以讓我知道爲什麼我會得到這樣的錯誤,以及如何避免它。我在下面包含了代碼和錯誤。scikit-learn類型'NoneType'的SGDClassifier對象沒有len()

from sklearn.linear_model import SGDClassifier 
    #Import data 
    cadd_dir = '.\\Script Bundle\\theano\\data\\' 
    ClinVar_ESP_dir = '.\\Script Bundle\\theano\\data\\' 
    #load data  
    X_tr = numpy.load(os.path.join(cadd_dir, 'training.X.npz')) 
    X_tr = scipy.sparse.csr_matrix((X_tr['data'], X_tr['indices'], X_tr['indptr']), shape=X_tr['shape']) 
    y_tr = numpy.load(os.path.join(cadd_dir, 'training.y.npy')) 
    #Train model 
    print('Train SGD Logistic Regression') 
    alpha = 1e-2 
    clf = SGDClassifier(loss="log", penalty='l2', alpha=alpha, random_state=None, shuffle=False, n_iter=10, verbose=1, n_jobs=1) 
    clf.fit(X_tr, y_tr) 




#Error 
"[Information]   -- Epoch 7 
[Information]   Norm: 0.40, NNZs: 641, Bias: 0.000623, T: 186214000, Avg. loss: 0.670200 
[Information]   Total training time: 43.97 seconds. 

[Information]   -- Epoch 8 
[Error]   Caught exception while executing function: Traceback (most recent call last): 
[Error]   File "C:\server\invokepy.py", line 211, in batch 
[Error]    xdrutils.XDRUtils.DataFrameToRFile(outlist[i], outfiles[i], True) 
[Error]   File "C:\server\XDRReader\xdrutils.py", line 51, in DataFrameToRFile 
[Error]    attributes = XDRBridge.DataFrameToRObject(dataframe) 
[Error]   File "C:\server\XDRReader\xdrbridge.py", line 40, in DataFrameToRObject 
[Error]    if (len(dataframe) == 1 and type(dataframe[0]) is pd.DataFrame): 
[Error]   TypeError: object of type 'NoneType' has no len() 
[Information]   Norm: 0.40, NNZs: 641, Bias: 0.000623, T: 212816000, Avg. loss: 0.669797 
[Information]   Total training time: 50.21 seconds. 

[Information]   -- Epoch 9 
[Information]   Norm: 0.40, NNZs: 641, Bias: 0.000622, T: 239418000, Avg. loss: 0.669482 
[Information]   Total training time: 56.46 seconds." 

回答

0

您的程序既創造常規輸出(「[信息]」)和錯誤消息(「[錯誤]」),但Azure的ML工作室只內置了顯示一個輸出日誌,所以這兩種類型的消息已被寫入相同的文件。除了混淆之外,每種類型的消息在寫入日誌時都經歷了不同的延遲。這就解釋了爲什麼你的錯誤信息被包含在關於培訓的第8個紀元的信息中,儘管這兩個信息是不相關的。

錯誤是由模塊完成運行時應該導出Python腳本結果的函數拋出的。要在Azure ML中正確運行,您的代碼需要封裝在一個名爲azureml_main()的函數中(作爲新的執行Python腳本模塊中的模板)。 azureml_main()需要返回pandasDataFrame。我不知道你想返回,因爲你的代碼只是適合的模型是什麼,但下面可能會有幫助:

from sklearn.linear_model import SGDClassifier 
import numpy 
import scipy 
import pandas as pd 

cadd_dir = '.\\Script Bundle\\theano\\data\\' 
ClinVar_ESP_dir = '.\\Script Bundle\\theano\\data\\' 

def azureml_main(input_df1 = None, input_df2 = None): 
    #load data  
    X_tr = numpy.load(os.path.join(cadd_dir, 'training.X.npz')) 
    X_tr = scipy.sparse.csr_matrix((X_tr['data'], X_tr['indices'], X_tr['indptr']), shape=X_tr['shape']) 
    y_tr = numpy.load(os.path.join(cadd_dir, 'training.y.npy')) 
    #Train model 
    print('Train SGD Logistic Regression') 
    alpha = 1e-2 
    clf = SGDClassifier(loss="log", penalty='l2', alpha=alpha, random_state=None, shuffle=False, n_iter=10, verbose=1, n_jobs=1) 
    clf.fit(X_tr, y_tr) 
    return(pd.DataFrame([])) 

附:你是否與MSR的Jenn/Nicolo一起合作了變異效應預測?如果沒有,你應該ping他們

相關問題