2016-04-30 33 views
0

以下代碼給出了以下錯誤:ValueError:找到包含0個樣本(形狀=(0,3))的數組,而最小值爲1是必需的。在Scikit中,如何在預測時修復數值錯誤?

該錯誤在調用預測的地方產生。我假設數據框的形狀有些問題,'obs_to_pred'。我檢查了形狀,這是(1046,3)。

你有什麼建議,所以我可以解決這個問題並運行預測?

import matplotlib.pyplot as plt 
import numpy as np 
import pandas as pd 
import statsmodels.api as sm 

from patsy import dmatrices 
from sklearn.linear_model import LogisticRegression 
import scipy.stats as stats 
from sklearn import linear_model 

# Import Titanic Data 
train_loc = 'C:/Users/Young/Desktop/Kaggle/Titanic/train.csv' 
test_loc = 'C:/Users/Young/Desktop/Kaggle/Titanic/test.csv' 
train = pd.read_csv(train_loc) 
test = pd.read_csv(test_loc) 

# Predict Missing Age Values Based on Factors Pclass, SibSp, and Parch. 
# In the function, combine train and test data. 
def regressionPred (traindata,testdata): 

    allobs = pd.concat([traindata, testdata]) 
    allobs = allobs[~allobs.Age.isnull()] 
    y = allobs.Age 

    y, X = dmatrices('y ~ Pclass + SibSp + Parch', data = allobs, return_type = 'dataframe') 
    mod = sm.OLS(y,X) 
    res = mod.fit() 

    predictors = ['Pclass', 'SibSp', 'Parch'] 
    regr = linear_model.LinearRegression() 
    regr.fit(allobs.ix[:,predictors], y) 

    obs_to_pred = allobs[allobs.Age.isnull()].ix[:,predictors] 
    prediction = regr.predict(obs_to_pred) # Error Produced in This Line *** 

    return res.summary(), prediction 

regressionPred(train,test) 

萬一你可能想看看數據集,鏈接將帶你去:https://www.kaggle.com/c/titanic/data

回答

0

在行

allobs = allobs[~allobs.Age.isnull()] 

定義allobs所有沒有NaN案件在Age列。

後來,隨着:

obs_to_pred = allobs[allobs.Age.isnull()].ix[:,predictors] 

你沒有任何情況下,預測在所有allobs.Age.isnull()將被評估爲False,你會得到一個空obs_to_pred。因此你的錯誤:

array with 0 sample(s) (shape=(0, 3)) while a minimum of 1 is required.

檢查邏輯你想要什麼與你的預測。

相關問題