SciKit學習 - 高斯樸素貝葉斯Implementingtion

我已經開始使用Scikit學習我試圖訓練和預測高斯樸素貝葉斯分類器。我不知道我做得很好，我想如果有人能幫助我。SciKit學習 - 高斯樸素貝葉斯Implementingtion

問題：我輸入型1項的X數量和我有作爲迴應，他們是類型0

我怎麼沒IT： 爲了生成訓練數據我做這樣的：

#this is of type 1 
    ganado={ 
      "Hora": "16:43:35", 
      "Fecha": "19/06/2015", 
      "Tiempo": 10, 
      "Brazos": "der", 
      "Sentado": "no", 
      "Puntuacion Final Pasteles": 50, 
      "Nombre": "usuario1", 
      "Puntuacion Final Botellas": 33 
     } 
    #this is type 0 
    perdido={ 
      "Hora": "16:43:35", 
      "Fecha": "19/06/2015", 
      "Tiempo": 10, 
      "Brazos": "der", 
      "Sentado": "no", 
      "Puntuacion Final Pasteles": 4, 
      "Nombre": "usuario1", 
      "Puntuacion Final Botellas": 3 
     } 
    train=[] 
    for repeticion in range(0,400): 
     train.append(ganado) 

    for repeticion in range(0,1): 
      train.append(perdido)

我這個微弱condiction標籤中的數據：

listLabel=[] 
for data in train: 
    condition=data["Puntuacion Final Pasteles"]+data["Puntuacion Final Botellas"]  
    if condition<20: 
     listLabel.append(0) 
    else: 
     listLabel.append(1)

我生成這樣的測試數據：

#this should be type 1 
    pruebaGanado={ 
      "Hora": "16:43:35", 
      "Fecha": "19/06/2015", 
      "Tiempo": 10, 
      "Brazos": "der", 
      "Sentado": "no", 
      "Puntuacion Final Pasteles": 10, 
      "Nombre": "usuario1", 
      "Puntuacion Final Botellas": 33 
     } 
    #this should be type 0 
    pruebaPerdido={ 
      "Hora": "16:43:35", 
      "Fecha": "19/06/2015", 
      "Tiempo": 10, 
      "Brazos": "der", 
      "Sentado": "no", 
      "Puntuacion Final Pasteles": 2, 
      "Nombre": "usuario1", 
      "Puntuacion Final Botellas": 3 
     } 
     test=[] 
     for repeticion in range(0,420): 
      test.append(pruebaGanado) 
      test.append(pruebaPerdido)

在那之後，我用train和listLabel訓練分類：

vec = DictVectorizer() 
X=vec.fit_transform(train) 
gnb = GaussianNB() 
trained=gnb.fit(X.toarray(),listLabel)

有一次，我已經訓練我利用數據進行分類測試

testX=vec.fit_transform(test) 
predicted=trained.predict(testX.toarray())

最後結果總是0。你能告訴我我做錯了什麼，請問如何解決？

來源

2015-06-29 Euskalduna

請接受答案，如果它幫助你，所以其他人也可以從中學習... – omerbp

首先，由於你的數據有沒有信息（所有數據相同的值）的功能，我清理了一點：

ganado={ 
    "a": 50, 
    "b": 33 
} 
perdido={ 
     "a": 4, 
     "b": 3 
    } 
pruebaGanado={ 
     "a": 10, 
     "b": 33 
    } 
pruebaPerdido={ 
     "a": 2, 
     "b": 3 
    }

其餘所有並不重要，和清潔您的代碼將幫助你專注於重要的事情。

現在，高斯樸素貝葉斯是所有關於概率：你可能會注意到，分類試圖告訴你的是：

P((a,b)=(10,33)|class=0)*P(class=0) > P((a,b)=(10,33)|class=1)*P(class=1)

因爲它假定兩個a和b有正常的分佈和概率這種情況非常低，你給它的先驗（1,400）是微不足道的。你可以看到公式本身here。順便說一句，你可以得到確切的概率：

t = [pruebaGanado,pruebaPerdido] 
t = vec.fit_transform(t) 
print model.predict_proba(t.toarray()) 
#prints: 
[[ 1. 0.] 
[ 1. 0.]]

所以分類是確保0是正確的類。現在，讓我們改了一下測試數據：

pruebaGanado={ 
    "Puntuacion Final Pasteles": 20, 
    "Puntuacion Final Botellas": 33 
}

現在我們有：

[[ 0. 1.] 
[ 1. 0.]]

所以，你沒有做錯任何事情，它是計算的所有問題。順便說一句，我挑戰你要 MultinomialNB，看看前輩如何改變這一切。

另外，除非你有很好的理由在這裏使用GaussianNB，我會考慮使用某種樹分類，因爲在我看來它可能更適合你的問題。

來源

2015-06-29 17:08:13 omerbp

SciKit學習 - 高斯樸素貝葉斯Implementingtion

回答

相關問題