分類器預測是不可靠的，是因爲我的GMM分類器沒有正確的訓練？

我正在訓練兩個GMM分類器，每個分類器對應一個標籤，並帶有MFCC值。我連接了一個類的所有MFCC值，並將其裝入分類器。對於每個分類器，我總結其標籤概率的概率。分類器預測是不可靠的，是因爲我的GMM分類器沒有正確的訓練？

def createGMMClassifiers(): 
    label_samples = {} 
    for label, sample in training.iteritems(): 
     labelstack = np.empty((50,13)) 
     for feature in sample: 
      #debugger.set_trace() 
      labelstack = np.concatenate((labelstack,feature)) 
     label_samples[label]=labelstack 
    for label in label_samples: 
     #debugger.set_trace() 
     classifiers[label] = mixture.GMM(n_components = n_classes) 
     classifiers[label].fit(label_samples[label]) 
    for sample in testing['happy']: 
     classify(sample) 
def classify(testMFCC): 
    probability = {'happy':0,'sad':0} 
    for name, classifier in classifiers.iteritems(): 
     prediction = classifier.predict_proba(testMFCC) 
     for probforlabel in prediction: 
      probability[name]+=probforlabel[0] 
    print 'happy ',probability['happy'],'sad ',probability['sad'] 

    if(probability['happy']>probability['sad']): 
     print 'happy' 
    else: 
     print 'sad'

但我的成績似乎並不一致，我覺得很難相信這是因爲RandomSeed =無狀態的，因爲所有的預測往往是所有測試數據相同的標籤，但每次運行它通常會給出確切的對立面（見輸出1和輸出2）。

所以我的問題是，我該做的事情顯然是錯誤的，而訓練我的分類？

輸出1：

happy 123.559202732 sad 122.409167294 
happy 

happy 120.000879032 sad 119.883786657 
happy 

happy 124.000069307 sad 123.999928962 
happy 

happy 118.874574047 sad 118.920941127 
sad 

happy 117.441353421 sad 122.71924156 
sad 

happy 122.210579428 sad 121.997571901 
happy 

happy 120.981752603 sad 120.325940128 
happy 

happy 126.013713257 sad 125.885047394 
happy 

happy 122.776016525 sad 122.12320875 
happy 

happy 115.064172476 sad 114.999513909 
happy

輸出2：

happy 123.559202732 sad 122.409167294 
happy 

happy 120.000879032 sad 119.883786657 
happy 

happy 124.000069307 sad 123.999928962 
happy 

happy 118.874574047 sad 118.920941127 
sad 

happy 117.441353421 sad 122.71924156 
sad 

happy 122.210579428 sad 121.997571901 
happy 

happy 120.981752603 sad 120.325940128 
happy 

happy 126.013713257 sad 125.885047394 
happy 

happy 122.776016525 sad 122.12320875 
happy 

happy 115.064172476 sad 114.999513909 
happy

早些時候，我問了一個相關的問題，並得到了正確的答案。我正在提供下面的鏈接。

Having different results every run with GMM Classifier

編輯：增補其收集的數據，並分成訓練主要功能和測試

def main(): 
    happyDir = dir+'happy/' 
    sadDir = dir+'sad/' 
    training["sad"]=[] 
    training["happy"]=[] 
    testing["happy"]=[] 
    #TestSet 
    for wavFile in os.listdir(happyDir)[::-1][:10]: 
     #print wavFile 
     fullPath = happyDir+wavFile 
     testing["happy"].append(sf.getFeatures(fullPath)) 
    #TrainSet 
    for wavFile in os.listdir(happyDir)[::-1][10:]: 
     #print wavFile 
     fullPath = happyDir+wavFile 
     training["happy"].append(sf.getFeatures(fullPath)) 
    for wavFile in os.listdir(sadDir)[::-1][10:]: 
     fullPath = sadDir+wavFile 
     training["sad"].append(sf.getFeatures(fullPath)) 
    #Ensure the number of files in set 
    print "Test(Happy): ", len(testing['happy']) 
    print "Train(Happy): ", len(training['happy']) 
    createGMMClassifiers()

編輯2：根據答案改變了代碼。仍然有類似的不一致的結果。

來源

2016-06-29 Ugur

對於分類的任務是給定的分類調整參數很重要，也有大量的分類算法後續的選擇理論，如果你改變了模型的一些參數簡要女巫意味着，你可能會得到一些巨大的不同的結果。對於這個問題，您可以嘗試使用不同的分類算法來測試您的數據是否良好，併爲每個分類器嘗試使用不同值的不同參數，那麼你可以確定問題在哪裏。

一種替代方法是使用網格搜索的探索和調整特定分類的最佳參數，請閱讀本：http://scikit-learn.org/stable/modules/grid_search.html

來源

2016-06-30 07:36:32 Masoud

非常感謝您的建議！在我得到有意義的結果這個特定的分類算法後，我會盡快檢查鏈接。連接空字典值的 – Ugur

您的代碼並沒有太大的意義，重新創建爲每一個新的訓練樣本的分類。培訓

正確的編碼方案應該是這個：

label_samples = {} 
classifiers = {} 

# First we collect all samples per label into array of samples 
for label, sample in samples: 
    label_samples[label].concatenate(sample) 

# Then we train classifier on every label data 
for label in label_samples: 
    classifiers[label] = mixture.GMM(n_components = n_classes) 
    classifiers[label].fit(label_samples[label])

你的解碼碼就可以了。

來源

2016-06-30 13:51:35

會產生錯誤。將利用label_samples [標號] = np.concatenate（樣品，label_samples [標號]）實現相同的結果？ – Ugur

使用更新的代碼編輯該問題，仍然遇到類似的不一致結果。 – Ugur

分類器預測是不可靠的，是因爲我的GMM分類器沒有正確的訓練？

回答

相關問題