我正在訓練兩個GMM分類器,每個分類器對應一個標籤,並帶有MFCC值。 我連接了一個類的所有MFCC值,並將其裝入分類器。 對於每個分類器,我總結其標籤概率的概率。分類器預測是不可靠的,是因爲我的GMM分類器沒有正確的訓練?
def createGMMClassifiers():
label_samples = {}
for label, sample in training.iteritems():
labelstack = np.empty((50,13))
for feature in sample:
#debugger.set_trace()
labelstack = np.concatenate((labelstack,feature))
label_samples[label]=labelstack
for label in label_samples:
#debugger.set_trace()
classifiers[label] = mixture.GMM(n_components = n_classes)
classifiers[label].fit(label_samples[label])
for sample in testing['happy']:
classify(sample)
def classify(testMFCC):
probability = {'happy':0,'sad':0}
for name, classifier in classifiers.iteritems():
prediction = classifier.predict_proba(testMFCC)
for probforlabel in prediction:
probability[name]+=probforlabel[0]
print 'happy ',probability['happy'],'sad ',probability['sad']
if(probability['happy']>probability['sad']):
print 'happy'
else:
print 'sad'
但我的成績似乎並不一致,我覺得很難相信這是因爲RandomSeed =無狀態的,因爲所有的預測往往是所有測試數據相同的標籤,但每次運行它通常會給出確切的對立面(見輸出1和輸出2)。
所以我的問題是,我該做的事情顯然是錯誤的,而訓練我的分類?
輸出1:
happy 123.559202732 sad 122.409167294
happy
happy 120.000879032 sad 119.883786657
happy
happy 124.000069307 sad 123.999928962
happy
happy 118.874574047 sad 118.920941127
sad
happy 117.441353421 sad 122.71924156
sad
happy 122.210579428 sad 121.997571901
happy
happy 120.981752603 sad 120.325940128
happy
happy 126.013713257 sad 125.885047394
happy
happy 122.776016525 sad 122.12320875
happy
happy 115.064172476 sad 114.999513909
happy
輸出2:
happy 123.559202732 sad 122.409167294
happy
happy 120.000879032 sad 119.883786657
happy
happy 124.000069307 sad 123.999928962
happy
happy 118.874574047 sad 118.920941127
sad
happy 117.441353421 sad 122.71924156
sad
happy 122.210579428 sad 121.997571901
happy
happy 120.981752603 sad 120.325940128
happy
happy 126.013713257 sad 125.885047394
happy
happy 122.776016525 sad 122.12320875
happy
happy 115.064172476 sad 114.999513909
happy
早些時候,我問了一個相關的問題,並得到了正確的答案。我正在提供下面的鏈接。
Having different results every run with GMM Classifier
編輯: 增補其收集的數據,並分成訓練主要功能和測試
def main():
happyDir = dir+'happy/'
sadDir = dir+'sad/'
training["sad"]=[]
training["happy"]=[]
testing["happy"]=[]
#TestSet
for wavFile in os.listdir(happyDir)[::-1][:10]:
#print wavFile
fullPath = happyDir+wavFile
testing["happy"].append(sf.getFeatures(fullPath))
#TrainSet
for wavFile in os.listdir(happyDir)[::-1][10:]:
#print wavFile
fullPath = happyDir+wavFile
training["happy"].append(sf.getFeatures(fullPath))
for wavFile in os.listdir(sadDir)[::-1][10:]:
fullPath = sadDir+wavFile
training["sad"].append(sf.getFeatures(fullPath))
#Ensure the number of files in set
print "Test(Happy): ", len(testing['happy'])
print "Train(Happy): ", len(training['happy'])
createGMMClassifiers()
編輯2: 根據答案改變了代碼。仍然有類似的不一致的結果。
非常感謝您的建議!在我得到有意義的結果這個特定的分類算法後,我會盡快檢查鏈接。連接空字典值的 – Ugur