如何合併NLTK中的NaiveBayesClassifier對象

我正在使用NLTK工具包進行項目工作。使用我擁有的硬件，我可以在小數據集上運行分類器對象。因此，我將數據分成更小的塊並在其中運行分類器對象，同時將所有這些單獨的對象存儲在pickle文件中。如何合併NLTK中的NaiveBayesClassifier對象

現在爲了測試我需要將整個對象作爲一個來獲得更好的結果。所以我的問題是如何將這些對象合併爲一個。

objs = [] 

while True: 
    try: 
     f = open(picklename,"rb") 
     objs.extend(pickle.load(f)) 
     f.close() 
    except EOFError: 
     break

這樣做不起作用。它給出了錯誤TypeError: 'NaiveBayesClassifier' object is not iterable。

NaiveBayesClassifier代碼：

classifier = nltk.NaiveBayesClassifier.train(training_set)

來源

2016-04-28 Arkham

'NaiveBayesClassifier'的代碼是怎麼樣的？ – Omid

@Omid它是一個工具包。我編輯了我的問題，顯示分類器。 – Arkham

我不知道你的數據的確切格式，但你不能簡單地合併不同的分類。樸素貝葉斯分類器根據訓練數據存儲概率分佈，並且無法訪問原始數據就無法合併概率分佈。

如果你看看源代碼在這裏：http://www.nltk.org/_modules/nltk/classify/naivebayes.html 分類存儲的實例：

self._label_probdist = label_probdist 
self._feature_probdist = feature_probdist

這些在使用相對頻率計數火車法計算。（例如，P（L_1）=（訓練集中的L1的數量）/（訓練集中的＃個標籤））要組合這兩者，你需要得到（列車1 +列車2中的L1的數量）/在T1 + T2中）

然而，樸素的貝葉斯程序並不是很難從零開始實現，特別是如果您按照上面鏈接中的「火車」源代碼進行操作，下面是一個大綱，使用NaiveBayes源代碼代碼

存儲 'FreqDist' 爲標籤和特徵的數據的每個子集對象。

label_freqdist = FreqDist() 
feature_freqdist = defaultdict(FreqDist) 
feature_values = defaultdict(set) 
fnames = set() 

# Count up how many times each feature value occurred, given 
# the label and featurename. 
for featureset, label in labeled_featuresets: 
    label_freqdist[label] += 1 
    for fname, fval in featureset.items(): 
     # Increment freq(fval|label, fname) 
     feature_freqdist[label, fname][fval] += 1 
     # Record that fname can take the value fval. 
     feature_values[fname].add(fval) 
     # Keep a list of all feature names. 
     fnames.add(fname) 

# If a feature didn't have a value given for an instance, then 
# we assume that it gets the implicit value 'None.' This loop 
# counts up the number of 'missing' feature values for each 
# (label,fname) pair, and increments the count of the fval 
# 'None' by that amount. 
for label in label_freqdist: 
    num_samples = label_freqdist[label] 
    for fname in fnames: 
     count = feature_freqdist[label, fname].N() 
     # Only add a None key when necessary, i.e. if there are 
     # any samples with feature 'fname' missing. 
     if num_samples - count > 0: 
      feature_freqdist[label, fname][None] += num_samples - count 
      feature_values[fname].add(None) 
# Use pickle to store label_freqdist, feature_freqdist,feature_values

結合那些使用他們內置的「添加」方法。這將允許您獲取所有數據的相對頻率。

all_label_freqdist = FreqDist() 
all_feature_freqdist = defaultdict(FreqDist) 
all_feature_values = defaultdict(set) 

for file in train_labels: 
    f = open(file,"rb") 
    all_label_freqdist += pickle.load(f) 
    f.close() 

# Combine the default dicts for features similarly

使用「估計量」來創建概率分佈。

estimator = ELEProbDist() 

label_probdist = estimator(all_label_freqdist) 

# Create the P(fval|label, fname) distribution 
feature_probdist = {} 
for ((label, fname), freqdist) in all_feature_freqdist.items(): 
    probdist = estimator(freqdist, bins=len(all_feature_values[fname])) 
    feature_probdist[label, fname] = probdist 

classifier = NaiveBayesClassifier(label_probdist, feature_probdist)

的分類不會在所有數據相結合的計數，併產生你所需要的。

來源

2016-05-02 19:43:41 akornilo

如何合併NLTK中的NaiveBayesClassifier對象

回答

相關問題