我目前使用樸素貝葉斯來分類一堆文本。我有多個類別。現在我只輸出後驗概率和類別,但我想要做的是根據後驗概率對類別進行排序,並使用第二,第三類別作爲「備份」類別。使用具有NLTK的樸素貝葉斯將文本字符串分類爲多個類
下面是一個例子:
df = pandas.DataFrame({ 'text' : pandas.Categorical(["I have wings","Metal wings","Feathers","Airport"]), 'true_cat' : pandas.Categorical(["bird","plane","bird","plane"])})
text true_cat
-----------------------
I have wings bird
Metal wings plane
Feathers bird
Airport plane
我在做什麼:
new_cat = classifier.classify(features(text))
prob_cat = classifier.prob_classify(features(text))
- 最終輸出:
new_cat prob_cat text true_cat
bird 0.67 I have wings bird
bird 0.6 Feathers bird
bird 0.51 Metal wings plane
plane 0.8 Airport plane
我已經找到了幾個例子使用classify_many和prob_classify_many但由於我是新來的Python我有麻煩翻譯它到我的問題。我沒有看到它在任何地方都與熊貓一起使用。
我希望它看起來像這樣:
df_new = pandas.DataFrame({'text': pandas.Categorical(["I have wings","Metal wings","Feathers","Airport"]),'true_cat': pandas.Categorical(["bird","plane","bird","plane"]), 'new_cat1': pandas.Categorical(["bird","bird","bird","plane"]), 'new_cat2': pandas.Categorical(["plane","plane","plane","bird"]), 'prob_cat1': pandas.Categorical(["0.67","0.51","0.6","0.8"]), 'prob_cat2': pandas.Categorical(["0.33","0.49","0.4","0.2"])})
new_cat1 new_cat2 prob_cat1 prob_cat2 text true_cat
-----------------------------------------------------------------------
bird plane 0.67 0.33 I have wings bird
bird plane 0.51 0.49 Metal wings plane
bird plane 0.6 0.4 Feathers bird
plane bird 0.8 0.2 Airport plane
任何幫助,將不勝感激。
完美,謝謝! –