2017-06-12 99 views
0

我正在構建一個簡單的分類器,它可以確定句子是否是肯定的。這是我如何使用textblob訓練分類器。Textblob邏輯幫助。 NaiveBayesClassifier

train = [ 
    'i love your website', 'pos', 
    'i really like your site', 'pos', 
    'i dont like your website', 'neg', 
    'i dislike your site', 'neg 
] 

cl.NaiveBayesClassifier(train) 

#im clasifying text from twitter using tweepy and it goes like this and 
stored into the databse and using the django to save me doing all the hassle 
of the backend 

class StdOutListener(StreamListener) 
def __init__(self) 
    self.raw_tweets = [] 
    self.raw_teets.append(jsin.loads(data) 
def on_data(self, data): 
    tweets = Htweets() # connection to the database 
    for x in self.raw_data: 
     tweets.tweet_text = x['text'] 

     cl.classify(x['text']) 

     if classify(x['text]) == 'pos' 
      tweets.verdict = 'pos' 
     elif classify(x['text]) == 'neg': 
      tweets.verdict = 'neg' 
     else: 
      tweets.verdict = 'normal' 

的邏輯似乎很簡單,但是當我訓練的分類哪一個是正還是負,應該與鳴叫到數據庫一起保存判決。

但這似乎並不是這樣,我一直在許多方面改變了邏輯,仍然沒有成功。問題是,如果推文是肯定的或否定的,則算法確實認識到它們是。

但是我希望它可以保存'正常',如果他們不是,它不這樣做。我認識到分類器只識別正面或負面的兩件事,但它肯定也應該確定一個文本是否不屬於這個範疇。

使用textblob時,這是如何實現的。示例替代邏輯和建議將非常感謝。

+0

通常的方式來實現,將要創建一個三等:中性,結合實例。 –

+0

我不認爲textblob接受第三類它給出了太多的值解壓錯誤 – johnobc

+1

然後你可以創建兩個二元分類器,一個負與中性,另一個pos與中性。中性可能意味着「沒有情緒表達」或「平衡情緒」(儘可能多的負面情緒)。因此,有可能相同的實例被它們各自的分類器分類爲正面和負面(由你決定是否爲中性或者第四類,均衡) –

回答

1

分類總會給你一個最大概率的答案,所以你應該使用prob_classify方法來獲得你的分類標籤的概率分佈。在觀察概率分佈並設置適當的置信度閾值時,您將開始通過良好的訓練集進行「中性」分類。 示例以最小的訓練集,以反映的概念,實際使用情況,您應使用一個大訓練集:

>>> train 
[('I love this sandwich.', 'pos'), ('this is an amazing place!', 'pos'), ('I feel very good about these beers.', 'pos'), ('this is my best work.', 'pos'), ('what an awesome view', 'pos'), ('I do not like this restaurant', 'neg'), ('I am tired of this stuff.', 'neg'), ("I can't deal with this", 'neg'), ('he is my sworn enemy!', 'neg'), ('my boss is horrible.', 'neg')] 
>>> from pprint import pprint 
>>> pprint(train) 
[('I love this sandwich.', 'pos'), 
('this is an amazing place!', 'pos'), 
('I feel very good about these beers.', 'pos'), 
('this is my best work.', 'pos'), 
('what an awesome view', 'pos'), 
('I do not like this restaurant', 'neg'), 
('I am tired of this stuff.', 'neg'), 
("I can't deal with this", 'neg'), 
('he is my sworn enemy!', 'neg'), 
('my boss is horrible.', 'neg')] 
>>> train2 = [('science is a subject','neu'),('this is horrible food','neg'),('glass has water','neu')] 
>>> train = train+train2 
>>> from textblob.classifiers import NaiveBayesClassifier 
>>> cl = NaiveBayesClassifier(train) 
>>> prob_dist = cl.prob_classify("I had a horrible day,I am tired") 
>>> (prob_dist.prob('pos'),prob_dist.prob('neg'),prob_dist.prob('neu')) 
(0.01085221171283812, 0.9746799258978173, 0.014467862389343378) 
>>> 
>>> prob_dist = cl.prob_classify("This is a subject") 
>>> (prob_dist.prob('pos'),prob_dist.prob('neg'),prob_dist.prob('neu')) 
(0.10789848368588585, 0.14908905046805337, 0.7430124658460614)