Gensim在句子中查找主題

我已經在一個語料庫上訓練了一個LDA算法，我想要做的是獲取每個句子所對應的主題，以便在算法找到的內容之間進行比較和我擁有的標籤。Gensim在句子中查找主題

我試圖與下面的代碼，但結果很糟糕，我覺得題目大量17（也許是體積的25％，應該是接近5％）

感謝您的幫助

# text lemmatized: list of string lemmatized 
dico = Dictionary(texts_lemmatized) 
corpus_lda = [dico.doc2bow(text) for text in texts_lemmatized] 

lda_ = LdaModel(corpus_lda, num_topics=18) 

df_ = pd.DataFrame([]) 
data = [] 

# theme_commentaire = label of the string 
for i in range(0, len(theme_commentaire)): 
    # lda_.get_document_topics() gives the distribution of all topic for a specific sentence 
    algo = max(lda_.get_document_topics(corpus_lda[i]))[0] 
    human = theme_commentaire[i] 
    data.append([str(algo), human]) 

cols = ['algo', 'human'] 
df_ = pd.DataFrame(data, columns=cols) 
df_.head()

來源

2017-05-10 glouis

閱讀此相關的SO問題：http://stackoverflow.com/q/42269313/7414759 – stovfl

它不是真正相關的我的問題是關於LDA不TFIDF。我發現我的問題，但它是max（）函數，它操作我的元組列表的鍵值[（num_topics，probability）]，所以基本上我大部分時間都是17，因爲它是最大的關鍵。 – glouis

解決的評論：

我發現我的問題，但，這是MAX（）函數，它在我的元組名單的鍵值進行操作[（NUM_TOPICS，概率） ]所以基本上我是我大部分時間得到17分，因爲這是最大的關鍵。 - glouis

來源

2017-05-11 07:37:11 stovfl

Gensim在句子中查找主題

回答

相關問題