我使用Python的gensim庫來執行潛在的語義索引。我遵循網站上的教程,它運行得非常好。現在我試圖修改它一下;每次添加文檔時,我都想運行lsi模型。在Python中使用gensim的LSI
這裏是我的代碼:
stoplist = set('for a of the and to in'.split())
num_factors=3
corpus = []
for i in range(len(urls)):
print "Importing", urls[i]
doc = getwords(urls[i])
cleandoc = [word for word in doc.lower().split() if word not in stoplist]
if i == 0:
dictionary = corpora.Dictionary([cleandoc])
else:
dictionary.addDocuments([cleandoc])
newVec = dictionary.doc2bow(cleandoc)
corpus.append(newVec)
tfidf = models.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
lsi = models.LsiModel(corpus_tfidf, numTopics=num_factors, id2word=dictionary)
corpus_lsi = lsi[corpus_tfidf]
geturls功能是我寫的,它返回一個網站作爲一個字符串的內容。再說一遍,如果我等到我在做tfidf和lsi之前處理所有的文件,它就會起作用,但那不是我想要的。我想在每次迭代時都這樣做。不幸的是,我得到這個錯誤:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "streamlsa.py", line 51, in <module>
lsi = models.LsiModel(corpus_tfidf, numTopics=num_factors, id2word=dictionary)
File "/Library/Python/2.6/site-packages/gensim-0.7.8-py2.6.egg/gensim/models/lsimodel.py", line 303, in __init__
self.addDocuments(corpus)
File "/Library/Python/2.6/site-packages/gensim-0.7.8-py2.6.egg/gensim/models/lsimodel.py", line 365, in addDocuments
self.printTopics(5) # TODO see if printDebug works and remove one of these..
File "/Library/Python/2.6/site-packages/gensim-0.7.8-py2.6.egg/gensim/models/lsimodel.py", line 441, in printTopics
self.printTopic(i, topN = numWords)))
File "/Library/Python/2.6/site-packages/gensim-0.7.8-py2.6.egg/gensim/models/lsimodel.py", line 433, in printTopic
return ' + '.join(['%.3f*"%s"' % (1.0 * c[val]/norm, self.id2word[val]) for val in most])
File "/Library/Python/2.6/site-packages/gensim-0.7.8-py2.6.egg/gensim/corpora/dictionary.py", line 52, in __getitem__
return self.id2token[tokenid] # will throw for non-existent ids
KeyError: 1248
通常在第二個文檔上彈出錯誤。我想我明白它告訴我的是什麼(字典索引是壞的),我只是不明白爲什麼。我嘗試了很多不同的東西,似乎沒有任何工作。有誰知道發生了什麼事?
謝謝!
您好傑夫,我使用gensim創建自己的LSA模型。首先我想問的是LSA型號與LSI型號相同嗎?其次,我試圖使用gensim包,但我不理解如何繼續。我隨機運行test_similarities.py和lsimodel.py。但我沒有看到lsimodel的任何輸出。 – Jana 2017-01-06 13:33:27
是的,LSA和LSI一樣。對不起,我沒有多少幫助,我幾年沒有碰到gensim,看起來有一些關於Google id start的教程。 – Jeff 2017-01-06 15:07:31