1
每嗖文檔here,給StemmingAnalyzer無限制的高速緩存使得一批索引的速度更快:文件不被索引
writer = myindex.writer()
# Get the analyzer object from a text field
stem_ana = writer.schema["content"].format.analyzer
# Set the cachesize to -1 to indicate unbounded caching
stem_ana.cachesize = -1
# Reset the analyzer to pick up the changed attribute
stem_ana.clear()
# Use the writer to index documents...
唯一的問題是,文件沒有被這樣做之後索引: 這裏是我的架構:
schema = Schema(
title=TEXT(stored=True, analyzer=StemmingAnalyzer(), field_boost=2.0),
content=TEXT(stored=True, analyzer=StemmingAnalyzer()),
owner=NUMERIC(stored=True),
id=ID(stored=True, unique=True),
date=DATETIME(stored=True, sortable=True),
author=TEXT(stored=True),
system=TEXT(stored=True),
url=TEXT(stored=True),
type=TEXT(stored=True),
service=TEXT(stored=True),
last_updated=fields.DATETIME)
我怎麼指數(從XML):
docs = xmlObj.findall('document')
for d in docs:
...
writer.update_document(...)
writer.commit()
後,我改變了詞幹緩存,什麼也不顯示當我這樣做:
for doc in ix.reader().iter_docs():
#doc should be a tuple of (docnum, document)
print "docnum: {}".format(doc[0])
請詳細說明,它是如何索引?顯示的錯誤? 0文件?你不能查詢他們與查詢? –
我編輯的問題,我得到0文件 – Hakim