1
我有1000個文件的列表(每年增長兩倍),僅包含文本和每個文件的大小〜8Mb,我試圖找到文件名(s) (通配符)表達式。使用通配符進行lucene搜索的速度很慢
實施例中,所有文件都包含這樣的數據
COD1004129641208240002709991455671866 4IT /福林4400QQQUF 3300QQQUF
和我的搜索可能是: 「* 9991455671866」,其具有匹配於上述的行。
問題是(也可能是我的期望太高)需要一分多鐘才能返回結果。
我的文檔索引是這樣的:
private Document getDocument(File file) throws IOException
{
FileReader reader = new FileReader(file);
Document doc = new Document();
doc.add(new Field(IndexProperties.FIELD_FILENAME, file.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field(IndexProperties.FIELD_CONTENT, reader));
return doc;
}
分析儀
Directory fsDir = FSDirectory.open(new File(indexFolder));
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
// build the writer
IndexWriterConfig indexWriter = new IndexWriterConfig(Version.LUCENE_36, analyzer);
IndexWriter writer = new IndexWriter(fsDir, indexWriter);
和通配符搜索是:
public List<String> findFilenameByContent(String wildCardContent, String INDEX_FOLDER, String TICKETS_FOLDER) throws Exception
{
long start = System.currentTimeMillis();
Term term = new Term(IndexProperties.FIELD_CONTENT, wildCardContent); //eg *9991455671866
Query query = new WildcardQuery(term);
//loop through docs
Directory fsDir = FSDirectory.open(new File(INDEX_FOLDER));
IndexSearcher searcher = new IndexSearcher(IndexReader.open(fsDir));
ScoreDoc[] queryResults = searcher.search(query, 10).scoreDocs;
List<String> strs = new ArrayList<String>();
for (ScoreDoc scoreDoc : queryResults)
{
Document doc = searcher.doc(scoreDoc.doc);
strs.add(doc.get(IndexProperties.FIELD_FILENAME));
}
searcher.close();
long end = System.currentTimeMillis();
System.out.println("TOTAL SEARCH TIME: "+(end-start)/1000.0+ "secs");
return strs;
}
謝謝@ fer13488;您的建議從3.6棄用。 – adhg