我想爲推文創建一個小型搜索引擎。我有一個包含20000個推文的txt文件。文件格式是這樣的:在Lucene中索引txt文件
TommyFrench1
851
85170333395811123
Lurgan, Moira, Armagh. Derry
This week we are double delight on first goalscorers on the four Champions League matches in shop. ChampionsLeagueIm_Aarkay
175
851703414300037122
Paris
@ChampionsLeague @AS_Monaco @AS_Monaco_EN Nopes, it's when City knocked outta Champions league. .
.
etc
第一行是username
,其次我有followers
,其次是id
和location
和最後一個是text(tweet)
。
我認爲每條推文都是一個文檔。所以我必須有20000個文件,每個文件必須有5個字段(用戶名,追隨者,ID等)。
我該如何編制索引?
我已經看到了一些教程,但我並沒有發現類似
編輯的東西:這是我的代碼。
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;
public class MyProgram {
public static void main(String[] args) throws IOException, ParseException {
FileReader fileReader = new FileReader(new File("myfile.txt"));
BufferedReader br = new BufferedReader(fileReader);
String line = null;
String indexPath = "C:\\Desktop\\myfolder";
Directory dir = FSDirectory.open(Paths.get(indexPath));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
IndexWriter writer = new IndexWriter(dir, iwc);
while ((line = br.readLine()) != null) {
// reading lines until the end of the file
Document doc = new Document();
String username = br.readLine();
doc.add(new Field("username", username, Field.Store.YES, Field.Index.ANALYZED)); // adding title field
String followers = br.readLine();
doc.add(new Field("followers", followers, Field.Store.YES, Field.Index.ANALYZED));
String id = br.readLine();
doc.add(new Field("id", id, Field.Store.YES, Field.Index.ANALYZED));
String location = br.readLine();
doc.add(new Field("location", location, Field.Store.YES, Field.Index.ANALYZED));
String text = br.readLine();
doc.add(new Field("text", text, Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(doc); // writing new document to the index
br.readLine();
}
}
}
即時得到以下錯誤: Index cannot be resolved or is not a field
。
我該如何解決這個問題?
你說的「索引」的意思是,你要達到這個是什麼? –
我有一個項目爲20000條推文創建一個小型搜索機器。索引過程是Lucene提供的核心功能之一。我必須閱讀txt文件,並且每條推文都必須是文檔。然後,每個文檔必須有域用戶名,ID,位置等我有關於熱它的工作原理,但即時通訊初學者在Lucene和我不能找到類似這樣的東西 –
你有沒有看這個問題的想法:http://stackoverflow.com /問題/ 4091441 /怎麼辦-I-索引和搜索文本文件功能於Lucene的-3-0-2?RQ = 1 –