2014-04-22 107 views
0

我正在使用Apache Lucene庫爲我的網站創建搜索功能。該網站正在從Sharepoint RSSFeeds獲取所有內容,因此每次必須瀏覽所有RSSFeed網址並閱讀內容。以使得搜索功能,更快我創建了一個計劃任務做索引每隔一小時:更新Apache Lucene索引文件

<bean id="rssIndexerService" class="com.lloydsbanking.webmi.service.RSSIndexerService" /> 
<task:scheduled-tasks> <task scheduled ref="rssIndexerService" method="indexUrls" cron="0 0 * * * MON-FRI" /></task:scheduled-tasks> 

的問題是,如果我創建一個新的內容,那麼搜索犯規顯示新的內容,而服務器運行和調度任務被調用後,如果我刪除了一個條目,它也不顯示從索引文件中刪除的調用。這裏是索引代碼:

@Service 
public class RSSIndexerService extends RSSReader { 

    @Autowired 
    private RSSFeedUrl rssFeedUrl; 

    private IndexWriter indexWriter = null; 

    private String indexPath = "C:\\MI\\index"; 

    Logger log = Logger.getLogger(RSSIndexerService.class.getName()); 

    public void indexUrls() throws IOException { 
     Date start = new Date(); 
     IndexWriter writer = getIndexWriter(); 
     log.info("Reading all the Urls in the Sharepoint");  
     Iterator<Entry<String, String>> entries = rssFeedUrl.getUrlMap().entrySet().iterator(); 
     try { 
      while (entries.hasNext()) { 
       Entry<String, String> mapEntry = entries.next(); 
       String url = mapEntry.getValue(); 
       SyndFeed feed = rssReader(url); 
       for (Object entry : feed.getEntries()) { 
        SyndEntry syndEntry = (SyndEntry) entry; 
        SyndContent desc = syndEntry.getDescription(); 
        if (desc != null) { 
         String text = desc.getValue(); 
         if ("text/html".equals(desc.getType())) { 
          Document doc = new Document(); 
          text = extractText(text); 
          Field fieldTitle = new StringField("title", syndEntry.getTitle(), Field.Store.YES); 
          doc.add(fieldTitle); 
          Field pathField = new StringField("path", url, Field.Store.YES); 
          doc.add(pathField); 
          doc.add(new TextField("contents", text, Field.Store.YES)); 

          // New index, so we just add the document (no old document can be there): 
          writer.addDocument(doc); 
         } 
        } 
       } 

      } 

     } finally { 

      // closeIndexWriter(); 
     } 
     Date end = new Date(); 
     log.info(end.getTime() - start.getTime() + " total milliseconds"); 
    } 

    public IndexWriter getIndexWriter() throws IOException { 

     if (indexWriter == null) { 
      Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47); 

      log.info("Indexing to directory '" + indexPath + "'..."); 
      Directory dir = FSDirectory.open(new File(indexPath)); 
      IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47, analyzer); 

      config.setOpenMode(OpenMode.CREATE_OR_APPEND); 
      indexWriter = new IndexWriter(dir, config); 
     } 
     return indexWriter; 
    } 

    @PreDestroy 
    public void closeIndexWriter() throws IOException { 
     if (indexWriter != null) { 
      System.out.println("Done with indexing ..."); 
      indexWriter.close(); 
     } 
    } 

} 

我知道這個問題可能由config.setOpenMode(OpenMode.CREATE_OR_APPEND)造成的;​​但是我不知道我怎麼能解決這個問題。

回答

0

好,我想出了檢查的想法,如果該目錄爲空之前或沒有,如果它不是那麼刪除以前的索引,然後每次做的OpenMode.Create索引:

File path = new File(System.getProperty("java.io.tmpdir")+"\\index"); 
     Directory dir = FSDirectory.open(path); 

     Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47); 
     IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47, analyzer); 

     if (path.list() != null) { 
      log.info("Delete previous indexes ..."); 
      FileUtils.cleanDirectory(path); 
     } 
     config.setOpenMode(OpenMode.CREATE); 

那麼我簡單的使用addDocument():

if ("text/html".equals(desc.getType())) { 
         ... 
         // New index, so we just add the document (no old document can be there): 
         indexWriter.addDocument(doc); 
        }