過濾器對搜索結果的影響

當我在solr中查詢「優雅」時，我也得到了「優雅」的結果。過濾器對搜索結果的影響

我用這些過濾器的指數分析

WhitespaceTokenizerFactory 
StopFilterFactory 
WordDelimiterFilterFactory 
LowerCaseFilterFactory 
SynonymFilterFactory 
EnglishPorterFilterFactory 
RemoveDuplicatesTokenFilterFactory 
ReversedWildcardFilterFactory

和查詢分析：

WhitespaceTokenizerFactory 
SynonymFilterFactory 
StopFilterFactory 
WordDelimiterFilterFactory 
LowerCaseFilterFactory 
EnglishPorterFilterFactory 
RemoveDuplicatesTokenFilterFactory

我想知道哪些過濾器影響我的搜索結果。

來源

2011-06-29 Romi

EnglishPorterFilterFactory

那簡短的回答;）

多一點信息：

英語波特意味着英語波特詞幹詞幹alogrithm。根據詞幹（這是一個啓發式詞根生成者），優雅和優雅都是同一個詞幹。

您可以在線驗證，例如Here。基本上你會看到「eleg ant」和「eleg ance」，它們是「相同的幹」>eleg。

從Solr的來源：

 public void inform(ResourceLoader loader) { 
      String wordFiles = args.get(PROTECTED_TOKENS); 
      if (wordFiles != null) { 
       try {

這裏正好來了protwords文件到播放：

    File protectedWordFiles = new File(wordFiles); 
        if (protectedWordFiles.exists()) { 
         List<String> wlist = loader.getLines(wordFiles); 
         //This cast is safe in Lucene 
         protectedWords = new CharArraySet(wlist, false);//No need to go through StopFilter as before, since it just uses a List internally 
        } else { 
         List<String> files = StrUtils 
           .splitFileNames(wordFiles); 
         for (String file : files) { 
          List<String> wlist = loader.getLines(file 
            .trim()); 
          if (protectedWords == null) 
           protectedWords = new CharArraySet(wlist, 
             false); 
          else 
           protectedWords.addAll(wlist); 
         } 
        } 
       } catch (IOException e) { 
        throw new RuntimeException(e); 
       } 
      } 
     }

那這會影響所產生的部分。在那裏，你看到雪球庫

 public EnglishPorterFilter create(TokenStream input) { 
      return new EnglishPorterFilter(input, protectedWords); 
     } 

    } 

    /** 
    * English Porter2 filter that doesn't use reflection to 
    * adapt lucene to the snowball stemmer code. 
    */ 
    @Deprecated 
    class EnglishPorterFilter extends SnowballPorterFilter { 
     public EnglishPorterFilter(TokenStream source, 
       CharArraySet protWords) { 
      super (source, new org.tartarus.snowball.ext.EnglishStemmer(), 
        protWords); 
     } 
    }

來源

2011-06-29 10:04:09 fyr

@fyr的調用：雅我useed Solr的adimn頁面才能看到效果:)，但使用portwords.txt，其中包括我什麼都不englishPorterFilter。那麼它是如何做到的呢？ – Romi

portwords.txt的用途是什麼 – Romi

不，它只使用portwords來修復你的詞幹。它是啓蒙主義的，所以它會犯錯誤。英語Porter算法使用雪球庫。 – fyr

過濾器對搜索結果的影響

回答

相關問題