2012-08-22 162 views
4

我想在solr中應用不區分大小寫的搜索字段myfieldSolr - 不區分大小寫的搜索不起作用

我google了一下,我發現,我需要應用LowerCaseFilterFactory字段類型和字段應爲solr.TextFeild

我在我的schema.xml中應用了這些數據,並重新爲這些數據編制索引,然後我的搜索似乎區分大小寫。

以下是我執行的搜索。

http://localhost:8080/solr/select?q=myfield:"cloud university"&hl=on&hl.snippets=99&hl.fl=myfield 

以下是字段類型定義

<fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> 
     <analyzer type="index"> 
     <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
     <!-- in this example, we will only use synonyms at query time 
     <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> 
     --> 
     <!-- Case insensitive stop word removal. 
      add enablePositionIncrements=true in both the index and query 
      analyzers to leave a 'gap' for more accurate phrase queries. 
     --> 
     <filter class="solr.StopFilterFactory" 
       ignoreCase="true" 
       words="stopwords_en.txt" 
       enablePositionIncrements="true" 
       /> 
     <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> 
     <filter class="solr.LowerCaseFilterFactory"/> 
     <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
     <filter class="solr.PorterStemFilterFactory"/> 
     </analyzer> 
     <analyzer type="query"> 
     <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
     <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
     <filter class="solr.StopFilterFactory" 
       ignoreCase="true" 
       words="stopwords_en.txt" 
       enablePositionIncrements="true" 
       /> 
     <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> 
     <filter class="solr.LowerCaseFilterFactory"/> 
     <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
     <filter class="solr.PorterStemFilterFactory"/> 
     </analyzer> 
    </fieldType> 

以下是我的字段定義

<field name="myfield" type="text_en_splitting" indexed="true" stored="true" /> 

不知道,什麼是錯。 請幫我解決這個問題。

感謝

編輯

調試查詢

<lst name="debug"> 
    <str name="rawquerystring"> 
     "cloud university" AND guid:268406b6-db65-49da-848a-c59248f170db 
    </str> 
    <str name="querystring"> 
     "cloud university" AND guid:268406b6-db65-49da-848a-c59248f170db 
    </str> 
    <str name="parsedquery"> 
     +PhraseQuery(CC:"cloud univers") +guid:268406b6-db65-49da-848a-c59248f170db 
    </str> 
    <str name="parsedquery_toString"> 
     +CC:"cloud univers" +guid:268406b6-db65-49da-848a-c59248f170db 
    </str> 
    <lst name="explain"> 
     <str name="KSYS_20120805_1100"> 
      12.572915 = (MATCH) sum of: 0.03595598 = weight(CC:"cloud univers" in 1560524), product of: 0.51819557 = queryWeight(CC:"cloud univers"), product of: 8.881522 = idf(CC: cloud=4798 univers=625207) 0.05834536 = queryNorm 0.06938689 = fieldWeight(CC:"cloud univers" in 1560524), product of: 1.0 = tf(phraseFreq=1.0) 8.881522 = idf(CC: cloud=4798 univers=625207) 0.0078125 = fieldNorm(field=CC, doc=1560524) 12.536959 = (MATCH) weight(guid:268406b6-db65-49da-848a-c59248f170db in 1560524), product of: 0.85526216 = queryWeight(guid:268406b6-db65-49da-848a-c59248f170db), product of: 14.658615 = idf(docFreq=1, maxDocs=1709587) 0.05834536 = queryNorm 14.658615 = (MATCH) fieldWeight(guid:268406b6-db65-49da-848a-c59248f170db in 1560524), product of: 1.0 = tf(termFreq(guid:268406b6-db65-49da-848a-c59248f170db)=1) 14.658615 = idf(docFreq=1, maxDocs=1709587) 1.0 = fieldNorm(field=guid, doc=1560524) 
     </str> 
    </lst> 
    <str name="QParser">LuceneQParser</str> 
    <lst name="timing"> 
     <double name="time">60.0</double> 
     <lst name="prepare"> 
      <double name="time">1.0</double> 
      <lst name="org.apache.solr.handler.component.QueryComponent"> 
       <double name="time">0.0</double> 
      </lst> 
      <lst name="org.apache.solr.handler.component.FacetComponent"> 
       <double name="time">0.0</double> 
      </lst> 
      <lst name="org.apache.solr.handler.component.MoreLikeThisComponent"> 
       <double name="time">0.0</double> 
      </lst> 
      <lst name="org.apache.solr.handler.component.HighlightComponent"> 
       <double name="time">0.0</double> 
      </lst> 
      <lst name="org.apache.solr.handler.component.StatsComponent"> 
       <double name="time">0.0</double> 
      </lst> 
      <lst name="org.apache.solr.handler.component.DebugComponent"> 
       <double name="time">0.0</double> 
      </lst> 
     </lst> 
     <lst name="process"> 
      <double name="time">59.0</double> 
      <lst name="org.apache.solr.handler.component.QueryComponent"> 
       <double name="time">0.0</double> 
      </lst> 
      <lst name="org.apache.solr.handler.component.FacetComponent"> 
       <double name="time">0.0</double> 
      </lst> 
      <lst name="org.apache.solr.handler.component.MoreLikeThisComponent"> 
       <double name="time">0.0</double> 
      </lst> 
      <lst name="org.apache.solr.handler.component.HighlightComponent"> 
       <double name="time">57.0</double> 
      </lst> 
      <lst name="org.apache.solr.handler.component.StatsComponent"> 
       <double name="time">0.0</double> 
      </lst> 
      <lst name="org.apache.solr.handler.component.DebugComponent"> 
       <double name="time">2.0</double> 
      </lst> 
     </lst> 
    </lst> 
</lst> 
+0

的配置是正確的字。在對模式xml進行更改之後是否重新加載了內核? – Jayendra

+0

是@Jayendra,我在更改後重新加載 – meghana

+0

您可以在url中添加debugQuery = on並檢查調試信息,以查看查詢的樣子。 – Jayendra

回答

6

你應該把solr.LowerCaseFilterFactory字分隔符之前因爲下端蓋,反之亦然中間蓋觸發分隔符

+0

嗯.....感謝@Bob Yoplait,我嘗試了這一點,讓你知道。 :) – meghana

+0

感謝@Bob,這個問題是由您的建議:) – meghana

+0

它的工作,謝謝你。 – user

1

我建議你應該使用分析工具,看錶情是怎樣的表達被索引以及如何搜索。 http://localhost:8983/solr/admin/analysis.jsp?highlight=on

我覺得可能是與WordDelimiterFilterFactory(它是在查詢和索引不同)的問題,但是這只是一個猜測。

選擇工具字段類型text_en_splitting和字段值指數ClOUD UNIVERSITY和字段值查詢cloud university進入。同時選擇詳細輸出,看看你得到什麼。

+0

謝謝@Dorin,我嘗試了這一點,讓你知道:) – meghana