當docvalues = true時，小寫過濾器工廠不起作用

我試圖使用Solr實現不區分大小寫的排序並面臨this issue。當docvalues = true時，小寫過濾器工廠不起作用

[複印]

....But When I get search result its not sorted case insensitive. It gives all camel case result first and then all lower case 

If I m having short names 

Banu 

Ajay 

anil 

sudhir 

Nilesh 

It sorts like Ajay, Banu, Nilesh, anil, sudhir 
...................

我跟着the solution並提出在我的solr Schema.xml文件進行以下更改（僅培訓相關字段和字段類型被示出）：

<?xml version="1.0" encoding="UTF-8" standalone="no"?> 
 
<schema name="autoSolrSchema" version="1.5"> 
 
\t <types> 
 
\t \t ............... 
 
\t \t <fieldType class="org.apache.solr.schema.TextField" name="TextField"> 
 
\t \t \t <analyzer> 
 
\t \t \t \t <tokenizer class="solr.KeywordTokenizerFactory"/> 
 
\t \t \t \t <filter class="solr.LowerCaseFilterFactory"/> 
 
\t \t \t </analyzer> 
 
\t \t </fieldType> 
 
\t \t ............. 
 
\t </types> 
 
\t <fields> 
 
\t ................. 
 
\t \t <field indexed="true" multiValued="false" name="name" stored="true" type="TextField" docValues="true" /> 
 
\t ................ \t 
 
\t </fields> 
 
\t <uniqueKey>id</uniqueKey> 
 
\t </schema>

但這並沒有解決排序問題。所以我從字段定義中刪除docValues="true"並再次嘗試。這次排序工作正常，但我不得不在查詢中指定useFieldCache=true。

爲什麼solr.LowerCaseFilterFactory不適用於docValues="true"？

是否有任何其他方法使大小寫不敏感的排序工作，而不刪除docValues="true"和指定useFieldCache=true？

更新：

我也跟着ericLavault的建議和實施更新請求處理器。但現在我面臨以下問題：

1）我們正在使用dse搜索。所以，隨後在規定的方法this article.

我們目前的表模式：

CREATE TABLE IF NOT EXISTS test_data(
    id  UUID, 
    nm  TEXT, 
    PRIMARY KEY (id)

Solr模式：

Solr schema : 
 

 
<?xml version="1.0" encoding="UTF-8" standalone="no"?> 
 
<schema name="autoSolrSchema" version="1.5"> 
 
\t <types> 
 
\t \t <fieldType class="org.apache.solr.schema.UUIDField" name="UUIDField"/> 
 
\t \t <fieldType class="org.apache.solr.schema.StrField" name="StrField"/> 
 
\t </types> 
 
\t <fields> 
 
\t \t <field indexed="true" multiValued="false" name="nm" stored="true" type="StrField" docValues="true"/> 
 
\t \t <field indexed="true" multiValued="false" name="id" stored="true" type="UUIDField"/> 
 
\t \t <field indexed="true" multiValued="false" name="nm_s" stored="true" type="StrField" docValues="true"/> 
 
\t </fields> 
 
\t <uniqueKey>id</uniqueKey> 
 
</schema>

誠如，我轉換納米至lowecase和插入作爲nm_s使用更新請求處理器。然後重新加載架構並重新編制索引。但是，在查詢時使用此select nm from test_data where solr_query='{"q": "(-nm:(sssss))" ,"paging":"driver","sort":"nm_s asc"}';

我收到以下錯誤：

...enable docvalues true n reindex or place useFieldCache=true...

2）我如何才能確保值nm_s正確更新？有沒有辦法看到nm_s的價值？

3）即使啓用了docValues，爲什麼會出現上述錯誤？

來源

2016-09-26 Sharun

您在更改字段定義（添加/刪除docValues時）後是否完全重新編制了內容的索引？ – EricLavault

@ n0tting是的... – Sharun

好的抱歉，問這個，但你永遠不知道！ :)我想我對此有一些線索，在回答 – EricLavault

此問題可能來自DocValues最初設計爲支持未分析類型的事實。它不支持TextField：

DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are:

StrField and UUIDField :

If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type.

If the field is multi-valued, Lucene will use the SORTED_SET type.

Any Trie* numeric fields, date fields and EnumField.

If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type.

If the field is multi-valued, Lucene will use the SORTED_SET type.

（從https://cwiki.apache.org/confluence/display/solr/DocValues報價）

有上Solr的吉拉的問題添加docValues的文本字段（SOLR-8362）的支持，但仍處於打開狀態和未分配。

爲了不區分大小寫的排序工作，無需拆卸docValues="true"，你將不得不使用字符串字段類型（solr.StrField），但因爲你不能用字符串類型定義任何<analyser>您將需要一個Update Request Processor爲小寫的輸入流（或類似於在將數據發送到Solr之前預處理字段內容）。

如果你想被標記化搜索你的領域和使用DocValues排序，您可以使用根據您的實際文本字段（不DocValues）一copyField和字符串字段要在排序（處理爲小寫並啓用了DocValues）。

來源

2016-09-26 09:07:30 EricLavault

之前，必須先檢查「...與被定義爲字符串類型的字段進行排序..」 - 您的意思是「StrField」？ – Sharun

@Sharun yes或任何使用* solr.StrField *的字段（缺省情況下，我們在schema.xml中有） – EricLavault

但我們不能指定StrField的，我們可以嗎？ – Sharun

當docvalues = true時，小寫過濾器工廠不起作用

回答

相關問題