我想使用solr的langid UpdateRequestProcessor。下面是配置:langid UpdateRequestProcessor只映射第一個字段
<updateRequestProcessorChain name="languages">
<processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory">
<lst name="invariants">
<str name="langid.fl">focus, expertise, platforms, partners, participation, additional</str>
<str name="langid.whitelist">en,fr</str>
<str name="langid.fallback">en</str>
<str name="langid.langField">detectedlang</str>
<bool name="langid.map">true</bool>
<bool name="langid.map.keepOrig">false</bool>
</lst>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
我的領域是這樣的:
<fields>
<field name="_root_" type="string" indexed="true" stored="false"/>
<field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>
<field name="id" type="string" indexed="true" stored="true" required="true" />
<!-- raw fields from sql db -->
<field name="expertise_id" type="int" indexed="true" stored="true" />
<field name="person_id" type="int" indexed="true" stored="true" />
<field name="mod_date" type="date" indexed="true" stored="true" />
<field name="lang" type="string" indexed="true" stored="true" />
<field name="focus" type="text_general" indexed="true" stored="true" />
<field name="expertise" type="text_general" indexed="true" stored="true" />
<field name="platforms" type="text_general" indexed="true" stored="true" />
<field name="partners" type="text_general" indexed="true" stored="true" />
<field name="participation" type="text_general" indexed="true" stored="true" />
<field name="additional" type="text_general" indexed="true" stored="true" />
<field name="tag" type="text_general" termVectors="true" multiValued="true" />
<field name="facet_tag" type="string" stored="false" indexed="false" docValues="true" multiValued="true" default=""/>
<!-- language detected by solr -->
<field name="detectedlang" type="string" indexed="true" stored="true" />
<!-- defined locale fields -->
<dynamicField name="*_en" type="text_en" indexed="true" stored="true" />
<dynamicField name="*_fr" type="text_fr" indexed="true" stored="true" />
<copyField source="tag" target="facet_tag"/>
</fields>
當我運行的更新或dataimport我知道,「語言」更新鏈的使用,因爲focus
被映射到focus_en
並檢測到lang被設置。但是,langid.fl
中的其他字段都沒有映射。爲什麼?
一個例子更新查詢:
{
"additional": "here is some other information about me.",
"expertise_id": "10000",
"id": "foo_10000",
"focus": "this is my new focus. It is very exciting. When I am done I expect to be super experienced."
}
這裏是expertise_id=10000
查詢的結果。需要注意的是additional
沒有被移動到additional_en
:
"response":{"numFound":1,"start":0,"docs":[
{
"additional":"here is some other information about me.",
"expertise_id":10000,
"id":"foo_10000",
"detectedlang":"en",
"focus_en":"this is my new focus. It is very exciting. When I am done I expect to be super experienced.",
"_version_":1447088846110982144}]
}
請參閱https://wiki.apache.org/solr/LanguageDetection#Caveats。 '由於這些實現使用基於n-gram的方法進行檢測,因此它們很容易在特別短的輸入上檢測不到。「您是否嘗試使用更長的文本? – arun
@arun:爲了測試長度可能成爲問題的想法,我只是添加了一個文檔,其中所有映射字段具有相同的200字英文文本。 'focus'被映射到'focus_en'。沒有其他人被映射。 – dnagirl
@dnagirl,是否提供瞭解決方案? – forguta