因此,我正在運行一個俚語字典類型網站,並且以前使用mysql LIKE進行網站搜索。它工作正常。無論如何,現在我正在更新網站,並正在考慮與SOLR一起使用django-haystack(似乎是最好的搜索選項之一?)停止SOLR多次索引相同單詞(或如何增加一個字段)
我知道它運行了,但搜索結果不是很好。例如,搜索單詞「LOL」會讓「Flood」成爲第一個結果,因爲它也有「LOL LOL LOL LOL LOL LOL」氾濫的例子,而不是首先顯示LOL這個詞。所以只能從Flood示例中標記出一個LOL(我是SOLR的新手,所以這可能會出錯,我是怎麼想的)。或者我可以只提高單詞標題的值(因此,searchterm與標題匹配的詞語首先出現,而searchterm匹配示例的詞語則出現在第二個詞語中)?我已經嘗試過django-haystack場的提升,但它似乎並沒有做太多的工作。
在此先感謝!
編輯:這裏是SOLR方案(這是一個有點大,主要是由Django的草堆自動生成):
<?xml version="1.0" ?>
<schema name="default" version="1.1">
<types>
<fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>
<!-- Numeric field types that manipulate the value into
a string value that isn't human-readable in its internal form,
but with a lexicographic ordering the same as the numeric ordering,
so that range queries work correctly. -->
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- find finnish ones <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> -->
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="Finnish" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="Finnish" />
<!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> -->
<!-- find finnish ones <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> -->
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
<fieldType name="ngram" class="solr.TextField" >
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="15" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="edge_ngram" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
</analyzer>
</fieldType>
</types>
<fields>
<!-- general -->
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="django_ct" type="string" indexed="true" stored="true" multiValued="false" />
<field name="django_id" type="string" indexed="true" stored="true" multiValued="false" />
<dynamicField name="*_i" type="sint" indexed="true" stored="true"/>
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
<dynamicField name="*_l" type="slong" indexed="true" stored="true"/>
<dynamicField name="*_t" type="text" indexed="true" stored="true"/>
<dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>
<dynamicField name="*_f" type="sfloat" indexed="true" stored="true"/>
<dynamicField name="*_d" type="sdouble" indexed="true" stored="true"/>
<dynamicField name="*_dt" type="date" indexed="true" stored="true"/>
<field name="rendered" type="string" indexed="false" stored="true" multiValued="false" />
<field name="word" type="text" indexed="true" stored="true" multiValued="false" />
<field name="author" type="text" indexed="true" stored="true" multiValued="false" />
<field name="text" type="text" indexed="true" stored="true" multiValued="false" />
<field name="explanation" type="text" indexed="true" stored="true" multiValued="false" />
<field name="example" type="text" indexed="true" stored="true" multiValued="false" />
</fields>
<!-- field to use to determine and enforce document uniqueness. -->
<uniqueKey>id</uniqueKey>
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>text</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="AND" />
</schema>
你能給你的架構?特別是你的自定義字段類型很有趣。由於「洪水」不包含「大聲笑」,我感興趣如何發生這種情況... – cheffe
架構添加。但是洪水氾濫的原因是因爲被索引的是:標題+解釋+例子。所以這是有道理的。這是用於俚語詞典,所以每個單詞都有一個單詞/標題,解釋和一個例子。所有的都被編入索引,但是當你搜索LOL時,你會想要一個單詞/標題LOL而不是一個包含LOL的例子(但是在某些情況下,你可能也想要這樣做,但是在結果列表中更低) – Nixarn