2017-02-01 77 views
0

我正在使用solr查詢從文檔中搜索關鍵字。我想要精確的短語出現在最前面,但是我也希望如果文件中多次重複同一短語,那麼它應該被計數一次,因爲那些在文件頂部多次具有相同短語的關鍵字會使得得分最高。solr精確搜索忽略重複短語

請查看下面的結果,因爲我正在尋找「php developer」,找到了兩個結果,但都有不同的分數。

根據我們的需要,兩者應該有相同的分數。我想忽略文檔中的重複短語。

請檢查方案還提出,搜索的 「job_search」 場組合 「JOB_TITLE,key_skills,key_skills_admin,job_detail」

 <copyField source="job_title" dest="job_search"/> 
     <copyField source="key_skills" dest="job_search"/> 
     <copyField source="key_skills_admin" dest="job_search"/> 
     <copyField source="job_detail" dest="job_search"/> 

     { 
     "responseHeader":{ 
     "status":0, 
     "QTime":7, 
     "params":{ 
      "lowercaseOperators":"true", 
      "mm":"2", 
      "debugQuery":"true", 
      "fl":"job_slno,job_title,job_detail,key_skills,key_skills_admin,display_date,score", 
      "indent":"true", 
      "q":"\"php developer\"", 
      "stopwords":"true", 
      "wt":"json", 
      "defType":"edismax"}}, 
     "response":{"numFound":110,"start":0,"maxScore":2.518858,"docs":[ 
      { 
      "job_slno":"243681", 
      "job_title":"php developer", 
      "job_detail":"sdf sdfs df", 
      "key_skills":"php developer", 
      "key_skills_admin":"php developer", 
      "display_date":"2016-11-11T00:00:00Z", 
      "score":2.518858}, 
      { 
      "job_slno":"243340", 
      "job_title":"sfsdfs", 
      "job_detail":"dfsdfsdfsd", 
      "key_skills":"PHP Developer", 
      "key_skills_admin":"PHP Developer", 
      "display_date":"2016-11-13T00:00:00Z", 
      "score":2.399412}, 
      ] 
     } 

回答

0

您可以創建擴展DefaultSimilarity自己的自定義相似性類別。 並根據您的用例重寫​​tf方法。

public class CustomSimilarity extends DefaultSimilarity { 

     //multiple occurrences of terms doesn't affect its relevancy 
     @Override 
     public float tf(float freq) { 
       return 1; 
     } 
}