2013-04-25 102 views
9

我有一個名爲deviations的屬性的「文件」(activerecords)。該屬性具有「Bin X」「Bin $」「Bin q」「Bin%」等值。elasticsearch的查詢字符串中的符號

我試圖使用輪胎/ elasticsearch搜索屬性。我正在使用空白分析器來索引偏差屬性。這裏是我創建索引的代碼:

settings :analysis => { 
    :filter => { 
     :ngram_filter => { 
     :type => "nGram", 
     :min_gram => 2, 
     :max_gram => 255 
     }, 
     :deviation_filter => { 
     :type => "word_delimiter", 
     :type_table => ['$ => ALPHA'] 
     } 
    }, 
    :analyzer => { 
     :ngram_analyzer => { 
     :type => "custom", 
     :tokenizer => "standard", 
     :filter => ["lowercase", "ngram_filter"] 
     }, 
     :deviation_analyzer => { 
     :type => "custom", 
     :tokenizer => "whitespace", 
     :filter => ["lowercase"] 
     } 
    } 
    } do 
    mapping do 
     indexes :id, :type => 'integer' 
     [:equipment, :step, :recipe, :details, :description].each do |attribute| 
     indexes attribute, :type => 'string', :analyzer => 'ngram_analyzer' 
     end 
     indexes :deviation, :analyzer => 'whitespace' 
    end 
    end 

當查詢字符串不包含特殊字符時,搜索似乎正常工作。例如Bin X將只返回那些在其中包含BinX這些字的記錄。但是,搜索諸如Bin $Bin %之類的東西會顯示包含字Bin的所有結果幾乎會忽略該符號(帶符號的結果在搜索中顯示的結果較高)。

這裏是我創造

def self.search(params) 
    tire.search(load: true) do 
     query { string "#{params[:term].downcase}:#{params[:query]}", default_operator: "AND" } 
     size 1000 
    end 
end 

的搜索方法,在這裏是怎麼了構建搜索表單:

<div> 
    <%= form_tag issues_path, :class=> "formtastic issue", method: :get do %> 
     <fieldset class="inputs"> 
     <ol> 
      <li class="string input medium search query optional stringish inline"> 
       <% opts = ["Description", "Detail","Deviation","Equipment","Recipe", "Step"] %> 
       <%= select_tag :term, options_for_select(opts, params[:term]) %> 
       <%= text_field_tag :query, params[:query] %> 
       <%= submit_tag "Search", name: nil, class: "btn" %> 
      </li> 
     </ol> 
     </fieldset> 
    <% end %> 
</div> 
+0

你不只是逃避,字符有含義的Lucene用反斜槓?當然,在一個Ruby字符串中,你需要一個雙反斜槓\\來在ruby字符到達Elastic Search API之前轉義它。我沒有試過Tyre,所以我不知道它是否適用於你的世界。僅供參考,這裏是受影響字符的快速參考:http://docs.lucidworks.com/display/lweug/Escaping+Special+Syntax+Characters – Phil 2013-04-26 13:39:26

+0

我不認爲這是問題,因爲查詢Bin $或Bin%會受到影響,但它們並未列在上面的鏈接中作爲特殊字符。 – Arnob 2013-04-26 17:48:15

+0

我從我自己的數據庫全文搜索(Oracle認爲它是和MySQL用於varchar或文本字段中的LIKE測試)中瞭解到,%是'匹配所有'字符。也許上面的鏈接不完整,或者與您的問題無關。你是否嘗試過逃避,看看是否能解決問題? – Phil 2013-04-27 18:34:36

回答

24

可以淨化你的查詢字符串。這裏是爲我試着在它扔一切正常消毒劑:

def sanitize_string_for_elasticsearch_string_query(str) 
    # Escape special characters 
    # http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html#Escaping Special Characters 
    escaped_characters = Regexp.escape('\\/+-&|!(){}[]^~*?:') 
    str = str.gsub(/([#{escaped_characters}])/, '\\\\\1') 

    # AND, OR and NOT are used by lucene as logical operators. We need 
    # to escape them 
    ['AND', 'OR', 'NOT'].each do |word| 
    escaped_word = word.split('').map {|char| "\\#{char}" }.join('') 
    str = str.gsub(/\s*\b(#{word.upcase})\b\s*/, " #{escaped_word} ") 
    end 

    # Escape odd quotes 
    quote_count = str.count '"' 
    str = str.gsub(/(.*)"(.*)/, '\1\"\3') if quote_count % 2 == 1 

    str 
end 

params[:query] = sanitize_string_for_elasticsearch_string_query(params[:query]) 
+2

我需要將正斜槓也添加到'escaped_characters'數組。 'escaped_characters = Regexp.escape('\\ + - &|!(){} [] ^〜*?:\ /')'因爲它正在打破正斜槓的字符串。 – rubyprince 2013-06-27 12:19:14

+0

這很奇怪,因爲'/'不是Lucene中的特殊字符:http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html#Escaping%20Special%20Characters – 2013-06-27 13:19:56

+0

嗨,請參閱http:/ /50.16.250.253:9200/locations/location/_search?q=123%2F345 ..我認爲這是一個錯誤,因爲'/'在字符串內......當我用'\\'轉義時,錯誤已解決,http://50.16.250.253:9200/locations/location/_search?q=123%5C%2F345 – rubyprince 2013-07-01 11:58:30