Elasticsearch布爾查詢使用正則表達式過濾

我一直在試圖找出一個Elasticsearch 5.4查詢中使用正則表達式的實際模式的最佳途徑。搜索有關標準分析器和標記化而每串場後，我開始使用放置在我的映射關係沒有分析領域（標.RAW屬性）。我試過了同一個查詢的兩個變體，都沒有成功。Elasticsearch布爾查詢使用正則表達式過濾

查詢字符串過濾器：

GET /test-*/_search 
{ 
"query": { 
    "bool": { 
    "must": [ 
     { 
      "query_string":{ 
      "query": "URL.raw:/^(http|https)\\:\/\/.+(wp-content|wp-admin)/" 
      } 
     } 
    ] 
    } 
}, 
"sort": { 
    "@timestamp": { 
    "order": "desc" 
    } 
} 
}

REGEXP FILTER：

GET /test-*/_search 
{ 
"query": { 
    "bool": { 
    "must": [ 
     { 
     "regexp": { 
      "URL.raw":{ 
      "value": "/^(http|https)\\:\/\/.+(wp-content|wp-admin)/" 
      } 
     } 
     } 
    ] 
    } 
}, 
"sort": { 
    "@timestamp": { 
    "order": "desc" 
    } 
} 
}

似乎都沒有結果或解析異常

{ 
    "error": { 
    "root_cause": [ 
     { 
     "type": "parse_exception", 
     "reason": "parse_exception: Encountered \" \"^\" \"^ \"\" at line 1, column 8.\nWas expecting one of:\n <BAREOPER> ...\n \"(\" ...\n \"*\" ...\n <QUOTED> ...\n <TERM> ...\n <PREFIXTERM> ...\n <WILDTERM> ...\n <REGEXPTERM> ...\n \"[\" ...\n \"{\" ...\n <NUMBER> ...\n " 
     },

是否Lucene的需要特殊的轉義或列入黑名單的字符？任何幫助或指針將不勝感激。謝謝！

來源

2017-06-05 djmm187

Lucene的正則表達式，默認情況下並固定''^ /'$'不是特殊那裏。你不需要'/'正則表達式的分隔符，你不需要逃避'/'。試着用'了'regexp_filter' 「的https：//.*wp-（內容|管理員）*。」' –

Lucene正則表達式默認是錨定的，並且^/$在那裏並不特別。

你不需要/正則表達式分隔符，因此你不需要轉義/。

使用以下模式：

"value": "https?://.*wp-(content|admin).*"

注意，我修改了組位，使圖案更加線性和效率。

詳細：

https?:// - 字符串https://或http://
.*開始 - 再有任何0+字符
wp- - 一個wp-子
(content|admin) - 無論是content或admin子串
.* - 再有任何0+字符。

來源

2017-06-05 21:06:50

真棒！完全合理！感謝您及時的回覆：）） – djmm187

Elasticsearch布爾查詢使用正則表達式過濾

回答

相關問題