2015-06-15 59 views
0

我試圖在我的索引上執行查詢並獲取所有沒有gravatar圖像的審閱者的評論。要做到這一點,我已經實現與主機模式的PatternAnalyzerDefinition:如何在elastic4s和elasticsearch中實現PatternAnalyzer以排除特定字段的結果

"^https?\\:\\/\\/([^\\/?#]+)(?:[\\/?#]|$)" 

應該匹配,並提取像URL的主機:

https://www.gravatar.com/avatar/blablalbla?s=200&r=pg&d=mm

變爲:

www.gravatar.com 

映射:

clientProvider.getClient.execute { 
      create.index(_index).analysis(
      phraseAnalyzer, 
      PatternAnalyzerDefinition("host_pattern", regex = "^https?\\:\\/\\/([^\\/?#]+)(?:[\\/?#]|$)") 
     ).mappings(
"reviews" as (
      .... Cool mmappings 
       "review" inner (
       "grade" typed LongType, 
       "text" typed StringType index "not_analyzed", 
       "reviewer" inner (
        "screenName" typed StringType index "not_analyzed", 
        "profilePicture" typed StringType analyzer "host_pattern", 
        "thumbPicture" typed StringType index "not_analyzed", 
        "points" typed LongType index "not_analyzed" 
       ),      
       .... Other cool mmappings      
      ) 
      ) all(false) 
} map { response => 
     Logger.info("Create index response: {}", response) 
    } recover { 
     case t: Throwable => play.Logger.error("Error creating index: ", t) 
    } 

查詢:

val reviewQuery = (search in path) 
     .query(
     bool(
      must(
      not(
       termQuery("review.reviewer.profilePicture", "www.gravatar.com") 
      ) 
     ) 
     ) 
    ) 
     .postFilter(
     bool(
      must(
      rangeFilter("review.grade") from 3 
     ) 
     ) 
    ) 
     .size(size) 
     .sort(by field "review.created" order SortOrder.DESC) 

    clientProvider.getClient.execute {  
     reviewQuery 
    }.map(_.getHits.jsonToList[ReviewData]) 

檢查該映射的索引:

reviewer: { 
    properties: { 
     id: { 
      type: "long" 
     }, 
     points: { 
      type: "long" 
     }, 
     profilePicture: { 
      type: "string", 
      analyzer: "host_pattern" 
     }, 
     screenName: { 
      type: "string", 
      index: "not_analyzed" 
     }, 
     state: { 
      type: "string" 
     }, 
     thumbPicture: { 
      type: "string", 
      index: "not_analyzed" 
     } 
    } 
} 

當我執行查詢模式匹配似乎不工作。我仍然收到有評論者評論的圖片。 我在做什麼錯?也許我誤解了PatternAnalyzer?

我使用 「com.sksamuel.elastic4s」 %% 「elastic4s」 % 「1.5.9」,

回答

0

我想再次RTFM是爲了在這裏:

docs狀態:

重要提示:正則表達式應匹配標記分隔符,而不是標記本身。

這意味着在我的情況下,匹配的標記www.gravatar.com將不會是 分析字段後的一部分令牌。

而是使用Pattern Capture Token Filter

首先聲明一個新的CustomAnalyzerDefinition:

val hostAnalyzer = CustomAnalyzerDefinition(
    "host_analyzer", 
    StandardTokenizer, 
    PatternCaptureTokenFilter(
     name = "hostFilter", 
     patterns = List[String]("^https?\\:\\/\\/([^\\/?#]+)(?:[\\/?#]|$)"), 
     preserveOriginal = false 
    ) 
) 

然後分析儀添加到字段:

"review" inner (    
       "reviewer" inner (
        "screenName" typed StringType index "not_analyzed", 
        "profilePicture" typed StringType analyzer "hostAnalyzer", 
        "thumbPicture" typed StringType index "not_analyzed", 
        "points" typed LongType index "not_analyzed" 
       ) 
) 

create.index(_index).analysis(
      someAnalyzer, 
      phraseAnalyzer, 
      hostAnalyzer 
     ).mappings(

瞧。有用。檢查令牌和索引的非常好的工具是:

/[index]/[collection]/[id]/_termvector?fields=review.reviewer.profilePicture&pretty=true 
相關問題