我有一個ElasticSearch設置，接收要通過CouchDB河索引的數據。我遇到的問題是，CouchDB文檔中的大多數字段實際上與搜索無關：它們是應用程序內部使用的字段（ID等），我不想因這些字段而誤報。此外，索引不需要的數據在我看來是浪費資源。ElasticSearch：僅索引映射中指定的字段

爲了解決這個問題，我已經定義了一個映射，我指定了我想要索引的字段。我正在使用pyes來訪問ElasticSearch。我遵循的過程是：

創建與索引關聯的CouchDB河流。這顯然也創建了索引，並在該索引中創建了一個「couchdb」映射，就我所知，它包含了所有字段，並帶有動態分配的類型。
把一個映射，restring它到我真正想索引的領域。

這是該指數定義爲通過獲得：

curl -XGET http://localhost:9200/notes_index/_mapping?pretty=true 

{ 
    "notes_index" : { 
    "default_mapping" : { 
     "properties" : { 
     "note_text" : { 
      "type" : "string" 
     } 
     } 
    }, 
    "couchdb" : { 
     "properties" : { 
     "_rev" : { 
      "type" : "string" 
     }, 
     "created_at_date" : { 
      "format" : "dateOptionalTime", 
      "type" : "date" 
     }, 
     "note_text" : { 
      "type" : "string" 
     }, 
     "organization_id" : { 
      "type" : "long" 
     }, 
     "user_id" : { 
      "type" : "long" 
     }, 
     "created_at_time" : { 
      "type" : "long" 
     } 
     } 
    } 
    } 
}

，我的問題是許多倍：

默認「CouchDB的」映射索引的所有字段。我不想要這個。是否有可能避免創建該映射？我很困惑，因爲這種映射似乎是以某種方式「連接」到CouchDB河流的。
我創建映射似乎沒有任何效果：沒有通過映射

索引你對此有何建議文件？

編輯

這就是我實際上做，輸入號碼：

server="localhost" 

# Create the index 
curl -XPUT "$server:9200/index1" 

# Create the mapping 
curl -XPUT "$server:9200/index1/mapping1/_mapping" -d ' 
{ 
    "type1" : { 
     "properties" : { 
      "note_text" : {"type" : "string", "store" : "no"} 
     } 
    } 
} 
' 

# Configure the river 
curl -XPUT "$server:9200/_river/river1/_meta" -d '{ 
    "type" : "couchdb", 
    "couchdb" : { 
     "host" : "localhost", 
     "port" : 5984, 
     "user" : "admin", 
     "password" : "admin", 
     "db" : "notes" 
    }, 
    "index" : { 
     "index" : "index1", 
     "type" : "type1" 
    } 
}'

在索引1的文件還是超過「note_text」等領域，這是只一個我在映射定義中特別提到。這是爲什麼？

來源

2012-01-26 dangonfast

CouchDB river的默認行爲是使用'動態'映射，即索引傳入CouchDB文檔中找到的所有字段。你是對的，它可以不必要地增加索引的大小（你可以通過從查詢中排除一些字段來解決你的搜索問題）。

，而不是使用的「動態」一個自己的映射，你需要配置河插件使用您所創建的映射（見this article）：

curl -XPUT 'elasticsearch-host:9200/_river/notes_index/_meta' -d '{ 
    "type" : "couchdb", 

    ... your CouchDB connection configuration ... 

    "index" : { 
     "index" : "notes_index", 
     "type" : "mapping1" 
    } 
}'

類型的名稱您在映射中指定URL會覆蓋您在定義中包含的那個，所以您創建的類型實際上是mapping1。嘗試執行這個命令來查看自己：

> curl 'localhost:9200/index1/_mapping?pretty=true' 

{ 
    "index1" : { 
    "mapping1" : { 
     "properties" : { 
     "note_text" : { 
      "type" : "string" 
     } 
     } 
    } 
    } 
}

我認爲，如果你會得到正確類型的名稱，它將開始正常工作。

來源

2012-01-26 23:09:46

感謝您的評論，但有些事情尚不清楚。我在哪裏使用我的映射（我稱之爲'default_mapping'）在該PUT請求中？ – dangonfast 2012-01-27 00:09:45

您對每個索引有一個映射，但是您可以在每個映射中聲明多個「類型」。我不確定你打算使用哪種映射類型 - 你有兩個：'couchdb'和'default_mapping'。只需更改河流配置中「類型」鍵的值即可。 – 2012-01-27 00:24:01

我編輯了原始問題，現在顯示我正在配置ES的實際POST請求。這仍然不起作用：所有字段仍然被編入索引。 – dangonfast 2012-01-27 00:58:28

ElasticSearch：僅索引映射中指定的字段

編輯

回答

相關問題