2014-02-27 35 views
12

Logstash非常棒。我可以把它JSON像這樣的(多內襯可讀性):Logstash索引JSON數組

{ 
    "a": "one" 
    "b": { 
    "alpha":"awesome" 
    } 
} 

,然後查詢使用搜索詞b.alpha:awesome在kibana該行。尼斯。

但是我現在有這樣一個JSON日誌行:

{ 
    "different":[ 
    { 
     "this": "one", 
     "that": "uno" 
    }, 
    { 
     "this": "two" 
    } 
    ] 
} 

而且我希望能夠找到這條線像different.this:two(或different.this:one,或different.that:uno)搜索

如果我直接使用Lucene,我會遍歷different陣列,併爲其中的每個散列生成一個新的搜索索引,但是Logstash目前似乎像這樣攝取該行:

不同:{這樣的:一個,即:UNO},{這件事:}

哪一個不是要幫我尋找使用different.thisdifferent.that日誌行。

任何想到我可以做到這一點的編解碼器,過濾器或代碼更改?

+0

索引數組之後,你想要的JSON格式是什麼? – vzamanillo

回答

3

您可以編寫自己的filter(複製&粘貼,重命名的類名,config_name並重寫filter(event)法)或修改當前的JSON過濾器(source在Github上)

您可以找到JSON過濾器( Ruby類)源代碼,路徑爲logstash-1.x.x\lib\logstash\filters,名稱爲json.rb。 JSON的過濾器解析內容JSON如下

begin 
    # TODO(sissel): Note, this will not successfully handle json lists 
    # like your text is '[ 1,2,3 ]' JSON.parse gives you an array (correctly) 
    # which won't merge into a hash. If someone needs this, we can fix it 
    # later. 
    dest.merge!(JSON.parse(source)) 

    # If no target, we target the root of the event object. This can allow 
    # you to overwrite @timestamp. If so, let's parse it as a timestamp! 
    if [email protected] && event[TIMESTAMP].is_a?(String) 
    # This is a hack to help folks who are mucking with @timestamp during 
    # their json filter. You aren't supposed to do anything with 
    # "@timestamp" outside of the date filter, but nobody listens... ;) 
    event[TIMESTAMP] = Time.parse(event[TIMESTAMP]).utc 
    end 

    filter_matched(event) 
rescue => e 
    event.tag("_jsonparsefailure") 
    @logger.warn("Trouble parsing json", :source => @source, 
       :raw => event[@source], :exception => e) 
    return 
end 

您可以修改的解析過程修改原來的JSON

json = JSON.parse(source) 
    if json.is_a?(Hash) 
    json.each do |key, value| 
     if value.is_a?(Array) 
      value.each_with_index do |object, index| 
       #modify as you need 
       object["index"]=index 
      end 
     end 
    end 
    end 
    #save modified json 
    ...... 
    dest.merge!(json) 

,那麼你可以修改配置文件使用/新/修改在\logstash-1.x.x\lib\logstash\config

JSON濾波器和地方這是我elastic_with_json.conf與修改json.rb過濾

input{ 
    stdin{ 

    } 
}filter{ 
    json{ 
     source => "message" 
    } 
}output{ 
    elasticsearch{ 
     host=>localhost 
    }stdout{ 

    } 
} 

,如果你想使用新的過濾器,你可以用config_name

class LogStash::Filters::Json_index < LogStash::Filters::Base 

    config_name "json_index" 
    milestone 2 
    .... 
end 

配置它,並將它配置

input{ 
    stdin{ 

    } 
}filter{ 
    json_index{ 
     source => "message" 
    } 
}output{ 
    elasticsearch{ 
     host=>localhost 
    }stdout{ 

    } 
} 

希望這有助於。

2

對於一個快速和骯髒的黑客,我用了Ruby濾波器和下面的代碼,無需使用了盒子「JSON」的過濾了

input { 
    stdin{} 
} 

filter { 
    grok { 
    match => ["message","(?<json_raw>.*)"] 
    } 
    ruby { 
    init => " 
     def parse_json obj, pname=nil, event 
     obj = JSON.parse(obj) unless obj.is_a? Hash 
     obj = obj.to_hash unless obj.is_a? Hash 

     obj.each {|k,v| 
     p = pname.nil?? k : pname 
     if v.is_a? Array 
      v.each_with_index {|oo,ii|    
      parse_json_array(oo,ii,p,event) 
      } 
      elsif v.is_a? Hash 
      parse_json(v,p,event) 
      else 
      p = pname.nil?? k : [pname,k].join('.') 
      event[p] = v 
      end 
     } 
     end 

     def parse_json_array obj, i,pname, event 
      obj = JSON.parse(obj) unless obj.is_a? Hash 
      pname_ = pname 
      if obj.is_a? Hash 
      obj.each {|k,v| 
       p=[pname_,i,k].join('.') 
       if v.is_a? Array 
       v.each_with_index {|oo,ii| 
        parse_json_array(oo,ii,p,event) 
       } 
       elsif v.is_a? Hash 
       parse_json(v,p, event) 
       else 
       event[p] = v 
       end 
      } 
      else 
      n = [pname_, i].join('.') 
      event[n] = obj 
      end 
     end 
     " 
     code => "parse_json(event['json_raw'].to_s,nil,event) if event['json_raw'].to_s.include? ':'" 
    } 


    } 

output { 
    stdout{codec => rubydebug} 
} 

測試JSON結構

{"id":123, "members":[{"i":1, "arr":[{"ii":11},{"ii":22}]},{"i":2}], "im_json":{"id":234, "members":[{"i":3},{"i":4}]}} 

這是什麼輸出

 { 
      "message" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}", 
      "@version" => "1", 
     "@timestamp" => "2014-07-25T00:06:00.814Z", 
       "host" => "Leis-MacBook-Pro.local", 
      "json_raw" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}", 
       "id" => 123, 
     "members.0.i" => 1, 
"members.0.arr.0.ii" => 11, 
"members.0.arr.1.ii" => 22, 
     "members.1.i" => 2, 
      "im_json" => 234, 
     "im_json.0.i" => 3, 
     "im_json.1.i" => 4 
     } 
+0

隨着時間的推移,這個解決方案應該仍然能夠工作,但是卻以一種笨拙的方式 - 我會說。對於可預測的json結構來說,使用預定義的映射會更好,而對於內部有數組的不可預測的jsons來說,你仍然可以做類似的事情,但是在你自己的自定義過濾器中,而不是ruby過濾器 –

+0

在最近的elasticsearch版本中,期間在字段名稱。如果您習慣了此解決方案,請使用其他字符。 –

0

我喜歡的解決方案是紅寶石過濾器,因爲該r要求我們不要寫另一個過濾器。但是,該解決方案會創建JSON「根」上的字段,並且很難跟蹤原始文檔的外觀。

我想出了類似的東西,更容易遵循,是一個遞歸解決方案,因此它更乾淨。

ruby { 
    init => " 
     def arrays_to_hash(h) 
      h.each do |k,v| 
      # If v is nil, an array is being iterated and the value is k. 
      # If v is not nil, a hash is being iterated and the value is v. 
      value = v || k 
      if value.is_a?(Array) 
       # "value" is replaced with "value_hash" later. 
       value_hash = {} 
       value.each_with_index do |v, i| 
        value_hash[i.to_s] = v 
       end 
       h[k] = value_hash 
      end 

      if value.is_a?(Hash) || value.is_a?(Array) 
       arrays_to_hash(value) 
      end 
      end 
     end 
     " 
     code => "arrays_to_hash(event.to_hash)" 
} 

它將數組轉換爲具有每個鍵作爲索引號。更多細節: - http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html