2016-07-01 54 views
0

我使用Filebeat解析Windows中的XML文件,並將它們發送到Logstash進行過濾併發送到Elasticsearch。使用Logstash解析Filebeat中的XML數據

Filebeat作業完美,我將XML塊放入Logstash,但它看起來很喜歡我錯誤地配置了Logstash過濾器,將XML塊解析爲分隔的字段並將這些字段封裝到Elasticsearch類型中。

這裏是我的XML樣本數據:

<H_Ticket> 
<IDH_Ticket>26</IDH_Ticket> 
<CodeBus>186</CodeBus> 
<CodeCh>5531</CodeCh> 
<CodeConv>5531</CodeConv> 
<Codeligne>12</Codeligne> 
<Date>20150915</Date> 
<Heur>1110</Heur> 
<NomFR1>SOUK AHAD</NomFR1> 
<NomFR2>KANTAOUI </NomFR2> 
<Prix>0.66</Prix> 
<IDTicket>26</IDTicket> 
<CodeRoute>107</CodeRoute> 
<origine>01</origine> 
<Distination>06</Distination> 
<Num>6</Num> 
<Ligne>107</Ligne> 
<requisition> </requisition> 
<voyage>0</voyage> 
<faveur> </faveur> 
</H_Ticket> 
<H_Ticket> 
<IDH_Ticket>26</IDH_Ticket> 
<CodeBus>186</CodeBus> 
<CodeCh>5531</CodeCh> 
<CodeConv>5531</CodeConv> 
<Codeligne>12</Codeligne> 
<Date>20150915</Date> 
<Heur>1110</Heur> 
<NomFR1>SOUK AHAD</NomFR1> 
<NomFR2>KANTAOUI </NomFR2> 
<Prix>0.66</Prix> 
<IDTicket>26</IDTicket> 
<CodeRoute>107</CodeRoute> 
<origine>01</origine> 
<Distination>06</Distination> 
<Num>6</Num> 
<Ligne>107</Ligne> 
<requisition> </requisition> 
<voyage>0</voyage> 
<faveur> </faveur> 
</H_Ticket>>  <H_Ticket> 
<IDH_Ticket>26</IDH_Ticket> 
<CodeBus>186</CodeBus> 
<CodeCh>5531</CodeCh> 
<CodeConv>5531</CodeConv> 
<Codeligne>12</Codeligne> 
<Date>20150915</Date> 
<Heur>1110</Heur> 
<NomFR1>SOUK AHAD</NomFR1> 
<NomFR2>KANTAOUI </NomFR2> 
<Prix>0.66</Prix> 
<IDTicket>26</IDTicket> 
<CodeRoute>107</CodeRoute> 
<origine>01</origine> 
<Distination>06</Distination> 
<Num>6</Num> 
<Ligne>107</Ligne> 
<requisition> </requisition> 
<voyage>0</voyage> 
<faveur> </faveur> 
</H_Ticket> 

這裏是我的logstash配置文件:

input { 
    beats { 
    port => 5044 
    } 
} 
filter 
{ 
    xml 
    { 
     source => "ticket" 
     xpath => 
     [ 
      "/ticket/IDH_Ticket/text()", "ticketId", 
      "/ticket/CodeBus/text()", "codeBus", 
      "/ticket/CodeCh/text()", "codeCh", 
      "/ticket/CodeConv/text()", "codeConv", 
      "/ticket/Codeligne/text()", "codeLigne", 
      "/ticket/Date/text()", "date", 
      "/ticket/Heur/text()", "heure", 
      "/ticket/NomFR1/text()", "nomFR1", 
      "/ticket/NomAR1/text()", "nomAR1", 
      "/ticket/NomFR2/text()", "nomFR2", 
      "/ticket/NomAR2/text()", "nomAR2", 
      "/ticket/Prix/text()", "prix", 
      "/ticket/IDTicket/text()", "idTicket", 
      "/ticket/CodeRoute/text()", "codeRoute", 
      "/ticket/origine/text()", "origine", 
      "/ticket/Distination/text()", "destination", 
      "/ticket/Num/text()", "num", 
      "/ticket/Ligne/text()", "ligne", 
      "/ticket/requisition/text()", "requisition", 
      "/ticket/voyage/text()", "voyage", 
      "/ticket/faveur/text()", "faveur" 
     ] 
     store_xml => true 
     target => "doc" 
    } 
} 

output 
{ 
    elasticsearch 
    { 
     hosts => "localhost" 
     index => "buses" 
     document_type => "ticket" 
    } 
    file { 
    path => "C:\busesdata\logstash.log" 
} 
stdout { codec =>rubydebug} 
} 

Filebeat配置:

filebeat: 
    # List of prospectors to fetch data. 
    prospectors: 
     paths: 
     - C:\busesdata\*.xml 
     input_type: log 
     document_type: ticket 
     scan_frequency: 10s 
     multiline: 
     pattern: '<H_Ticket' 
     negate: true 
     match: after 
output: 
    ### Logstash as output 
    logstash: 
    hosts: ["localhost:5044"] 
    index: filebeat 

這裏是b的一部分OTH stdout和文件輸出:

PS C:\logstash-2.3.3\bin> .\logstash -f .\logstash_temp.conf 
io/console not supported; tty will not be manipulated 
Settings: Default pipeline workers: 4 
Pipeline main started 

{ 
     "message" => "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<?xml-stylesheet href=\"ticket.xsl\" type=\"text/xsl\"?>\n<HF_DOCUMENT>", 
     "@version" => "1", 
    "@timestamp" => "2016-07-03T12:13:28.892Z", 
     "source" => "C:\\busesdata\\ticket2.xml", 
      "type" => "ticket", 
    "input_type" => "log", 
     "fields" => nil, 
      "beat" => { 
     "hostname" => "hp-pavillion-g6", 
      "name" => "hp-pavillion-g6" 
    }, 
     "offset" => 0, 
     "count" => 1, 
      "host" => "hp-pavillion-g6", 
      "tags" => [ 
     [0] "beats_input_codec_plain_applied" 
    ] 
} 
{ 
     "message" => "\t<H_Ticket>\r\n\t\t<IDH_Ticket>1</IDH_Ticket>\r\n\t\t<CodeBus>186</CodeBus>\r\n\t\t<CodeCh>5531</CodeCh>\r\n\t\t<CodeConv>5531</CodeConv>\r\n\t\t<Codeligne>12</Codeligne>\r\n\t\t<Date>20150903</Date>\r\n\t\t<Heur>1101</Heur>\r\n\t\t<NomFR1>SOUK AHAD</NomFR1>\r\n\t\t<NomAR1>??? ?????</NomAR1>\r\n\t\t<NomFR2>SOVIVA </NomFR2>\r\n\t\t<NomAR2>??????</NomAR2>\r\n\t\t<Prix>0.66</Prix>\r\n\t\t<IDTicket>1</IDTicket>\r\n\t\t<CodeRoute>107</CodeRoute>\r\n\t\t<origine>01</origine>\r\n\t\t<Distination>07</Distination>\r\n\t\t<Num>3</Num>\r\n\t\t<Ligne>107</Ligne>\r\n\t\t<requisition> </requisition>\r\n\t\t<voyage>0</voyage>\r\n\t\t<faveur> </faveur>\r\n\t</H_Ticket>", 
     "@version" => "1", 
    "@timestamp" => "2016-07-03T12:13:28.892Z", 
    "input_type" => "log", 
     "source" => "C:\\busesdata\\ticket2.xml", 
     "offset" => 125, 
      "type" => "ticket", 
     "count" => 1, 
     "fields" => nil, 
      "beat" => { 
     "hostname" => "hp-pavillion-g6", 
      "name" => "hp-pavillion-g6" 
    }, 
      "host" => "hp-pavillion-g6", 
      "tags" => [ 
     [0] "beats_input_codec_plain_applied" 
    ] 
} 
+0

可以粘貼的'logstash',使'標準輸出{編解碼器=> ruby​​debug輸出}'? – Arpit

+0

我認爲這是一個映射的問題,在ES中手動設置類型映射並再次嘗試後,Logstash沒有向ES發送任何數據......我很確定這是一個過濾問題:/ –

回答

0

XML過濾器不會因爲源配置點,做了場工作不存在。
有文檔中沒有字段ticket

{ 
    "message" => "\t<H_Ticket>\r\n\t\t<IDH_Ticket>1</IDH_Ticket>\r\n\t\t<CodeBus>186</CodeBus>\r\n\t\t<CodeCh>5531</CodeCh>\r\n\t\t<CodeConv>5531</CodeConv>\r\n\t\t<Codeligne>12</Codeligne>\r\n\t\t<Date>20150903</Date>\r\n\t\t<Heur>1101</Heur>\r\n\t\t<NomFR1>SOUK AHAD</NomFR1>\r\n\t\t<NomAR1>??? ?????</NomAR1>\r\n\t\t<NomFR2>SOVIVA </NomFR2>\r\n\t\t<NomAR2>??????</NomAR2>\r\n\t\t<Prix>0.66</Prix>\r\n\t\t<IDTicket>1</IDTicket>\r\n\t\t<CodeRoute>107</CodeRoute>\r\n\t\t<origine>01</origine>\r\n\t\t<Distination>07</Distination>\r\n\t\t<Num>3</Num>\r\n\t\t<Ligne>107</Ligne>\r\n\t\t<requisition> </requisition>\r\n\t\t<voyage>0</voyage>\r\n\t\t<faveur> </faveur>\r\n\t</H_Ticket>", 
    "@version" => "1", 
    "@timestamp" => "2016-07-03T12:13:28.892Z", 
    "input_type" => "log", 
    "source" => "C:\\busesdata\\ticket2.xml", 
    "offset" => 125, 
    "type" => "ticket", 
    "count" => 1, 
    "fields" => nil, 
    "beat" => { 
     "hostname" => "hp-pavillion-g6", 
     "name" => "hp-pavillion-g6" 
    }, 
    "host" => "hp-pavillion-g6", 
    "tags" => [ 
     [0] "beats_input_codec_plain_applied" 
    ] 
} 

你應該XML過濾器更改爲:

xml { 
     source => "message" 
     ... 
} 
+0

非常感謝,它的工作!但我仍然有兩個問題:首先,字段內容是這樣一個數組:「ticketId」:[「28」]第二,我如何可以應用映射來指示每個字段類型和格式,就像我在ES中所做的:PUT/transtu { 「映射」:{ 「票」:{ 「動態」: 「嚴格」, 「屬性」:{ 「ticketId」:{ 「類型」: 「整數」}, 「codeBus」: {「type」:「integer」}, 「date」:{ 「type」:「date」, 「format」:「basic_date」 }等... –

+0

對於數組問題,您必須問另一個問題。映射:根據數據類型,Elasticsearch在將數據插入索引時自動生成映射。因此,如果logstash中的數據類型正確,映射將是正確的。您必須在logstash過濾器中進行轉換(參見http://stackoverflow.com/questions/38006656/logstash-converting-string-to-an-integer以將字符串轉換爲int)。您也可以在Elasticsearch中使用索引模板,但數據仍然必須是正確的類型。 – baudsp

+0

我使用mutate過濾器修復了它 –

2

你可以嘗試編輯在filter,如下xpath配置:

filter 
{ 
    xml 
    { 
     source => "ticket" 
     xpath => 
     [ 
      "/IDH_Ticket/text()", "ticketId", 
      "/CodeBus/text()", "codeBus", 
      "/CodeCh/text()", "codeCh", 
      "/CodeConv/text()", "codeConv", 
      "/Codeligne/text()", "codeLigne", 
      "/Date/text()", "date", 
      "/Heur/text()", "heure", 
      "/NomFR1/text()", "nomFR1", 
      "/NomAR1/text()", "nomAR1", 
      "/NomFR2/text()", "nomFR2", 
      "/NomAR2/text()", "nomAR2", 
      "/Prix/text()", "prix", 
      "/IDTicket/text()", "idTicket", 
      "/CodeRoute/text()", "codeRoute", 
      "/origine/text()", "origine", 
      "/Distination/text()", "destination", 
      "/Num/text()", "num", 
      "/Ligne/text()", "ligne", 
      "/requisition/text()", "requisition", 
      "/voyage/text()", "voyage", 
      "/faveur/text()", "faveur" 
     ] 
     store_xml => true 
     target => "doc" 
    } 
} 
+0

這樣做不工作,我得到了相同的輸出,我想我可以使用Grok模式來實現這個功能,你有什麼想法可以在這種情況下使用Grok模式嗎? –