解析Apache Spark中的XML數據

我需要知道如何解析Spark中的XML文件。我正在接收來自kafka的流式數據，然後需要解析流式數據。解析Apache Spark中的XML數據

這裏是我的星火代碼接收數據：

directKafkaStream.foreachRDD(rdd ->{ 
      rdd.foreach(s ->{ 
       System.out.println("&&&&&&&&&&&&&&&&&" +s._2); 
      });

和結果：

<root> 
<student> 
<name>john</name> 
<marks>90</marks> 
</student> 
</root>

如何通過這些XML元素？

來源

2016-09-26 user6325753

您是否搜索過此前的問題？如：http://stackoverflow.com/questions/33078221/xml-processing-in-spark –

@Binary Nerd，謝謝你的迴應。我的火花應用程序正在逐行讀取數據。所以我需要逐行解析，而不使用開始元素和/或結束元素。 – user6325753

謝謝你們..問題就迎刃而解了。這裏是解決方案。

String xml = "<name>xyz</name>"; 
DOMParser parser = new DOMParser(); 
try { 
    parser.parse(new InputSource(new java.io.StringReader(xml))); 
    Document doc = parser.getDocument(); 
    String message = doc.getDocumentElement().getTextContent(); 
    System.out.println(message); 
} catch (Exception e) { 
    // handle SAXException 
}

來源

2016-09-26 13:13:17 user6325753

這是否適用於Spark中的大數據？ –

@MasudRahman，請看下面提到的鏈接https://stackoverflow.com/questions/33078221/xml-processing-in-spark/40653300#40653300 – user6325753

在處理流式數據時，使用databricks的spark-xml lib進行xml數據處理會很有幫助。

參考：https://github.com/databricks/spark-xml

來源

2016-09-26 08:18:00

感謝您的回覆。我的火花應用程序正在逐行讀取數據。所以我需要逐行解析，而不使用開始元素和/或結束元素。 – user6325753

我花了幾個小時與此，然後我發現它不讀取自閉行。 –

解析Apache Spark中的XML數據

回答

相關問題