我正在尋找允許訪問歷史記錄服務器中的「Streaming」選項卡中提供的Spark Streaming Statistics的API。Spark Streaming Statistics的API
我主要興趣在批處理時間價值,但它通過REST API不直接提供至少根據文檔: https://spark.apache.org/docs/latest/monitoring.html#rest-api
任何想法如何獲取各種信息,如在「流「標籤或正在運行的歷史服務器中的作業
我正在尋找允許訪問歷史記錄服務器中的「Streaming」選項卡中提供的Spark Streaming Statistics的API。Spark Streaming Statistics的API
我主要興趣在批處理時間價值,但它通過REST API不直接提供至少根據文檔: https://spark.apache.org/docs/latest/monitoring.html#rest-api
任何想法如何獲取各種信息,如在「流「標籤或正在運行的歷史服務器中的作業
與驅動程序節點上的Spark UI相同的端口上有一個度量端點可用。 http://<host>:<sparkUI-port>/metrics/json/
流相關的指標有一個在他們的名字.StreamingMetrics
:從本地測試工作
樣品:
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingDelay: {
value: 30
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingEndTime: {
value: 1498124090031
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingStartTime: {
value: 1498124090001
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_schedulingDelay: {
value: 1
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_submissionTime: {
value: 1498124090000
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_totalDelay: {
value: 31
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastReceivedBatch_processingEndTime: {
value: 1498124090031
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastReceivedBatch_processingStartTime: {
value: 1498124090001
}
爲了得到我們需要diff的局地StreamingMetrics.streaming.lastCompletedBatch_processingEndTime - StreamingMetrics.streaming.lastCompletedBatch_processingStartTime
由於Spark 2.2.0於7月份發佈,在您的文章發佈一個月後,我想您的鏈接指向:spark 2.1.0。顯然,REST API已經擴展到Spark Streaming,請參閱spark 2.2.0。
因此,如果您仍有可能更新Spark版本,我建議您這樣做。然後您可以接收來自所有批次的數據:
/applications/[app-id]/streaming/batches