將HDFS數據流式傳輸到Storm（又名HDFS噴口）

我想知道是否有任何將HDFS數據流式傳輸到Storm中的噴口實現（類似於HDFS中的Spark Streaming）。我知道有將數據寫入HDFS（https://github.com/ptgoetz/storm-hdfs和http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_user-guide/content/ch_storm-using-hdfs-connector.html）的螺栓實現方法，但是我找不到其他方法。我很欣賞任何建議和提示。將HDFS數據流式傳輸到Storm（又名HDFS噴口）

來源

2015-05-14 florins

一個選項是使用Hadoop HDFS Java API。假設你正在使用maven，你將包括Hadoop的共同在你的pom.xml：

<dependency> 
    <groupId>org.apache.hadoop</groupId> 
    <artifactId>hadoop-common</artifactId> 
    <version>2.6.0.2.2.0.0-2041</version> 
</dependency>

然後，在你的嘴實現你會使用HDFS文件系統對象。例如，下面是一些僞代碼，用於以字符串形式發送文件中的每行：

@Override 
public void nextTuple() { 
    try { 
     Path pt=new Path("hdfs://servername:8020/user/hdfs/file.txt"); 
     FileSystem fs = FileSystem.get(new Configuration()); 
     BufferedReader br = new BufferedReader(new InputStreamReader(fs.open(pt))); 
     String line = br.readLine(); 
     while (line != null){ 
     System.out.println(line); 
     line=br.readLine(); 
     // emit the line which was read from the HDFS file 
     // _collector is a private member variable of type SpoutOutputCollector set in the open method; 
     _collector.emit(new Values(line)); 
     } 
    } catch (Exception e) { 
     _collector.reportError(e); 
     LOG.error("HDFS spout error {}", e); 
    } 
}

來源

2015-05-14 20:13:24

謝謝Kit！這確實是單個文件逐個流式化元組的解決方案。怎麼樣批量元組（還是風暴三叉戟）的噴口？ – florins

@florins自己並沒有嘗試三叉戟，但它看起來像你會實現[IBatchSpout]（https://nathanmarz.github.io/storm/doc/storm/trident/spout/IBatchSpout.html），然後你的代碼會去在emitBatch而不是nextTuple中。 –

將HDFS數據流式傳輸到Storm（又名HDFS噴口）

回答

相關問題