火花流文件流

我使用火花流編程，但在scala中遇到了一些麻煩。我想使用的功能StreamingContext.fileStream火花流文件流

這個函數的定義是這樣的：

def fileStream[K, V, F <: InputFormat[K, V]](directory: String)(implicit arg0: ClassManifest[K], arg1: ClassManifest[V], arg2: ClassManifest[F]): DStream[(K, V)]

創建監視新文件在Hadoop兼容的文件系統，並使用它們讀取一個輸入流給出鍵值類型和輸入格式。文件名稱以。被忽略。 ķ 關鍵型讀取HDFS文件 V 值類型讀取HDFS文件 ˚F 輸入格式讀取HDFS文件目錄 HDFS目錄監視新文件

我不知道怎麼打發鍵和值的類型。我的火花流代碼：

val ssc = new StreamingContext(args(0), "StreamingReceiver", Seconds(1), 
    System.getenv("SPARK_HOME"), Seq("/home/mesos/StreamingReceiver.jar")) 

// Create a NetworkInputDStream on target ip:port and count the 
val lines = ssc.fileStream("/home/sequenceFile")

Java代碼編寫Hadoop的文件：

public class MyDriver { 

private static final String[] DATA = { "One, two, buckle my shoe", 
     "Three, four, shut the door", "Five, six, pick up sticks", 
     "Seven, eight, lay them straight", "Nine, ten, a big fat hen" }; 

public static void main(String[] args) throws IOException { 
    String uri = args[0]; 
    Configuration conf = new Configuration(); 
    FileSystem fs = FileSystem.get(URI.create(uri), conf); 
    Path path = new Path(uri); 
    IntWritable key = new IntWritable(); 
    Text value = new Text(); 
    SequenceFile.Writer writer = null; 
    try { 
     writer = SequenceFile.createWriter(fs, conf, path, key.getClass(), 
       value.getClass()); 
     for (int i = 0; i < 100; i++) { 
      key.set(100 - i); 
      value.set(DATA[i % DATA.length]); 
      System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, 
        value); 
      writer.append(key, value); 
     } 
    } finally { 
     IOUtils.closeStream(writer); 
    } 
}

}

來源

2013-05-15 user2384993

你看到哪些問題？你有編譯錯誤嗎？如果是這樣，他們是什麼？當你運行你的代碼時，你會得到錯誤/意外的行爲嗎？如果您提供了更多的背景知識，您發現哪些錯誤/意外行爲更有可能得到有用的答案。 – cmbaxter

如果你想使用fileStream，你將必須提供所有3調用它時會輸入參數。在調用它之前，您需要知道您的Key，Value和InputFormat類型。如果你的類型是LongWritable，Text和TextInputFormat，你會叫fileStream像這樣：

val lines = ssc.fileStream[LongWritable, Text, TextInputFormat]("/home/sequenceFile")

如果這些3種碰巧是你的類型，那麼你可能想使用textFileStream，而不是因爲它不需要任何類型params和代表fileStream使用我提到的那3種類型。使用應該是這樣的：

val lines = ssc.textFileStream("/home/sequenceFile")

來源

2013-05-15 12:23:23 cmbaxter

嘿，我正在嘗試做同樣的事情，但與二進制文件，我已經按照這裏的指示，不幸的是它不工作。請你能提出一些建議嗎？ https://stackoverflow.com/questions/45778016/reading-binaryfile-with-spark-streaming – MaatDeamon

val filterF = new Function[Path, Boolean] { 
    def apply(x: Path): Boolean = { 
     val flag = if(x.toString.split("/").last.split("_").last.toLong < System.currentTimeMillis) true else false 
     return flag 
    } 
} 

val streamed_rdd = ssc.fileStream[LongWritable, Text, TextInputFormat]("/user/hdpprod/temp/spark_streaming_input",filterF,false).map(_._2.toString).map(u => u.split('\t'))

來源

2016-10-31 19:00:44

火花流文件流

回答

相關問題