0
我使用文件作爲Spark流,我想計算流中的單詞,但應用程序不打印任何內容,這是我的代碼。我使用Scala的上Cloudera的環境使用filstream的Spark流wordcount不打印結果
import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext
object TwitterHashtagStreaming {
def main(args: Array[String]) : Unit = {
val conf = new SparkConf().setAppName("TwitterHashtagStreaming").setMaster("local[2]").set("spark.executor.memory","1g");
val streamingC = new StreamingContext(conf,Seconds(5))
val streamLines = streamingC.textFileStream("file:///home/cloudera/Desktop/wordstream")
val words = streamLines.flatMap(_.split(" "))
val counts = words.map(word => (word, 1)).reduceByKey(_ + _)
counts.print()
streamingC.start()
streamingC.awaitTermination()
}
}
什麼是打印?任何錯誤? –
不,只是時間,好像計數是空的 -------------------------------------- ----- Time:1506415275000 ms --------------------------------------- ---- –
首先嚐試在進行字數統計之前打印streamLines,以確保數據是否已被讀取。 –