1
我正在學習Spark,通過學習Spark中的一些示例:Lightning Fast Data Analysis,然後添加自己的開發。RDD.saveAsTextFile之後的空文件是什麼?
我創建了這個類來查看基本轉換和操作。
/**
* Find errors in a log file
*/
package com.oreilly.learningsparkexamples.mini.java;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
public class FindErrors {
public static void main(String args[]){
String inputFile = args[0];
String outputFile = args[1];
//Create a Spark context
SparkConf conf = new SparkConf().setAppName("findErrors");
JavaSparkContext sc = new JavaSparkContext(conf);
//Load input data
JavaRDD<String> input = sc.textFile(inputFile);
//Split up into words
JavaRDD<String> errorsRDD = input.filter(
new Function<String, Boolean>() {
public Boolean call(String x) {
return x.contains("error");
}
});
//Transform into word and count
//errorsRDD.saveAsTextFile(outputFile);
JavaRDD<String> warningsRDD = input.filter(
new Function<String, Boolean>() {
public Boolean call(String x) {
return x.contains("warning");
}
});
JavaRDD<String> badLinesRDD = errorsRDD.union(warningsRDD);
badLinesRDD.saveAsTextFile(outputFile);
System.out.println("I had " + badLinesRDD.count() + " concerning lines.");
System.out.println("Here are 10 examples:");
for(String line: badLinesRDD.take(10)){
System.out.println(line);
}
}
}
這是我用來運行它的命令:
$SPARK_HOME/bin/spark-submit --class com.oreilly.learningsparkexamples.mini.java.FindErrors ./target/learning-spark-mini-example-0.0.1.jar ../files/fake_logs/log1.log ./errorLog
這裏的日誌文件的內容:
66.249.69.97 - - [24/Sep/2014:22:25:44 +0000] "GET /071300/242153 HTTP/1.1" 404 514 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
71.19.157.174 - - [24/Sep/2014:22:26:12 +0000] "GET /error HTTP/1.1" 404 505 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
71.19.157.174 - - [24/Sep/2014:22:26:12 +0000] "GET /favicon.ico HTTP/1.1" 200 1713 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
71.19.157.174 - - [24/Sep/2014:22:26:37 +0000] "GET/HTTP/1.1" 200 18785 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
71.19.157.174 - - [24/Sep/2014:22:26:37 +0000] "GET /jobmineimg.php?q=m HTTP/1.1" 200 222 "http://www.holdenkarau.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
71.19.157.175 - - [24/Sep/2014:22:26:12 +0000] "GET /error HTTP/1.1" 404 505 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
71.19.157.175 - - [24/Sep/2014:22:26:12 +0000] "GET /error HTTP/1.1" 404 505 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
71.19.157.174 - - [24/Sep/2014:22:26:37 +0000] "GET /jobmineimg.php?q=m HTTP/1.1" 200 222 "http://www.holdenkarau.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
71.19.157.175 - - [24/Sep/2014:22:26:12 +0000] "GET /warning HTTP/1.1" 404 505 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
71.19.157.175 - - [24/Sep/2014:22:26:12 +0000] "GET /warning HTTP/1.1" 404 505 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
一件事我注意到的是,輸出創建一些文件,而比我預期的一個文件。
的文件有:
_SUCCESS
part-00000
71.19.157.174 - - [24/Sep/2014:22:26:12 +0000] "GET /error HTTP/1.1" 404 505 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
71.19.157.175 - - [24/Sep/2014:22:26:12 +0000] "GET /error HTTP/1.1" 404 505 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
part-00001
71.19.157.175 - - [24/Sep/2014:22:26:12 +0000] "GET /error HTTP/1.1" 404 505 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
part-00002
part-00003
71.19.157.175 - - [24/Sep/2014:22:26:12 +0000] "GET /warning HTTP/1.1" 404 505 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
71.19.157.175 - - [24/Sep/2014:22:26:12 +0000] "GET /warning HTTP/1.1" 404 505 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
它看起來好像每個警告/錯誤的「分組」創建文件。什麼是空白文件雖然?
此外,這可能是我的代碼中,我還沒有找到的東西,或者它是一個星火的特徵?
乾杯user6910411。 – runnerpaul