2016-09-02 18 views
2

我是Hadoop初學者。我的設置:RHEL7,hadoop-2.7.3Hadoop字數統計示例 - 空指針異常

我試圖運行Example:_WordCount_v2.0。我只是將源代碼複製到新的eclipse項目並將其導出到wc.jar文件。

現在,我已將hadoop Pseudo-Distributed Operation配置爲鏈接中的鏈接。然後,我開始通過以下:

在輸入目錄中創建輸入文件:

echo "Hello World, Bye World!" > input/file01 
echo "Hello Hadoop, Goodbye to hadoop." > input/file02 

開始ENV:

sbin/start-dfs.sh 
bin/hdfs dfs -mkdir /user 
bin/hdfs dfs -mkdir /user/<username> 
bin/hdfs dfs -put input input 
bin/hadoop jar ws.jar WordCount2 input output 

,這是我得到了什麼:

16/09/02 13:15:01 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 
16/09/02 13:15:01 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 
16/09/02 13:15:01 INFO input.FileInputFormat: Total input paths to process : 2 
16/09/02 13:15:01 INFO mapreduce.JobSubmitter: number of splits:2 
16/09/02 13:15:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local455553963_0001 
16/09/02 13:15:01 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 
16/09/02 13:15:01 INFO mapreduce.Job: Running job: job_local455553963_0001 
16/09/02 13:15:01 INFO mapred.LocalJobRunner: OutputCommitter set in config null 
16/09/02 13:15:01 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 
16/09/02 13:15:01 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 
16/09/02 13:15:02 INFO mapred.LocalJobRunner: Waiting for map tasks 
16/09/02 13:15:02 INFO mapred.LocalJobRunner: Starting task: attempt_local455553963_0001_m_000000_0 
16/09/02 13:15:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 
16/09/02 13:15:02 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 
16/09/02 13:15:02 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/aii/input/file02:0+33 
16/09/02 13:15:02 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 
16/09/02 13:15:02 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 
16/09/02 13:15:02 INFO mapred.MapTask: soft limit at 83886080 
16/09/02 13:15:02 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 
16/09/02 13:15:02 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 
16/09/02 13:15:02 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 
16/09/02 13:15:02 INFO mapred.MapTask: Starting flush of map output 
16/09/02 13:15:02 INFO mapred.LocalJobRunner: Starting task: attempt_local455553963_0001_m_000001_0 
16/09/02 13:15:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 
16/09/02 13:15:02 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 
16/09/02 13:15:02 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/aii/input/file01:0+24 
16/09/02 13:15:02 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 
16/09/02 13:15:02 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 
16/09/02 13:15:02 INFO mapred.MapTask: soft limit at 83886080 
16/09/02 13:15:02 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 
16/09/02 13:15:02 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 
16/09/02 13:15:02 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 
16/09/02 13:15:02 INFO mapred.MapTask: Starting flush of map output 
16/09/02 13:15:02 INFO mapred.LocalJobRunner: map task executor complete. 
16/09/02 13:15:02 WARN mapred.LocalJobRunner: job_local455553963_0001 
java.lang.Exception: java.lang.NullPointerException 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
Caused by: java.lang.NullPointerException 
    at WordCount2$TokenizerMapper.setup(WordCount2.java:47) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
16/09/02 13:15:02 INFO mapreduce.Job: Job job_local455553963_0001 running in uber mode : false 
16/09/02 13:15:02 INFO mapreduce.Job: map 0% reduce 0% 
16/09/02 13:15:02 INFO mapreduce.Job: Job job_local455553963_0001 failed with state FAILED due to: NA 
16/09/02 13:15:02 INFO mapreduce.Job: Counters: 0 

無結果(輸出)被給出。爲什麼我得到這個異常?

感謝

編輯:

感謝解決方案建議我已經意識到,還有第二次嘗試(在例子的wordCount):

echo "\." > patterns.txt 
echo "\," >> patterns.txt 
echo "\!" >> patterns.txt 
echo "to" >> patterns.txt 

,然後運行:

bin/hadoop jar ws.jar WordCount2 -Dwordcount.case.sensitive=true input output -skip patterns.txt 

和一切都是工作gr吃!

回答

2

問題發生在映射器的setup()方法中。這個wordcount示例比平常更高級一些,並允許您指定包含映射器將過濾出的模式的文件。該文件被添加到main()方法中的緩存中,以便它可在每個節點上供映射器打開。

你可以看到文件被添加到緩存中main()

for (int i=0; i < remainingArgs.length; ++i) { 
    if ("-skip".equals(remainingArgs[i])) { 
     job.addCacheFile(new Path(remainingArgs[++i]).toUri()); 
     job.getConfiguration().setBoolean("wordcount.skip.patterns", true); 
    } else { 
     otherArgs.add(remainingArgs[i]); 
    } 
} 

你沒有指定-skip選項,這樣就不會嘗試添加任何東西。如果添加文件,您可以看到它設置爲wordcount.skip.patternstrue

setup()你有這樣的代碼映射器:

@Override 
public void setup(Context context) throws IOException, InterruptedException { 
    conf = context.getConfiguration(); 
    caseSensitive = conf.getBoolean("wordcount.case.sensitive", true); 
    if (conf.getBoolean("wordcount.skip.patterns", true)) { 
     URI[] patternsURIs = Job.getInstance(conf).getCacheFiles(); 
     for (URI patternsURI : patternsURIs) { 
      Path patternsPath = new Path(patternsURI.getPath()); 
      String patternsFileName = patternsPath.getName().toString(); 
      parseSkipFile(patternsFileName); 
     } 
    } 
} 

問題是這樣的檢查conf.getBoolean("wordcount.skip.patterns", true)默認true如果沒有設置,並在你的情況下,它不會是。因此patternsURIs或其他地方(我沒有行號)將爲空。

因此,您可以將wordcount.case.sensitive更改爲默認值false,在驅動程序(主方法)中將其設置爲false或提供跳過文件來修復它。

+0

非常感謝!你可以在我的問題中看到編輯 - 我從你的答案中找出它:-) – ItayB

1

問題可能是你的這部分代碼:

caseSensitive = conf.getBoolean("wordcount.case.sensitive", true); 
if (conf.getBoolean("wordcount.skip.patterns", true)) { 
    URI[] patternsURIs = Job.getInstance(conf).getCacheFiles(); 
    for (URI patternsURI : patternsURIs) { 
     Path patternsPath = new Path(patternsURI.getPath()); 
     String patternsFileName = patternsPath.getName().toString(); 
     parseSkipFile(patternsFileName); 
    } 
} 

這裏getCacheFiles()被以任何理由返回null。這就是爲什麼當你試圖迭代patternsURIs(它只有null)時,你會得到例外。

爲了解決這個問題,開始循環前請檢查patternsURIs是否爲空。

if(patternsURIs != null) { 
    for (URI patternsURI : patternsURIs) { 
     Path patternsPath = new Path(patternsURI.getPath()); 
     String patternsFileName = patternsPath.getName().toString(); 
     parseSkipFile(patternsFileName); 
    } 
} 

您也應該檢查你爲什麼越來越null,如果不希望它得到null

+0

非常感謝!我會嘗試這種改變,但我想在深入代碼之前運行該示例。我會讓你更新。你可以在我的問題中看到編輯 - 我知道了:-) – ItayB