2013-07-19 19 views
0

所以我寫了兩個Perl腳本來練習Map Reduce。該程序應該計算我放入目錄中的一堆文本文件中的所有單詞。爲什麼我的減速機失敗? (Hadoop)

這是我mapper.pl

#!/usr/bin/perl 

use 5.010; 
use strict; 
use warnings; 

while(my $line = <>) { 
    my @words = split(' ', $line); 

    foreach my $word(@words) { 
     print "$word \t 1\n"; 
    } 
} 

這是我reducer.pl

#!/bin/usr/perl 

use 5.010; 
use warnings; 

my $currentWord = ""; 
my $currentCount = 0; 

##Use this block for testing the reduce script with some test data. 
#Open the test file 
#open(my $fh, "<", "testdata.txt"); 
#while(!eof $fh) {} 

while(my $line = <>) { 
    #Remove the \n 
    chomp $line; 

    #Index 0 is the word, index 1 is the count value 
    my @lineData = split('\t', $line); 
    my $word = $lineData[0]; 
    my $count = $lineData[1]; 

    if($currentWord eq $word) { 
     $currentCount = $currentCount + $count; 
    } else { 
     if($currentWord ne "") { 
      #Output the key we're finished working with 
      print "$currentWord \t $currentCount \n"; 
     } 
     #Switch the current variables over to the next key 
     $currentCount = $count; 
     $currentWord = $word; 
    } 
} 

#deal with the last loop 
print "$currentWord \t $currentCount \n"; 

所以當我運行這些使用Hadoop的流命令:

bin/hadoop jar contrib/streaming/hadoop-streaming-1.1.2.jar -file /home/hduser/countWords/mapper.pl -mapper /home/hduser/countWords/mapper.pl -file /home/hduser/countWords/reducer.pl -reducer /home/hduser/countWords/reducer.pl -input /user/hduser/testData/* -output /user/hduser/testData/output/* 

我得到出現以下錯誤:

13/07/19 11:36:33 INFO streaming.StreamJob: map 0% reduce 0% 
13/07/19 11:36:39 INFO streaming.StreamJob: map 9% reduce 0% 
13/07/19 11:36:40 INFO streaming.StreamJob: map 64% reduce 0% 
13/07/19 11:36:41 INFO streaming.StreamJob: map 73% reduce 0% 
13/07/19 11:36:44 INFO streaming.StreamJob: map 82% reduce 0% 
13/07/19 11:36:45 INFO streaming.StreamJob: map 100% reduce 0% 
13/07/19 11:36:49 INFO streaming.StreamJob: map 100% reduce 11% 
13/07/19 11:36:53 INFO streaming.StreamJob: map 100% reduce 0% 
13/07/19 11:37:02 INFO streaming.StreamJob: map 100% reduce 17% 
13/07/19 11:37:03 INFO streaming.StreamJob: map 100% reduce 33% 
13/07/19 11:37:06 INFO streaming.StreamJob: map 100% reduce 17% 
13/07/19 11:37:08 INFO streaming.StreamJob: map 100% reduce 0% 
13/07/19 11:37:16 INFO streaming.StreamJob: map 100% reduce 33% 
13/07/19 11:37:21 INFO streaming.StreamJob: map 100% reduce 0% 
13/07/19 11:37:31 INFO streaming.StreamJob: map 100% reduce 33% 
13/07/19 11:37:35 INFO streaming.StreamJob: map 100% reduce 17% 
13/07/19 11:37:38 INFO streaming.StreamJob: map 100% reduce 100% 
13/07/19 11:37:38 INFO streaming.StreamJob: To kill this job, run: 
13/07/19 11:37:38 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=shiv0:54311 -kill job_201307031312_0065 
13/07/19 11:37:38 INFO streaming.StreamJob: Tracking URL: http://shiv0:50030/jobdetails.jsp?jobid=job_201307031312_0065 
13/07/19 11:37:38 ERROR streaming.StreamJob: Job not successful. Error: # of failed Reduce Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201307031312_0065_r_000001 
13/07/19 11:37:38 INFO streaming.StreamJob: killJob... Streaming Command Failed! 

我一直在試圖弄清楚我現在做錯了一段時間,現在我一直在抓我的頭。任何人對我如何診斷這個問題有任何建議?

+2

你的命令行顯示'mapper.py'和'reducer.py'。 – innaM

+0

對不起,我剛修好了。我試圖運行我在網上找到的同一個程序的python版本 – asaji

回答

0

我非常愚蠢的錯誤.. reducer.pl的shbang線是不正確的。我有

#!/bin/usr/perl 

,而不是

#!/usr/bin/perl 
0

bin/hadoop jar contrib/streaming/hadoop-streaming-1.1.2.jar -file /home/hduser/countWords/mapper.py -mapper /home/hduser/countWords/mapper.py -file/home/hduser/countWords/reducer.py -reducer /home/hduser/countWords/reducer.py -input/user/hduser/testData/* -output/user/hduser/testData/output/*

你爲什麼打電話。 py文件?你不應該叫perl文件,即reducer.pl而不是reducer.py

+0

是的,那是我的錯。我試圖運行一個字數統計的應用程序,我發現在python在線,看看是否會運行。我把它固定在上面 – asaji