2016-12-18 74 views
1

我正在使用JAVA代碼嘗試使用Hadoop 2.6中的MapReduce程序。我試圖引用Stack Overflow上的其他帖子,但未能調試我的代碼。MapReduce Hadoop運行時字符串異常

首先讓我描述的記錄類型: 子ID = 00001111911128052627towerid = 11232w34532543456345623453456984756894756bytes = 122112212212212218.4621702216543667E17 子ID = 00001111911128052639towerid = 11232w34532543456345623453456984756894756bytes = 122112212212212219.6726312167218586E17 子ID = 00001111911128052615towerid = 11232w34532543456345623453456984756894756bytes = 122112212212212216.9431647633139046E17 子ID = 00001111911128052615towerid = 11232w34532543456345623453456984756894756bytes = 122112212212212214.7836041833447418E17

現在Mapper類:AircelMapper.class

import java.io.IOException; 
import java.lang.String; 
import java.lang.Long; 
import org.apache.hadoop.mapreduce.*; 
import org.apache.hadoop.io.*; 
public class AircelMapper extends Mapper<LongWritable,Text,Text, LongWritable> 
{ 

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException 
    { 

     String acquire=value.toString(); 
     String st=acquire.substring(81, 84); 

     LongWritable bytes=new LongWritable(Long.parseLong(st)); 
     context.write(new Text(acquire.substring(6, 26)), bytes); 
    } 
} 

現在驅動程序類:AircelDriver.class

import java.io.IOException; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Mapper; 
import org.apache.hadoop.mapreduce.Reducer; 

public class AircelDriver 
{ 
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException 
    { 
     if(args.length<2) 
     { System.out.println(" type ip and op file correctly"); 
      System.exit(-1); 
     } 


     Job job = Job.getInstance(); 

     job.setJobName(" @@@@@@@@@@@@@@@  MY FIRST PROGRAM  @@@@@@@@@@@@@@@"); 


     job.setJarByClass(AircelDriver.class); 
     job.setOutputKeyClass(Text.class); 
     job.setOutputValueClass(LongWritable.class); 
     FileInputFormat.setInputPaths(job, new Path(args[0])); 
     FileOutputFormat.setOutputPath(job, new Path(args[1])); 

     job.setInputFormatClass(TextInputFormat.class); 
     job.setOutputFormatClass(TextOutputFormat.class); 
     job.setMapperClass(AircelMapper.class); 
     job.setReducerClass(AircelReducer.class); 
     job.submit(); 
     job.waitForCompletion(true); 

    } 
} 

我沒有張貼減速類,因爲這個問題在運行時是在映射器代碼。 Hadoop的運行時的輸出如下(基本上是作業失敗的指示):

16/12/18 04:11:00 INFO mapred.LocalJobRunner: Starting task: attempt_local1618565735_0001_m_000000_0 
16/12/18 04:11:01 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 
16/12/18 04:11:01 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 
16/12/18 04:11:01 INFO mapred.MapTask: Processing split: hdfs://quickstart.cloudera:8020/practice/Data_File.txt:0+1198702 
16/12/18 04:11:01 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 
16/12/18 04:11:01 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 
16/12/18 04:11:01 INFO mapred.MapTask: soft limit at 83886080 
16/12/18 04:11:01 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 
16/12/18 04:11:01 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 
16/12/18 04:11:01 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 
16/12/18 04:11:01 INFO mapreduce.Job: Job job_local1618565735_0001 running in uber mode : false 
16/12/18 04:11:01 INFO mapreduce.Job: map 0% reduce 0% 
16/12/18 04:11:02 INFO mapred.MapTask: Starting flush of map output 
16/12/18 04:11:02 INFO mapred.MapTask: Spilling map output 
16/12/18 04:11:02 INFO mapred.MapTask: bufstart = 0; bufend = 290000; bufvoid = 104857600 
16/12/18 04:11:02 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26174400(104697600); length = 39997/6553600 
16/12/18 04:11:03 INFO mapred.MapTask: Finished spill 0 
16/12/18 04:11:03 INFO mapred.LocalJobRunner: map task executor complete. 
16/12/18 04:11:03 WARN mapred.LocalJobRunner: job_local1618565735_0001 
****java.lang.Exception: **java.lang.StringIndexOutOfBoundsException: String index out of range: 84****** 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 84 
    at java.lang.String.substring(String.java:1907) 
    at AircelMapper.map(AircelMapper.java:13) 
    at AircelMapper.map(AircelMapper.java:1) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
    at java.util.concurrent.FutureTask.run(Fut 

爲什麼它給字符串索引出界的異常? String類在內部對字符串的大小有限制嗎?我不明白Mapper類中第13-15行的問題。

+0

試'acquire.substring(81,84-1);'索引從0開始,並進入'string.length減() -1' –

+0

但是如果你看看它跨過110的每個記錄長度,那麼如果程序將整行作爲字符串,嘗試訪問84應該不會給出錯誤,對吧?爲什麼它限制了字符串的大小? –

+0

只是嘗試打印您字符串長度的代碼,因爲錯誤說'字符串索引超出範圍:84'意味着字符串不夠長 –

回答

0

IndexOutOfBoundsException - 如果beginIndex是負數,或者endIndex大於此String對象的長度,或者beginIndex大於endIndex。

公衆的StringIndexOutOfBoundsException(INT指數) 構造一個新的StringIndexOutOfBoundsException類指示違法指數參數。 - 84(你的情況)

公共的StringIndexOutOfBoundsException(字符串或多個) 構造帶指定詳細信息一個的StringIndexOutOfBoundsException。 - 陣列超出範圍(你的情況)

檢查輸入在索引84

+0

先生,我知道異常是什麼意思,但正如我已經評論過的,在運行時如我所嘗試的那樣,映射階段打印所有輸入記錄的長度,然後開始處理,這實際上很奇怪,因爲mapreduce框架使得Mapper類處理每個輸入記錄,這在我的情況中不會發生。我的意思是這是什麼,一些奇怪的錯誤。你能請你的同事在這裏幫我一把。 –