2014-11-24 60 views
0

下面代碼寫在Hadoop中的映射:發生在s[1]使用字符串分割在Hadoop中映射失敗

String[] s = value.toString().split("\\s+"); 
String date = s[1]; 

錯誤,ArrayIndexOutofBoundsException

是否正則表達式不能在hadoop中工作?

+1

如果什麼'value.toString()分裂(「\ s +「)'只產生一個字符串?或者沒有?什麼是價值? – Lucas 2014-11-24 12:44:00

+0

我嘗試了沒有hadoop,它使用split創建了28個元素(\\ s +)。例如,輸入「63891 20130101 5.102 -86.61 32.85 12.8 9.6 11.2 11.6 19.4 -9999.00 U -9999.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0 -99.000 -99.000 -99.000 -99.000 -99.000 -9999.0 -9999.0 -9999.0 -9999.0 - 9999.0「 – ming 2014-11-24 12:45:08

+0

在hdfs中檢查你的輸入文件,並且有一些行只有一個字符串,沒有任何發言空間或標籤。也可以通過在映射器中捕獲該錯誤來打印該行,僅用於調試目的。 – SMA 2014-11-24 12:45:19

回答

0

這是由於空白或空格出現爲一行,您必須對其進行過濾。

if(s.length>1){ 
String[] s = value.toString().split("\\s+"); 
String date = s[1]; 
} 

你的問題的解決方案

//地圖功能:

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, DoubleWritable> { 
    // private final static IntWritable one = new IntWritable(1); 
    //private Text word = new Text(); 
     double temp; 

    public void map(LongWritable key, Text value, OutputCollector<Text, DoubleWritable> output, Reporter reporter) throws IOException { 
     String line = value.toString(); 
     line=line.replaceAll("U",""); 
     int a=line.length(); 
     if(a>2) 
     { 
      int spec=line.indexOf(' '); 
      String s=line.substring(spec,spec+9); 
      String b=line.substring(spec+10,a); 

     StringTokenizer tokenizer = new StringTokenizer(b); 
     while (tokenizer.hasMoreTokens()) { 
     { 
      temp=Double.valueOf(tokenizer.nextToken().toString()); 
     } 
     output.collect(new Text(s), new DoubleWritable(temp)); 
     } 
    } 
    } 
    } 

// Reduce函數:

public static class Reduce extends MapReduceBase implements Reducer<Text, DoubleWritable, Text, DoubleWritable> { 
    public void reduce(Text key, Iterator<DoubleWritable> values, OutputCollector<Text, DoubleWritable> output, Reporter reporter) throws IOException { 

     Double maxValue = Double.MIN_VALUE; 
     Double minvalue=Double.MAX_VALUE; 
     Double a; 
     while (values.hasNext()) 
     { 
     a=values.next().get(); 
     maxValue = Math.max(maxValue,a); 
     minvalue=Math.min(minvalue,a); 
    if(maxValue>40) 
     { 
      output.collect(key,new DoubleWritable(maxValue)); 
     } 
    /* if(minvalue<10) 
     { 
     output.collect(key, new DoubleWritable(a)); 
     } */ 

     } 
    output.collect(new Text(key+"Max"), new DoubleWritable(maxValue)); 
    output.collect(new Text(key+"Min"),new DoubleWritable(minvalue)); 
    } 
    } 
+0

真的很感謝你的回答。 – ming 2014-11-24 14:34:07