使用字符串分割在Hadoop中映射失敗

下面代碼寫在Hadoop中的映射：發生在s[1]使用字符串分割在Hadoop中映射失敗

String[] s = value.toString().split("\\s+"); 
String date = s[1];

錯誤，ArrayIndexOutofBoundsException。

是否正則表達式不能在hadoop中工作？

來源

2014-11-24 ming

如果什麼'value.toString（）分裂（「\ s +「）'只產生一個字符串？或者沒有？什麼是價值？ – Lucas 2014-11-24 12:44:00

我嘗試了沒有hadoop，它使用split創建了28個元素（\\ s +）。例如，輸入「63891 20130101 5.102 -86.61 32.85 12.8 9.6 11.2 11.6 19.4 -9999.00 U -9999.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0 -99.000 -99.000 -99.000 -99.000 -99.000 -9999.0 -9999.0 -9999.0 -9999.0 - 9999.0「 – ming 2014-11-24 12:45:08

在hdfs中檢查你的輸入文件，並且有一些行只有一個字符串，沒有任何發言空間或標籤。也可以通過在映射器中捕獲該錯誤來打印該行，僅用於調試目的。 – SMA 2014-11-24 12:45:19

這是由於空白或空格出現爲一行，您必須對其進行過濾。

if(s.length>1){ 
String[] s = value.toString().split("\\s+"); 
String date = s[1]; 
}

你的問題的解決方案

//地圖功能：

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, DoubleWritable> { 
    // private final static IntWritable one = new IntWritable(1); 
    //private Text word = new Text(); 
     double temp; 

    public void map(LongWritable key, Text value, OutputCollector<Text, DoubleWritable> output, Reporter reporter) throws IOException { 
     String line = value.toString(); 
     line=line.replaceAll("U",""); 
     int a=line.length(); 
     if(a>2) 
     { 
      int spec=line.indexOf(' '); 
      String s=line.substring(spec,spec+9); 
      String b=line.substring(spec+10,a); 

     StringTokenizer tokenizer = new StringTokenizer(b); 
     while (tokenizer.hasMoreTokens()) { 
     { 
      temp=Double.valueOf(tokenizer.nextToken().toString()); 
     } 
     output.collect(new Text(s), new DoubleWritable(temp)); 
     } 
    } 
    } 
    }

// Reduce函數：

public static class Reduce extends MapReduceBase implements Reducer<Text, DoubleWritable, Text, DoubleWritable> { 
    public void reduce(Text key, Iterator<DoubleWritable> values, OutputCollector<Text, DoubleWritable> output, Reporter reporter) throws IOException { 

     Double maxValue = Double.MIN_VALUE; 
     Double minvalue=Double.MAX_VALUE; 
     Double a; 
     while (values.hasNext()) 
     { 
     a=values.next().get(); 
     maxValue = Math.max(maxValue,a); 
     minvalue=Math.min(minvalue,a); 
    if(maxValue>40) 
     { 
      output.collect(key,new DoubleWritable(maxValue)); 
     } 
    /* if(minvalue<10) 
     { 
     output.collect(key, new DoubleWritable(a)); 
     } */ 

     } 
    output.collect(new Text(key+"Max"), new DoubleWritable(maxValue)); 
    output.collect(new Text(key+"Min"),new DoubleWritable(minvalue)); 
    } 
    }

來源

2014-11-24 13:33:33

真的很感謝你的回答。 – ming 2014-11-24 14:34:07

使用字符串分割在Hadoop中映射失敗

回答

相關問題