2015-05-03 80 views
0

花了2天的時間之後,我就這麼失望了。一切似乎都好。當我在它上面運行較小的數據集時,它工作正常,但是當我在其上運行40GB的主數據集時,它總是失敗並出現此錯誤。數據集正常,沒有非法字符。Hadoop錯誤FAILED java.lang.NumberFormatException:空字符串變得差不多完成

15/05/02 21:12:42 INFO mapred.JobClient: map 100% reduce 96% 
15/05/02 21:12:43 INFO mapred.JobClient: map 100% reduce 97% 
15/05/02 21:12:45 INFO mapred.JobClient: map 100% reduce 98% 
15/05/02 21:12:47 INFO mapred.JobClient: map 100% reduce 99% 
15/05/02 21:12:52 INFO mapred.JobClient: map 100% reduce 100% 
15/05/02 21:12:52 INFO mapred.JobClient: Job complete: job_201505011756_0013 
15/05/02 21:12:52 INFO mapred.JobClient: Counters: 30 
15/05/02 21:12:52 INFO mapred.JobClient: Map-Reduce Framework 
15/05/02 21:12:52 INFO mapred.JobClient:  Spilled Records=295830048 
15/05/02 21:12:52 INFO mapred.JobClient:  Map output materialized bytes=4511435075 
15/05/02 21:12:52 INFO mapred.JobClient:  Reduce input records=147915024 
15/05/02 21:12:52 INFO mapred.JobClient:  Virtual memory (bytes) snapshot=1973084037120 
15/05/02 21:12:52 INFO mapred.JobClient:  Map input records=1479169548 
15/05/02 21:12:52 INFO mapred.JobClient:  SPLIT_RAW_BYTES=109140 
15/05/02 21:12:52 INFO mapred.JobClient:  Map output bytes=4215470387 
15/05/02 21:12:52 INFO mapred.JobClient:  Reduce shuffle bytes=4511435075 
15/05/02 21:12:52 INFO mapred.JobClient:  Physical memory (bytes) snapshot=268727762944 
15/05/02 21:12:52 INFO mapred.JobClient:  Map input bytes=68433542634 
15/05/02 21:12:52 INFO mapred.JobClient:  Reduce input groups=1020 
15/05/02 21:12:52 INFO mapred.JobClient:  Combine output records=0 
15/05/02 21:12:52 INFO mapred.JobClient:  Reduce output records=147915024 
15/05/02 21:12:52 INFO mapred.JobClient:  Map output records=147915024 
15/05/02 21:12:52 INFO mapred.JobClient:  Combine input records=0 
15/05/02 21:12:52 INFO mapred.JobClient:  CPU time spent (ms)=1611510 
15/05/02 21:12:52 INFO mapred.JobClient:  Total committed heap usage (bytes)=209235476480 
15/05/02 21:12:52 INFO mapred.JobClient: File Input Format Counters 
15/05/02 21:12:52 INFO mapred.JobClient:  Bytes Read=68500323818 
15/05/02 21:12:52 INFO mapred.JobClient: FileSystemCounters 
15/05/02 21:12:52 INFO mapred.JobClient:  HDFS_BYTES_READ=68500432958 
15/05/02 21:12:52 INFO mapred.JobClient:  FILE_BYTES_WRITTEN=9105249650 
15/05/02 21:12:52 INFO mapred.JobClient:  FILE_BYTES_READ=4511300789 
15/05/02 21:12:52 INFO mapred.JobClient:  HDFS_BYTES_WRITTEN=3623810291 
15/05/02 21:12:52 INFO mapred.JobClient: File Output Format Counters 
15/05/02 21:12:52 INFO mapred.JobClient:  Bytes Written=3623810291 
15/05/02 21:12:52 INFO mapred.JobClient: Job Counters 
15/05/02 21:12:52 INFO mapred.JobClient:  Launched map tasks=1033 
15/05/02 21:12:52 INFO mapred.JobClient:  Launched reduce tasks=24 
15/05/02 21:12:52 INFO mapred.JobClient:  SLOTS_MILLIS_REDUCES=2505921 
15/05/02 21:12:52 INFO mapred.JobClient:  Total time spent by all reduces waiting after reserving slots (ms)=0 
15/05/02 21:12:52 INFO mapred.JobClient:  SLOTS_MILLIS_MAPS=2009059 
15/05/02 21:12:52 INFO mapred.JobClient:  Total time spent by all maps waiting after reserving slots (ms)=0 
15/05/02 21:12:52 INFO mapred.JobClient:  Data-local map tasks=1033 
15/05/02 21:12:52 INFO operations.Sampler: resultSize: 4215470387 
15/05/02 21:12:52 INFO operations.Sampler: resultCount: 147915024 
15/05/02 21:12:52 INFO operations.Sampler: MapReduce return 0.02487447197431825 of 147915024 records 
15/05/02 21:12:52 INFO mapred.FileInputFormat: No block filter specified 
15/05/02 21:12:52 INFO mapred.FileInputFormat: Total input paths to process : 22 
15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.10:50010 
    15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.5:50010 
    15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.7:50010 
    15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.4:50010 
    15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.13:50010 
    15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.14:50010 
    15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.11:50010 
    15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.9:50010 
    15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.8:50010 
    15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.6:50010 
    15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.2:50010 
    15/05/02 21:12:52 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.3:50010 
    15/05/02 21:12:53 INFO mapred.JobClient: Running job: job_201505011756_0014 
    15/05/02 21:12:54 INFO mapred.JobClient: map 0% reduce 0% 
    15/05/02 21:13:02 INFO mapred.JobClient: Task Id : attempt_201505011756_0014_m_000000_0, Status : FAILED 
    java.lang.NumberFormatException: empty String 
    at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842) 
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) 
at java.lang.Double.parseDouble(Double.java:538) 
at edu.umn.cs.spatialHadoop.io.TextSerializerHelper.consumeDouble(TextSerializerHelper.java:182) 
at edu.umn.cs.spatialHadoop.core.Rectangle.fromText(Rectangle.java:276) 
at edu.umn.cs.spatialHadoop.core.STPRect.fromText(STPRect.java:41) 
at edu.umn.cs.spatialHadoop.operations.Sampler$Map.map(Sampler.java:122) 
at edu.umn.cs.spatialHadoop.operations.Sampler$Map.map(Sampler.java:69) 
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) 
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) 
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) 
at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:422) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) 
at org.apache.hadoop.mapred.Child.main(Child.java:249) 

15/05/02 21:13:02 INFO mapred.JobClient: Task Id : attempt_201505011756_0014_m_000002_0, Status : FAILED 
    java.lang.NumberFormatException: empty String 
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842) 
    at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) 
    at java.lang.Double.parseDouble(Double.java:538) 
at edu.umn.cs.spatialHadoop.io.TextSerializerHelper.consumeDouble(TextSerializerHelpe r.java:182) 
at edu.umn.cs.spatialHadoop.core.Rectangle.fromText(Rectangle.java:276) 
at edu.umn.cs.spatialHadoop.core.STPRect.from 

和數據集是這樣的:

32714,13271400,132704,13271400,132704 
    132715,13271500,132716,13271500,132716 
    132716,13271600,132717,13271600,132717 
    132717,13271700,132718,13271700,132718 
    132718,13271800,132719,13271800,132719 
    132719,13271900,132709,13271900,132709 
    132720,13272000,132721,13272000,132721 
    132721,13272100,132722,13272100,132722 
    132722,13272200,132723,13272200,132723 
    132723,13272300,132724,13272300,132724 
    132724,13272400,132725,13272400,132725 
    132725,13272500,132726,13272500,132726 
    132726,13272600,132727,13272600,132727 
    132727,13272700,132728,13272700,132728 
    132728,13272800,132729,13272800,132729 
    132729,13272900,132730,13272900,132730 

任何想法?請幫忙。謝謝

回答

0

我的解決方案是檢查當前輸入是否是雙倍。如果沒有記錄並移動到下一個輸入。

//assuming current input is called input and your doubles are >1 
if (input.matches("[\\d]+\\.*[\\d]*") 
{ 
    //process normally 
} 
else 
{ 
    //log and continuecontinue 
} 

或者,你可以趕上NumberFormatException

try 
{ 
    double d = Double.parseDouble(input) 
    //process normally 
} 
catch(NumberFormatException e) 
{ 
    //log and continue to next input 
} 
相關問題