2015-08-13 27 views
1

在Hadoop 2.7.1上運行的Scalding驅動的作業中,ArrayIndexOutOfBounds非常奇怪。映射器日誌轉儲在下面。它看起來像赤道莫名其妙地被設置爲一個負數在第二次泄漏。這是正常的嗎?

2015-08-12 23:39:19,649 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 1 
2015-08-12 23:39:20,174 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 469762044(1879048176) 
2015-08-12 23:39:20,175 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 1792 
2015-08-12 23:39:20,175 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 187904816 
2015-08-12 23:39:20,175 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 1879048192 
2015-08-12 23:39:20,175 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 469762044; length = 117440512 
2015-08-12 23:39:20,214 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 
2015-08-12 23:39:20,216 INFO [main] cascading.flow.hadoop.FlowMapper: cascading version: 2.6.1 
2015-08-12 23:39:20,216 INFO [main] cascading.flow.hadoop.FlowMapper: child jvm opts: -Xmx1024m -Djava.io.tmpdir=./tmp 
2015-08-12 23:39:20,516 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 
2015-08-12 23:39:20,552 INFO [main] cascading.flow.hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['docId', 'otherDocId', 'score']]"][9909013673/_pipe_11__pipe_12/] 
2015-08-12 23:39:20,552 INFO [main] cascading.flow.hadoop.FlowMapper: sinking to: GroupBy(_pipe_11+_pipe_12)[by:[ 
{1} 
:'docId']] 
2015-08-12 23:39:29,424 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output 
2015-08-12 23:39:29,424 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 108647886; bufvoid = 1879048192 
2015-08-12 23:39:29,424 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 469762044(1879048176); kvend = 449947816(1799791264); length = 19814229/117440512 
2015-08-12 23:39:29,425 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 839953118 kvi 209988272(839953088) 
2015-08-12 23:39:43,985 INFO [SpillThread] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.gz] 
2015-08-12 23:39:46,767 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 0 
2015-08-12 23:39:46,767 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator 839953118 kv 209988272(839953088) kvi 178264648(713058592) 
2015-08-12 23:39:46,767 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output 
2015-08-12 23:39:46,767 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 839953118; bufend = 1014433072; bufvoid = 1879048192 
2015-08-12 23:39:46,767 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 209988272(839953088); kvend = 178264648(713058592); length = 31723625/117440512 
2015-08-12 23:39:46,767 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 1696670336 kvi 424167580(1696670320) 
2015-08-12 23:40:22,641 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 1 
2015-08-12 23:40:22,641 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator 1696670336 kv 424167580(1696670320) kvi 392768808(1571075232) 
2015-08-12 23:40:22,641 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output 
2015-08-12 23:40:22,641 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 1696670336; bufend = 1869363604; bufvoid = 1879048192 
2015-08-12 23:40:22,641 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 424167580(1696670320); kvend = 392768808(1571075232); length = 31398773/117440512 
2015-08-12 23:40:22,642 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) -1742031900 kvi 34254072(137016288) 
2015-08-12 23:40:47,329 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 2 
2015-08-12 23:40:47,330 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator -1742031900 kv 34254072(137016288) kvi 34254072(137016288) 
2015-08-12 23:40:47,331 ERROR [main] cascading.flow.stream.TrapHandler: caught Throwable, no trap available, rethrowing 
cascading.flow.stream.DuctException: internal error: ['7541904654925238223', '2.812180059539485'] 
at cascading.flow.hadoop.stream.HadoopGroupByGate.receive(HadoopGroupByGate.java:81) 
at cascading.flow.hadoop.stream.HadoopGroupByGate.receive(HadoopGroupByGate.java:37) 
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:80) 
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145) 
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133) 
at cascading.operation.Identity$2.operate(Identity.java:137) 
at cascading.operation.Identity.operate(Identity.java:150) 
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99) 
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39) 
at cascading.flow.stream.SourceStage.map(SourceStage.java:102) 
at cascading.flow.stream.SourceStage.run(SourceStage.java:58) 
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:130) 
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:415) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
Caused by: java.lang.ArrayIndexOutOfBoundsException 
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1453) 
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1349) 
at java.io.DataOutputStream.write(DataOutputStream.java:88) 
at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) 
at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273) 
at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253) 
at cascading.tuple.hadoop.io.HadoopTupleOutputStream.writeIntInternal(HadoopTupleOutputStream.java:155) 
at cascading.tuple.io.TupleOutputStream.write(TupleOutputStream.java:86) 
at cascading.tuple.io.TupleOutputStream.writeTuple(TupleOutputStream.java:64) 
at cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:37) 
at cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:28) 
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1149) 
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610) 
at cascading.tap.hadoop.util.MeasuredOutputCollector.collect(MeasuredOutputCollector.java:69) 
at cascading.flow.hadoop.stream.HadoopGroupByGate.receive(HadoopGroupByGate.java:68) 
... 18 more 
+2

對於未來的讀者:嘗試將「mapreduce.task.io.sort.mb」縮減爲較小的值。 – Minutis

+0

我在Hadoop中遇到了同樣的問題,但我的工作從豬身上被踢開了。很奇怪。我會嘗試減少排序內存,如你所說@Minutis – WattsInABox

+1

實際上,我的堆棧看起來並不像它發生在排序完成後合併兩個排序的部分。 'org.apache.hadoop.mapred.Merger:合併2個排序的段。''org.apache.hadoop.mapred.Merger:直到最後一個合併通道,剩下兩個段的總大小:116274字節'mapred。 YarnChild:異常正在運行的孩子:java.lang.ArrayIndexOutOfBoundsException' – WattsInABox

回答

0

我懷疑是線程問題,所以我嘗試了下面的工作。不知道治療是否會堅持。

<property> 
<name>mapreduce.map.sort.spill.percent</name> 
<value>0.8</value> 
</property> 

<property> 
<name>mapreduce.task.io.sort.factor</name> 
<value>10</value> 
</property> 

<property> 
<name>mapreduce.task.io.sort.mb</name> 
<value>100</value> 
</property> 

<property> 
<name>mapred.map.multithreadedrunner.threads</name> 
<value>1</value> 
</property> 

<property> 
<name>mapreduce.mapper.multithreadedmapper.threads</name> 
<value>1</value> 
</property> 
+0

您能解釋一下您在屬性中設置的值以及您對結果的可疑影響嗎?我把他們全都看了一遍,並且知道這些價值是什麼意思 - 但是哪些變化以及爲什麼獲得了您的結果? –

相關問題