2015-02-08 33 views
1

我使用Hadoop的版本是亨利馬烏在行動:06章:維基百科作業失敗java.lang.ArrayIndexOutOfBoundsException

$ hadoop version 
Hadoop 2.5.0-cdh5.2.0 
Subversion http://github.com/cloudera/hadoop -r e1f20a08bde76a33b79df026d00a0c91b2298387 
Compiled by jenkins on 2014-10-11T21:00Z 
Compiled with protoc 2.5.0 
From source with checksum 309bccd135b199bdfdd6df5f3f4153d 
This command was run using /DCNFS/applications/cdh/5.2/app/hadoop-2.5.0-cdh5.2.0/share/hadoop/common/hadoop-common-2.5.0-cdh5.2.0.jar 

我input.txt中看起來像

$ hadoop dfs -cat input/input.txt | head -5 
DEPRECATED: Use of this script to execute hdfs command is deprecated. 
Instead use the hdfs command for it. 

1: 1664968 
2: 3 747213 1664968 1691047 4095634 5535664 
3: 9 77935 79583 84707 564578 594898 681805 681886 835470 880698 1109091 1125108 1279972 1463445 1497566 1783284 1997564 2006526 2070954 2250217 2268713 2276203 2374802 2571397 2640902 2647217 2732378 2821237 3088028 3092827 3211549 3283735 3491412 3492254 3498305 3505664 3547201 3603437 3617913 3793767 3907547 4021634 4025897 4086017 4183126 4184025 4189168 4192731 4395141 4899940 4987592 4999120 5017477 5149173 5149311 5158741 5223097 5302153 5474252 5535280 
4: 145 
5: 8 57544 58089 60048 65880 284186 313376 564578 717529 729993 1097284 1204280 1204407 1255317 1670218 1720928 1850305 2269887 2333350 2359764 2640693 2743982 3303009 3322952 3492254 3573013 3721693 3797343 3797349 3797359 3849461 4033556 4173124 4189215 4207986 4669945 4817900 4901416 5010479 5062062 5072938 5098953 5292042 5429924 5599862 5599863 5689049 

和我的用戶。 TXT看起來像

$ hadoop dfs -cat input/users.txt 
DEPRECATED: Use of this script to execute hdfs command is deprecated. 
Instead use the hdfs command for it. 

3: 9 77935 79583 84707 564578 594898 681805 681886 835470 880698 1109091 
1125108 1279972 1463445 1497566 1783284 1997564 2006526 2070954 2250217 
2268713 2276203 2374802 2571397 2640902 2647217 2732378 2821237 3088028 
3092827 3211549 3283735 3491412 3492254 3498305 3505664 3547201 3603437 
3617913 3793767 3907547 4021634 4025897 4086017 4183126 4184025 4189168 
4192731 4395141 4899940 4987592 4999120 5017477 5149173 5149311 5158741 
5223097 5302153 5474252 5535280 

我跑我的工作作爲

$ hadoop jar mahout-core-0.9-cdh5.2.0-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile input/users.txt --booleanData -s SIMILARITY_COOCCURRENCE 

和失敗與以下跟蹤

15/02/07 16:48:44 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --maxPrefsInItemSimilarity=[500], --maxPrefsPerUser=[10], --maxSimilaritiesPerItem=[100], --minPrefsPerUser=[1], --numRecommendations=[10], --similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], --tempDir=[temp], --usersFile=[input/users.txt]} 
15/02/07 16:48:44 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[input/input.txt], --minPrefsPerUser=[1], --output=[temp/preparePreferenceMatrix], --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]} 
15/02/07 16:48:44 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 
15/02/07 16:48:44 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress 
15/02/07 16:48:44 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 
15/02/07 16:48:44 INFO client.RMProxy: Connecting to ResourceManager at name1.hadoop.dc.engr.scu.edu/10.128.0.201:8032 
15/02/07 16:48:45 INFO input.FileInputFormat: Total input paths to process : 1 
15/02/07 16:48:45 INFO mapreduce.JobSubmitter: number of splits:8 
15/02/07 16:48:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1422500076160_0023 
15/02/07 16:48:46 INFO impl.YarnClientImpl: Submitted application application_1422500076160_0023 
15/02/07 16:48:46 INFO mapreduce.Job: The url to track the job: http://name1.hadoop.dc.engr.scu.edu:8088/proxy/application_1422500076160_0023/ 
15/02/07 16:48:46 INFO mapreduce.Job: Running job: job_1422500076160_0023 
15/02/07 16:48:56 INFO mapreduce.Job: Job job_1422500076160_0023 running in uber mode : false 
15/02/07 16:48:56 INFO mapreduce.Job: map 0% reduce 0% 
15/02/07 16:49:02 INFO mapreduce.Job: Task Id : attempt_1422500076160_0023_m_000006_0, Status : FAILED 
Error: java.lang.ArrayIndexOutOfBoundsException: 1 
    at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50) 
    at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 

15/02/07 16:49:02 INFO mapreduce.Job: Task Id : attempt_1422500076160_0023_m_000001_0, Status : FAILED 
Error: java.lang.ArrayIndexOutOfBoundsException: 1 
    at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50) 
    at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 

我相信,數據格式不正確,可有一個人請幫我解決這個問題?我是新來MapReduceHadoop

非常感謝

enter image description here

+0

堆棧跟蹤提到數組,但沒有代碼片段很難說爲什麼會有錯誤。 – 2015-02-08 04:59:58

回答

0

我不上這個項目工作了,這本書是不支持在這個階段。但是,您似乎是在原始輸入上運行此項工作,而不是在將此格式解析爲標準格式後,使用您在本書中看到的自定義映射器。

+0

我以爲'RecommenderJob'正在做這件事 – daydreamer 2015-02-08 20:09:41

+0

不,它預計輸入是用戶,項目,評級格式。這不是維基百科的數據。 6.3.2中的代碼是最初的翻譯。 – 2015-02-08 20:33:36

+0

我很困惑,根據書(附上圖)的圖,'RecommenderJob'似乎具有所有必需的映射器和縮減器,現在既然不是這樣,我是否需要運行'WikipediaToItemPrefsMapper'和'WikipediaToUserVectorReducer'?並將輸出提供給'RecommenderJob'?請幫助 – daydreamer 2015-02-09 21:57:10