0
CSV文件由用於用戶偏好布爾數據(用戶ID,項ID)的。 該文件由預處理器檢查是否存在任何不一致。我也手動檢查過,數據似乎是一致且正確的格式。 有兩點需要注意: - 如果只有一個輸入文件,以Hadoop的工作就是說作業永遠不會失敗,如果所有的偏好導出到一個CSV與沒有重複的條目(用戶ID,商品ID) - 的作業隨機失敗,其中hadoop目錄中有多個csv文件,這些文件是用戶首選項的初始轉儲以及用戶首選項的每日增量文件。Mahout的項目相似工作隨機失敗,數組索引越界異常對CSV數據?
如果CSV數據是一致的,正確的整個工作不應該失敗,異常的ArrayIndexOutOfBounds。如果跨越增量文件的(userid,itemid)有重複條目,那麼作業可能會失敗。由於布爾偏好,許多這些條目在幾個增量文件中是重複的。
日誌似乎並不輸出造成錯誤數據位。這裏是日誌:
2012-08-09 15:03:22,652 INFO org.apache.hadoop.mapred.JobInProgress: job_201208021510_0221: nMaps=2 nReduces=1 max=-1
2012-08-09 15:03:22,652 INFO org.apache.hadoop.mapred.JobTracker: Job job_201208021510_0221 added successfully for user 'deploy' to queue 'default'
2012-08-09 15:03:22,652 INFO org.apache.hadoop.mapred.AuditLogger: USER=deploy IP=127.0.0.1 OPERATION=SUBMIT_JOB TARGET=job_201208021510_0221 RESULT=SUCCESS
2012-08-09 15:03:22,652 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201208021510_0221
2012-08-09 15:03:22,653 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201208021510_0221
2012-08-09 15:03:23,023 INFO org.apache.hadoop.mapred.JobInProgress: jobToken generated and stored with users keys in /zenius/hadoop/tmp/mapred/system/job_201208021510_0221/jobToken
2012-08-09 15:03:23,027 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201208021510_0221 = 56518256. Number of splits = 2
2012-08-09 15:03:23,027 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208021510_0221_m_000000 has split on node:/default-rack/localhost
2012-08-09 15:03:23,028 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208021510_0221_m_000001 has split on node:/default-rack/localhost
2012-08-09 15:03:23,028 INFO org.apache.hadoop.mapred.JobInProgress: job_201208021510_0221 LOCALITY_WAIT_FACTOR=1.0
2012-08-09 15:03:23,028 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201208021510_0221 initialized successfully with 2 map tasks and 1 reduce tasks.
2012-08-09 15:03:25,787 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_SETUP) 'attempt_201208021510_0221_m_000003_0' to tip task_201208021510_0221_m_000003, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:31,794 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201208021510_0221_m_000003_0' has completed task_201208021510_0221_m_000003 successfully.
2012-08-09 15:03:31,795 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000000_0' to tip task_201208021510_0221_m_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:31,796 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000000
2012-08-09 15:03:31,796 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000001_0' to tip task_201208021510_0221_m_000001, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:31,796 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000001
2012-08-09 15:03:37,800 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201208021510_0221_m_000001_0' has completed task_201208021510_0221_m_000001 successfully.
2012-08-09 15:03:37,801 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201208021510_0221_r_000000_0' to tip task_201208021510_0221_r_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:49,807 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201208021510_0221_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2012-08-09 15:03:52,810 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a failed task task_201208021510_0221_m_000000
2012-08-09 15:03:52,810 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000000_1' to tip task_201208021510_0221_m_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:52,810 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000000
2012-08-09 15:03:52,810 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201208021510_0221_m_000000_0'
2012-08-09 15:04:14,603 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201208021510_0221_m_000000_1: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2012-08-09 15:04:17,606 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a failed task task_201208021510_0221_m_000000
2012-08-09 15:04:17,607 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000000_2' to tip task_201208021510_0221_m_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:04:17,607 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000000
2012-08-09 15:04:17,607 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201208021510_0221_m_000000_1'
2012-08-09 15:04:35,618 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201208021510_0221_m_000000_2: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2012-08-09 15:04:38,621 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a failed task task_201208021510_0221_m_000000
2012-08-09 15:04:38,621 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000000_3' to tip task_201208021510_0221_m_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:04:38,621 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000000
2012-08-09 15:04:38,621 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201208021510_0221_m_000000_2'
2012-08-09 15:04:56,632 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201208021510_0221_m_000000_3: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.TaskInProgress: TaskInProgress task_201208021510_0221_m_000000 has failed 4 times.
2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.JobInProgress: TaskTracker at 'localhost' turned 'flaky'
2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.JobInProgress: Aborting job job_201208021510_0221
2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.JobInProgress: Killing job 'job_201208021510_0221'
2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_CLEANUP) 'attempt_201208021510_0221_m_000002_0' to tip...