2012-04-05 58 views
2

我有下面這個豬腳本,它完美地使用grunt shell(將結果存儲到HDFS時沒有任何問題);然而,如果我使用Java EmbeddedPig運行相同的腳本,最後一項工作(ORDER BY)失敗。如果我用其他人替換ORDER BY作業,例如GROUP或FOREACH GENERATE,那麼整個腳本將在Java EmbeddedPig中成功。所以我認爲這是導致問題的ORDER BY。任何人都有這方面的經驗?任何幫助,將不勝感激!使用Java運行EmbeddedPig時,ORDER BY作業在Pig腳本中失敗

豬八戒腳本:

REGISTER pig-udf-0.0.1-SNAPSHOT.jar; 
    user_similarity = LOAD '/tmp/sample-sim-score-results-31/part-r-00000' USING PigStorage('\t') AS (user_id: chararray, sim_user_id: chararray, basic_sim_score: float, alt_sim_score: float); 
    simplified_user_similarity = FOREACH user_similarity GENERATE $0 AS user_id, $1 AS sim_user_id, $2 AS sim_score; 
    grouped_user_similarity = GROUP simplified_user_similarity BY user_id; 
    ordered_user_similarity = FOREACH grouped_user_similarity {       
     sorted = ORDER simplified_user_similarity BY sim_score DESC; 
     top = LIMIT sorted 10; 
     GENERATE group, top; 
    }; 
    top_influencers = FOREACH ordered_user_similarity GENERATE com.aol.grapevine.similarity.pig.udf.AssignPointsToTopInfluencer($1, 10); 
    all_influence_scores = FOREACH top_influencers GENERATE FLATTEN($0); 
    grouped_influence_scores = GROUP all_influence_scores BY bag_of_topSimUserTuples::user_id; 
    influence_scores = FOREACH grouped_influence_scores GENERATE group AS user_id, SUM(all_influence_scores.bag_of_topSimUserTuples::points) AS influence_score; 
    ordered_influence_scores = ORDER influence_scores BY influence_score DESC; 
    STORE ordered_influence_scores INTO '/tmp/cc-test-results-1' USING PigStorage(); 

錯誤日誌從豬:

12/04/05 10:00:56 INFO pigstats.ScriptState: Pig script settings are added to the job 
12/04/05 10:00:56 INFO mapReduceLayer.JobControlCompiler: mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 
12/04/05 10:00:58 INFO mapReduceLayer.JobControlCompiler: Setting up single store job 
12/04/05 10:00:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
12/04/05 10:00:58 INFO mapReduceLayer.MapReduceLauncher: 1 map-reduce job(s) waiting for submission. 
12/04/05 10:00:58 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 
12/04/05 10:00:58 INFO input.FileInputFormat: Total input paths to process : 1 
12/04/05 10:00:58 INFO util.MapRedUtil: Total input paths to process : 1 
12/04/05 10:00:58 INFO util.MapRedUtil: Total input paths (combined) to process : 1 
12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Creating tmp-1546565755 in /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134-work-6955502337234509704 with rwxr-xr-x 
12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Cached hdfs://localhost/tmp/temp1725960134/tmp-1546565755#pigsample_854728855_1333645258470 as /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 
12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Cached hdfs://localhost/tmp/temp1725960134/tmp-1546565755#pigsample_854728855_1333645258470 as /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 
12/04/05 10:00:58 WARN mapred.LocalJobRunner: LocalJobRunner does not support symlinking into current working dir. 
12/04/05 10:00:58 INFO mapred.TaskRunner: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/pigsample_854728855_1333645258470 
12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.jar.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.jar.crc 
12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.split.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.split.crc 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.splitmetainfo.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.splitmetainfo.crc 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.xml.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.xml.crc 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.jar <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.jar 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.split <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.split 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.splitmetainfo <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.splitmetainfo 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.xml <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.xml 
12/04/05 10:00:59 INFO mapred.Task: Using ResourceCalculatorPlugin : null 
12/04/05 10:00:59 INFO mapred.MapTask: io.sort.mb = 100 
12/04/05 10:00:59 INFO mapred.MapTask: data buffer = 79691776/99614720 
12/04/05 10:00:59 INFO mapred.MapTask: record buffer = 262144/327680 
12/04/05 10:00:59 WARN mapred.LocalJobRunner: job_local_0004 
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/Users/cchuang/workspace/grapevine-rec/pigsample_854728855_1333645258470 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:139) 
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) 
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) 
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:560) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) 
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/Users/cchuang/workspace/grapevine-rec/pigsample_854728855_1333645258470 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37) 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) 
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153) 
    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:112) 
    ... 6 more 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Deleted path /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 
12/04/05 10:00:59 INFO mapReduceLayer.MapReduceLauncher: HadoopJobId: job_local_0004 
12/04/05 10:01:04 INFO mapReduceLayer.MapReduceLauncher: job job_local_0004 has failed! Stop running all dependent jobs 
12/04/05 10:01:04 INFO mapReduceLayer.MapReduceLauncher: 100% complete 
12/04/05 10:01:04 ERROR pigstats.PigStatsUtil: 1 map reduce job(s) failed! 
12/04/05 10:01:04 INFO pigstats.PigStats: Script Statistics: 

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 
0.20.2-cdh3u3 0.8.1-cdh3u3 cchuang 2012-04-05 10:00:34 2012-04-05 10:01:04 GROUP_BY,ORDER_BY 

Some jobs have failed! Stop running all dependent jobs 

Job Stats (time in seconds): 
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs 
job_local_0001 0 0 0 0 0 0 0 0 all_influence_scores,grouped_user_similarity,simplified_user_similarity,user_similarity GROUP_BY 
job_local_0002 0 0 0 0 0 0 0 0 grouped_influence_scores,influence_scores GROUP_BY,COMBINER 
job_local_0003 0 0 0 0 0 0 0 0 ordered_influence_scores SAMPLER 

Failed Jobs: 
JobId Alias Feature Message Outputs 
job_local_0004 ordered_influence_scores ORDER_BY Message: Job failed! Error - NA /tmp/cc-test-results-1, 

Input(s): 
Successfully read 0 records from: "/tmp/sample-sim-score-results-31/part-r-00000" 

Output(s): 
Failed to produce result in "/tmp/cc-test-results-1" 

Counters: 
Total records written : 0 
Total bytes written : 0 
Spillable Memory Manager spill count : 0 
Total bags proactively spilled: 0 
Total records proactively spilled: 0 

Job DAG: 
job_local_0001 -> job_local_0002, 
job_local_0002 -> job_local_0003, 
job_local_0003 -> job_local_0004, 
job_local_0004 


12/04/05 10:01:04 INFO mapReduceLayer.MapReduceLauncher: Some jobs have failed! Stop running all dependent jobs 
+0

嗨,黃,我在查詢中遇到了同樣的問題,甚至更簡單。您是否找到了解決方案或問題的根源?謝謝。 – AOvejero 2012-05-01 21:07:15

+0

我遇到了同樣的問題。這兩行必須與錯誤有關,但我不知道如何解決:-( ---'12/04/05 10:00:58警告mapred.LocalJobRunner:LocalJobRunner不支持符號鏈接到當前工作目錄' ---'輸入路徑不存在:file:/ Users/cchuang/workspace/grapevine-rec/pigsample_854728855_1333645258470' – 2013-02-20 16:54:23

+0

請參閱http://stackoverflow.com/questions/15983956/pig-order -command-失敗/ 18035754#18035754 – user2649134 2013-08-03 18:11:15

回答

0

確保PIG_HOME環境變量設置爲豬安裝。

相關問題