2013-07-26 47 views
0

我正在嘗試使用Elephantbird在豬中處理數據,但是我沒有成功加載數據。這裏是我的豬腳本:從HDFS中加載數據不適用於Elephantbird

register 'lib/elephant-bird-core-3.0.9.jar'; 
register 'lib/elephant-bird-pig-3.0.9.jar'; 
register 'lib/google-collections-1.0.jar'; 
register 'lib/json-simple-1.1.jar'; 

twitter = LOAD 'statuses.log.2013-04-01-00' 
      USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad'); 

DUMP twitter; 

我得到的輸出是

[main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:48:21 
[main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop1/twitter_test/pig_1374834826168.log 
[main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop1/.pigbootup not found 
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.hadoop:8020 
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master.hadoop:8021 
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 
[main] WARN org.apache.pig.backend.hadoop23.PigJobControl - falling back to default JobControl (not using hadoop 0.23 ?) 
java.lang.NoSuchFieldException: jobsInProgress 
    at java.lang.Class.getDeclaredField(Class.java:1938) 
    at org.apache.pig.backend.hadoop23.PigJobControl.<clinit>(PigJobControl.java:58) 
    at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:102) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:285) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177) 
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1266) 
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251) 
    at org.apache.pig.PigServer.storeEx(PigServer.java:933) 
    at org.apache.pig.PigServer.store(PigServer.java:900) 
    at org.apache.pig.PigServer.openIterator(PigServer.java:813) 
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696) 
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320) 
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) 
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) 
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) 
    at org.apache.pig.Main.run(Main.java:604) 
    at org.apache.pig.Main.main(Main.java:157) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208) 
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=656085089 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job6015425922938886053.jar 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job6015425922938886053.jar created 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 
[JobControl] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 
[JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 
[JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 5 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201307261031_0050 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases twitter 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: twitter[10,10] C: R: 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://master.hadoop:50030/jobdetails.jsp?jobid=job_201307261031_0050 
[main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201307261031_0050 has failed! Stop running all dependent jobs 
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected 
[main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 
[main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 
2.0.0-cdh4.3.0 0.11.0-cdh4.3.0 hadoop1 2013-07-26 12:33:48 2013-07-26 12:34:23 UNKNOWN 

Failed! 

Failed Jobs: 
JobId Alias Feature Message Outputs 
job_201307261031_0050 twitter MAP_ONLY Message: Job failed! hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504, 

Input(s): 
Failed to read data from "hdfs://master.hadoop:8020/user/hadoop1/statuses.log.2013-04-01-00" 

Output(s): 
Failed to produce result in "hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504" 

Counters: 
Total records written : 0 
Total bytes written : 0 
Spillable Memory Manager spill count : 0 
Total bags proactively spilled: 0 
Total records proactively spilled: 0 

Job DAG: 
job_201307261031_0050 


[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected 
Details at logfile: /home/hadoop1/twitter_test/pig_1374834826168.log 

該文件存在並且可訪問:

$ hdfs dfs -ls /user/hadoop1/statuses.log.2013-04-01-00 
Found 1 items 
-rw-r--r-- 3 hadoop1 supergroup 656085089 2013-07-26 11:53 /user/hadoop1/statuses.log.2013-04-01-00 

這似乎是與豬版的一個普遍問題與Cloudera 4.6.0一起發貨:問題似乎是

[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected 

我運行加載數據的另一種用戶自定義函數時得到了類似的錯誤:

[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected 

當我強迫豬本地模式(「」 -x地方「」)我得到較爲明顯的錯誤

Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected 

因此,Hadoop豬使用的版本似乎與Cloudera附帶的版本不兼容,我想。

回答

2

這實際上是一個版本控制問題:某些庫與新的MapReduce API尚不兼容,請參閱例如問題#56,#247#308。 ElephantBird的問題是solved in a recent version。在上面的代碼中使用ElephantBird 4.1並添加了Hadoop兼容模塊

register 'lib/elephant-bird-core-4.1.jar'; 
register 'lib/elephant-bird-pig-4.1.jar'; 
register 'lib/elephant-bird-hadoop-compat-4.1.jar'; 
register 'lib/google-collections-1.0.jar'; 
register 'lib/json-simple-1.1.jar'; 

解決了這個問題! :-)

相關問題