我試圖讓使用蒙戈 - Hadoop的地圖,減少與Python功能。 Hadoop正在工作,hadoop streaming正在與python和mongo-hadoop適配器一起工作。但是,使用python的mongo-hadoop流式示例無法正常工作。當試圖在流/例子/國庫運行的例子中,我得到以下錯誤:Hadoop的數據流將使用Python蒙戈 - Hadoop的
[email protected]: ~/git/mongo-hadoop/streaming$ hadoop jar target/mongo-hadoop-streaming-assembly-1.0.1.jar -mapper examples/treasury/mapper.py -reducer examples/treasury/reducer.py -inputformat com.mongodb.hadoop.mapred.MongoInputFormat -outputformat com.mongodb.hadoop.mapred.MongoOutputFormat -inputURI mongodb://127.0.0.1/mongo_hadoop.yield_historical.in -outputURI mongodb://127.0.0.1/mongo_hadoop.yield_historical.streaming.out
13/04/09 11:54:34 INFO streaming.MongoStreamJob: Running
13/04/09 11:54:34 INFO streaming.MongoStreamJob: Init
13/04/09 11:54:34 INFO streaming.MongoStreamJob: Process Args
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Setup Options'
13/04/09 11:54:34 INFO streaming.StreamJobPatch: PreProcess Args
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Parse Options
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-mapper'
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'examples/treasury/mapper.py'
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-reducer'
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'examples/treasury/reducer.py'
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-inputformat'
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'com.mongodb.hadoop.mapred.MongoInputFormat'
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-outputformat'
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'com.mongodb.hadoop.mapred.MongoOutputFormat'
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-inputURI'
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'mongodb://127.0.0.1/mongo_hadoop.yield_historical.in'
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-outputURI'
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'mongodb://127.0.0.1/mongo_hadoop.yield_historical.streaming.out'
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Add InputSpecs
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Setup output_
13/04/09 11:54:34 INFO streaming.StreamJobPatch: Post Process Args
13/04/09 11:54:34 INFO streaming.MongoStreamJob: Args processed.
13/04/09 11:54:36 INFO io.MongoIdentifierResolver: Resolving: bson
13/04/09 11:54:36 INFO io.MongoIdentifierResolver: Resolving: bson
13/04/09 11:54:36 INFO io.MongoIdentifierResolver: Resolving: bson
13/04/09 11:54:36 INFO io.MongoIdentifierResolver: Resolving: bson
**Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/filecache/DistributedCache**
at org.apache.hadoop.streaming.StreamJob.setJobConf(StreamJob.java:959)
at com.mongodb.hadoop.streaming.MongoStreamJob.run(MongoStreamJob.java:36)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.mongodb.hadoop.streaming.MongoStreamJob.main(MongoStreamJob.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.filecache.DistributedCache
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 10 more
如果有人可以提供一些線索這將是一個很大的幫助。
全部信息:
至於我可以告訴我需要得到以下四兩件事的工作:
- 安裝和測試的Hadoop
- 安裝和使用Python 測試Hadoop的流
- 安裝和測試蒙戈-的hadoop
- 安裝並用蟒測試蒙戈-hadoop的流
因此,它的缺點是我有一切工作到第四步。使用(https://github.com/danielpoe/cloudera)我已經得到了Cloudera 4安裝
- 使用廚師食譜4 Cloudera的已安裝並正常運行和測試
- 使用邁克爾nolls博客教程,測試Hadoop的使用python流成功
- 使用Google文檔在mongodb.org能夠同時運行國庫UFO和實例(建立build.sbt CDH4)
- 下載1.5小時值得使用的自述在流/例子嘰嘰喳喳例如Twitter的數據,並且也嘗試了國庫例子。
解決: 得到它的工作,我們需要安裝Cloudera的4 然後使用版本CDH4 然後使用版本CDH3 此時創建蒙戈 - Hadoop的流媒體驅動器,安裝蒙戈-的Hadoop適配器,而不是跟隨指令和從倉庫中安裝pymongo-Hadoop的,最好的解決辦法 '須藤PIP安裝pymongo_hadoop' – Conor 2013-04-17 08:50:05