2016-09-09 92 views
0

我正在使用Pyspark創建一個數據框,但是出現了一個錯誤。Python pyspark錯誤

我使用以下代碼來創建使用從實施例中的數據的數據幀的文件夾:

df = spark.read.load(`c:/spark/examples/src/main/resources/users.parquet`) 

這將生成以下廣泛錯誤消息:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". 
SLF4J: Defaulting to no-operation (NOP) logger implementation 
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 
16/09/09 15:41:51 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 
16/09/09 15:41:51 WARN Hive: Failed to access metastore. This class should not accessed in runtime. 
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hiv 
e.ql.metadata.SessionHiveMetaStoreClient 
     at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236) 
     at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) 
     at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166) 
     at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) 
     at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258) 
     at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359) 
     at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263) 
     at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39) 
     at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38) 
     at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46) 
     at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45) 
     at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50) 
     at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48) 
     at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63) 
     at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63) 
     at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62) 
     at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) 
     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) 
     at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:498) 
     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) 
     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
     at py4j.Gateway.invoke(Gateway.java:280) 
     at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) 
     at py4j.commands.CallCommand.execute(CallCommand.java:79) 
     at py4j.GatewayConnection.run(GatewayConnection.java:211) 
     at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClien 
t 
     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) 
     at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) 
     at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) 
     at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234) 
     ... 36 more 
Caused by: java.lang.reflect.InvocationTargetException 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) 
     ... 42 more 
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:c:/Spark/ 
bin/spark-warehouse 
     at org.apache.hadoop.fs.Path.initialize(Path.java:205) 
     at org.apache.hadoop.fs.Path.<init>(Path.java:171) 
     at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159) 
     at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:600) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) 
     at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66) 
     at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) 
     at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199) 
     at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74) 
     ... 47 more 
Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:c:/Spark/bin/spark-warehouse 
     at java.net.URI.checkPath(URI.java:1823) 
     at java.net.URI.<init>(URI.java:745) 
     at org.apache.hadoop.fs.Path.initialize(Path.java:202) 
     ... 58 more 
16/09/09 15:41:51 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "c:\Spark\python\pyspark\sql\readwriter.py", line 147, in load 
    return self._df(self._jreader.load(path)) 
    File "c:\Spark\python\lib\py4j-0.10.1-src.zip\py4j\java_gateway.py", line 933, in __call__ 
    File "c:\Spark\python\pyspark\sql\utils.py", line 63, in deco 
    return f(*a, **kw) 
    File "c:\Spark\python\lib\py4j-0.10.1-src.zip\py4j\protocol.py", line 312, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling o27.load. 
: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.Sessio 
nHiveMetaStoreClient 
     at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) 
     at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258) 
     at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359) 
     at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263) 
     at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39) 
     at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38) 
     at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46) 
     at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45) 
     at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50) 
     at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48) 
     at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63) 
     at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63) 
     at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62) 
     at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) 
     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) 
     at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:498) 
     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) 
     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
     at py4j.Gateway.invoke(Gateway.java:280) 
     at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) 
     at py4j.commands.CallCommand.execute(CallCommand.java:79) 
     at py4j.GatewayConnection.run(GatewayConnection.java:211) 
     at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClien 
t 
     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) 
     at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) 
     at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) 
     at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) 
     ... 33 more 
Caused by: java.lang.reflect.InvocationTargetException 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) 
     ... 39 more 
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:c:/Spark/ 
bin/spark-warehouse 
     at org.apache.hadoop.fs.Path.initialize(Path.java:205) 
     at org.apache.hadoop.fs.Path.<init>(Path.java:171) 
     at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159) 
     at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:600) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) 
     at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66) 
     at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) 
     at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199) 
     at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74) 
     ... 44 more 
Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:c:/Spark/bin/spark-warehouse 
     at java.net.URI.checkPath(URI.java:1823) 
     at java.net.URI.<init>(URI.java:745) 
     at org.apache.hadoop.fs.Path.initialize(Path.java:202) 
     ... 55 more 

我認爲一個原因可能是這line:

java.net.URISyntaxException: Relative path in absolute URI: file:c:/Spark/bin/spark-warehouse 

我不相信如何解決這個問題,所以任何assistan ce非常感謝,

回答

0

這是Spark安裝的問題。我在本地安裝。我創建了rdd的&一切都很好,直到我想從rdds創建一個Spark DataFrame ...大錯誤。

的問題是與預建星火版本:火花2.0.0彬hadoop2.7

我刪除了火花2.0.0彬hadoop2.7下載和安裝時火花1.6使用PIP安裝py4j,而不是解壓縮,並使用附帶的預建星火

我現在可以創建數據框

我想到結局版本1,2-濱hadoop2.6 是雙重的: 1.如果在Windows7上安裝,使用spark-1.6.2-bin-hadoop2.6並且想要使用Spark DataFrames 2. SparkSession不可用 - 只有Spark 2出現,必須使用SQLContext ...哦!

關於