2016-04-21 32 views
0

我無法從本地獨立Spark羣集訪問對象存儲上的文件。這是代碼 -從本地spark訪問bluemix對​​象存儲

sqlCxt = SQLContext(sc) 
prefix = "fs.swift.service." + creds['name'] 
hconf = sc._jsc.hadoopConfiguration() 
hconf.set(prefix + ".auth.url", creds['auth_url'] + '/v2.0/tokens') 
hconf.set(prefix + ".auth.endpoint.prefix", "endpoints") 
hconf.set(prefix + ".tenant", creds['project_id']) 
hconf.set(prefix + ".username", creds['user_id']) 
hconf.set(prefix + ".password", creds['password']) 
hconf.setInt(prefix + ".http.port", 8080) 
hconf.set(prefix + ".region", creds['region']) 
hconf.setBoolean(prefix + ".public", True) 

weather = sqlCxt.read.json("swift://notebooks." + creds['name'] + "/repo_local.json") 
weather.show() 

這是我得到

16/04/21 17:31:11 INFO JSONRelation: Listing swift://notebooks.pac/repo_local.json on driver 
16/04/21 17:31:11 WARN HttpMethodDirector: Unable to respond to any of these challenges: {keystone=Keystone uri="https://identity.open.softlayer.com"} 
16/04/21 17:31:33 INFO SparkContext: Created broadcast 0 from json at NativeMethodAccessorImpl.java:-2 
Traceback (most recent call last): 
    File "C:\Users\MY_PC\Desktop\PAC\src\unittest\python\PAC\ObjectStorage_tests.py", line 18, in <module> 
    weather = sqlCxt.read.json("swift://notebooks.pac/config-repo_local.json") 
    File "C:\Python27\lib\pyspark\sql\readwriter.py", line 176, in json 
    return self._df(self._jreader.json(path)) 
    File "C:\Python27\lib\site-packages\py4j\java_gateway.py", line 813, in __call__ 
    answer, self.gateway_client, self.target_id, self.name) 
    File "C:\Python27\lib\pyspark\sql\utils.py", line 45, in deco 
    return f(*a, **kw) 
    File "C:\Python27\lib\site-packages\py4j\protocol.py", line 308, in get_return_value 
    format(target_id, ".", name), value) 
py4j.protocol.Py4JJavaError: An error occurred while calling o22.json. 
: java.io.IOException: No input paths specified in job 
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:201) 
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) 
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
    at scala.Option.getOrElse(Option.scala:120) 
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
    at scala.Option.getOrElse(Option.scala:120) 
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
    at scala.Option.getOrElse(Option.scala:120) 
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
    at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1115) 
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) 
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) 
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) 
    at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1113) 
    at org.apache.spark.sql.execution.datasources.json.InferSchema$.infer(InferSchema.scala:65) 
    at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$4.apply(JSONRelation.scala:114) 
    at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$4.apply(JSONRelation.scala:109) 
    at scala.Option.getOrElse(Option.scala:120) 
    at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema$lzycompute(JSONRelation.scala:109) 
    at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema(JSONRelation.scala:108) 
    at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:636) 
    at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:635) 
    at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37) 
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) 
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) 
    at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:244) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56) 
    at java.lang.reflect.Method.invoke(Method.java:620) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) 
    at py4j.Gateway.invoke(Gateway.java:259) 
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) 
    at py4j.commands.CallCommand.execute(CallCommand.java:79) 
    at py4j.GatewayConnection.run(GatewayConnection.java:209) 
    at java.lang.Thread.run(Thread.java:801) 

異常請注意 - 我能當我通過筆記本上運行訪問文件或bluemix內火花提交。

另外我可以通過swift CLI訪問文件。

回答

1

swift需要身份驗證令牌通過keystone身份驗證連接到本地環境中的對象存儲。

我會建議嘗試使用Stocator連接器來訪問bluemix對​​象存儲,它對我來說算得非常一致。

https://github.com/SparkTC/stocator

感謝, 查爾斯。

相關問題