2016-05-14 97 views
3

這是一個基本問題,但是,我試圖使用Apache Spark服務中Analytics上的Bluemix筆記本中Scala中的代碼檢索文件的內容,並且關於認證的錯誤不斷彈出。有人有一個用於訪問文件的Scala認證示例嗎?先謝謝你!Bluemix Apache Spark服務 - Scala - 讀取文件

我嘗試以下簡單的腳本:

val file = sc.textFile("swift://notebooks.keystone/kdd99.data") 
file.take(1) 

我也試過:

def setConfig(name:String) : Unit = { 
    val pfx = "fs.swift.service." + name 
    val conf = sc.getConf 
    conf.set(pfx + "auth.url", "hardcoded") 
    conf.set(pfx + "tenant", "hardcoded") 
    conf.set(pfx + "username", "hardcoded") 
    conf.set(pfx + "password", "hardcoded") 
    conf.set(pfx + "apikey", "hardcoded") 
    conf.set(pfx + "auth.endpoint.prefix", "endpoints") 
} 
setConfig("keystone") 

我也試着從以前的問題,這個腳本:

import scala.collection.breakOut 
val name= "keystone" 
val YOUR_DATASOURCE = """auth_url:https://identity.open.softlayer.com 
project: hardcoded 
project_id: hardcoded 
region: hardcoded 
user_id: hardcoded 
domain_id: hardcoded 
domain_name: hardcoded 
username: hardcoded 
password: hardcoded 
filename: hardcoded 
container: hardcoded 
tenantId: hardcoded 
""" 

val settings:Map[String,String] = YOUR_DATASOURCE.split("\\n"). 
    map(l=>(l.split(":",2)(0).trim(), l.split(":",2)(1).trim()))(breakOut) 

val conf = sc.getConf  conf.set("fs.swift.service.keystone.auth.url",settings.getOrElse("auth_url","")) 
conf.set("fs.swift.service.keystone.tenant", settings.getOrElse("tenantId", "")) 
conf.set("fs.swift.service.keystone.username", settings.getOrElse("username", "")) 
conf.set("fs.swift.service.keystone.password", settings.getOrElse("password", "")) 
conf.set("fs.swift.service.keystone.apikey", settings.getOrElse("password", "")) 
conf.set("fs.swift.service.keystone.auth.endpoint.prefix", "endpoints") 
println("sett: "+ settings.getOrElse("auth_url","")) 
val file = sc.textFile("swift://notebooks.keystone/kdd99.data") 

/* The following line gives errors */ 
file.take(1) 

誤差低於:

姓名:org.apache.hadoop.fs.swift.exceptions.SwiftConfigurationException 消息:缺少必需的配置選項:fs.swift.service.keystone.auth.url

編輯

這將是一個Python的好選擇。我試過以下,以「火花」作爲配置名稱爲兩個不同的文件:

def set_hadoop_config(credentials): 
    prefix = "fs.swift.service." + credentials['name'] 
    hconf = sc._jsc.hadoopConfiguration() 
    hconf.set(prefix + ".auth.url", credentials['auth_url']+'/v3/auth/tokens') 
    hconf.set(prefix + ".auth.endpoint.prefix", "endpoints") 
    hconf.set(prefix + ".tenant", credentials['project_id']) 
    hconf.set(prefix + ".username", credentials['user_id']) 
    hconf.set(prefix + ".password", credentials['password']) 
    hconf.setInt(prefix + ".http.port", 8080) 
    hconf.set(prefix + ".region", credentials['region']) 
    hconf.setBoolean(prefix + ".public", True) 

回答

2

要從Scala中的對象存儲中訪問文件,以下命令序列可在Scala筆記本中工作: (當您爲數據源中顯示的文件執行「插入到代碼」鏈接時,憑據將填充到單元格中筆記本):

IN [1]:

var credentials = scala.collection.mutable.HashMap[String, String](
    "auth_url"->"https://identity.open.softlayer.com", 
    "project"->"object_storage_b3c0834b_0936_4bbe_9f29_ef45e018cec9", 
    "project_id"->"68d053dff02e42b1a947457c6e2e3290", 
    "region"->"dallas", 
    "user_id"->"e7639268215e4830a3662f708e8c4a5c", 
    "domain_id"->"2df6373c549e49f8973fb6d22ab18c1a", 
    "domain_name"->"639347", 
    "username"->"Admin_XXXXXXXXXXXX」, 
    "password」->」」」XXXXXXXXXX」」」, 
    "filename"->"2015_small.csv", 
    "container"->"notebooks", 
    "tenantId"->"sefe-f831d4ccd6da1f-42a9cf195d79" 
) 

IN [2]:

credentials("name")="keystone" 

IN [3]:

def setHadoopConfig(name: String, tenant: String, url: String, username: String, password: String, region: String) = { 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.auth.url",url+"/v3/auth/tokens") 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.auth.endpoint.prefix","endpoints") 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.tenant",tenant) 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.username",username) 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.password",password) 
    sc.hadoopConfiguration.setInt(f"fs.swift.service.$name.http.port",8080) 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.region",region) 
    sc.hadoopConfiguration.setBoolean(f"fs.swift.service.$name.public",true) 
} 

IN [4]:

setHadoopConfig(credentials("name"), credentials("project_id"), credentials("auth_url"), credentials("user_id"), credentials("password"), credentials("region")) 

IN [5]:

var testcount = sc.textFile("swift://notebooks.keystone/2015_small.csv") 
testcount.count() 

IN [6]:

testcount.take(1) 
+0

謝謝NSHUKLA – tbuda

+0

我已經用Python版本編輯了這個問題。你能看看嗎? – tbuda

+0

對於Python,代碼似乎是正確的(您可以參考示例「Analytics Notebooks和Apache Spark」,它具有用於def set_hadoop_config(憑證)的python代碼。 我嘗試過使用keystone名稱的.csv和.txt文件。您是否遇到spark問題,如.data文件中的配置文件,如您所說的與.txt文件一起使用的文件? – NSHUKLA

3

我認爲你需要使用「火花」作爲配置名稱,而不是重點因爲你正試圖從IBM Bluemix筆記本電腦訪問對象存儲UI。

sc.textFile(「SWIFT://notebooks.spark/2015_small.csv」)

現在,這裏是工作示例的例子

https://console.ng.bluemix.net/data/notebooks/4dda9ee7-bf26-4ebc-bccf-dcb1b7ef63c8/view?access_token=37bff7ab682ee255b753fca485d49de50fed69d2a25217a7c748dd1463222c3b

注意考慮改變容器名稱。 containername.configname。

另請在上例中的YOUR_DATASOURCE變量中替換您的憑據。

筆記本是默認的容器。

謝謝, Charles。

+0

這是它!非常感謝你。只是「keystone」不是一個好的配置名稱。爲什麼「火花」現在的作品?這是一個新規則嗎?以前keystone也能正常工作。 – tbuda

+0

keystone可能工作我認爲....我認爲IBM BM對象存儲的API似乎升級到V3,它可能需要v3 api URL/v3/auth/tokens ..我還沒有測試,但如下文所述@ NSHUKLA,你可能需要更新的URL來使用keystone ... –

+0

謝謝Charles – tbuda

相關問題