Bluemix Apache Spark服務 - Scala - 讀取文件

這是一個基本問題，但是，我試圖使用Apache Spark服務中Analytics上的Bluemix筆記本中Scala中的代碼檢索文件的內容，並且關於認證的錯誤不斷彈出。有人有一個用於訪問文件的Scala認證示例嗎？先謝謝你！Bluemix Apache Spark服務 - Scala - 讀取文件

我嘗試以下簡單的腳本：

val file = sc.textFile("swift://notebooks.keystone/kdd99.data") 
file.take(1)

我也試過：

def setConfig(name:String) : Unit = { 
    val pfx = "fs.swift.service." + name 
    val conf = sc.getConf 
    conf.set(pfx + "auth.url", "hardcoded") 
    conf.set(pfx + "tenant", "hardcoded") 
    conf.set(pfx + "username", "hardcoded") 
    conf.set(pfx + "password", "hardcoded") 
    conf.set(pfx + "apikey", "hardcoded") 
    conf.set(pfx + "auth.endpoint.prefix", "endpoints") 
} 
setConfig("keystone")

我也試着從以前的問題，這個腳本：

import scala.collection.breakOut 
val name= "keystone" 
val YOUR_DATASOURCE = """auth_url:https://identity.open.softlayer.com 
project: hardcoded 
project_id: hardcoded 
region: hardcoded 
user_id: hardcoded 
domain_id: hardcoded 
domain_name: hardcoded 
username: hardcoded 
password: hardcoded 
filename: hardcoded 
container: hardcoded 
tenantId: hardcoded 
""" 

val settings:Map[String,String] = YOUR_DATASOURCE.split("\\n"). 
    map(l=>(l.split(":",2)(0).trim(), l.split(":",2)(1).trim()))(breakOut) 

val conf = sc.getConf  conf.set("fs.swift.service.keystone.auth.url",settings.getOrElse("auth_url","")) 
conf.set("fs.swift.service.keystone.tenant", settings.getOrElse("tenantId", "")) 
conf.set("fs.swift.service.keystone.username", settings.getOrElse("username", "")) 
conf.set("fs.swift.service.keystone.password", settings.getOrElse("password", "")) 
conf.set("fs.swift.service.keystone.apikey", settings.getOrElse("password", "")) 
conf.set("fs.swift.service.keystone.auth.endpoint.prefix", "endpoints") 
println("sett: "+ settings.getOrElse("auth_url","")) 
val file = sc.textFile("swift://notebooks.keystone/kdd99.data") 

/* The following line gives errors */ 
file.take(1)

誤差低於：

姓名：org.apache.hadoop.fs.swift.exceptions.SwiftConfigurationException 消息：缺少必需的配置選項：fs.swift.service.keystone.auth.url

編輯

這將是一個Python的好選擇。我試過以下，以「火花」作爲配置名稱爲兩個不同的文件：

def set_hadoop_config(credentials): 
    prefix = "fs.swift.service." + credentials['name'] 
    hconf = sc._jsc.hadoopConfiguration() 
    hconf.set(prefix + ".auth.url", credentials['auth_url']+'/v3/auth/tokens') 
    hconf.set(prefix + ".auth.endpoint.prefix", "endpoints") 
    hconf.set(prefix + ".tenant", credentials['project_id']) 
    hconf.set(prefix + ".username", credentials['user_id']) 
    hconf.set(prefix + ".password", credentials['password']) 
    hconf.setInt(prefix + ".http.port", 8080) 
    hconf.set(prefix + ".region", credentials['region']) 
    hconf.setBoolean(prefix + ".public", True)

來源

2016-05-14 tbuda

要從Scala中的對象存儲中訪問文件，以下命令序列可在Scala筆記本中工作：（當您爲數據源中顯示的文件執行「插入到代碼」鏈接時，憑據將填充到單元格中筆記本）：

IN [1]：

var credentials = scala.collection.mutable.HashMap[String, String](
    "auth_url"->"https://identity.open.softlayer.com", 
    "project"->"object_storage_b3c0834b_0936_4bbe_9f29_ef45e018cec9", 
    "project_id"->"68d053dff02e42b1a947457c6e2e3290", 
    "region"->"dallas", 
    "user_id"->"e7639268215e4830a3662f708e8c4a5c", 
    "domain_id"->"2df6373c549e49f8973fb6d22ab18c1a", 
    "domain_name"->"639347", 
    "username"->"Admin_XXXXXXXXXXXX」, 
    "password」->」」」XXXXXXXXXX」」」, 
    "filename"->"2015_small.csv", 
    "container"->"notebooks", 
    "tenantId"->"sefe-f831d4ccd6da1f-42a9cf195d79" 
)

IN [2]：

credentials("name")="keystone"

IN [3]：

def setHadoopConfig(name: String, tenant: String, url: String, username: String, password: String, region: String) = { 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.auth.url",url+"/v3/auth/tokens") 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.auth.endpoint.prefix","endpoints") 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.tenant",tenant) 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.username",username) 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.password",password) 
    sc.hadoopConfiguration.setInt(f"fs.swift.service.$name.http.port",8080) 
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.region",region) 
    sc.hadoopConfiguration.setBoolean(f"fs.swift.service.$name.public",true) 
}

IN [4]：

setHadoopConfig(credentials("name"), credentials("project_id"), credentials("auth_url"), credentials("user_id"), credentials("password"), credentials("region"))

IN [5]：

var testcount = sc.textFile("swift://notebooks.keystone/2015_small.csv") 
testcount.count()

IN [6]：

testcount.take(1)

來源

2016-05-14 21:44:35 NSHUKLA

謝謝NSHUKLA – tbuda

我已經用Python版本編輯了這個問題。你能看看嗎？ – tbuda

對於Python，代碼似乎是正確的（您可以參考示例「Analytics Notebooks和Apache Spark」，它具有用於def set_hadoop_config（憑證）的python代碼。我嘗試過使用keystone名稱的.csv和.txt文件。您是否遇到spark問題，如.data文件中的配置文件，如您所說的與.txt文件一起使用的文件？ – NSHUKLA

我認爲你需要使用「火花」作爲配置名稱，而不是重點因爲你正試圖從IBM Bluemix筆記本電腦訪問對象存儲UI。

sc.textFile（「SWIFT：//notebooks.spark/2015_small.csv」）

現在，這裏是工作示例的例子

https://console.ng.bluemix.net/data/notebooks/4dda9ee7-bf26-4ebc-bccf-dcb1b7ef63c8/view?access_token=37bff7ab682ee255b753fca485d49de50fed69d2a25217a7c748dd1463222c3b

注意考慮改變容器名稱。 containername.configname。

另請在上例中的YOUR_DATASOURCE變量中替換您的憑據。

筆記本是默認的容器。

謝謝， Charles。

來源

2016-05-14 19:07:50

這是它！非常感謝你。只是「keystone」不是一個好的配置名稱。爲什麼「火花」現在的作品？這是一個新規則嗎？以前keystone也能正常工作。 – tbuda

keystone可能工作我認爲....我認爲IBM BM對象存儲的API似乎升級到V3，它可能需要v3 api URL/v3/auth/tokens ..我還沒有測試，但如下文所述@ NSHUKLA，你可能需要更新的URL來使用keystone ... –

謝謝Charles – tbuda

Bluemix Apache Spark服務 - Scala - 讀取文件

回答

相關問題