在Cloudera VM 5.7上運行spark示例和

我正在學習hadoop，機器學習和火花。我已經下載了Cloudera 5.7快速啓動虛擬機。我還將https://github.com/apache/spark中的示例作爲zip文件下載並複製到Cloudera VM。我有一個挑戰，運行機器學習和https://github.com/apache/spark的任何示例。我嘗試運行簡單的字數統計範例，但失敗了。下面是我的步驟和錯誤，我得到在Cloudera VM 5.7上運行spark示例和

[[email protected]] CD /火花主/例子/ src目錄/主/蟒蛇/毫升 [[email protected]]火花提交word2vec_example.py

我嘗試運行的所有示例都失敗，出現以下錯誤。

回溯（最近通話最後一個）：從pyspark.sql 文件「/home/cloudera/training/spark-master/examples/src/main/python/ml/word2vec_example.py」 23行，在進口SparkSession

我搜索了文件pyspark.sql，但我只能找到下面的文件 cd/spark-master 找到。 -name pyspark.sql ./python/docs/pyspark.sql.rst

請告知我怎樣才能解決這些錯誤，使我可以爲了加快我的機器學習和大數據運行這個例子。

用於字計數例子中的代碼是下面

貓word2vec_example.py

# 
# Licensed to the Apache Software Foundation (ASF) under one or more 
# contributor license agreements. See the NOTICE file distributed with 
# this work for additional information regarding copyright ownership. 
# The ASF licenses this file to You under the Apache License, Version 2.0 
# (the "License"); you may not use this file except in compliance with 
# the License. You may obtain a copy of the License at 
# 
# http://www.apache.org/licenses/LICENSE-2.0 
# 
# Unless required by applicable law or agreed to in writing, software 
# distributed under the License is distributed on an "AS IS" BASIS, 
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
# See the License for the specific language governing permissions and 
# limitations under the License. 
# 

from __future__ import print_function 

# $example on$ 
from pyspark.ml.feature import Word2Vec 
# $example off$ 
from pyspark.sql import SparkSession 

if __name__ == "__main__": 
    spark = SparkSession\ 
     .builder\ 
     .appName("Word2VecExample")\ 
     .getOrCreate() 

    # $example on$ 
    # Input data: Each row is a bag of words from a sentence or document. 
    documentDF = spark.createDataFrame([ 
     ("Hi I heard about Spark".split(" "),), 
     ("I wish Java could use case classes".split(" "),), 
     ("Logistic regression models are neat".split(" "),) 
    ], ["text"]) 
    # Learn a mapping from words to Vectors. 
    word2Vec = Word2Vec(vectorSize=3, minCount=0, inputCol="text", outputCol="result") 
    model = word2Vec.fit(documentDF) 
    result = model.transform(documentDF) 
    for feature in result.select("result").take(3): 
     print(feature) 
    # $example off$ 

    spark.stop()

來源

2016-07-09 user3537389

線23：spark = SparkSession\

SparkSession火花2.0是新的，並且僅Cloudera的附帶火花1.6默認。您可以從Spark 1.6或install Spark 2.0 on Cloudera.

下載示例

來源

2016-12-21 11:36:55 njustice

在Cloudera VM 5.7上運行spark示例和

回答

相關問題