火花1.6.1蟒蛇3.5.1建設樸素貝葉斯分類器

我的問題是根據。火花1.6.1蟒蛇3.5.1建設樸素貝葉斯分類器

是否可以更詳細的評論/解釋代碼開始線tf = HashingTF().transform(training_raw.map(lambda doc: doc["text"], preservesPartitioning=True))
我怎麼能打印混淆矩陣？
下面的錯誤是什麼意思？我該如何解決它？該模型仍然得到建立，我得到的預測

>>> # Train and check ... model = NaiveBayes.train(training) [Stage 2:=============================> (2 + 2)/4]16/04/05 18:18:28 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 16/04/05 18:18:28 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
我怎麼能打印新的觀察結果。我嘗試和失敗

>>> model.predict("love") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "c:\spark-1.6.1-bin-hadoop2.6\spark-1.6.1-bin-hadoop2.6\python\pyspark\mllib\classification.py", line 594, in predict x = _convert_to_vector(x) File "c:\spark-1.6.1-bin-hadoop2.6\spark-1.6.1-bin-hadoop2.6\python\pyspark\mllib\linalg\__init__.py", line 77, in _convert_to_vector raise TypeError("Cannot convert type %s into Vector" % type(l)) TypeError: Cannot convert type <class 'str'> into Vector

2016-04-06 user2543622

你可以從'training_raw'添加一個示例嗎？ –

的數據是在2，我明白什麼是BLAS代表的數據是http://stackoverflow.com/questions/32231049/how-to-use-spark-naive-bayes-classifier-for-text-classification-with-idf – user2543622

1.hashingTF火花類同的scikitlearn HashingVectorizer。 training_raw是文本的一個rdd。有關pySpark中可用矢量化器的詳細說明，請參閱Vectorizers。有關完整示例，請參見this post

2.BLAS是基本線性代數子程序庫。您可以在github上查看此頁面，以獲取潛在的solution。

3.您正試圖在字符串（「愛」）上使用model.predict。您必須先將字符串轉換爲向量。一個簡單的例子，需要一個密集的矢量列並輸出與標籤密集的載體是

def parseLine(line): 
    parts = line.split(',') 
    label = float(parts[0]) 
    features = Vectors.dense([float(x) for x in parts[1].split(' ')]) 
    return LabeledPoint(label, features)

您可能正在尋找一個稀疏向量。所以試試Vectors.sparse。

來源

2016-04-06 01:58:02 goCards

現在。但是，有可能提供提示以擺脫錯誤嗎？另外讓我知道如何打印混淆矩陣......謝謝 – user2543622

火花1.6.1蟒蛇3.5.1建設樸素貝葉斯分類器

回答

相關問題