2017-04-19 23 views
0

我還沒有看到如何做到這一點的任何示例。我在python 3環境中使用PySpark 2.0。我有隨機數據,二進制數據,.jpg數據,隨機字符串。我只需要將數據放回底層存儲。寫隨機文件到HDFS - PySpark

例如:

import os 
with open(os.path.join(base_dir, 'RF_model.txt'), "w") as file1: 
    toFile = raw_input(RF_model.toDebugString()) 
    file1.write(toFile) 

(以上不工作)

謝謝!

編輯--------------什麼RF_model.toDebugString()輸出----

Tree 0: 
    If (feature 0 <= 64.0) 
    If (feature 2 <= 212.0) 
     If (feature 3 <= 0.0) 
     If (feature 2 <= 154.0) 
     Predict: 1.0 
     Else (feature 2 > 154.0) 
     Predict: 1.0 
     Else (feature 3 > 0.0) 
     If (feature 2 <= 147.0) 
     Predict: 0.0 
     Else (feature 2 > 147.0) 
     Predict: 0.0 
    Else (feature 2 > 212.0) 
     If (feature 2 <= 375.0) 
     If (feature 3 <= 0.0) 
     Predict: 0.0 
     Else (feature 3 > 0.0) 
     Predict: 0.0 
     Else (feature 2 > 375.0) 
     If (feature 0 <= 22.0) 
     Predict: 0.0 
     Else (feature 0 > 22.0) 
     Predict: 0.0 
    Else (feature 0 > 64.0) 
    If (feature 2 <= 239.0) 
     If (feature 3 <= 0.0) 
     If (feature 2 <= 200.0) 
     Predict: 0.0 
     Else (feature 2 > 200.0) 
     Predict: 0.0 
     Else (feature 3 > 0.0) 
     If (feature 2 <= 124.0) 
     Predict: 0.0 
     Else (feature 2 > 124.0) 
     Predict: 0.0 
    Else (feature 2 > 239.0) 
     If (feature 2 <= 375.0) 
     If (feature 1 <= 67.0) 
     Predict: 0.0 
     Else (feature 1 > 67.0) 
     Predict: 0.0 
     Else (feature 2 > 375.0) 
     If (feature 1 <= 63.0) 
     Predict: 0.0 
     Else (feature 1 > 63.0) 
     Predict: 0.0 
    Tree 1: 
    If (feature 0 <= 64.0) 
    If (feature 2 <= 224.0) 
     If (feature 3 <= 0.0) 
     If (feature 2 <= 170.0) 
     Predict: 1.0 
     Else (feature 2 > 170.0) 
     Predict: 1.0 
     Else (feature 3 > 0.0) 
     If (feature 2 <= 158.0) 
     Predict: 0.0 
     Else (feature 2 > 158.0) 
     Predict: 0.0 
    Else (feature 2 > 224.0) 
     If (feature 2 <= 375.0) 
     If (feature 3 <= 0.0) 
     Predict: 0.0 
     Else (feature 3 > 0.0) 
     Predict: 0.0 
+0

什麼'toFile = raw_input(RF_model.toDebugString())'這假設存檔? rdd上的'.toDebugString()'返回該RDD(RF_model)的描述及其遞歸依賴關係以進行調試。 – Pushkr

+0

它只是一個字符串;我將它添加到上面。 –

回答

1

我希望我是對的,當我假設你想寫的.toDebugString()到文本文件輸出,

在pyspark您可以使用.saveAsTextFile保存任何數據並行化以文本文件 -

# imp step : first parallelize data that you need to save 
rdd = sc.parallelize([str(RF_Model.toDebugString())]) 

# then save as text file , using below if underline storage is HDFS 
rdd.saveAsTextFile('hdfs://'+base_dir+"/RF_model.txt") 

,或者如果你只是想將其保存在本地文件系統 -

rdd.saveAsTextFile("file:///"+base_dir+"/RF_model.txt")