寫隨機文件到HDFS - PySpark

我還沒有看到如何做到這一點的任何示例。我在python 3環境中使用PySpark 2.0。我有隨機數據，二進制數據，.jpg數據，隨機字符串。我只需要將數據放回底層存儲。寫隨機文件到HDFS - PySpark

例如：

import os 
with open(os.path.join(base_dir, 'RF_model.txt'), "w") as file1: 
    toFile = raw_input(RF_model.toDebugString()) 
    file1.write(toFile)

（以上不工作）

謝謝！

編輯--------------什麼RF_model.toDebugString（）輸出----

Tree 0: 
    If (feature 0 <= 64.0) 
    If (feature 2 <= 212.0) 
     If (feature 3 <= 0.0) 
     If (feature 2 <= 154.0) 
     Predict: 1.0 
     Else (feature 2 > 154.0) 
     Predict: 1.0 
     Else (feature 3 > 0.0) 
     If (feature 2 <= 147.0) 
     Predict: 0.0 
     Else (feature 2 > 147.0) 
     Predict: 0.0 
    Else (feature 2 > 212.0) 
     If (feature 2 <= 375.0) 
     If (feature 3 <= 0.0) 
     Predict: 0.0 
     Else (feature 3 > 0.0) 
     Predict: 0.0 
     Else (feature 2 > 375.0) 
     If (feature 0 <= 22.0) 
     Predict: 0.0 
     Else (feature 0 > 22.0) 
     Predict: 0.0 
    Else (feature 0 > 64.0) 
    If (feature 2 <= 239.0) 
     If (feature 3 <= 0.0) 
     If (feature 2 <= 200.0) 
     Predict: 0.0 
     Else (feature 2 > 200.0) 
     Predict: 0.0 
     Else (feature 3 > 0.0) 
     If (feature 2 <= 124.0) 
     Predict: 0.0 
     Else (feature 2 > 124.0) 
     Predict: 0.0 
    Else (feature 2 > 239.0) 
     If (feature 2 <= 375.0) 
     If (feature 1 <= 67.0) 
     Predict: 0.0 
     Else (feature 1 > 67.0) 
     Predict: 0.0 
     Else (feature 2 > 375.0) 
     If (feature 1 <= 63.0) 
     Predict: 0.0 
     Else (feature 1 > 63.0) 
     Predict: 0.0 
    Tree 1: 
    If (feature 0 <= 64.0) 
    If (feature 2 <= 224.0) 
     If (feature 3 <= 0.0) 
     If (feature 2 <= 170.0) 
     Predict: 1.0 
     Else (feature 2 > 170.0) 
     Predict: 1.0 
     Else (feature 3 > 0.0) 
     If (feature 2 <= 158.0) 
     Predict: 0.0 
     Else (feature 2 > 158.0) 
     Predict: 0.0 
    Else (feature 2 > 224.0) 
     If (feature 2 <= 375.0) 
     If (feature 3 <= 0.0) 
     Predict: 0.0 
     Else (feature 3 > 0.0) 
     Predict: 0.0

來源

2017-04-19 David Crook

什麼'toFile = raw_input（RF_model.toDebugString（））'這假設存檔？ rdd上的'.toDebugString（）'返回該RDD（RF_model）的描述及其遞歸依賴關係以進行調試。 – Pushkr

它只是一個字符串;我將它添加到上面。 –

我希望我是對的，當我假設你想寫的.toDebugString()到文本文件輸出，

在pyspark您可以使用.saveAsTextFile保存任何數據並行化以文本文件 -

# imp step : first parallelize data that you need to save 
rdd = sc.parallelize([str(RF_Model.toDebugString())]) 

# then save as text file , using below if underline storage is HDFS 
rdd.saveAsTextFile('hdfs://'+base_dir+"/RF_model.txt")

，或者如果你只是想將其保存在本地文件系統 -

rdd.saveAsTextFile("file:///"+base_dir+"/RF_model.txt")

來源

2017-04-19 20:31:40 Pushkr

寫隨機文件到HDFS - PySpark

回答

相關問題