我是zeppelin的新手。我有一個用例,其中我有一個熊貓數據框。我需要使用內置的zeppelin圖表可視化集合,但我沒有明確的方法。我對zeppelin的理解是,如果數據是RDD格式,我們可以將數據可視化。所以,我想將pandas數據框轉換爲spark數據框,然後執行一些查詢(使用sql),我將可視化。 首先,我試圖轉換大熊貓數據幀引發的,但我失敗了將熊貓數據框轉換爲zeppelin中的火花數據框
%pyspark
import pandas as pd
from pyspark.sql import SQLContext
print sc
df = pd.DataFrame([("foo", 1), ("bar", 2)], columns=("k", "v"))
print type(df)
print df
sqlCtx = SQLContext(sc)
sqlCtx.createDataFrame(df).show()
而且我得到了下面的錯誤
Traceback (most recent call last): File "/tmp/zeppelin_pyspark.py",
line 162, in <module> eval(compiledCode) File "<string>",
line 8, in <module> File "/home/bala/Software/spark-1.5.0-bin-hadoop2.6/python/pyspark/sql/context.py",
line 406, in createDataFrame rdd, schema = self._createFromLocal(data, schema) File "/home/bala/Software/spark-1.5.0-bin-hadoop2.6/python/pyspark/sql/context.py",
line 322, in _createFromLocal struct = self._inferSchemaFromList(data) File "/home/bala/Software/spark-1.5.0-bin-hadoop2.6/python/pyspark/sql/context.py",
line 211, in _inferSchemaFromList schema = _infer_schema(first) File "/home/bala/Software/spark-1.5.0-bin-hadoop2.6/python/pyspark/sql/types.py",
line 829, in _infer_schema raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <type 'str'>
是否有人可以幫助我在這裏?另外,糾正我,如果我錯了任何地方。