我有我的df
一個問題,運行星火2.1.0,有從蜂房DB SQL查詢創建了幾個字符串列,讓這個.summary()
:PySpark GROUPBY計數失敗,show方法
DataFrame[summary: string, visitorid: string, eventtype: string, ..., target: string]
。
如果我只運行df.groupBy("eventtype").count()
,它的工作原理,我得到DataFrame[eventtype: string, count: bigint]
當節目df.groupBy('eventtype').count().show()
運行,我不斷收到:
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-9040214714346906648.py", line 267, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-9040214714346906648.py", line 265, in <module>
exec(code)
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 318, in show
print(self._jdf.showString(n, 20))
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o4636.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 633.0 failed 4 times, most recent failure: Lost task 0.3 in stage 633.0 (TID 19944, ip-172-31-28-173.eu-west-1.compute.internal, executor 440): java.lang.NullPointerException
我不知道什麼是錯的顯示方法(既非的其他列可以工作,而不是我創建的事件列target
)。集羣的管理員也無法幫助我。
任何指針
我假設你正在使用Zeppelin。 'z.show(df.groupBy('eventtype')。count())'工作嗎? –
是的,我正在使用zeppelin - 有趣的想法!它會引發稍微不同的錯誤..'Py4JJavaError:調用z:org.apache.zeppelin.spark.ZeppelinContext.showDF時發生錯誤。 :org.apache.zeppelin.interpreter.InterpreterException:java.lang.reflect.InvocationTargetException'我應該編輯我的Q並添加整個錯誤消息嗎? –