2016-12-14 59 views
0

我使用Apache Toree - PySpark運行Jupyter(v4.2.1)。當我嘗試調用plotly的init_notebook_mode功能,我遇到了以下錯誤:使用Jupyter(Apache Toree PySpark)調用plotly的init_notebook_mode錯誤

import numpy as np 
import pandas as pd 

import plotly.plotly as py 
import plotly.graph_objs as go 
from plotly import tools 
from plotly.offline import iplot, init_notebook_mode 
init_notebook_mode() 

錯誤:

Name: org.apache.toree.interpreter.broker.BrokerException 
Message: Traceback (most recent call last): 
    File "/tmp/kernel-PySpark-6415c581-01c4-4c90-b4d9-81773c2bc03f/pyspark_runner.py", line 134, in <module> 
    eval(compiled_code) 
    File "<string>", line 7, in <module> 
    File "/usr/local/lib/python3.4/dist-packages/plotly/offline/offline.py", line 151, in init_notebook_mode 
    display(HTML(script_inject)) 
    File "/usr/local/lib/python3.4/dist-packages/IPython/core/display.py", line 158, in display 
    format = InteractiveShell.instance().display_formatter.format 
    File "/usr/local/lib/python3.4/dist-packages/traitlets/config/configurable.py", line 412, in instance 
    inst = cls(*args, **kwargs) 
    File "/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py", line 499, in __init__ 
    self.init_io() 
    File "/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py", line 658, in init_io 
    io.stdout = io.IOStream(sys.stdout) 
    File "/usr/local/lib/python3.4/dist-packages/IPython/utils/io.py", line 34, in __init__ 
    raise ValueError("fallback required, but not specified") 
ValueError: fallback required, but not specified 

StackTrace: org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:140) 
org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:140) 
scala.Option.foreach(Option.scala:236) 
org.apache.toree.interpreter.broker.BrokerState.markFailure(BrokerState.scala:139) 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
java.lang.reflect.Method.invoke(Method.java:498) 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) 
py4j.Gateway.invoke(Gateway.java:259) 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) 
py4j.commands.CallCommand.execute(CallCommand.java:79) 
py4j.GatewayConnection.run(GatewayConnection.java:209) 
java.lang.Thread.run(Thread.java:745) 

我無法找到這個網絡上的任何信息。當我在代碼中發現這種情況失敗時 - 我在IPython utils中使用io.py,我發現被傳遞的流必須同時具有這兩個屬性 - write和flush。但由於某種原因,在這種情況下傳遞的流 - sys.stdout只有「write」屬性,而不是「flush」屬性。

+0

是[this](https://github.com/ipython/ipython/issues/9300)鏈接有幫助嗎?它描述了一個錯誤,其中'IOStream'對象沒有'flush'屬性,這似乎也是這裏的根本原因。 –

回答

0

我相信會發生這種情況,因爲plotly的筆記本模式假定它在執行筆記本通信的IPython jupyter內核中運行;你會在堆棧跟蹤中看到它試圖調用IPython包。

然而,Toree是一個不同的jupyter內核,並有自己的協議處理功能來與筆記本服務器進行通信。即使當你使用toree來運行一個PySpark解釋器時,你也會得到一個「普通」的PySpark(就像當你從一個shell啓動它時),並且toree驅動該解釋器的輸入/輸出。

因此,IPython機制沒有設置,並且在該環境中調用init_notebook_mode()將失敗,就像您在PySpark中運行時一樣,直接從shell啓動的PySpark中,它對筆記本一無所知。

據我所知,目前沒有辦法通過toree來繪製PySpark會話的輸出結果 - 我們最近面臨同樣的問題。您不需要通過toree運行python,而需要運行IPython內核,將PySpark庫導入並連接到Spark羣集。請參閱https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook瞭解碼頭化示例。