python
  • python-3.x
  • hadoop
  • hive
  • cloudera
  • 2017-08-27 66 views 11 likes 
    11

    連接到Hive2在連接到Hive2使用Python與下面的代碼:無法使用Python

    import pyhs2 
    
    with pyhs2.connect(host='localhost', 
          port=10000, 
          authMechanism="PLAIN", 
          user='root', 
          password='test', 
          database='default') as conn: 
    with conn.cursor() as cur: 
        #Show databases 
        print cur.getDatabases() 
    
        #Execute query 
        cur.execute("select * from table") 
    
        #Return column info from query 
        print cur.getSchema() 
    
        #Fetch table results 
        for i in cur.fetch(): 
         print i 
    

    我得到以下錯誤:

    File 
    "C:\Users\vinbhask\AppData\Roaming\Python\Python36\site-packages\pyhs2-0.6.0-py3.6.egg\pyhs2\connections.py", 
    line 7, in <module> 
        from cloudera.thrift_sasl import TSaslClientTransport ModuleNotFoundError: No module named 'cloudera' 
    

    試過herehere,但問題是沒有解決。

    下面是安裝到現在的包:在使用Impyla

    bitarray0.8.1,certifi2017.7.27.1,chardet3.0.4,cm-api16.0.0,cx-Oracle6.0.1,future0.16.0,idna2.6,impyla0.14.0,JayDeBeApi1.1.1,JPype10.6.2,ply3.10,pure-sasl0.4.0,PyHive0.4.0,pyhs20.6.0,pyodbc4.0.17,requests2.18.4,sasl0.2.1,six1.10.0,teradata15.10.0.21,thrift0.10.0,thrift-sasl0.2.1,thriftpy0.3.9,urllib31.22 
    

    錯誤:

    Traceback (most recent call last): 
    File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\Scripts\HiveConnTester4.py", line 1, in <module> 
    from impala.dbapi import connect 
    File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\dbapi.py", line 28, in <module> 
    import impala.hiveserver2 as hs2 
    File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\hiveserver2.py", line 33, in <module> 
    from impala._thrift_api import (
    File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\_thrift_api.py", line 74, in <module> 
    include_dirs=[thrift_dir]) 
    File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\thriftpy\parser\__init__.py", line 30, in load 
    include_dir=include_dir) 
    File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\thriftpy\parser\parser.py", line 496, in parse 
    url_scheme)) 
    thriftpy.parser.exc.ThriftParserError: ThriftPy does not support generating module with path in protocol 'c' 
    
    +0

    我很驚訝有這麼多人突然抱怨PyHive(這是目前破* [2017年8月] *)和PyHS2(你清楚無法工作)。請嘗試ImPyla。它由Cloudera維護。它的工作原理。 –

    +0

    @SamsonScharfrichter:我也曾在Impyla嘗試過,更新錯誤日誌如上 – Vinod

    +0

    PySpark如何? –

    回答

    1

    thrift_sasl.py試圖cStringIO其不再在Python 3.0。嘗試與Python 2?

    +0

    我們有一個要求使用python3 + – Vinod

    +0

    pysh2不再被維護。你使用PyHive嗎? – Xire

    1

    您可能需要安裝thrift_sasl的未發佈版本。嘗試:

    pip install git+https://github.com/cloudera/thrift_sasl 
    
    +0

    @Vinod對您有幫助嗎? – Tagar

    +0

    沒有得到這個錯誤「無法連接到github.com端口443:超時」 – Vinod

    +0

    最後一個錯誤提示你在防火牆後面 - 這就是爲什麼你訪問端口443超時。更改'https:'到'http:'並重試 - 端口80可能會打開。 – Tagar

    0

    如果你舒適的學習PySpark,那麼你只需要設置hive.metastore.uris酒店在蜂房Metastore地址指向,你準備好去。

    最簡單的方法是從集羣中導出hive-site.xml,然後在​​期間通過--files hive-site.xml

    (我沒有試過運行獨立Pyspark,所以因人而異)

    相關問題