当前位置: 代码迷 >> 综合 >> python2/python3 连接 hive/impala 的问题汇总
  详细解决方案

python2/python3 连接 hive/impala 的问题汇总

热度:66   发布时间:2023-12-09 22:36:27.0

目前Pyhive和impyla不兼容,同一个python不能同时用这两个library

建议连impyla


连Impala

connect函数的源代码: https://github.com/cloudera/impyla/blob/master/impala/dbapi.py

示例:

from impala.dbapi import connect
mcon=connect(host='bd-slave01-pe2.f-pro.cn',port=21050,user='username',password='password',auth_mechanism='PLAIN')mcon=connect(host='bd-slave01-pe2.f-pro.cn',port=21050,user='username',password='password',auth_mechanism='GSSAPI')

如果VM上有kerberos权限 那么可以用 auth_mechanism='GSSAPI' 或 auth_mechanism='PLAIN'

如果美誉kerberos权限, 请用auth_mechanism='PLAIN'.

另附一句命令行的连接impalad的方法:

#kinit first
impala-shell  -u username  -k

https://github.com/cloudera/thrift_sasl/releases

 

Python3 连接impala正解

 

# installation for python 3.5.1 , 3.7 
# python 3.9 not support, error happened.
sudo pip3 install impyla
sudo pip3 install thrift_sasl

pip3 install pure-sasl==0.5.1
pip3 install thrift-sasl==0.2.1 --no-deps
pip3 install thrift==0.9.3
pip3 install impyla==0.14.1
pip3 install bitarray==0.8.3
pip3 install thriftpy==0.3.9

# TypeError: can't concat str to bytes

vi /opt/python3.5/lib/python3.5/site-packages/thrift_sasl/__init__.py

# 定位到错误的最后一条,在init.py第94行 (注意代码的缩进)
header = struct.pack(">BI", status, len(body))
self._trans.write(header + body)

更改为:
header = struct.pack(">BI", status, len(body))
if(type(body) is str):
    body = body.encode() 
self._trans.write(header + body)

impyla               0.16.2
thrift                   0.13.0
thrift-sasl           0.4.2
thriftpy               0.3.9
thriftpy2             0.4.11
pure-sasl            0.6.2

问题1.

from thriftpy.transport import TTransportException, TTransportBase, readall
ImportError: cannot import name 'TTransportException'
问题2.

'TSocket' object has no attribute 'isOpen bug: https://github.com/cloudera/impyla/issues/268

'TSaslClientTransport' object has no attribute 'readAll': https://github.com/dropbox/PyHive/issues/151

解决方案:

https://github.com/dropbox/PyHive/commit/5322d8f1420b033ba7446449b5cca2cbf9f6fbc4

pip3 install git+https://github.com/cloudera/thrift_sasl

同时使用impala和pyHive请注意import顺序

连Hive

Python library 版本:

thrift                    0.11.0
thrift-sasl            0.3.0 (使用非release版本, 而是用上面的URL来安装)
thriftpy                0.3.9
PyHive                0.6.1
 

kerberos + LDAP 的权限体系 

from pyhive import hive
mcon=hive.connect(host='bd-master01-pe2.f.cn',port=10000,username='someone',password='password',auth='LDAP')
cs = mcon.cursor()
cs.execute('show database')
print(cs.fetchall())
cs.close()
mcon.close()

Kerberos权限体系

from pyhive import hive
import pandas as pd
hcon=hive.connect(host='bd-master01-pe2.f.cn',port=10000,auth ='KERBEROS',kerberos_service_name='hive')
hdata = pd.read_sql('show databases',hcon)
print(hdata)

 

python2 装 impyla,准备工作:

sudo pip install --upgrade setuptools
sudo yum install -y gcc libffi-devel python-devel openssl-devel gcc-c++
sudo yum install python-devel openldap-devel

 

python2 装 hive

在终端里输入下列命令

pip install pyhive[hive]

注意这里要加上[hive]后缀,否则有些关联的包装不上,会导致报错,我就遇到如下报错信息:

ImportError: cannot import name TFrozenDict 错误

 

impyla 0.14.2.2对 thrift 库的要求是<=0.9.3, 而pyhive 0.6.1不兼容thrift 0.9.3 ,pyhive用的是0.13.0

impyla 0.14.2.2 has requirement thrift<=0.9.3, but you'll have thrift 0.13.0 which is incompatible.

所以 impyla和pyhive 不兼容

  相关解决方案