今天运行hive时候报错,如下:
distinct on different columns not supported with skew in data
由于对多个列执行去重操作导致,如以下代码:
select id,count(distinct col1) as cnt1,count(distinct col2) as cnt2
from table_name
group by id
报这个错误的原因与hive的环境变量hive.groupby.skewindata
相关
默认情况下环境变量hive.groupby.skewindata=True;
hive不支持多列上的去重操作。
因此只要在hive语句前面加上语句set hive.groupby.skewindata=False;
即可,如下代码:
set hive.groupby.skewindata=False;
select id,count(distinct col1) as cnt1,count(distinct col2) as cnt2
from table_name
group by id