当前位置: 代码迷 >> 综合 >> 【Hive】distinct on different columns not supported with skew in data
  详细解决方案

【Hive】distinct on different columns not supported with skew in data

热度:88   发布时间:2024-02-28 14:02:02.0

今天运行hive时候报错,如下:

distinct on different columns not supported with skew in data

由于对多个列执行去重操作导致,如以下代码:

select id,count(distinct col1) as cnt1,count(distinct col2) as cnt2 
from table_name 
group by id

报这个错误的原因与hive的环境变量hive.groupby.skewindata相关
默认情况下环境变量hive.groupby.skewindata=True; hive不支持多列上的去重操作。
因此只要在hive语句前面加上语句set hive.groupby.skewindata=False;即可,如下代码:

set hive.groupby.skewindata=False;
select id,count(distinct col1) as cnt1,count(distinct col2) as cnt2 
from table_name 
group by id
  相关解决方案