Hive数据类型、 explode、自定义UDF_综合

Hive数据类型、 explode、自定义UDF

一、基本类型
在这里插入图片描述
二、复杂类型

三、数组类型 array
　　案例一、
　　元数据：
　　　100,200,300
　　　200,300,500
　　建表语句：create external table ex(vals array) row format delimited fields terminated by ‘\t’ collection items terminated by ‘,’ location ‘/ex’;
　　查询语句：
　　　查询每行数组的个数　select size(vals) from ex;
　　　select vals[0] from ex;查询的是第一行的数据。
　　注：hive 内置函数不具备查询某个具体行的数组元素。需要自定义函数来实现
　　案例二、
　　元数据：
　　　100,200,300 tom,jary
　　　200,300,500 rose,jack
　　建表语句：create external table ex1(info1 array,info2 array) row format delimited fields terminated by ‘\t’ collection items terminated by ‘,’ location
‘/ex’;
四、map类型
　　案例一、
　　元数据：
　　　tom,23
　　　rose,25
　　　jary,28
　　建表语句：
　　　create external table m1 (vals map<string,int>) row format delimited fields terminated by ‘\t’ map keys terminated by ‘,’ location ‘/map’;
　　如果是map类型，列分隔符必须是\t
　　查询语句：select vals[‘tom’] from ex;
五、struct 类型（对象类型）
　　元数据：
　　　tom 23
　　　rose 22
　　　jary 26
　　建表语句：
　　　create external table ex (vals structname:string,age:int)row format delimited collection items terminated by ‘,’ location ‘/m1’;
　　查询语句：select vals.age from s1 where vals.name=‘tom’;
六、collect_set
　　collect_set 函数用于数据去重，并将结果形成数组返回
七、 explode
　　explode 命令可以将行数据，按指定规则切分出多行
　　原数据：
　　　100,200,300
　　　200,300,500
　　创建表：：create external table ex1 (num string) location ‘/ex’;
　　注：用explode做行切分，注意表里只有一列，并且行数据是string类型，因为只有字符类型才能做切分。
　　通过explode指令来做行切分：执行：select explode(split(num,’,’)) from ex1;
八、UDF
　　如果hive的内置函数不够用，我们也可以自己定义函数来使用，这样的函数称为hive的用户自定义函数，简称UDF。
　　实现步骤：
　　　1.新建java工程，导入hive相关包，导入hive相关的lib。
　　　2.创建类继承UDF
　　　3.自己编写一个evaluate方法，返回值和参数任意

import org.apache.hadoop.hive.ql.exec.UDF;public class ToUpper extends UDF{public String evaluate(String str){return str.toUpperCase();｝
｝

4.为了能让mapreduce处理，String要用Text处理。
　　5.将写好的类打成jar包，上传到linux中
　　6.在hive命令行下，向hive注册UDF：add jar /xxxx/xxxx.jar
　　7.在hive命令行下，为当前udf起一个名字：create temporary function fname as ‘类的全路径名’;
　　8.之后就可以在hql中使用该自定义函数了。