背景
记录一下前一阵spark连接hive和HBase的过程,主要得保证主机和虚拟机的主机名映射是一致的
步骤
1、首先保证windows的hosts文件、CentOS的hosts文件、CentOS的hostname文件中的待连接ip对应的主机名是一致的
比如我要连接的ip是192.168.57.141,那我的windows下的C:\Windows\System32\drivers\etc\hosts文件中相应内容为
192.168.57.141 scentos
虚拟机中/etc/hosts中相应内容为(注意下面的localhost部分也别少,否则windows还是连不过来)
192.168.57.141 scentos
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
虚拟机中/etc/hostname中相应内容为
scentos
虚拟机中两个文件内容修改后重启才能生效,也可以用下面命令临时改变主机名
[root@scentos spark-2.1.0]# hostname scentos
2、把hive-site.xml复制到spark目录的conf目录下,关闭tez引擎
3、检查mysql中hive的SDS和DBS表,如果以前hdfs有用localhost存过数据,一律把localhost改成真实的ip
mysql> update SDS set LOCATION=REPLACE (LOCATION,'hdfs://localhost:8020/user/hive/warehouse','hdfs://192.168.57.141:8020/user/hive/warehouse');
mysql> update DBS set DB_LOCATION_URI=REPLACE (DB_LOCATION_URI,'hdfs://localhost:8020/user/hive/warehouse','hdfs://192.168.57.141:8020/user/hive/warehouse');
4、把hive-site.xml、core-site.xml和hdfs.xml复制到idea中的resource目录下,其中hive-site.xml复制后要关闭tez引擎
5、在idea项目的pom文件中引入依赖
<dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_2.11</artifactId><version>2.1.0</version>
</dependency>
<dependency><groupId>org.apache.spark</groupId><artifactId>spark-hive_2.11</artifactId><version>2.1.0</version>
</dependency>
<dependency><groupId>org.apache.hive</groupId><artifactId>hive-hbase-handler</artifactId><version>2.3.5</version>
</dependency>
6、启动spark集群、hive的metastore和hbase集群后,在项目中写下如下代码,其中zookeeperIp、zookeeperPort和hbaseMasterURL换成自己的ZooKeeper地址、端口和HBase的地址
SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("ActionConsumer").set("spark.serializer", KryoSerializer.class.getCanonicalName()).registerKryoClasses(new Class[]{ConsumerRecord.class}).set("spark.kryoserializer.buffer.max", "512m").set("hbase.zookeeper.quorum", zookeeperIp).set("hbase.zookeeper.property.clientPort", zookeeperPort).set("hbase.master", hbaseMasterURL);SparkSession session = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate();Dataset<Row> rawData = session.sql("select * from profile");rawData.show();
把session.sql()的参数换成自己的sql语句,然后编译运行即可