当前位置: 代码迷 >> SQL >> spark sql on hive安装有关问题解析
  详细解决方案

spark sql on hive安装有关问题解析

热度:345   发布时间:2016-05-05 10:55:59.0
spark sql on hive安装问题解析

安装spark时,默认的spark assembly 不包含hive支持。spark官网上说明“Spark SQL also supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, it is not included in the default Spark assembly.” ,要想spark sql在hive上运行,需要编辑与自己使用spark版本相同的源码,将依赖包重新打入assembly中,编译后将所需要的包加入到之前spark安装位置。

1、首先重新编译与使用版本一样的spark源码

本文hadoop版本为2.3.0-cdh5.1.2,spark版本为1.0.2。

本文是用sbt工具进行编译,也可使用maven编译。

编译过程如下:

修改spark1.0.2/project/SparkBuild.scala文件,如下:

val DEFAULT_HADOOP_VERSION = "2.3.0-cdh5.1.2"val DEFAULT_YARN = trueval DEFAULT_HIVE = true

执行命令:sbt/bin/sbt spark1.0.2/assembly

等待编译,时间较长

编译结束后,查看spark-1.0.2/assembly/target/scala-2.10目录下,有新生成的jar包,本文生成的jar包为spark-assembly-1.0.2-hadoop2.3.0-cdh5.1.2.jar

此外,源码中spark-1.0.2/lib_managed/jars目录下也含有依赖包。

2、配置sqark sql on hive依赖包

首先执行spark-shell,查看下缺什么包。

<span style="font-size:14px;">./spark-shell \  --master yarn-client \  --driver-class-path $(echo /opt/cloudera/parcels/CDH/lib/hadoop-yarn/*.jar |sed 's/ /:/g'):/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-hdfs/hadoop-hdfs-2.3.0-cdh5.1.2.jar</span>

然后执行hql语句

<span style="font-size:14px;">val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)</span>

结果出现如下错误

<span style="font-size:14px;">val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)<console>:12: error: object hive is not a member of package org.apache.spark.sql       val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)</span>
从错误中可以看出是缺少org.apache.spark.sql.hive包。

此时将编译的spark-assembly-1.0.2-hadoop2.3.0-cdh5.1.2.jar包放入spark/assembly/lib目录下。

再次运行

./spark-shell \  --master yarn-client \  --driver-class-path $(echo /opt/cloudera/parcels/CDH/lib/hadoop-yarn/*.jar |sed 's/ /:/g'):/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-hdfs/hadoop-hdfs-2.3.0-cdh5.1.2.jar
之后执行语句

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
出现如下错误:

<span style="font-size:14px;">ERROR DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient        at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1074)        at org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2198)        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:328)        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)        at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:189)        at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:163)        at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)        at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)        at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:38)        at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:250)        at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:250)        at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)        at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:104)        at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:75)        at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:78)        at $line9.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:15)        at $line9.$read$$iwC$$iwC$$iwC.<init>(<console>:20)        at $line9.$read$$iwC$$iwC.<init>(<console>:22)        at $line9.$read$$iwC.<init>(<console>:24)        at $line9.$read.<init>(<console>:26)        at $line9.$read$.<init>(<console>:30)        at $line9.$read$.<clinit>(<console>)        at $line9.$eval$.<init>(<console>:7)        at $line9.$eval$.<clinit>(<console>)        at $line9.$eval.$print(<console>)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)        at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)        at org.apache.spark.repl.Main$.main(Main.scala:31)        at org.apache.spark.repl.Main.main(Main.scala)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1212)        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2372)        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2383)        at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1072)        ... 59 moreCaused by: java.lang.reflect.InvocationTargetException        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1210)        ... 64 moreCaused by: javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.NestedThrowables:java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory        at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)        at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:275)        at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:304)        at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:234)        at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:209)        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)        at org.apache.hadoop.hive.metastore.RetryingRawStore.<init>(RetryingRawStore.java:64)        at org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:73)        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:415)        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:402)        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:441)        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:326)        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:286)        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)        at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4060)        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:121)        ... 69 moreCaused by: java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory        at scala.tools.nsc.interpreter.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:83)        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)        at java.lang.Class.forName0(Native Method)        at java.lang.Class.forName(Class.java:247)        at javax.jdo.JDOHelper$18.run(JDOHelper.java:2018)        at javax.jdo.JDOHelper$18.run(JDOHelper.java:2016)        at java.security.AccessController.doPrivileged(Native Method)        at javax.jdo.JDOHelper.forName(JDOHelper.java:2015)        at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1162)</span>

从以上错误可以看出缺少datanucleus类jar包,将源码中spark-1.0.2/lib_managed/jars目录下的datanucleus-api-jdo-3.2.1.jar,datanucleus-core-3.2.2.jar,datanucleus-rdbms-3.2.1.jar在执行时使用--jars参数导入。

运行

<span style="font-size:14px;">./spark-shell \  --master yarn-client \  --driver-class-path $(echo /opt/cloudera/parcels/CDH/lib/hadoop-yarn/*.jar |sed 's/ /:/g'):/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-hdfs/hadoop-hdfs-2.3.0-cdh5.1.2.jar \  --jars /opt/cloudera/parcels/CDH/lib/spark/libs/datanucleus-api-jdo-3.2.1.jar,/opt/cloudera/parcels/CDH/lib/spark/libs/datanucleus-core-3.2.2.jar,/opt/cloudera/parcels/CDH/lib/spark/libs/datanucleus-rdbms-3.2.1.jar </span>

之后执行语句

<span style="font-size:14px;">val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)sqlContext.hql("show databases").collect().foreach(println)</span>

结果显示没有结果,不正确,没有读取到hive-site.xml

将hive-site.xml文件放入spark/conf目录下,重新运行

<span style="font-size:14px;">./spark-shell \  --master yarn-client \  --driver-class-path $(echo /opt/cloudera/parcels/CDH/lib/hadoop-yarn/*.jar |sed 's/ /:/g'):/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-hdfs/hadoop-hdfs-2.3.0-cdh5.1.2.jar \  --jars /opt/cloudera/parcels/CDH/lib/spark/libs/datanucleus-api-jdo-3.2.1.jar,/opt/cloudera/parcels/CDH/lib/spark/libs/datanucleus-core-3.2.2.jar,/opt/cloudera/parcels/CDH/lib/spark/libs/datanucleus-rdbms-3.2.1.jar </span>

之后执行语句

<span style="font-size:14px;">val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)sqlContext.hql("show databases").collect().foreach(println)</span>

结果显示正常,配置结束。
  相关解决方案