当前位置: 代码迷 >> 综合 >> sqoop-1.4.6安装及常见问题解决
  详细解决方案

sqoop-1.4.6安装及常见问题解决

热度:95   发布时间:2024-01-09 20:00:47.0

环境:ubuntu16.0.4,hadoop-2.7.3,hbase-1.3.0,zookeeper-3.4.10,hive-2.1.1


从网上下载sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz。点我下载


然后解压到用户主目录下面

ubuntuserver@Master:~$tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

重命名

ubuntuserver@Master:~$sudo mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha sqoop-1.4.6

移动到/usr/local

ubuntuserver@Master:~$sudo mv sqoop-1.4.6 /usr/local/

修改sqoop-env.sh文件

ubuntuserver@Master:/usr/local/sqoop-1.4.6/conf$sudo cp sqoop-env-template.sh sqoop-env.sh

ubuntuserver@Master:/usr/local/sqoop-1.4.6/conf$sudo vim sqoop-env.sh

修改内容如下:

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*# Set Hadoop-specific environment variables here.#Set path to where bin/hadoop is available
#export HADOOP_COMMON_HOME=export HADOOP_COMMON_HOME=/usr/local/hadoop#Set path to where hadoop-*-core.jar is available
#export HADOOP_MAPRED_HOME=export HADOOP_MAPRED_HOME=/usr/local/hadoop#set the path to where bin/hbase is available
#export HBASE_HOME=export HBASE_HOME=/usr/local/hbase-1.3.0#Set the path to where bin/hive is available
#export HIVE_HOME=export HIVE_HOME=/usr/local/apache-hive-2.1.1-bin#Set the path for where zookeper config dir is
#export ZOOCFGDIR=export ZOOCFGDIR=/usr/local/zookeeper-3.4.10

修改环境变量

ubuntuserver@Master:~$sudo vim /etc/profile

添加以下内容:

export SQOOP_HOME=/usr/local/sqoop-1.4.6

export PATH=$SQOOP_HOME/bin:$PATH


保存文件,执行source /etc/profile使环境变量生效。


执行configure-sqoop命令,如果报如下错误的话:

ubuntuserver@Master:~$ configure-sqoop
Warning: /usr/local/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
ubuntuserver@Master:~$ 

在$SQOOP_HOME/bin目录下面修改configure-sqoop文件,注释掉以下内容:


#if [ -z "${HCAT_HOME}" ]; then
#  if [ -d "/usr/lib/hive-hcatalog" ]; then
#    HCAT_HOME=/usr/lib/hive-hcatalog
#  elif [ -d "/usr/lib/hcatalog" ]; then
#    HCAT_HOME=/usr/lib/hcatalog
#  else
#    HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog
#    if [ ! -d ${HCAT_HOME} ]; then
#       HCAT_HOME=${SQOOP_HOME}/../hcatalog
#    fi
#  fi
#fi
#if [ -z "${ACCUMULO_HOME}" ]; then
#  if [ -d "/usr/lib/accumulo" ]; then
#    ACCUMULO_HOME=/usr/lib/accumulo
#  else
#    ACCUMULO_HOME=${SQOOP_HOME}/../accumulo
#  fi
#fi## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
#  echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
#  echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi#if [ ! -d "${ACCUMULO_HOME}" ]; then
#  echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
#  echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi


接下来验证是否可以使用。先执行start-dfs.sh与start-yarn.sh启动hadoop集群。然后执行hive命令进入到CLI。



现在可以看到要上传的表(techbbs_2017_06_18),表结构,已经表的内容。

然后进入到mysql,创建数据库以及表。


验证是否可以连接mysql:

执行命令:sqoop list-databases -connect jdbc:mysql://localhost:3306 -username root -password 123456


注意,经本人实验,这里的连接语句中的localhost要根据hive的元数据库配置来写,如果是使用的本地mysql数据库作为元数据库,则需要写localhost,如果是使用的远程服务器的mysql作为hive的元数据库,则应该要写ip地址。

可以看到红色框中有个techbbs,表明连接成功了。


现在开始从hdfs导入数据到mysql中,执行以下命令:

sqoop export --connect jdbc:mysql://localhost:3306/techbbs --username root --password 123456 --table techbbs_logs_stat --fields-terminated-by '\001' --export-dir '/user/hive/warehouse/techbbs_2017_06_18'

注意该命令中没有换行。


显示成功导入了一条数据到mysql中,现在到mysql中看是否一致:


可以看到mysql中的数据跟在hdfs中查询到的数据是完全一致的。

如果在导入的时候报如下的错误:

java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiatedat java.util.ServiceLoader.fail(ServiceLoader.java:232)at java.util.ServiceLoader.access$100(ServiceLoader.java:185)at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)at java.util.ServiceLoader$1.next(ServiceLoader.java:480)at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2631)at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2650)at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)at org.apache.sqoop.mapreduce.ExportJobBase.getInputPath(ExportJobBase.java:223)at org.apache.sqoop.mapreduce.ExportJobBase.getInputFileType(ExportJobBase.java:505)at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(JdbcExportJob.java:68)at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:424)at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:912)at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)at org.apache.sqoop.Sqoop.run(Sqoop.java:143)at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.NoClassDefFoundError: com/amazonaws/AmazonServiceExceptionat java.lang.Class.getDeclaredConstructors0(Native Method)at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)at java.lang.Class.getConstructor0(Class.java:3075)at java.lang.Class.newInstance(Class.java:412)at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)... 23 more
Caused by: java.lang.ClassNotFoundException: com.amazonaws.AmazonServiceExceptionat java.net.URLClassLoader.findClass(URLClassLoader.java:381)at java.lang.ClassLoader.loadClass(ClassLoader.java:424)at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)at java.lang.ClassLoader.loadClass(ClassLoader.java:357)... 28 more

解决方法:

把$HADOOP_HOME/share/hadoop/tools/lib中的aws-java-sdk*.jar文件复制到sqoop的lib目录当中,则不会报错了。

可以参考点此参考该问题解决方法。

  相关解决方案