当前位置: 代码迷 >> 综合 >> Hadoop3.0初探-部署安装Hadoop
  详细解决方案

Hadoop3.0初探-部署安装Hadoop

热度:93   发布时间:2024-02-12 23:04:33.0

Hadoop3.0初探

目录

  • Hadoop3.0初探
    • 为什么要搞这篇博客呢?
    • 安装包下载
    • 首先配置ssh免密登录
    • 部署Hadoop
      • 安装前检查本地环境并安装jdk
        • 下载地址
        • 安装成功校验
      • 安装Hadoop
        • 下载对应安装包
        • 解压 && 并修改配置文件
      • 报错异常一
      • 报错异常二
      • 最后再贴一个环境变量的配置

为什么要搞这篇博客呢?

就是目前hadoop3.0 也出来很久了呢,想着自己本地搭建一套玩一下~

MacBook Pro (13-inch, 2020, Four Thunderbolt 3 ports)

2 GHz 四核Intel Core i5

16 GB 3733 MHz LPDDR4

存储:1T

然后我选择了开源的产品来搭建一套本地环境,为什么吗? 开源无极限,哈哈哈…

参考CDH6.0 选择离线组件如下:https://archive.cloudera.com/cdh6/6.3.2/docs/
在这里插入图片描述

安装包下载

准备本地安装这一套东东,像jdk、MySQL在此不再描述了,MAC版本的很好装,全程next即可。
在这里插入图片描述

首先配置ssh免密登录

进入 ~/.ssh 目录下执行命令生成公钥、私钥,并将id_rsa.pub里的内容追加复制到目标主机authorized_keys文件中

注:免密登陆对用户有要求,登陆哪个用户就修改哪个用户下的公钥文件

# 生成公钥私钥
ssh-keygen -t rsa
# 配置目标主机免密
cat id_rsa.pub >> authorized_keys# 测试是否可以免密登录
ssh hostname or localhost

上述三个命令操作一波即可,如果找不到~/.ssh 目录的话,就先执行第三行,访问一次即可自动创建。

Mac系统的话,操作完可能还是不行报错如下

(base) zhangchenguang@cgzhang.local:/Users/zhangchenguang/.ssh $ ssh zhangchenguang
ssh: Could not resolve hostname zhangchenguang: nodename nor servname provided, or not known

如何解决呢?原因很简单,本地没有开启远程登录

操作说明:start–系统偏好设置–共享–开启远程登录–end
在这里插入图片描述

部署Hadoop

安装前检查本地环境并安装jdk

前提是已部署安装jdk环境哦~ 至于怎么装,不想多bb了。

1、下载包 2、解压 3、配置环境变量 4、测试安装是否ok即可。

mac版本的安装jdk的方式very easy,一路next即可。

下载地址

https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html

可能还需要登录oracle账号呦,自己用邮箱注册一个就好了

在这里插入图片描述

安装成功校验

(base) zhangchenguang@cgzhang.local:/Users/zhangchenguang/.ssh $ Java -version
java version "1.8.0_261"
Java(TM) SE Runtime Environment (build 1.8.0_261-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)

如上,jdk已安装完毕~

安装Hadoop

下载对应安装包

我来提供个最全的包下载地址,上去找就行了,哪个版本都有。地址:https://archive.apache.org/dist/

解压 && 并修改配置文件

参考官网配置即可,地址如下:

https://hadoop.apache.org/docs/r3.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation

  • 解压并检查安装包是否正常
(base) zhangchenguang@cgzhang.local:/Users/zhangchenguang/software $ tar -xzvf hadoop-3.3.0.tar.gz -C ~/software
(base) zhangchenguang@cgzhang.local:/Users/zhangchenguang/software $ ll
total 0
drwxr-xr-x   9 zhangchenguang  staff   288B  7 21 09:00 apache-maven-3.5.4
drwxr-xr-x   4 zhangchenguang  staff   128B  7 22 16:51 gitee_git_workspace
drwxr-xr-x  15 zhangchenguang  staff   480B  7  7 03:50 hadoop-3.3.0
(base) zhangchenguang@cgzhang.local:/Users/zhangchenguang/software/hadoop-3.3.0 $ ./bin/hadoop version
Hadoop 3.3.0
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r aa96f1871bfd858f9bac59cf2a81ec470da649af
Compiled by brahma on 2020-07-06T18:44Z
Compiled with protoc 3.7.1
From source with checksum 5dc29b802d6ccd77b262ef9d04d19c4
This command was run using /Users/zhangchenguang/software/hadoop-3.3.0/share/hadoop/common/hadoop-common-3.3.0.jar
  • 修改对应配置文件 ($HADOOP_HOME/etc/hadoop)

    • hadoop-env.sh

    首行添加JAVA安装路径保存退出即可

    $ more ~/.bash_profile
    # maven
    export M2_HOME=/Users/zhangchenguang/software/apache-maven-3.5.4
    export PATH=$PATH:$M2_HOME/bin# java
    export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_261.jdk/Contents/Home
    export PATH=$PATH:$JAVA_HOME/bin# scala
    export SCALA_HOME=/Users/zhangchenguang/software/scala-2.12.12
    export PATH=$PATH:$SCALA_HOME/bin$ vi hadoop-env.sh
    $ more hadoop-env.sh
    JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_261.jdk/Contents/Home
    ......
    
    • 修改core-site.xml
    <configuration><property><name>fs.defaultFS</name><value>hdfs://cgzhang.local:9000</value></property><!-- 指定hadoop临时目录 --><property><name>hadoop.tmp.dir</name><value>/Users/zhangchenguang/software/hadoop-3.3.0/tmp/${user.name}</value></property>
    </configuration>
    
    • 修改hdfs-site.xml
    <configuration><property><name>dfs.replication</name><value>1</value></property>
    </configuration>
    
    • 修改mapred-site.xml
    <configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.application.classpath</name><value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value></property>
    </configuration>
    
    • 修改yarn-site.xml
    <configuration><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.nodemanager.env-whitelist</name><value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value></property>
    </configuration>
    

    在这里按照官网文档操作一通,遇到了两个问题,官网呀,还是不够全面呀,下面单独介绍下,就当装好了,哈哈~

  • 启动 && 测试

    • 不要忘记配置hadoop的环境变量呦~ 指定hadoop_home的bin、sbin即可
    # 格式化namenode
    hdfs namenode -format# 启动本地伪分布式集群
    start-all.sh
    

    然后你就会发现,没有报错竟然…

    But,你以为现在就结束了吗???

    NO、NO、NO~

报错异常一

  • 查看hdfs 根目录报错
(base) zhangchenguang@cgzhang.local:/Users/zhangchenguang/software/hadoop-3.3.0/etc/hadoop $ hdfs dfs -ls /
2020-08-20 16:43:15,099 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: Call From cgzhang.local/127.0.0.1 to cgzhang.local:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
  • 如何解决?

怎么解决呢?看着好像端口没启动成功的样子,为啥子呢?

看了一下namenode的启动日志,并没有发现报错的点,为啥子呢?

仔细看了一下发现了点问题

在这里插入图片描述

  • 修改 主机名对应的ip地址后得以解决
sudo vi /etc/hosts
# 添加如下一行,则错误不见了~
10.0.198.200    cgzhang.local

报错异常二

在测试yarn的跑了个自带的MapReduce任务的时候,发现又报错了~

  • 错误信息如下
(base) zhangchenguang@cgzhang.local:/Users/zhangchenguang/software/hadoop-3.3.0 $ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar pi 1 1
Number of Maps  = 1
Samples per Map = 1
2020-08-20 17:47:25,452 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Starting Job
2020-08-20 17:47:26,774 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2020-08-20 17:47:27,132 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/zhangchenguang/.staging/job_1597916823640_0001
2020-08-20 17:47:27,230 INFO input.FileInputFormat: Total input files to process : 1
2020-08-20 17:47:27,679 INFO mapreduce.JobSubmitter: number of splits:1
2020-08-20 17:47:27,782 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1597916823640_0001
2020-08-20 17:47:27,782 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-08-20 17:47:27,923 INFO conf.Configuration: resource-types.xml not found
2020-08-20 17:47:27,923 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-08-20 17:47:28,172 INFO impl.YarnClientImpl: Submitted application application_1597916823640_0001
2020-08-20 17:47:28,219 INFO mapreduce.Job: The url to track the job: http://cgzhang.local:8088/proxy/application_1597916823640_0001/
2020-08-20 17:47:28,220 INFO mapreduce.Job: Running job: job_1597916823640_0001
2020-08-20 17:47:31,255 INFO mapreduce.Job: Job job_1597916823640_0001 running in uber mode : false
2020-08-20 17:47:31,256 INFO mapreduce.Job:  map 0% reduce 0%
2020-08-20 17:47:31,270 INFO mapreduce.Job: Job job_1597916823640_0001 failed with state FAILED due to: Application application_1597916823640_0001 failed 2 times due to AM Container for appattempt_1597916823640_0001_000002 exited with  exitCode: 127
Failing this attempt.Diagnostics: [2020-08-20 17:47:31.026]Exception from container-launch.
Container id: container_1597916823640_0001_02_000001
Exit code: 127[2020-08-20 17:47:31.029]Container exited with a non-zero exit code 127. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
/bin/bash: /bin/java: No such file or directory[2020-08-20 17:47:31.029]Container exited with a non-zero exit code 127. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
/bin/bash: /bin/java: No such file or directoryFor more detailed output, check the application tracking page: http://cgzhang.local:8088/cluster/app/application_1597916823640_0001 Then click on links to logs of each attempt.
. Failing the application.
2020-08-20 17:47:31,288 INFO mapreduce.Job: Counters: 0
Job job_1597916823640_0001 failed!
  • 分析

咋回事儿呀?咋回事儿呀?咋回事儿呀?

我也没分析出来,在网上找到了说法,说是由于 hadoop_home/libexec/目录下hadoop-config.sh添加java_home

(base) zhangchenguang@cgzhang.local:/Users/zhangchenguang/software/hadoop-3.3.0 $ vi libexec/hadoop-config.sh
(base) zhangchenguang@cgzhang.local:/Users/zhangchenguang/software/hadoop-3.3.0 $ more libexec/hadoop-config.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_261.jdk/Contents/Home
......
  • 添加跑一波?

ok,搞定了

结果如下:

(base) zhangchenguang@cgzhang.local:/Users/zhangchenguang/software/hadoop-3.3.0 $ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar pi 1 1
Number of Maps  = 1
Samples per Map = 1
2020-08-20 17:59:56,476 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Starting Job
2020-08-20 17:59:57,887 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2020-08-20 17:59:58,242 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/zhangchenguang/.staging/job_1597917552239_0001
2020-08-20 17:59:58,754 INFO input.FileInputFormat: Total input files to process : 1
2020-08-20 17:59:58,787 INFO mapreduce.JobSubmitter: number of splits:1
2020-08-20 17:59:58,902 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1597917552239_0001
2020-08-20 17:59:58,902 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-08-20 17:59:59,042 INFO conf.Configuration: resource-types.xml not found
2020-08-20 17:59:59,043 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-08-20 17:59:59,304 INFO impl.YarnClientImpl: Submitted application application_1597917552239_0001
2020-08-20 17:59:59,360 INFO mapreduce.Job: The url to track the job: http://cgzhang.local:8088/proxy/application_1597917552239_0001/
2020-08-20 17:59:59,361 INFO mapreduce.Job: Running job: job_1597917552239_0001
2020-08-20 18:00:06,462 INFO mapreduce.Job: Job job_1597917552239_0001 running in uber mode : false
2020-08-20 18:00:06,463 INFO mapreduce.Job:  map 0% reduce 0%
2020-08-20 18:00:10,517 INFO mapreduce.Job:  map 100% reduce 0%
2020-08-20 18:00:15,561 INFO mapreduce.Job:  map 100% reduce 100%
2020-08-20 18:00:15,569 INFO mapreduce.Job: Job job_1597917552239_0001 completed successfully
2020-08-20 18:00:15,654 INFO mapreduce.Job: Counters: 50File System CountersFILE: Number of bytes read=28FILE: Number of bytes written=528949FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=278HDFS: Number of bytes written=215HDFS: Number of read operations=9HDFS: Number of large read operations=0HDFS: Number of write operations=3HDFS: Number of bytes read erasure-coded=0Job CountersLaunched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=2033Total time spent by all reduces in occupied slots (ms)=2162Total time spent by all map tasks (ms)=2033Total time spent by all reduce tasks (ms)=2162Total vcore-milliseconds taken by all map tasks=2033Total vcore-milliseconds taken by all reduce tasks=2162Total megabyte-milliseconds taken by all map tasks=2081792Total megabyte-milliseconds taken by all reduce tasks=2213888Map-Reduce FrameworkMap input records=1Map output records=2Map output bytes=18Map output materialized bytes=28Input split bytes=160Combine input records=0Combine output records=0Reduce input groups=2Reduce shuffle bytes=28Reduce input records=2Reduce output records=0Spilled Records=4Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=71CPU time spent (ms)=0Physical memory (bytes) snapshot=0Virtual memory (bytes) snapshot=0Total committed heap usage (bytes)=550502400Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format CountersBytes Read=118File Output Format CountersBytes Written=97
Job Finished in 17.841 seconds
Estimated value of Pi is 4.00000000000000000000

最后再贴一个环境变量的配置

刚开始我是加的,因为mapred-site配置文件中有HADOOP_MAPRED_HOME 但是我没有配置,所以加的,后面发现好像不加也不影响使用,看自己吧。加不加都行,我这里加了。

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

嗯嗯。。。未完待续,明天继续安装
哈哈哈。。。