前言
Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储。
HDFS(Hadoop Distributed File System):一个分布式文件系统
HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。
Hadoop的框架最核心的设计就是:HDFS和MapReduce
- HDFS为海量的数据提供了存储
- MapReduce为海量的数据提供了计算。
前期准备
1 更新系统
保持你的Ubuntu所程序和部件都为最新版本
sudo apt-get update
2 安装vim
sudo apt-get install vim
3 安装JDK
sudo apt-get install default-jdk
java -version
4 设定SSH(Secure Shell)无密码登录
sudo apt-get install ssh
sudo apt-get install rsync
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
ll /home/<用户名>/.ssh
ll ~/.ssh
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh localhost
安装hadoop
1 下载安装Hadoop
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.0.1/hadoop-3.0.1.tar.gz
sudo tar -zxvf hadoop-3.0.1.tar.gz
sudo mv hadoop-3.0.1 /usr/local/hadoop
ll /usr/local/hadoop
2 配置Hadoop环境变量
- 打开文件
sudo vim ~/.bashrc
- 添加内容:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export PATH=${JAVA_HOME}/bin:${PATH}
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
- 使编辑好的脚本立即生效
source ~/.bashrc
3 修改Hadoop文件
- 修改hadoop-env.sh
sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
- 修改core-site.xml
sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml
<property><name>fs.defaultFS</name><value>hdfs://localhost:9000</value>
</property>
- 修改yarn-site.xml
sudo vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
<property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value>
</property>
<property><name>yarn.nodemanager.env-whitelist</name><value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<property><name>yarn.nodemanager.vmem-pmem-ratio</name><value>3</value>
</property>
- 修改mapred-site
sudo vim /usr/local/hadoop/etc/hadoop/mapred-site.xml
<property><name>mapreduce.framework.name</name><value>yarn</value>
</property>
<property><name>mapreduce.application.classpath</name><value>/usr/local/hadoop/etc/hadoop,/usr/local/hadoop/share/hadoop/common/lib/*,/usr/local/hadoop/share/hadoop/common/*,/usr/local/hadoop/share/hadoop/hdfs/lib/*,/usr/local/hadoop/share/hadoop/hdfs/*,/usr/local/hadoop/share/hadoop/mapreduce/*,/usr/local/hadoop/share/hadoop/yarn/lib/*,/usr/local/hadoop/share/hadoop/yarn/*</value>
</property>
- 修改hdfs-site.xml
sudo vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<property><name>dfs.replication</name><value>1</value>
</property>
<property><name>dfs.namenode.name.dir</name><value>file:/usr/local/hadoop/hadoop_data/hdfs/namenode</value>
</property>
<property><name>dfs.datanode.data.dir</name><value>file:/usr/local/hadoop/hadoop_data/hdfs/datanode</value>
</property>
4 建?HDFS目录
sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
5 给Hadoop文件授权
sudo chown <用户名> : <组名> -R /usr/local/hadoop
6 格式化HDFS目录
hadoop namenode -format
7 启动hadoop
start-dfs.sh start-yarn.sh 或 start-all.sh
- 关闭Hadoop服务命令
stop-all.sh
8 检查进程(jps)
jps
15157 NameNode
15273 DataNode
16281 Jps
15450 SecondaryNameNode
15786 ResourceManager
15900 NodeManager
注意: 算JPS进程一共六个,少一个代表你搭建过程有问题,一般问题都是配置文件的问题,你可以通过查看日志排错!
日志位置 /usr/local/hadoop/logs
ls /usr/local/hadoop/logs
hadoop-zwg-datanode-zwg.log hadoop-zwg-resourcemanager-zwg.log
hadoop-zwg-datanode-zwg.out hadoop-zwg-resourcemanager-zwg.out
hadoop-zwg-datanode-zwg.out.1 hadoop-zwg-resourcemanager-zwg.out.1
hadoop-zwg-namenode-zwg.log hadoop-zwg-secondarynamenode-zwg.log
hadoop-zwg-namenode-zwg.out hadoop-zwg-secondarynamenode-zwg.out
hadoop-zwg-namenode-zwg.out.1 hadoop-zwg-secondarynamenode-zwg.out.1
hadoop-zwg-nodemanager-zwg.log SecurityAuth-zwg.audit
hadoop-zwg-nodemanager-zwg.out userlogs
hadoop-zwg-nodemanager-zwg.out.1
9 开启Hadoop Web界面
- ResourceManager地址
http://localhost:8088/
- NameNode HDFS Web地址
http://localhost:9870/
END!