Docker Container Executor (DCE) 是 Hadoop 2.6.0 包含的一个重要特性:至此,大数据的巨人 Hadoop 也终于可以利用当前虚拟化/云计算领域的宠儿 Docker 的强大能力了。
对于 Docker 概念介绍的文章已经非常多了,因此本文就不再赘述,仅仅直接引用 Hadoop 社区里对其的描述:“Docker (https://www.docker.io/) combines an easy-to-use interface to Linux containers with easy-to-construct image files for those containers. In short, Docker launches very light weight virtual machines.” 而 Hadoop 使用 Docker 能力主要是借助其的新组件 Docker Container Executor (DCE)。利用 DCE,YARN NodeManager 可以将 YARN 容器执行于 Docker 容器中。
根据 Hadoop 社区的文章http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html, 我们可以试用该功能。在试用前,需要事先安装好docker组件,并下载 docker image sequenceiq/hadoop-docker:2.4.1。
另外,也要更改 yarn-site.xml 配置:
<property><name>yarn.nodemanager.docker-container-executor.exec-name</name><value>/usr/bin/docker</value><description>Name or path to the Docker client. This is a required parameter. If this is empty,user must pass an image name as part of the job invocation(see below).</description>
</property><property><name>yarn.nodemanager.container-executor.class</name><value>org.apache.hadoop.yarn.server.nodemanager.DockerContainerExecutor</value><description>This is the container executor setting that ensures that all
jobs are started with the DockerContainerExecutor.</description>
</property>
最后,可以通过命令提交 Hadoop MapReduce job teragen 到 resourcemanger:“hadoop jarshare/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar teragen-Dmapreduce.map.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.4.1"-Dyarn.app.mapreduce.am.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.4.1"10000 /tmp/teragen”。
MapReduce Job 执行:
在 MR job 运行期间,我们能观察到有3个 docker container 也在被执行—— MR job 正式运行于它们之中:
当然,成功执行完后也能从 ResourceManger web console上面看到执行结果: