此集群三个节点基于三台虚拟机(hadoop01、hadoop02、hadoop03)进行搭建,虚拟机安装的操作系统为Centos6.5,Hadoop版本选取为2.9.1。
实验过程
1、基础集群的搭建
下载并安装VMware WorkStation Pro,链接:https://pan.baidu.com/s/1rA30rE9Px5tDJkWSrghlZg 密码:dydq
下载CentOS镜像或者Ubuntu镜像都可,可以去官网下载,我这里使用的Centos6.5。
使用VMware安装linux系统,制作三台虚拟机。
2、集群配置
设置主机名:
vi /etc/sysconfig/network
修改内容:
HOSTNAME=hadoop01
三台虚拟机主机名分别为:hadoop01、hadoop02、hadoop03
修改hosts文件:
vi /etc/hosts
内容:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.216.15 www.hadoop01.com hadoop01
192.168.216.16 www.hadoop02.com hadoop02
192.168.216.17 www.hadoop03.com hadoop03
注意:三台虚拟机都做此操作
网络环境配置:
vi/etc/sysconfig/network-scripts/ifcfg-eth0
示例内容:
DEVICE=eth0
HWADDR=00:0C:29:0F:84:86
TYPE=Ethernet
UUID=70d880d5-6852-4c85-a1c9-2491c4c1ac11
ONBOOT=yes
IPADDR=192.168.216.111
PREFIX=24
GATEWAY=192.168.216.2
DNS1=8.8.8.8
DNS2=114.114.114.114
NM_CONTROLLED=yes
BOOTPROTO=static
DEFROUTE=yes
NAME="System eth0"
hadoop01:192.168.216.15
hadoop02:192.168.216.16
hadoop03:192.168.216.17
设置完后,可以通过ping进行网络测试
注意事项:通过虚拟机文件复制,可能会产生网卡MAC地址重复的问题,需要在VMware网卡设置中重新生成MAC,在虚拟机复制后需要更改内网网卡的IP。
安装jdk:
下载jdk,链接:
http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz
解压:
tar zxvf jdk-8u181-linux-x64.tar.gz -C /usr/local
配置环境变量:
vi /etc/profile
内容为:
export JAVA_HOME=/usr/local/jdk1.8.0_181 export PATH=$PATH:$JAVA_HOME/bin:
使之生效:
source /etc/profile
设置免密登陆:
免密登陆,效果也就是在hadoop01上,通过ssh登陆到对方计算机上时不用输入密码。(注:若没有安装ssh,先进行安装ssh)
首先在hadoop01 上进行如下操作:
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys
然后:hadoop01 —> hadoop02
ssh-copy-id hadoop02
然后:hadoop01 —> hadoop03
ssh-copy-id hadoop03
说明:
hadoop01给hadoop02发出请求信息,hadoop02接到去authorithd_keys找对应的ip和用户名,能找到则用其对应公钥加密一个随机字符串(找不到则输入密码),然后将加密的随机字符串返回给hadoop01,hadoop01接到加密后的字符然后用自己的私钥解密,然后再发给HAdoop02,hadoop02判断和加密之前的是否一样,一样则通过登录,不一样则拒绝。
如果需要对其他做免密操作,同理。
3、Hadoop安装配置
安装前三台节点都需要需要关闭防火墙:
service iptables stop : 停止防火墙 chkconfig iptables off :开机不加载防火墙
Hadoop安装
首先在hadoop01节点进行hadoop安装配置,之后在hadoop02和hadoop03进行同样的操作,我们可以复制hadoop文件至hadoop02和hadoop03上:scp -r /usr/local/hadoop-2.9.1 hadoop02:/usr/local/
下载Hadoop压缩包,下载地址:https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.9.1/hadoop-2.9.1.tar.gz
(1)解压并配置环境变量
tar -zxvf /home/hadoop-2.9.1.tar.gz -C /usr/local/
vi /etc/profile
export HADOOP_HOME=/usr/local/hadoop-2.9.1/ export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:
source /etc/profile : 使环境变量生效
hadoop version : 检测hadoop版本信息
[root@hadoop01 hadoop-2.9.1]# source /etc/profile [root@hadoop01 hadoop-2.9.1]# hadoop version Hadoop 2.9.1 Subversion https://github.com/apache/hadoop.git -r e30710aea4e6e55e69372929106cf119af06fd0e Compiled by root on 2018-04-16T09:33Z Compiled with protoc 2.5.0 From source with checksum 7d6d2b655115c6cc336d662cc2b919bd This command was run using /usr/local/hadoop-2.9.1/share/hadoop/common/hadoop-common-2.9.1.jar
此时环境变量配置成功。
(2)配置配置文件(注意,此时的操作是在hadoop解压目录下完成的)
[root@hadoop03 hadoop-2.9.1]# pwd /usr/local/hadoop-2.9.1
vi ./etc/hadoop/hadoop-env.sh :指定JAVA_HOME的路径
修改:
export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64
vi ./etc/hadoop/core-site.xml : 配置核心配置文件
fs.defaultFS hdfs://hadoop01:9000 io.file.buffer.size 4096 hadoop.tmp.dir /home/bigdata/tmp
vi ./etc/hadoop/hdfs-site.xml
dfs.replication 3 dfs.blocksize 134217728 dfs.namenode.name.dir /home/hadoopdata/dfs/name dfs.datanode.data.dir /home/hadoopdata/dfs/data fs.checkpoint.dir /home/hadoopdata/dfs/checkpoint/cname fs.checkpoint.edits.dir /home/hadoopdata/dfs/checkpoint/edit dfs.http.address hadoop01:50070 dfs.secondary.http.address hadoop02:50090 dfs.webhdfs.enabled true dfs.permissions false
cp ./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml
vi ./etc/hadoop/mapred-site.xml
mapreduce.framework.name yarn true mapreduce.jobhistory.address hadoop03:10020 mapreduce.jobhistory.webapp.address hadoop03:19888
vi ./etc/hadoop/yarn-site.xml
yarn.resourcemanager.hostname hadoop02 yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.address hadoop02:8032 yarn.resourcemanager.scheduler.address hadoop02:8030 yarn.resourcemanager.resource-tracker.address hadoop02:8031 yarn.resourcemanager.admin.address hadoop02:8033 yarn.resourcemanager.webapp.address hadoop02:8088
vi ./etc/hadoop/slaves
hadoop01 hadoop02 hadoop03
这里,我们可以使用scp工具吧hadoop文件夹复制到hadoop02、hadoop03上,这样就不需要重复配置这些文件了。
命令:
复制hadoop文件至hadoop02和hadoop03上: scp -r /usr/local/hadoop-2.9.1 hadoop03:/usr/local/ scp -r /usr/local/hadoop-2.9.1 hadoop03:/usr/local/
4、启动Hadoop
在hadoop01格式化namenode
bin/hdfs namenode -format
出现:INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
则初始化成功。
启动hadoop:
start-all.sh
[root@hadoop01 hadoop-2.9.1]# start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh 18/09/13 23:29:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [hadoop01] hadoop01: starting namenode, logging to /usr/local/hadoop-2.9.1/logs/hadoop-root-namenode-hadoop01.out hadoop03: starting datanode, logging to /usr/local/hadoop-2.9.1/logs/hadoop-root-datanode-hadoop03.out hadoop01: starting datanode, logging to /usr/local/hadoop-2.9.1/logs/hadoop-root-datanode-hadoop01.out hadoop02: starting datanode, logging to /usr/local/hadoop-2.9.1/logs/hadoop-root-datanode-hadoop02.out Starting secondary namenodes [hadoop02] hadoop02: starting secondarynamenode, logging to /usr/local/hadoop-2.9.1/logs/hadoop-root-secondarynamenode-hadoop02.out 18/09/13 23:29:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop-2.9.1/logs/yarn-root-resourcemanager-hadoop01.out hadoop03: nodemanager running as process 61198. Stop it first. hadoop01: starting nodemanager, logging to /usr/local/hadoop-2.9.1/logs/yarn-root-nodemanager-hadoop01.out hadoop02: starting nodemanager, logging to /usr/local/hadoop-2.9.1/logs/yarn-root-nodemanager-hadoop02.out
这里的一个警告可以忽略,如果想解决,可以查看我之前帖子。
查看是否正常启动:
hadoop01:
[root@hadoop01 hadoop-2.9.1]# jps 36432 Jps 35832 DataNode 36250 NodeManager 35676 NameNode
hadoop02
[root@hadoop02 hadoop-2.9.1]# jps 54083 SecondaryNameNode 53987 DataNode 57031 Jps 49338 ResourceManager 54187 NodeManager
hadoop03
[root@hadoop03 hadoop-2.9.1]# jps 63570 Jps 63448 DataNode 61198 NodeManager
一切正常。
这里,我们在hadoop03启动JobHistoryServer
[root@hadoop03 hadoop-2.9.1]# mr-jobhistory-daemon.sh start historyserver starting historyserver, logging to /usr/local/hadoop-2.9.1/logs/mapred-root-historyserver-hadoop03.out [root@hadoop03 hadoop-2.9.1]# jps 63448 DataNode 63771 Jps 63724 JobHistoryServer 61198 NodeManager
Web浏览器查看一下:
http://hadoop01:50070 namenode的web http://hadoop02:50090 secondarynamenode的web http://hadoop02:8088 resourcemanager的web http://hadoop03:19888 jobhistroyserver的web
验证一下:
namenode的web
secondarynamenode的web
resourcemanager的web
jobhistroyserver的web
OK!!!
7、Hadoop集群测试
目的:验证当前hadoop集群正确安装配置
本次测试用例为利用MapReduce实现wordcount程序
生成文件helloworld:
echo "Hello world Hello hadoop" > ./helloworld
将文件helloworld上传至hdfs:
hdfs dfs -put /home/words/helloworld /
查看一下:
[root@hadoop01 words]# hdfs dfs -ls / 18/09/13 23:49:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 3 items -rw-r--r-- 3 root supergroup 26 2018-09-13 23:46 /helloworld drwxr-xr-x - root supergroup 0 2018-09-13 23:41 /input drwxrwx--- - root supergroup 0 2018-09-13 23:36 /tmp [root@hadoop01 words]# hdfs dfs -cat /helloworld 18/09/13 23:50:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Hello world Hello hadoop
执行wordcount程序,
[root@hadoop01 words]# yarn jar /usr/local/hadoop-2.9.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar wordcount /helloworld/ /out/00000
出现如下内容,则OK
Virtual memory (bytes) snapshot=4240179200 Total committed heap usage (bytes)=287309824 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=26 File Output Format Counters Bytes Written=25
查看生成下的文件:
代码查看:
[root@hadoop01 words]# hdfs dfs -cat /out/00000/part-r-00000 18/09/14 00:32:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Hello 2 hadoop 1 world 1
结果正确。
此时我们就可以在jobHistory看到此任务的一些执行信息,当然,还有很多信息,不一一说了。晚上00:37了,睡觉。。。。
竟然维护。。。。。。。。。。。。。。
详细信息可查看官网文档