1.基本环境配置
1.1 关闭防火墙和Selinux
关闭防火墙
systemctl stop firewalld systemctl disable firewalld
禁用Selinux
setenforce 0 sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
1.2 安装JDK8
使用root用户执行如下操作:
mkdir /usr/local/java && tar -zxvf jdk-8u181-linux-x64.tar.gz -C /usr/local/java
修改环境变量
vim /etc/profile
在最末尾填入如下信息:
export JAVA_HOME=/usr/local/java/jdk1.8.0_181 export PATH=$JAVA_HOME/bin:$PATH
source /etc/profile
使其生效。
1.3 新增用户
groupadd -g 520 hadoop useradd -g hadoop -u 520 hadoop
创建/data
目录,并且赋予hadoop权限
mkdir /data chown hadoop:hadoop /data
1.4 性能优化
修改操作系统重启时默认的句柄数:
vi /etc/security/limits.conf
在最下面填入如下内容:
hadoop hard nofile 65535 hadoop soft nofile 65535 hadoop soft nproc 32000 hadoop hard nproc 32000
配置用户最大打开进程数
vi /etc/security/limits.d/20-nproc.conf
新增如下内容:
hadoop soft nproc 32000 hadoop hard nproc 32000
1.5 免Key登录
切换到hadoop用户,执行如下命令:
su - hadoop ssh-keygen -t rsa -f ~/.ssh/id_rsa -P "" cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
2. 部署Hadoop
2.1 下载hadoop-2.7.7
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
2.2 配置hadoop-2.7.7
2.2.1 解压
tar -zxvf hadoop-2.7.7.tar.gz ln -s hadoop-2.7.7 hadoop
2.2.2 修改core-site.xml
在hadoop目录中执行如下
vim etc/hadoop/core-site.xml
在<configuration>
中填入如下内容:
<property> <name>fs.defaultFS</name> <value>hdfs://mylocal:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/data/data/hadooptmp</value> </property> <property> <name>io.file.buffer.size</name> <value>65536</value> </property>
新增/data/data/hadooptmp
目录 mkdir -p /data/data/hadooptmp
2.2.3 修改hdfs-site.xml
vim etc/hadoop/hdfs-site.xml
在<configuration>
中填入如下内容:
<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>mylocal:50090</value> </property> <property> <name>dfs.namenode.secondary.https-address</name> <value>mylocal:50091</value> </property> <property> <name>dfs.support.append</name> <value>true</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/data/data/namenode</value> </property> <property> <name>dfs.blocksize</name> <value>1048576</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/data/data/datanode</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
2.2.4 修改mapred-site.xml
cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml vim etc/hadoop/mapred-site.xml
在<configuration>
中填入如下内容:
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
2.2.5 修改yarn-site.xml
vim etc/hadoop/yarn-site.xml
在<configuration>
中填入如下内容:
<property> <name>yarn.resourcemanager.hostname</name> <value>mylocal</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--spark --> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>1</value> </property>
2.2.6 新增环境变量
$ vim ~/.bashrc
注意:这里是$,表示非root用户(本文中自然就是hadoop用户了)。填入如下内容:
export HADOOP_HOME=/data/app/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=${HADOOP_HOME} export HADOOP_COMMON_HOME=${HADOOP_HOME} export HADOOP_HDFS_HOME=${HADOOP_HOME} export YARN_HOME=${HADOOP_HOME}
2.3 启动Hadoop
2.3.1 格式化namenode
hdfs namenode -format
2.3.2 启动dfs
$ start-dfs.sh Starting namenodes on [mylocal]
2.3.3 启动yarn
$ start-yarn.sh starting yarn daemons
2.3.4 检测进程
$ jps 2897 ResourceManager 3012 NodeManager 3284 Jps 2439 NameNode 1290 QuorumPeerMain 2572 DataNode 2734 SecondaryNameNode