Unified description of big data course software
1, Document description
In order to unify our operating system and software environment, we unify the basic software environment before class to realize that the software versions are consistent throughout the whole learning process
2, VmWare and linux versions
VmWare version:
There is no requirement for the version of VmWare. You can use the version of VmWare10 or above. For the installation of VmWare, you can directly use the installation package to continue to the next step. The installation package contains a cracking secret key, which can be used for cracking
linux version
linux uses centos uniformly
CentOS uniformly uses centos7.6 64 bit version
Download address of seed file: http://mirrors.aliyun.com/centos/7.6.1810/isos/x86_64/CentOS-7-x86_64-DVD-1810.torrent
3, Using VmWare to install linux Software
See video operating instructions
4, Environment preparation for three linux servers
Three linux servers are used to prepare the unified environment. Through the unified environment, the environment of all teaching links is unified
Three machine IP settings
See video settings
Three machines modify ip addresses:
vi /etc/sysconfig/network-scripts/ifcfg-ens33 BOOTPROTO="static" IPADDR=192.168.52.100 NETMASK=255.255.255.0 GATEWAY=192.168.52.1 DNS1=8.8.8.8
Prepare three linux machines with IP addresses set to
IP address of the first machine: 192.168.52.100
IP address of the second machine: 192.168.52.110
IP address of the third machine: 192.168.52.120
Three machines turn off the firewall
The three machines execute the following command under the root user to shut down the firewall
systemctl stop firewalld systemctl disable firewalld
Three machines shut down selinux
The three machines execute the following command under the root user to shut down selinux
Three machines execute the following command to shut down selinux
vim /etc/selinux/config SELINUX=disabled
Three machines change hostname
Three machines change host names respectively
The first host name is changed to: node01 kaikeba. com
The second host name is changed to node02 kaikeba. com
The third host name is changed to node03 kaikeba. com
The first machine executes the following command to modify the hostname
vim /etc/hostname node01.kaikeba.com
The second machine executes the following command to modify the hostname
vim /etc/hostname node02.kaikeba.com
The third machine executes the following command to modify the hostname
vim /etc/hostname node03.kaikeba.com
Three machines change host name and IP address mapping
Three machines execute the following command to change the mapping relationship between hostname and IP address
vim /etc/hosts 192.168.52.100 node01.kaikeba.com node01 192.168.52.110 node02.kaikeba.com node02 192.168.52.120 node03.kaikeba.com node03
Synchronization time of three machines
The three machines execute the following commands to periodically synchronize the Alibaba cloud server time
yum -y install ntpdate crontab -e */1 * * * * /usr/sbin/ntpdate time1.aliyun.com
Add ordinary users to three machines
The three linux servers uniformly add hadoop as a common user and give sudo permission for the installation of all big data software in the future
And uniformly set the password of ordinary users as 123456
useradd hadoop passwd hadoop
Three machines add sudo permission for ordinary users
visudo hadoop ALL=(ALL) ALL
Unified directory of three definitions
Define the directory for storing the software compressed packages of three linux servers and the directory for installing after decompression. The three machines execute the following commands to create two folders, one for storing the software compressed package directory and the other for storing the directory after decompression
mkdir -p /kkb/soft # Storage directory of software compressed package mkdir -p /kkb/install # Storage directory after software decompression chown -R hadoop:hadoop /kkb # Change folder permissions to hadoop users
Install jdk on three machines
Use the hadoop user to reconnect the three machines, and then use the hadoop user to install the jdk software
Upload the compressed package to /kkb/soft of the first server, decompress it, configure the environment variables, and install the three machines in turn
cd /kkb/soft/ tar -zxf jdk-8u141-linux-x64.tar.gz -C /kkb/install/ sudo vim /etc/profile #Add the following configuration contents to configure jdk environment variables export JAVA_HOME=/kkb/install/jdk1.8.0_141 export PATH=:$JAVA_HOME/bin:$PATH
hadoop user password free login
The three machines execute the following commands under the hadoop user to generate the public key and private key
ssh-keygen -t rsa Three machines at hadoop Under user, execute the following command to copy the public key to node01 Go to the server ssh-copy-id node01 node01 stay hadoop Under user, execute the following command to authorized_keys copy to node02 And node03 The server cd /home/hadoop/.ssh/ scp authorized_keys node02:$PWD scp authorized_keys node03:$PWD
Shutdown and restart of three machines
The three machines execute the following commands under the root user to realize shutdown and restart
reboot -h now
5, Installation of zookeeper cluster on three machines
Note: the three machines must ensure clock synchronization
Step 1: download the zipped package of zookeeper. The download website is as follows
http://archive.cloudera.com/cdh5/cdh/5/
We download the zk version we use at this website as zookeeper-3.4.5-cdh5.14.2.tar.gz
After downloading, upload to the /kkb/soft path of node01 for installation
Step 2: unzip
Node01 executes the following command to unzip the zipped package of zookeeper to the /kkb/install path of node01 server, and then prepare for installation
cd /kkb/soft tar -zxvf zookeeper-3.4.5-cdh5.14.2.tar.gz -C /kkb/install/
Step 3: modify the configuration file
First machine modification profile
cd /kkb/install/zookeeper-3.4.5-cdh5.14.2/conf cp zoo_sample.cfg zoo.cfg mkdir -p /kkb/install/zookeeper-3.4.5-cdh5.14.2/zkdatas vim zoo.cfg dataDir=/kkb/install/zookeeper-3.4.5-cdh5.14.2/zkdatas autopurge.snapRetainCount=3 autopurge.purgeInterval=1 server.1=node01:2888:3888 server.2=node02:2888:3888 server.3=node03:2888:3888
Step 4: add myid configuration
At /kkb/install/zookeeper-3.4.5-cdh5.14.2/zkdata of the first machine/
Create a file under this path, the file name is myid, and the file content is 1
echo 1 > /kkb/install/zookeeper-3.4.5-cdh5.14.2/zkdatas/myid
Step 5: install package distribution and modify the value of myid
Distribution of installation packages to other machines
Execute the following two commands on the first machine scp -r /kkb/install/zookeeper-3.4.5-cdh5.14.2/ node02:/kkb/install/ scp -r /kkb/install/zookeeper-3.4.5-cdh5.14.2/ node03:/kkb/install/ Modification on the second machine myid The value of is 2 Execute the following command directly on any path of the second machine echo 2 > /kkb/install/zookeeper-3.4.5-cdh5.14.2/zkdatas/myid Modified on the third machine myid The value of is 3 Execute the following command directly on any path of the third machine echo 3 > /kkb/install/zookeeper-3.4.5-cdh5.14.2/zkdatas/myid
Step 6: three machines start the zookeeper service
Three machines start the zookeeper service
This command must be executed by all three machines
/kkb/install/zookeeper-3.4.5-cdh5.14.2/bin/zkServer.sh start View startup status /kkb/install/zookeeper-3.4.5-cdh5.14.2/bin/zkServer.sh status
6, hadoop environment installation
1. Recompile the CDH software version
1. Why compile hadoop
Since all installation package versions of the CDH provide the corresponding software versions, it is generally unnecessary to compile by ourselves. However, since the hadoop installation package provided by the CDH does not provide an interface with C program access, we will have problems when using the local library (which can be used for compression and support C programs, etc.). Well, I won't say much nonsense. Next, let's see how to compile
2. Preparation of compilation environment
2.1: prepare linux Environment
Prepare a linux environment with 4G or more memory and 40G or more hard disk. I use Centos6.9 64 bit operating system (Note: be sure to use 64 bit operating system)
2.2: virtual machine networking, turn off firewall and selinux
Turn off firewall command: service iptables stop chkconfig iptables off close selinux vim /etc/selinux/config
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (IMG udftryi2-16542306748) (%e5%a4%a7%e6%95%b0%e6%8d%ae%e8%af%be%e7%a8%8b%e5%89%8d%e7%bd%ae%e7%8e%af%e5%a2%83%e5%87%86%e5%a4%87.assets/ 1573482881980.png)]
2.3: installing jdk1.7
Note: personal test certificate h adoop-2.6.0-cdh5.14.2 This version can only be compiled using jdk1.7. If jdk1.8 is used, an error will be reported
Note: do not use jdk1.8 here. Errors may occur during the personal test of jdk1.8
Upload our JDK installation package to /kkb/soft (I use jdk1.7.0\u 71 here)
Unzip our jdk package
Unify two paths
mkdir -p /kkb/soft mkdir -p /kkb/install cd /kkb/soft tar -zxvf jdk-7u71-linux-x64.tar.gz -C ../servers/
Configure environment variables
vim /etc/profile export JAVA_HOME=/kkb/install/jdk1.7.0_71 export PATH=:$JAVA_HOME/bin:$PATH
Make changes effective immediately source /etc/profile
2.4: installing maven
Here we use maven3 All versions above x should be OK. It is not recommended to use too high versions. It is strongly recommended to use version 3.0.5
Upload the maven installation package to /kkb/soft
Then extract the maven installation package to /kkb/install
cd /kkb/soft/ tar -zxvf apache-maven-3.0.5-bin.tar.gz -C ../servers/
Configuring environment variables for maven
vim /etc/profile export MAVEN_HOME=/kkb/install/apache-maven-3.0.5 export MAVEN_OPTS="-Xms4096m -Xmx4096m" export PATH=:$MAVEN_HOME/bin:$PATH
Make changes effective immediately source /etc/profile
2.5: installing findbugs
Download findbugs
cd /kkb/soft wget --no-check-certificate https://sourceforge.net/projects/findbugs/files/findbugs/1.3.9/findbugs-1.3.9.tar.gz/download -O findbugs-1.3.9.tar.gz decompression findbugs tar -zxvf findbugs-1.3.9.tar.gz -C ../install/ to configure findbugs Environment variables for vim /etc/profile export JAVA_HOME=/kkb/install/jdk1.7.0_75 export PATH=:$JAVA_HOME/bin:$PATH export MAVEN_HOME=/kkb/install/apache-maven-3.0.5 export PATH=:$MAVEN_HOME/bin:$PATH export FINDBUGS_HOME=/kkb/install/findbugs-1.3.9 export PATH=:$FINDBUGS_HOME/bin:$PATH
Make changes effective immediately source /etc/profile
2.6: online installation of some dependent packages
yum install autoconf automake libtool cmake yum install ncurses-devel yum install openssl-devel yum install lzo-devel zlib-devel gcc gcc-c++ bzip2 Compress required dependent packages yum install -y bzip2-devel
2.7: installing protobuf
protobuf download Baidu network disk address
https://pan.baidu.com/s/1pJlZubT
Upload to /kkb/soft after downloading
Extract protobuf and compile it
cd /kkb/soft tar -zxvf protobuf-2.5.0.tar.gz -C ../servers/ cd /kkb/install/protobuf-2.5.0 ./configure make && make install
2.8. Installing snappy
snappy download address:
http://code.google.com/p/snappy/
cd /kkb/soft/ tar -zxf snappy-1.1.1.tar.gz -C ../servers/ cd ../servers/snappy-1.1.1/ ./configure make && make install
2.9: Download cdh source code for compilation
The source code download address is:
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.14.2-src.tar.gz
Download the source code for compilation
cd /kkb/soft wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.14.2-src.tar.gz tar -zxvf hadoop-2.6.0-cdh5.14.2-src.tar.gz -C ../servers/ cd /kkb/install/hadoop-2.6.0-cdh5.14.2 Compilation not supported snappy Compression: mvn package -Pdist,native -DskipTests –Dtar Compilation support snappy Compression: mvn package -DskipTests -Pdist,native -Dtar -Drequire.snappy -e -X After the compilation, the compressed package we need is in the following path
2.10: common compilation errors
If this error occurs during compilation:
An Ant BuildException has occured: exec returned: 2
This is because the Tomcat compressed package has not been downloaded. You need to download a corresponding version of apache-tomcat-6.0.53 tar. GZ can be put under the specified path
You need to put the tomcat compressed package under these two paths
/kkb/install/hadoop-2.6.0-cdh5.14.2/hadoop-hdfs-project/hadoop-hdfs-httpfs/downloads /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoop-common-project/hadoop-kms/downloads
2. hadoop cluster installation
Installation environment service deployment planning
Server IP | 192.168.52.100 | 192.168.52.110 | 192.168.52.120 |
---|---|---|---|
HDFS | NameNode | ||
HDFS | SecondaryNameNode | ||
HDFS | DataNode | DataNode | DataNode |
YARN | ResourceManager | ||
YARN | NodeManager | NodeManager | NodeManager |
Historical log server | JobHistoryServer |
Step 1: upload the compressed package and decompress it
Upload our recompiled hadoop package that supports snappy compression to the first server and decompress it
The first machine executes the following commands
cd /kkb/soft/ tar -zxvf hadoop-2.6.0-cdh5.14.2_after_compile.tar.gz -C ../install/
Step 2: check the compression methods and local libraries supported by hadoop
The first machine executes the following commands
cd /kkb/install/hadoop-2.6.0-cdh5.14.2 bin/hadoop checknative
If openssl is false, all machines can install openssl online. Execute the following command. After the virtual machine is networked, it can be installed online
yum -y install openssl-devel
Step 3: modify the configuration file
Modify the core site xml
The first machine executes the following commands
cd /kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop vim core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node01:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/tempDatas</value> </property> <!-- The buffer size is dynamically adjusted according to the server performance in actual work --> <property> <name>io.file.buffer.size</name> <value>4096</value> </property> <!-- open hdfs The deleted data can be recycled from the garbage can, in minutes --> <property> <name>fs.trash.interval</name> <value>10080</value> </property> </configuration>
Modify HDFS site xml
The first machine executes the following commands
cd /kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop vim hdfs-site.xml <configuration> <!-- NameNode The path to store metadata information. In practical work, the attached directory of the disk is generally determined first, and then multiple directories are used for segmentation --> <!-- Cluster dynamic uplink and downlink <property> <name>dfs.hosts</name> <value>/kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop/accept_host</value> </property> <property> <name>dfs.hosts.exclude</name> <value>/kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop/deny_host</value> </property> --> <property> <name>dfs.namenode.secondary.http-address</name> <value>node01:50090</value> </property> <property> <name>dfs.namenode.http-address</name> <value>node01:50070</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas</value> </property> <!-- definition dataNode The node location of data storage. In actual work, the disk mounting directory is generally determined first, and then multiple directories are used to split --> <property> <name>dfs.datanode.data.dir</name> <value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/datanodeDatas</value> </property> <property> <name>dfs.namenode.edits.dir</name> <value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/edits</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/snn/name</value> </property> <property> <name>dfs.namenode.checkpoint.edits.dir</name> <value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/snn/edits</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> </configuration>
Modify Hadoop env sh
The first machine executes the following commands
cd /kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop vim hadoop-env.sh export JAVA_HOME=/kkb/install/jdk1.8.0_141
Modify mapred site xml
The first machine executes the following commands
cd /kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop vim mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.job.ubertask.enable</name> <value>true</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>node01:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>node01:19888</value> </property> </configuration>
Modify the yarn site xml
The first machine executes the following commands
cd /kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop vim yarn-site.xml <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>node01</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
Modify the slaves file
The first machine executes the following commands
cd /kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop vim slaves node01 node02 node03
Step 4: create a file storage directory
The first machine executes the following commands
Create the following directories on the node01 machine
mkdir -p /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/tempDatas mkdir -p /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas mkdir -p /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/datanodeDatas mkdir -p /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/edits mkdir -p /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/snn/name mkdir -p /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/snn/edits
Step 5: distribution of installation package
The first machine executes the following commands
cd /kkb/install/ scp -r hadoop-2.6.0-cdh5.14.2/ node02:$PWD scp -r hadoop-2.6.0-cdh5.14.2/ node03:$PWD
Step 6: configure hadoop environment variables
All three machines need to configure hadoop environment variables
Three machines execute the following commands
vim /etc/profile export HADOOP_HOME=/kkb/install/hadoop-2.6.0-cdh5.14.2 export PATH=:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH Effective after configuration source /etc/profile
Step 7: cluster startup
To start Hadoop clusters, you need to start HDFS and YARN clusters.
Note: when HDFS is started for the first time, it must be formatted. In essence, it is some cleaning and preparation work, because HDFS does not exist physically at this time.
bin/hdfs namenode -format perhaps bin/hadoop namenode –format
Single node start one by one
Start with the following command on the master node HDFS NameNode: hadoop-daemon.sh start namenode Start with the following command on each slave node HDFS DataNode: hadoop-daemon.sh start datanode Start with the following command on the master node YARN ResourceManager: yarn-daemon.sh start resourcemanager Start with the following command on each slave node YARN nodemanager: yarn-daemon.sh start nodemanager The above script is located in $HADOOP_PREFIX/sbin/Directory. If you want to stop a role on a node, you only need to start Change to stop OK.
Script one click Start
If etc/hadoop/slaves and ssh password free login are configured, you can use the program script to start the related processes of all Hadoop clusters and execute them on the machine set by the primary node.
Start cluster
Execute the following command on the node01 node
The first machine executes the following commands cd /kkb/install/hadoop-2.6.0-cdh5.14.2/ sbin/start-dfs.sh sbin/start-yarn.sh sbin/mr-jobhistory-daemon.sh start historyserver Stop cluster: sbin/stop-dfs.sh sbin/stop-yarn.sh
Step 8: view the startup page with the browser
hdfs cluster access address
http://192.168.52.100:50070/dfshealth.html#tab-overview
yarn cluster access address
http://192.168.52.100:8088/cluster
jobhistory access address:
http://192.168.52.100:19888/jobhistory