Unified description of big data course software

Unified description of big data course software

1, Document description

In order to unify our operating system and software environment, we unify the basic software environment before class to realize that the software versions are consistent throughout the whole learning process

2, VmWare and linux versions

VmWare version:

There is no requirement for the version of VmWare. You can use the version of VmWare10 or above. For the installation of VmWare, you can directly use the installation package to continue to the next step. The installation package contains a cracking secret key, which can be used for cracking

linux version

linux uses centos uniformly

CentOS uniformly uses centos7.6 64 bit version

Download address of seed file: http://mirrors.aliyun.com/centos/7.6.1810/isos/x86_64/CentOS-7-x86_64-DVD-1810.torrent

3, Using VmWare to install linux Software

See video operating instructions

4, Environment preparation for three linux servers

Three linux servers are used to prepare the unified environment. Through the unified environment, the environment of all teaching links is unified

Three machine IP settings

See video settings

Three machines modify ip addresses:

vi /etc/sysconfig/network-scripts/ifcfg-ens33 

BOOTPROTO="static"
IPADDR=192.168.52.100
NETMASK=255.255.255.0
GATEWAY=192.168.52.1
DNS1=8.8.8.8

Prepare three linux machines with IP addresses set to

IP address of the first machine: 192.168.52.100

IP address of the second machine: 192.168.52.110

IP address of the third machine: 192.168.52.120

Three machines turn off the firewall

The three machines execute the following command under the root user to shut down the firewall

systemctl stop firewalld
systemctl disable firewalld

Three machines shut down selinux

The three machines execute the following command under the root user to shut down selinux

Three machines execute the following command to shut down selinux

vim /etc/selinux/config 

SELINUX=disabled

Three machines change hostname

Three machines change host names respectively

The first host name is changed to: node01 kaikeba. com

The second host name is changed to node02 kaikeba. com

The third host name is changed to node03 kaikeba. com

The first machine executes the following command to modify the hostname

vim /etc/hostname
node01.kaikeba.com

The second machine executes the following command to modify the hostname

vim /etc/hostname
node02.kaikeba.com

The third machine executes the following command to modify the hostname

vim /etc/hostname
node03.kaikeba.com

Three machines change host name and IP address mapping

Three machines execute the following command to change the mapping relationship between hostname and IP address

vim /etc/hosts

192.168.52.100 node01.kaikeba.com node01
192.168.52.110 node02.kaikeba.com node02
192.168.52.120 node03.kaikeba.com node03

Synchronization time of three machines

The three machines execute the following commands to periodically synchronize the Alibaba cloud server time

 yum -y install ntpdate
 crontab -e 
 */1 * * * * /usr/sbin/ntpdate time1.aliyun.com
 
 

Add ordinary users to three machines

The three linux servers uniformly add hadoop as a common user and give sudo permission for the installation of all big data software in the future

And uniformly set the password of ordinary users as 123456

 useradd hadoop
 passwd hadoop

Three machines add sudo permission for ordinary users

visudo

hadoop  ALL=(ALL)       ALL

Unified directory of three definitions

Define the directory for storing the software compressed packages of three linux servers and the directory for installing after decompression. The three machines execute the following commands to create two folders, one for storing the software compressed package directory and the other for storing the directory after decompression

 mkdir -p /kkb/soft     # Storage directory of software compressed package
 mkdir -p /kkb/install  # Storage directory after software decompression
 chown -R hadoop:hadoop /kkb    # Change folder permissions to hadoop users

Install jdk on three machines

Use the hadoop user to reconnect the three machines, and then use the hadoop user to install the jdk software

Upload the compressed package to /kkb/soft of the first server, decompress it, configure the environment variables, and install the three machines in turn

cd /kkb/soft/

tar -zxf jdk-8u141-linux-x64.tar.gz  -C /kkb/install/
sudo vim /etc/profile


#Add the following configuration contents to configure jdk environment variables
export JAVA_HOME=/kkb/install/jdk1.8.0_141
export PATH=:$JAVA_HOME/bin:$PATH

hadoop user password free login

The three machines execute the following commands under the hadoop user to generate the public key and private key

ssh-keygen -t rsa 
Three machines at hadoop Under user, execute the following command to copy the public key to node01 Go to the server
ssh-copy-id  node01
node01 stay hadoop Under user, execute the following command to authorized_keys copy to node02 And node03 The server
cd /home/hadoop/.ssh/
scp authorized_keys  node02:$PWD
scp authorized_keys  node03:$PWD

Shutdown and restart of three machines

The three machines execute the following commands under the root user to realize shutdown and restart

reboot -h now

5, Installation of zookeeper cluster on three machines

Note: the three machines must ensure clock synchronization

Step 1: download the zipped package of zookeeper. The download website is as follows

http://archive.cloudera.com/cdh5/cdh/5/

We download the zk version we use at this website as zookeeper-3.4.5-cdh5.14.2.tar.gz

After downloading, upload to the /kkb/soft path of node01 for installation

Step 2: unzip

Node01 executes the following command to unzip the zipped package of zookeeper to the /kkb/install path of node01 server, and then prepare for installation

cd /kkb/soft

tar -zxvf zookeeper-3.4.5-cdh5.14.2.tar.gz  -C /kkb/install/

Step 3: modify the configuration file

First machine modification profile

cd /kkb/install/zookeeper-3.4.5-cdh5.14.2/conf

cp zoo_sample.cfg zoo.cfg

mkdir -p /kkb/install/zookeeper-3.4.5-cdh5.14.2/zkdatas

vim  zoo.cfg
dataDir=/kkb/install/zookeeper-3.4.5-cdh5.14.2/zkdatas
autopurge.snapRetainCount=3
autopurge.purgeInterval=1

server.1=node01:2888:3888
server.2=node02:2888:3888
server.3=node03:2888:3888

Step 4: add myid configuration

At /kkb/install/zookeeper-3.4.5-cdh5.14.2/zkdata of the first machine/

Create a file under this path, the file name is myid, and the file content is 1

echo 1 > /kkb/install/zookeeper-3.4.5-cdh5.14.2/zkdatas/myid

Step 5: install package distribution and modify the value of myid

Distribution of installation packages to other machines

Execute the following two commands on the first machine

scp -r /kkb/install/zookeeper-3.4.5-cdh5.14.2/ node02:/kkb/install/

scp -r /kkb/install/zookeeper-3.4.5-cdh5.14.2/ node03:/kkb/install/

Modification on the second machine myid The value of is 2

Execute the following command directly on any path of the second machine

echo 2 > /kkb/install/zookeeper-3.4.5-cdh5.14.2/zkdatas/myid

 

Modified on the third machine myid The value of is 3

Execute the following command directly on any path of the third machine

echo 3 > /kkb/install/zookeeper-3.4.5-cdh5.14.2/zkdatas/myid

Step 6: three machines start the zookeeper service

Three machines start the zookeeper service

This command must be executed by all three machines

/kkb/install/zookeeper-3.4.5-cdh5.14.2/bin/zkServer.sh start

View startup status

/kkb/install/zookeeper-3.4.5-cdh5.14.2/bin/zkServer.sh status

6, hadoop environment installation

1. Recompile the CDH software version

1. Why compile hadoop

Since all installation package versions of the CDH provide the corresponding software versions, it is generally unnecessary to compile by ourselves. However, since the hadoop installation package provided by the CDH does not provide an interface with C program access, we will have problems when using the local library (which can be used for compression and support C programs, etc.). Well, I won't say much nonsense. Next, let's see how to compile

2. Preparation of compilation environment

2.1: prepare linux Environment

Prepare a linux environment with 4G or more memory and 40G or more hard disk. I use Centos6.9 64 bit operating system (Note: be sure to use 64 bit operating system)

2.2: virtual machine networking, turn off firewall and selinux
Turn off firewall command:

service  iptables   stop
chkconfig   iptables  off 

close selinux
vim /etc/selinux/config

[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (IMG udftryi2-16542306748) (%e5%a4%a7%e6%95%b0%e6%8d%ae%e8%af%be%e7%a8%8b%e5%89%8d%e7%bd%ae%e7%8e%af%e5%a2%83%e5%87%86%e5%a4%87.assets/ 1573482881980.png)]

2.3: installing jdk1.7

Note: personal test certificate h adoop-2.6.0-cdh5.14.2 This version can only be compiled using jdk1.7. If jdk1.8 is used, an error will be reported

Note: do not use jdk1.8 here. Errors may occur during the personal test of jdk1.8

Upload our JDK installation package to /kkb/soft (I use jdk1.7.0\u 71 here)

Unzip our jdk package

Unify two paths

mkdir -p /kkb/soft
mkdir -p /kkb/install
cd /kkb/soft
tar -zxvf jdk-7u71-linux-x64.tar.gz -C ../servers/

Configure environment variables

vim /etc/profile

export JAVA_HOME=/kkb/install/jdk1.7.0_71

export PATH=:$JAVA_HOME/bin:$PATH
Make changes effective immediately

source /etc/profile
2.4: installing maven

Here we use maven3 All versions above x should be OK. It is not recommended to use too high versions. It is strongly recommended to use version 3.0.5

Upload the maven installation package to /kkb/soft

Then extract the maven installation package to /kkb/install

cd /kkb/soft/

tar -zxvf apache-maven-3.0.5-bin.tar.gz -C ../servers/

Configuring environment variables for maven

vim /etc/profile

export MAVEN_HOME=/kkb/install/apache-maven-3.0.5

export MAVEN_OPTS="-Xms4096m -Xmx4096m"

export PATH=:$MAVEN_HOME/bin:$PATH
Make changes effective immediately

source /etc/profile
2.5: installing findbugs

Download findbugs

cd  /kkb/soft

wget --no-check-certificate https://sourceforge.net/projects/findbugs/files/findbugs/1.3.9/findbugs-1.3.9.tar.gz/download -O findbugs-1.3.9.tar.gz

 

decompression findbugs

tar -zxvf findbugs-1.3.9.tar.gz -C ../install/

 

to configure findbugs Environment variables for

vim /etc/profile

export JAVA_HOME=/kkb/install/jdk1.7.0_75

export PATH=:$JAVA_HOME/bin:$PATH

 

export MAVEN_HOME=/kkb/install/apache-maven-3.0.5

export PATH=:$MAVEN_HOME/bin:$PATH

 
export FINDBUGS_HOME=/kkb/install/findbugs-1.3.9
export PATH=:$FINDBUGS_HOME/bin:$PATH
Make changes effective immediately

 source  /etc/profile
2.6: online installation of some dependent packages
yum install autoconf automake libtool cmake

yum install ncurses-devel

yum install openssl-devel

yum install lzo-devel zlib-devel gcc gcc-c++

 

bzip2 Compress required dependent packages

yum install -y  bzip2-devel
2.7: installing protobuf

protobuf download Baidu network disk address

https://pan.baidu.com/s/1pJlZubT

Upload to /kkb/soft after downloading

Extract protobuf and compile it

cd  /kkb/soft

tar -zxvf protobuf-2.5.0.tar.gz -C ../servers/

cd   /kkb/install/protobuf-2.5.0

./configure

make && make install
2.8. Installing snappy

snappy download address:

http://code.google.com/p/snappy/

cd /kkb/soft/

tar -zxf snappy-1.1.1.tar.gz  -C ../servers/

cd ../servers/snappy-1.1.1/

./configure

make && make install
2.9: Download cdh source code for compilation

The source code download address is:

http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.14.2-src.tar.gz

Download the source code for compilation

cd  /kkb/soft

wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.14.2-src.tar.gz

tar -zxvf hadoop-2.6.0-cdh5.14.2-src.tar.gz -C ../servers/

cd  /kkb/install/hadoop-2.6.0-cdh5.14.2

Compilation not supported snappy Compression:

mvn package -Pdist,native -DskipTests –Dtar   

 

Compilation support snappy Compression:

mvn package -DskipTests -Pdist,native -Dtar -Drequire.snappy -e -X

After the compilation, the compressed package we need is in the following path
2.10: common compilation errors

If this error occurs during compilation:

An Ant BuildException has occured: exec returned: 2

This is because the Tomcat compressed package has not been downloaded. You need to download a corresponding version of apache-tomcat-6.0.53 tar. GZ can be put under the specified path

You need to put the tomcat compressed package under these two paths

/kkb/install/hadoop-2.6.0-cdh5.14.2/hadoop-hdfs-project/hadoop-hdfs-httpfs/downloads

/kkb/install/hadoop-2.6.0-cdh5.14.2/hadoop-common-project/hadoop-kms/downloads

2. hadoop cluster installation

Installation environment service deployment planning

Server IP192.168.52.100192.168.52.110192.168.52.120
HDFSNameNode
HDFSSecondaryNameNode
HDFSDataNodeDataNodeDataNode
YARNResourceManager
YARNNodeManagerNodeManagerNodeManager
Historical log serverJobHistoryServer

Step 1: upload the compressed package and decompress it

Upload our recompiled hadoop package that supports snappy compression to the first server and decompress it

The first machine executes the following commands

cd /kkb/soft/



tar -zxvf hadoop-2.6.0-cdh5.14.2_after_compile.tar.gz -C ../install/

Step 2: check the compression methods and local libraries supported by hadoop

The first machine executes the following commands

cd /kkb/install/hadoop-2.6.0-cdh5.14.2

bin/hadoop checknative

If openssl is false, all machines can install openssl online. Execute the following command. After the virtual machine is networked, it can be installed online

yum -y install openssl-devel

Step 3: modify the configuration file

Modify the core site xml

The first machine executes the following commands

cd /kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop

vim core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node01:8020</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/tempDatas</value>
    </property>
    <!--  The buffer size is dynamically adjusted according to the server performance in actual work -->
    <property>
        <name>io.file.buffer.size</name>
        <value>4096</value>
    </property>
    <!--  open hdfs The deleted data can be recycled from the garbage can, in minutes -->
    <property>
        <name>fs.trash.interval</name>
        <value>10080</value>
    </property>
</configuration>
Modify HDFS site xml

The first machine executes the following commands

cd /kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop

vim hdfs-site.xml

<configuration>
    <!-- NameNode The path to store metadata information. In practical work, the attached directory of the disk is generally determined first, and then multiple directories are used for segmentation   --> 
    <!--   Cluster dynamic uplink and downlink 
    <property>
        <name>dfs.hosts</name>
        <value>/kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop/accept_host</value>
    </property>
    <property>
        <name>dfs.hosts.exclude</name>
        <value>/kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop/deny_host</value>
    </property>
     -->
     <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>node01:50090</value>
    </property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>node01:50070</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas</value>
    </property>
    <!--  definition dataNode The node location of data storage. In actual work, the disk mounting directory is generally determined first, and then multiple directories are used to split  -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/datanodeDatas</value>
    </property>
    <property>
        <name>dfs.namenode.edits.dir</name>
        <value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/edits</value>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/snn/name</value>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.edits.dir</name>
        <value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/snn/edits</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
<property>
        <name>dfs.blocksize</name>
        <value>134217728</value>
    </property>
</configuration>

Modify Hadoop env sh

The first machine executes the following commands

cd /kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop

vim hadoop-env.sh
export JAVA_HOME=/kkb/install/jdk1.8.0_141

Modify mapred site xml

The first machine executes the following commands

cd /kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop

vim mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.job.ubertask.enable</name>
        <value>true</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>node01:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>node01:19888</value>
    </property>
</configuration>

Modify the yarn site xml

The first machine executes the following commands

cd /kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop

vim yarn-site.xml

<configuration>
    <property>
       <name>yarn.resourcemanager.hostname</name>
        <value>node01</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

Modify the slaves file

The first machine executes the following commands

cd /kkb/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop

vim slaves


node01
node02
node03

Step 4: create a file storage directory

The first machine executes the following commands

Create the following directories on the node01 machine

mkdir -p /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/tempDatas
mkdir -p /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas
mkdir -p /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/datanodeDatas 
mkdir -p /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/edits
mkdir -p /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/snn/name
mkdir -p /kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/snn/edits

Step 5: distribution of installation package

The first machine executes the following commands

cd /kkb/install/

scp -r hadoop-2.6.0-cdh5.14.2/ node02:$PWD
scp -r hadoop-2.6.0-cdh5.14.2/ node03:$PWD

Step 6: configure hadoop environment variables

All three machines need to configure hadoop environment variables

Three machines execute the following commands

vim  /etc/profile

export HADOOP_HOME=/kkb/install/hadoop-2.6.0-cdh5.14.2
export PATH=:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

Effective after configuration

source /etc/profile

Step 7: cluster startup

To start Hadoop clusters, you need to start HDFS and YARN clusters.

Note: when HDFS is started for the first time, it must be formatted. In essence, it is some cleaning and preparation work, because HDFS does not exist physically at this time.

bin/hdfs namenode  -format perhaps bin/hadoop namenode –format
Single node start one by one
Start with the following command on the master node HDFS NameNode:  
hadoop-daemon.sh start namenode 

Start with the following command on each slave node HDFS DataNode:  
hadoop-daemon.sh start datanode 

Start with the following command on the master node YARN ResourceManager:  
yarn-daemon.sh  start resourcemanager 

Start with the following command on each slave node YARN nodemanager:  
yarn-daemon.sh start nodemanager 

The above script is located in $HADOOP_PREFIX/sbin/Directory. If you want to stop a role on a node, you only need to start Change to stop OK.

Script one click Start

If etc/hadoop/slaves and ssh password free login are configured, you can use the program script to start the related processes of all Hadoop clusters and execute them on the machine set by the primary node.

Start cluster

Execute the following command on the node01 node

The first machine executes the following commands

cd /kkb/install/hadoop-2.6.0-cdh5.14.2/
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver

Stop cluster:

sbin/stop-dfs.sh

sbin/stop-yarn.sh

Step 8: view the startup page with the browser

hdfs cluster access address

http://192.168.52.100:50070/dfshealth.html#tab-overview

yarn cluster access address

http://192.168.52.100:8088/cluster

jobhistory access address:

http://192.168.52.100:19888/jobhistory

Tags: Linux Big Data CentOS

Posted by Mr Camouflage on Sat, 04 Jun 2022 01:51:05 +0530