Hadoop Foundation: NameNode and SecondaryNameNode

1 NN and 2NN working mechanism

 

Think: where is the metadata stored in the NameNode?

 

First of all, let's assume that if it is stored in the disk of the NameNode node, because random access is often required and customer requests are responded to, the efficiency must be too low. Therefore, metadata needs to be stored in memory. However, if it only exists in memory, once the power is cut off and the metadata is lost, the entire cluster will not work. This results in an FsImage that backs up metadata on disk.

This will bring new problems. When metadata in memory is updated, if FsImage is updated at the same time, the efficiency will be too low. However, if it is not updated, consistency problems will occur. Once the NameNode node is powered off, data loss will occur.

 

Therefore, the Edits file is introduced (only the append operation is performed, and the efficiency is very high). Whenever metadata is updated or added, the metadata in memory is modified and appended to Edits. In this way, once the NameNode node is powered off, metadata can be synthesized through the combination of FsImage and Edits.

 

However, if the data is added to the Edits for a long time, the file data will be too large and the efficiency will be reduced. In addition, once the power is cut off, it will take too long to recover the metadata. Therefore, it is necessary to merge FsImage and Edits regularly. If this operation is completed by the NameNode node, the efficiency will be too low. Therefore, a new node, SecondaryNamenode, is introduced to merge FsImage and Edits.

The working mechanism of NN and 2NN is shown in Figure 3-14.

 

 

 

Figure 3-14 working mechanism of NN and 2NN

1. phase I: NameNode startup

(1) After the NameNode format is started for the first time, the Fsimage and Edits files are created. If it is not the first time to start, directly load the editing log and image file into memory.

(2) The client requests to add, delete or modify metadata.

(3) NameNode records the operation log and updates the rolling log.

(4) NameNode adds, deletes, and modifies data in memory.

2. phase II: Secondary NameNode works

(1) The Secondary NameNode asks the NameNode if it needs a CheckPoint. Directly bring back the NameNode check result.

(2) Secondary NameNode requests CheckPoint.

(3) NameNode scrolls the Edits log being written.

(4) Copy the edit log and image file before scrolling to the Secondary NameNode.

(5) The Secondary NameNode loads the edit log and image files into memory and merges them.

(6) Generate a new image file fsimage Chkpoint.

(7) Copy fsimage Chkpoint to NameNode.

(8) NameNode will fsimage Rename chkpoint to fsimage.

Detailed explanation of NN and 2NN working mechanism:

Fsimage: file formed after metadata serialization in NameNode memory.

Edits: records every step of updating metadata information by the client (metadata can be calculated through edits).

When the NameNode starts, first scroll through the Edits and generate an empty Edits Inprogress, and then load Edits and Fsimage into memory. At this time, NameNode memory holds the latest metadata information. The Client starts to send metadata addition, deletion and modification requests to the NameNode. The operations of these requests will be recorded in the Edits Inprogress (the operation of querying metadata will not be recorded in the Edits because the metadata information will not be changed by the query operation). If the NameNode hangs at this time, the metadata information will be read from the Edits after restarting. Then, the NameNode will add, delete and modify metadata in memory.

Since there will be more and more operations recorded in the Edits, and the Edits file will be larger and larger, the NameNode will be very slow when starting to load the Edits. Therefore, it is necessary to merge the Edits and Fsimage (the so-called merge is to load the Edits and Fsimage into memory, follow the operations in the Edits step by step, and finally form a new Fsimage). SecondaryNameNode is used to help NameNode merge Edits and Fsimage.

The SecondaryNameNode will first ask whether the NameNode needs a CheckPoint (triggering a CheckPoint requires either of the two conditions, the timing time is up and the data in the Edits is full). Directly bring back the NameNode check result. The SecondaryNameNode performs the CheckPoint operation. First, the NameNode will scroll through the Edits and generate an empty Edits Inprogress. The purpose of scrolling Edits is to mark the Edits. All new operations in the future will be written to Edits Inprogress, other unconsolidated Edits and Fsimage will be copied to the local of SecondaryNameNode, and then the copied Edits and Fsimage will be loaded into memory for merging to generate Fsimage Chkpoint, and then set Fsimage The chkpoint is copied to the NameNode, renamed to Fsimage and replaced with the original Fsimage. When the NameNode is started, it only needs to load the previously unconsolidated Edits and Fsimage, because the metadata information in the consolidated Edits has been recorded in Fsimage.

 

2 Analysis of fsimage and Edits

 

1. concept

 

 

 

2. oiv view the Fsimage file

(1) Viewing oiv and oev commands

[atguigu@hadoop102 current]$ hdfs
oiv            apply the offline fsimage viewer to an fsimage
oev            apply the offline edits viewer to an edits file

(2) Basic grammar

hdfs oiv -p file type -i image file -o file output path after conversion

(3) Case practice

 

[atguigu@hadoop102 current]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs/name/current

[atguigu@hadoop102 current]$ hdfs oiv -p XML -i fsimage_0000000000000000025 -o /opt/module/hadoop-2.7.2/fsimage.xml

[atguigu@hadoop102 current]$ cat /opt/module/hadoop-2.7.2/fsimage.xml

 

Copy the contents of the displayed xml file to the xml file created in Eclipse and format it. Some of the display results are as follows.

 

<inode>
    <id>16386</id>
    <type>DIRECTORY</type>
    <name>user</name>
    <mtime>1512722284477</mtime>
    <permission>atguigu:supergroup:rwxr-xr-x</permission>
    <nsquota>-1</nsquota>
    <dsquota>-1</dsquota>
</inode>
<inode>
    <id>16387</id>
    <type>DIRECTORY</type>
    <name>atguigu</name>
    <mtime>1512790549080</mtime>
    <permission>atguigu:supergroup:rwxr-xr-x</permission>
    <nsquota>-1</nsquota>
    <dsquota>-1</dsquota>
</inode>
<inode>
    <id>16389</id>
    <type>FILE</type>
    <name>wc.input</name>
    <replication>3</replication>
    <mtime>1512722322219</mtime>
    <atime>1512722321610</atime>
    <perferredBlockSize>134217728</perferredBlockSize>
    <permission>atguigu:supergroup:rw-r--r--</permission>
    <blocks>
        <block>
            <id>1073741825</id>
            <genstamp>1001</genstamp>
            <numBytes>59</numBytes>
        </block>
    </blocks>
</inode >

 

Think: it can be seen that there is no DataNode corresponding to the record block in Fsimage. Why?

After the cluster is started, the DataNode is required to report the data block information and report it again after a period of time.

 

 

3. oev view Edits file

 

(1) Basic grammar

hdfs oev -p file type -i Edit log -o Converted file output path

(2) Case practice

[atguigu@hadoop102 current]$ hdfs oev -p XML -i edits_0000000000000000012-0000000000000000013 -o /opt/module/hadoop-2.7.2/edits.xml

[atguigu@hadoop102 current]$ cat /opt/module/hadoop-2.7.2/edits.xml

Copy the contents of the displayed xml file to the xml file created in Eclipse and format it. The display results are as follows.

 

<?xml version="1.0" encoding="UTF-8"?>
<EDITS>
    <EDITS_VERSION>-63</EDITS_VERSION>
    <RECORD>
        <OPCODE>OP_START_LOG_SEGMENT</OPCODE>
        <DATA>
            <TXID>129</TXID>
        </DATA>
    </RECORD>
    <RECORD>
        <OPCODE>OP_ADD</OPCODE>
        <DATA>
            <TXID>130</TXID>
            <LENGTH>0</LENGTH>
            <INODEID>16407</INODEID>
            <PATH>/hello7.txt</PATH>
            <REPLICATION>2</REPLICATION>
            <MTIME>1512943607866</MTIME>
            <ATIME>1512943607866</ATIME>
            <BLOCKSIZE>134217728</BLOCKSIZE>
            <CLIENT_NAME>DFSClient_NONMAPREDUCE_-1544295051_1</CLIENT_NAME>
            <CLIENT_MACHINE>192.168.1.5</CLIENT_MACHINE>
            <OVERWRITE>true</OVERWRITE>
            <PERMISSION_STATUS>
                <USERNAME>atguigu</USERNAME>
                <GROUPNAME>supergroup</GROUPNAME>
                <MODE>420</MODE>
            </PERMISSION_STATUS>
            <RPC_CLIENTID>908eafd4-9aec-4288-96f1-e8011d181561</RPC_CLIENTID>
            <RPC_CALLID>0</RPC_CALLID>
        </DATA>
    </RECORD>
    <RECORD>
        <OPCODE>OP_ALLOCATE_BLOCK_ID</OPCODE>
        <DATA>
            <TXID>131</TXID>
            <BLOCK_ID>1073741839</BLOCK_ID>
        </DATA>
    </RECORD>
    <RECORD>
        <OPCODE>OP_SET_GENSTAMP_V2</OPCODE>
        <DATA>
            <TXID>132</TXID>
            <GENSTAMPV2>1016</GENSTAMPV2>
        </DATA>
    </RECORD>
    <RECORD>
        <OPCODE>OP_ADD_BLOCK</OPCODE>
        <DATA>
            <TXID>133</TXID>
            <PATH>/hello7.txt</PATH>
            <BLOCK>
                <BLOCK_ID>1073741839</BLOCK_ID>
                <NUM_BYTES>0</NUM_BYTES>
                <GENSTAMP>1016</GENSTAMP>
            </BLOCK>
            <RPC_CLIENTID></RPC_CLIENTID>
            <RPC_CALLID>-2</RPC_CALLID>
        </DATA>
    </RECORD>
    <RECORD>
        <OPCODE>OP_CLOSE</OPCODE>
        <DATA>
            <TXID>134</TXID>
            <LENGTH>0</LENGTH>
            <INODEID>0</INODEID>
            <PATH>/hello7.txt</PATH>
            <REPLICATION>2</REPLICATION>
            <MTIME>1512943608761</MTIME>
            <ATIME>1512943607866</ATIME>
            <BLOCKSIZE>134217728</BLOCKSIZE>
            <CLIENT_NAME></CLIENT_NAME>
            <CLIENT_MACHINE></CLIENT_MACHINE>
            <OVERWRITE>false</OVERWRITE>
            <BLOCK>
                <BLOCK_ID>1073741839</BLOCK_ID>
                <NUM_BYTES>25</NUM_BYTES>
                <GENSTAMP>1016</GENSTAMP>
            </BLOCK>
            <PERMISSION_STATUS>
                <USERNAME>atguigu</USERNAME>
                <GROUPNAME>supergroup</GROUPNAME>
                <MODE>420</MODE>
            </PERMISSION_STATUS>
        </DATA>
    </RECORD>
</EDITS >

 

Think: how does the NameNode determine which Edits to merge when it starts up next time?

 

3 CheckPoint time setting

 

(1) Generally, the SecondaryNameNode is executed every hour.

 

[hdfs-default.xml]

<property>
  <name>dfs.namenode.checkpoint.period</name>
  <value>3600</value>
</property>

(2) Check the number of operations once a minute. 3 when the number of operations reaches 1million, the SecondaryNameNode executes once.

 

<property>
  <name>dfs.namenode.checkpoint.txns</name>
  <value>1000000</value>
<description>Operation action times</description>
</property>

<property>
  <name>dfs.namenode.checkpoint.check.period</name>
  <value>60</value>
<description> 1 Operation times per minute</description>
</property >

 

4 NameNode fault handling

After a NameNode fails, the following two methods can be used to recover data.

Method 1: copy the data in the SecondaryNameNode to the directory where the NameNode stores the data;

1. kill -9 NameNode process

2. delete the data stored in the NameNode (/opt/module/hadoop-2.7.2/data/tmp/dfs/name)

[atguigu@hadoop102 hadoop-2.7.2]$ rm -rf /opt/module/hadoop-2.7.2/data/tmp/dfs/name/*

3. copy the data in the SecondaryNameNode to the original NameNode storage data directory

 

[atguigu@hadoop102 dfs]$ scp -r atguigu@hadoop104:/opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary/* ./name/

 

4. restart NameNode

[atguigu@hadoop102 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start namenode

Method 2: use the -importCheckpoint option to start the NameNode daemon to copy the data in the SecondaryNameNode to the NameNode directory.

  1. Modify HDFS site In XML

 

<property>
  <name>dfs.namenode.checkpoint.period</name>
  <value>120</value>
</property>

<property>
  <name>dfs.namenode.name.dir</name>
  <value>/opt/module/hadoop-2.7.2/data/tmp/dfs/name</value>
</property>

 

 

2. kill -9 NameNode process

3. delete the data stored in the NameNode (/opt/module/hadoop-2.7.2/data/tmp/dfs/name)

[atguigu@hadoop102 hadoop-2.7.2]$ rm -rf /opt/module/hadoop-2.7.2/data/tmp/dfs/name/*

4. if the SecondaryNameNode is not on the same host node as the NameNode, copy the directory where the SecondaryNameNode stores data to the peer directory where the NameNode stores data, and delete in_ Use Lock file

[atguigu@hadoop102 dfs]$ scp -r atguigu@hadoop104:/opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary ./

[atguigu@hadoop102 namesecondary]$ rm -rf in_use.lock

[atguigu@hadoop102 dfs]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs

[atguigu@hadoop102 dfs]$ ls
data  name  namesecondary

5. import checkpoint data (wait for ctrl+c to finish)

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hdfs namenode -importCheckpoint

6. start NameNode

[atguigu@hadoop102 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start namenode

5 cluster security mode

 

1. general

 

 

 

2 basic grammar

The cluster is in safe mode and cannot perform important operations (write operations). After the cluster is started, it will automatically exit the safe mode.

 

(1) bin/hdfs dfsadmin -safemode get (function description: View safe mode status)

 

(2) bin/hdfs dfsadmin -safemode enter (function description: enter safe mode state)

 

(3) bin/hdfs dfsadmin -safemode leave (function description: leave safe mode state)

 

(4) bin/hdfs dfsadmin -safemode wait (function description: wait for safe mode status)

3. cases

Simulate wait safe mode

(1) View current mode

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfsadmin -safemode get

Safe mode is OFF

(2) First in safe mode

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hdfs dfsadmin -safemode enter

(3) Create and execute the following script

On the /opt/module/hadoop-2.7.2 path, edit a script safemode Sh

[atguigu@hadoop102 hadoop-2.7.2]$ touch safemode.sh

[atguigu@hadoop102 hadoop-2.7.2]$ vim safemode.sh

#!/bin/bash

hdfs dfsadmin -safemode wait

hdfs dfs -put /opt/module/hadoop-2.7.2/README.txt /

[atguigu@hadoop102 hadoop-2.7.2]$ chmod 777 safemode.sh

[atguigu@hadoop102 hadoop-2.7.2]$ ./safemode.sh

(4) Open another window and execute

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hdfs dfsadmin -safemode leave

(5) Observation

(a) Look at the previous window again

Safe mode is OFF

(b) There are already uploaded data on the HDFS cluster.

 

6 NameNode multi directory configuration

1. the local directories of namenode can be configured into multiple directories, and each directory stores the same content, which increases the reliability

2. the specific configuration is as follows

 

(1) On HDFS site Add the following contents to the XML file

 

<property>
    <name>dfs.namenode.name.dir</name>
<value>file:///${hadoop.tmp.dir}/dfs/name1,file:///${hadoop.tmp.dir}/dfs/name2</value>
</property>

 

(2) Stop the cluster and delete all data in data and logs.

 

[atguigu@hadoop102 hadoop-2.7.2]$ rm -rf data/ logs/
[atguigu@hadoop103 hadoop-2.7.2]$ rm -rf data/ logs/
[atguigu@hadoop104 hadoop-2.7.2]$ rm -rf data/ logs/

 

(3) Format the cluster and start it.

 

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hdfs namenode –format
[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh

 

(4) View results

 

[atguigu@hadoop102 dfs]$ ll
 Total consumption 12
drwx------. 3 atguigu atguigu 4096 12 November 8:03 data
drwxrwxr-x. 3 atguigu atguigu 4096 12 November 8:03 name1
drwxrwxr-x. 3 atguigu atguigu 4096 12 November 8:03 name2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Tags: Big Data

Posted by crispytown on Mon, 30 May 2022 17:53:18 +0530