Data structure (7. Huffman tree)


1, Huffman tree

2, Huffman coding implementation


1, Huffman tree

Huffman tree, also known as the optimal tree, is a kind of tree with the shortest weighted path length. This paper discusses the optimal binary tree.

Here are some noun concepts.

Path: a branch from one node to another in a tree

Path length: number of branches on the path

Tree path length: the sum of the path length from the tree root to each leaf node

Weighted path length of tree: the sum of path lengths from tree root to all weighted leaf nodes

Steps to construct Huffman tree:

1. According to the given n weights {w1,w2,... wn}, N binary trees with root nodes are constructed;

2. In the forest, two trees with the smallest root node weight are selected as the left and right subtrees to construct a new binary tree, and the root node weight of the new binary tree is the sum of the root node weights of its left and right subtrees;

3. Delete the two trees in the forest and add the newly obtained binary tree to the forest;

4. Repeat the above two steps until there is only one tree, the Huffman tree.

2, Huffman coding implementation

Huffman code: after constructing a Huffman tree, set a prefix code for its leaf node. The rule is that if the left branch of the tree is' 0 'and the right branch is' 1', the character string composed of branch characters on the path from the root node to the leaf node can be used as the character code of the leaf node.

For example, the Huffman tree composed of eight nodes with weight w={5, 29, 7, 8, 14, 23, 3, 11} is as follows:

  Code of node with weight of 5: 0110         Code with weight of 8 nodes: 1111

  A Huffman tree with n leaf nodes has (2n-1) nodes, which can be stored in an array of 2n-1. Next, the implementation of Huffman coding is given (the code can be copied directly and run). See code Notes for specific operation process.

/* Huffman coding */
using namespace std;

//Storage representation of Huffman tree and Huffman coding
typedef struct
	int weight;
	int parent, lchild, rchild;
}HTNode, * HuffmanTree;

typedef char** HuffmanCode;

void select(HuffmanTree HT, int i, int& s1, int& s2)
{//Select two nodes with parent==0 and minimum weight in HT[1...i], and their serial numbers are s1 and s2 respectively
	int min1 = 99998;//Used to save the minimum weight
	int min2 = 99999;//Used to save the second small weight
	for (int k = 1; k <= i; k++)
		if(HT[k].parent == 0)
		if (HT[k].weight < min1)//Update s1, s2
			min2 = min1;
			s2 = s1;
			min1 = HT[k].weight;
			s1 = k;
		else if (HT[k].weight < min2)//Update s2
			min2 = HT[k].weight;
			s2 = k;
/*-----Huffman coding-----*/
void huffmanCoding(HuffmanTree& HT, HuffmanCode& HC, int w[], int n)
{//w stores the weights of n characters, constructs the Huffman tree HT, and obtains the Huffman code HC of n characters
	if (n <= 1) return;
	int m = 2 * n - 1;
	HT = new HTNode[m + 1];//Unit 0 is not used
	for (int i = 1; i <= m; i++)//The parent, lchild and rchild of m htnodes are initialized to 0
		HT[i].parent = HT[i].parent = HT[i].lchild = HT[i].rchild = 0;
	for (int i = 1; i <= n; i++)//Initialize the weights of n nodes
		HT[i].weight = w[i - 1];
	//Printf ("initial state of HT ================================= \ n");
	//printf("node i 	 weight 	 parent 	 lchild 	 rchild\n");
	//for (int i = 1; i <= m; i++)
	//	printf("%d	%d	%d	%d	%d\n", i, HT[i].weight, HT[i].parent, HT[i].lchild, HT[i].rchild);
	/*----- After the initialization, start to build Huffman tree-----*/
	for (int i = n + 1; i <= m; i++)
	{//Create Huffman tree through n-1 selection, deletion and merging
		//Select two nodes with parent==0 and minimum weight in HT[1...i-1], and their serial numbers are s1 and s2 respectively
		int s1 = 0, s2 = 0;
		select(HT, i - 1, s1, s2);
		//cout << s1 << "-----" << s2 << endl;
		HT[s1].parent = i; HT[s2].parent = i;//Delete s1 and s2 from the forest
		HT[i].lchild = s1; HT[i].rchild = s2;//Set the left and right children of i as s1 and s2
		HT[i].weight = HT[s1].weight + HT[s2].weight;//The weight of i is the sum of the weight of left and right children
	//Printf ("final state of HT ================================= \ n");
	//printf("node i 	 weight 	 parent 	 lchild 	 rchild\n");
	//for (int i = 1; i <= m; i++)
	//	printf("%d	%d	%d	%d	%d\n", i, HT[i].weight, HT[i].parent, HT[i].lchild, HT[i].rchild);
	/*----- Reverse the Huffman code of each character from leaf to root-----*/
	HC = new char* [n + 1];//Allocate a header pointer vector encoded with n characters. Cell 0 is not used
	char* cd = new char[n];//Allocate temporary storage working interval for coding
	cd[n - 1] = '\0';//Encoding Terminator
	for (int i = 1; i <= n; i++)
	{//Huffman coding character by character
		int start = n - 1;//Code terminator position
		int c, f;
		for ( c = i, f = HT[i].parent; f != 0; c = f, f = HT[f].parent)//Reverse encoding from leaf to root
			if (HT[f].lchild == c)
				cd[--start] = '0';
				cd[--start] = '1';
			HC[i] = new char[n - start];//Allocate storage space for the ith character encoding
			strcpy_s(HC[i], n-start, &cd[start]);

int main()
	HuffmanTree HT = NULL;
	HuffmanCode HC = NULL;
	int w[] = { 5,29,7,8,14,23,3,11 };
	int n = sizeof(w) / sizeof(int);

	cout << "Weight of each node:" << endl;
	for (int i = 0; i < n; i++)
		cout << w[i] << " ";
	cout << endl;

	huffmanCoding(HT, HC, w, n);

	cout << "\n Huffman code corresponding to each node:" << endl;
	for (int i = 1; i <= n; i++)
		cout << HC[i] << endl;

	/* Free up memory in heap */
	delete[] HT;
	for (int i = 1; i <= n; i++)
	delete[] HC;

	return 0;

Tags: C Algorithm C++ data structure

Posted by delboy1978uk on Tue, 21 Sep 2021 02:34:18 +0530