popular science knowledge
International Conference on computer vision and pattern recognition (CVPR) is an annual academic conference of IEEE. The main content of the conference is computer vision and pattern recognition technology. CVPR is the world's top computer vision Conference (one of the three top conferences, the other two are ICCV and ECCV). In recent years, there are about 1500 participants every year, and the number of papers included is generally about 300. Every year, the conference will have a fixed discussion topic, and every year, the company will sponsor the conference and get the opportunity to display it at the conference.
#Foreword
SEP.

Students, we finally meet again. Your little editor, a recluse, has returned. It is almost a month since the recent article 9.26. The little editor also misses you very much. Recently, he has been busy writing his paper. He has been sleepy for several months. It is a pity to give up. If he doesn't give up, he won't get a good result. The result has been deadlocked. Finally, he has made a comparison with SOTA. As soon as he finishes his paper, he will embrace you. To get back to the point, we followed the previous theoretical article Theory of deep learning (15) -- a preliminary study of VGG the mystery of depth Strike while the iron is hot for actual training.
TensorFlow VGG16 actual combat
In this VGG16 network structure, we still use the previous flower data set, and the code part also uses the previous structure. However, the processing code of the input data is slightly upgraded, and the rest are similar. Later, I will put the code on the code hosting platform to facilitate the students to download and test.
1. data preparation
Five categories of flower data sets: they are mainly divided into training and verification sets: train and val. each set contains five categories of flowers. Generally speaking, the number of training sets is far more than that of the same category of the verification set.



2. network structure
'''
VGG network structure input: 224x224x3
1. 64 channel convolution layer block: input: 224x224x3, 2-layer 3x3x64 convolution structure, padding, output: 64x224x224
2. maxpooling1: input: 64x224x224, output: 64x112x112
3. 128 channel convolution block: input: 64x112x1122, 2-layer 3x3x128 convolution structure, padding, output: 128x112x112
4. maxpooling2: input: 128x112x112, output: 128x56x56
5. 256 channel convolution block: input: 128x56x563 layers, 3x3x256 convolution structure, padding, output: 256x56x56.
6. maxpooling3: input: 256x56x56, output: 256x28x28
7. 512 channel convolution block: input: 256x28x28, 3-layer 3x3x512 convolution structure, padding, output: 512x28x28.
8 maxpooling4: input: 512x28x28, output: 512x14x14.
9. 512 channel convolution block: input: 512x14x14, 3-layer 3x3x512 convolution structure, padding, output: 512x14x14.
10. maxpooling4: input: 512x14x14, output: 512x7x7.
11. full connection layer 1: input: 512*7*7, output: 4096
12. full connection layer 1: input: 4096, output: 4096
13. full connection layer 1: input: 4096, output: 5 (number of categories)
'''
copydef inference(images, batch_size, n_classes,drop_rate): # A simple convolution neural network, convolution + pooling layer x2, full connection layer x2, and the last softmax layer for classification. # Convolution layer 1 # 1. 64 channel convolution layer block: input: 224x224x3, 2-layer 3x3x64 convolution structure, padding, output: 64x224x224 # 2. maxpooling1: input: 64x224x224, output: 112x112x64 conv1 = Conv_layer(names = 'conv_block1', input = images , w_shape = [3, 3, 3, 64], b_shape = [64], strid = [1, 1]) conv2 = Conv_layer(names = 'conv_block2', input = conv1 , w_shape = [3, 3, 64, 64], b_shape = [64], strid = [1, 1]) pool_1 = Max_pool_lrn(names = 'pooling1', input = conv2 , ksize = [1, 2, 2, 1], is_lrn = True) # 3. 128 channel convolution block: input: 64x112x1122, 2-layer 3x3x128 convolution structure, padding, output: 128x112x112 # 4. maxpooling2: input: 128x112x112, output: 128x56x56 conv3 = Conv_layer(names = 'conv_block3', input = pool_1 , w_shape = [3, 3, 64, 128], b_shape = [128], strid = [1, 1]) conv4 = Conv_layer(names = 'conv_block4', input = conv3 , w_shape = [3, 3, 128, 128], b_shape = [128], strid = [1, 1]) pool_2 = Max_pool_lrn(names = 'pooling2', input = conv4 , ksize = [1, 2, 2, 1], is_lrn = False) # 5. 256 channel convolution block: input: 128x56x56, 3-layer 3x3x256 convolution structure, padding, output: 256x56x56. # 6. maxpooling3: input: 256x56x56, output: 256x28x28 conv5 = Conv_layer(names = 'conv_block5', input = pool_2 , w_shape = [3, 3, 128, 256], b_shape = [256], strid = [1, 1]) conv6 = Conv_layer(names = 'conv_block6', input = conv5 , w_shape = [3, 3, 256, 256], b_shape = [256], strid = [1, 1]) conv7 = Conv_layer(names = 'conv_block7', input = conv6 , w_shape = [3, 3, 256, 256], b_shape = [256], strid = [1, 1]) pool_3 = Max_pool_lrn(names = 'pooling3', input = conv7 , ksize = [1, 2, 2, 1], is_lrn = False) # 7. 512 channel convolution block: input: 256x28x28, 3-layer 3x3x512 convolution structure, padding, output: 512x28x28. # 8 maxpooling4: input: 512x28x28, output: 512x14x14. conv8 = Conv_layer(names = 'conv_block8', input = pool_3 , w_shape = [3, 3, 256, 512], b_shape = [512], strid = [1, 1]) conv9 = Conv_layer(names = 'conv_block9', input = conv8 , w_shape = [3, 3, 512, 512], b_shape = [512], strid = [1, 1]) conv10 = Conv_layer(names = 'conv_block10', input = conv9 , w_shape = [3, 3, 512, 512], b_shape = [512], strid = [1, 1]) pool_4 = Max_pool_lrn(names = 'pooling4', input = conv10 , ksize = [1, 2, 2, 1], is_lrn = False) # print(pool_4.shape) # 9. 512 channel convolution block: input: 512x14x14, 3-layer 3x3x512 convolution structure, padding, output: 512x14x14. # 10. maxpooling4: input: 512x14x14, output: 512x7x7. conv11 = Conv_layer(names = 'conv_block11', input = pool_4 , w_shape = [3, 3, 512, 512], b_shape = [512], strid = [1, 1]) conv12 = Conv_layer(names = 'conv_block12', input = conv11 , w_shape = [3, 3, 512, 512], b_shape = [512], strid = [1, 1]) conv13 = Conv_layer(names = 'conv_block13', input = conv12 , w_shape = [3, 3, 512, 512], b_shape = [512], strid = [1, 1]) pool_5 = Max_pool_lrn(names = 'pooling5', input = conv13 , ksize = [1, 2, 2, 1], is_lrn = False) # 11. full connection layer 1: input: 512*7*7, output: 4096 # 12. full connection layer 1: input: 4096, output: 4096 # 13. full connection layer 1: input: 4096, output: 5 (number of categories) reshape = tf.reshape(pool_5, shape=[batch_size, -1]) dim = reshape.get_shape()[1].value local_1 = local_layer(names = 'local1_scope', input = reshape , w_shape = [dim, 4096], b_shape = [4096]) local_2 = local_layer(names = 'local2_scope', input = local_1 , w_shape = [4096, 4096], b_shape = [4096]) with tf.variable_scope('softmax_linear') as scope: weights = tf.Variable(tf.truncated_normal(shape=[4096, n_classes], stddev=0.005, dtype=tf.float32), name='softmax_linear', dtype=tf.float32) biases = tf.Variable(tf.constant(value=0.1, dtype=tf.float32, shape=[n_classes]), name='biases', dtype=tf.float32) softmax_linear = tf.add(tf.matmul(local_2, weights), biases, name='softmax_linear') # print("---------softmax_linear:{}".format(softmax_linear)) return softmax_linear
3. training process

Source code acquisition: https://gitee.com/fengyuxiexie/tensor-flow-vgg16
END
epilogue
This sharing ends here. Although the code practice of TensorFlow has been completed, please pay attention to the following issues.
1. because we only focus on network construction and do not adjust parameters, the performance of the network may be poor.
2. in the training process, the data should be tested in the training set and the test set after a round of training. However, our code tests the test set after a batch of training. This needs to be improved.
3. the code of the input data part has been improved. Please note that the label of each category is customized.
Everybody, have a nice weekend, researchers. Good night!
Editor: Yue yijushi reviewed by: Xiaoquan Jushi