Continuing with the content of the previous two articles:

Loss Functions in Deep Learning1

Loss Functions in Deep Learning2

It is still the right to record as personal study notes.

1. Normalization/Standardization

By definition, normalization refers to converting data into small intervals with a length of 1 or near the origin, while standardization refers to converting data into data with a mean of 0 and a standard deviation of 1. . Normalization and standardization are essentially some kind of data change, whether it is a linear change or a nonlinear change, it will not change the numerical order in the original data, and they can convert the eigenvalues to the same dimension. Since normalization is to map the data into a specific interval, its scaling range is only determined by the extreme values in the data, while standardization is to transform the source data into a distribution with a mean of 0 and a variance of 1, which involves calculating the data The mean and standard deviation of , each sample point will have an impact on the standardization process.

In deep learning, using normalized/standardized data can speed up the convergence speed of the model, and it can sometimes improve the accuracy of the model, which is especially significant in models involving distance calculations. When calculating distances, if the amount of data If the syllabus is inconsistent, the final calculation result will be more biased towards the data with a large range. Since the numerical level is reduced, when the computer performs calculations, on the one hand, it can prevent the gradient of the model from being too large (explosion), and on the other hand, it can also avoid some numerical problems caused by too large values.

2. Normalization method

2.1 Min-Max Normalization

2.2 Mean normalization

2.3 Logarithmic function normalization

2.4 Arctangent function normalization

3. Standardization method

3.1 Z-score standardization

It is often used for data preprocessing. It is necessary to calculate the mean and standard deviation of all sample data first and then change the sample.

3.2Batch Normalization

copydef batch_normalization(inp, name, weight1=0.99, weight2=0.99, is_training=True): with tf.variable_scope(name): # Get the shape of the input tensor inp_shape = inp.get_shape().as_list() # Defines the moving average of the non-trainable variable hist_mean record mean # The shape is the same as the last dimension of the input tensor hist_mean = tf.get_variable('hist_mean', shape=inp_shape[-1:], initializer=tf.zeros_initializer(), trainable=False) # Define the moving average of the variance of the non-trainable variable hist_var records # The shape is the same as the last dimension of the input tensor hist_var = tf.get_variable('hist_var', shape=inp_shape[-1:], initializer=tf.ones_initializer(), trainable=False) # Define trainable variables gamma and beta with the same shape as the last dimension of the input tensor gamma = tf.Variable(tf.ones(inp_shape[-1:]), name='gamma') beta = tf.Variable(tf.zeros(inp_shape[-1:]), name='beta') # Computes the mean and variance of the input tensor except the last dimension batch_mean, batch_var = tf.nn.moments(inp, axes=[i for i in range(len(inp_shape) - 1)], name='moments') # Calculate the moving average of the mean and assign the result to hist_mean/running_mean running_mean = tf.assign(hist_mean, weight1 * hist_mean + (1 - weight1) * batch_mean) # Calculate the moving average of the variance and assign the result to hist_var/running_var running_var = tf.assign(hist_var, weight2 * hist_var + (1 - weight2) * batch_var) # Use control_dependencies to restrict the calculation of moving averages first with tf.control_dependencies([running_mean, running_var]): # Select different values for normalization according to whether the current state is training or testing # is_training=True, use batch_mean & batch_var # is_training=False, use running_mean & running_var output = tf.cond(tf.cast(is_training, tf.bool), lambda: tf.nn.batch_normalization(inp, mean=batch_mean, variance=batch_var, scale=gamma, offset=beta, variance_epsilon=1e-5, name='bn'), lambda: tf.nn.batch_normalization(inp, mean=running_mean, variance=running_var, scale=gamma, offset=beta, variance_epsilon=1e-5, name='bn') ) return output def _batch_normalization(inp, name, weight1=0.99, weight2=0.99, is_training=True): with tf.variable_scope(name): return tf.layers.batch_normalization( inp, training=is_training )

3.3Layer Normalization

copyimport tensorflow.compat.v1 as tf def layer_normalization(inp, name): with tf.variable_scope(name): # Get the shape of the input tensor inp_shape = inp.get_shape().as_list() # Define the trainable variables gamma and beta, the batch dimension is the same as the first dimension of the input tensor para_shape = [inp_shape[0]] + [1] * (len(inp_shape) - 1) gamma = tf.Variable(tf.ones(para_shape, name='gamma')) beta = tf.Variable(tf.zeros(para_shape, name='beta')) # Computes the mean and variance of the input tensor except the first dimension layer_mean, layer_var = tf.nn.moments(inp, axes=[i for i in range(1, len(inp_shape))], name='moments', keep_dims=True) output = gamma * (inp - layer_mean) / tf.sqrt(layer_var + 1e-5) + beta return output if __name__ == "__main__": a = tf.ones([128, 10, 10, 3]) b = layer_normalization(a, name='ln') print(b.shape) with tf.Session() as sess: tf.global_variables_initializer().run() print(sess.run(b))

3.4Instance Normalization

copyimport tensorflow.compat.v1 as tf def instance_normalization(inp, name): with tf.variable_scope(name): # Get the shape of the input tensor inp_shape = inp.get_shape().as_list() # Define the trainable variables gamma and beta, the shape is [n,1,1,c] to facilitate direct linear transformation para_shape = [inp_shape[0], 1, 1, inp_shape[-1]] gamma = tf.Variable(tf.ones(para_shape, name='gamma')) beta = tf.Variable(tf.zeros(para_shape, name='beta')) # Computes the mean and variance above the first (H) and second (W) dimensions of the input tensor insta_mean, insta_var = tf.nn.moments(inp, axes=[1,2], name='moments', keep_dims=True) output = gamma * (inp - insta_mean) / tf.sqrt(insta_var + 1e-5) + beta return output if __name__ == "__main__": a = tf.ones([128, 10, 10, 3]) b = instance_normalization(a, name='in') print(b.shape) with tf.Session() as sess: tf.global_variables_initializer().run() print(sess.run(b))

3.5Group Normalization

copyimport tensorflow.compat.v1 as tf def group_normalization(inp, name, G=32): with tf.variable_scope(name): # Get the shape of the input tensor insp = inp.get_shape().as_list() # Convert the input NHWC format to NCHW for convenient grouping inp = tf.transpose(inp, [0, 3, 1, 2]) # Group the input tensors to get a new tensor of shape [n,G,c//G,h,w] inp = tf.reshape(inp, [insp[0], G, insp[-1] // G, insp[1], insp[2]]) # Define the trainable variables gamma and beta, the shape is [1,1,1,c] to facilitate direct linear transformation para_shape = [1, 1, 1, insp[-1]] gamma = tf.Variable(tf.ones(para_shape, name='gamma')) beta = tf.Variable(tf.zeros(para_shape, name='beta')) # Computes the mean and variance above the second, third and fourth (c//G, h, w) dimensions of the input tensor group_mean, group_var = tf.nn.moments(inp, axes=[2, 3, 4], name='moments', keep_dims=True) inp = (inp - group_mean) / tf.sqrt(group_var + 1e-5) # Restore tensor shape to original shape [n,h,w,c] # First regroup the standardized grouping results into [n,c,w,h] inp = tf.reshape(inp, [insp[0], insp[-1], insp[1], insp[2]]) # Convert NCHW format to NHWC by transpose operation inp = tf.transpose(inp, [0, 2, 3, 1]) output = gamma * inp + beta return output if __name__ == "__main__": import numpy as np a = tf.constant(np.random.randn(1,2,2,64), dtype=tf.float32) b = group_normalization(a, name='gn1') with tf.Session() as sess: tf.global_variables_initializer().run() rb = sess.run(b) print(rb)

3.6 Switchable Normalization

The purpose of switchable Normalization is to enable the model to automatically learn the most suitable standardized strategy.