Normalization/Standardization Methods in Deep Learning

Continuing with the content of the previous two articles:

Loss Functions in Deep Learning1

Loss Functions in Deep Learning2

It is still the right to record as personal study notes.

1. Normalization/Standardization

By definition, normalization refers to converting data into small intervals with a length of 1 or near the origin, while standardization refers to converting data into data with a mean of 0 and a standard deviation of 1. . Normalization and standardization are essentially some kind of data change, whether it is a linear change or a nonlinear change, it will not change the numerical order in the original data, and they can convert the eigenvalues ​​​​to the same dimension. Since normalization is to map the data into a specific interval, its scaling range is only determined by the extreme values ​​​​in the data, while standardization is to transform the source data into a distribution with a mean of 0 and a variance of 1, which involves calculating the data The mean and standard deviation of , each sample point will have an impact on the standardization process.

In deep learning, using normalized/standardized data can speed up the convergence speed of the model, and it can sometimes improve the accuracy of the model, which is especially significant in models involving distance calculations. When calculating distances, if the amount of data If the syllabus is inconsistent, the final calculation result will be more biased towards the data with a large range. Since the numerical level is reduced, when the computer performs calculations, on the one hand, it can prevent the gradient of the model from being too large (explosion), and on the other hand, it can also avoid some numerical problems caused by too large values.

2. Normalization method

2.1 Min-Max Normalization

2.2 Mean normalization

2.3 Logarithmic function normalization

2.4 Arctangent function normalization

3. Standardization method

3.1 Z-score standardization

It is often used for data preprocessing. It is necessary to calculate the mean and standard deviation of all sample data first and then change the sample.

3.2Batch Normalization

def batch_normalization(inp, 
                        name, 
                        weight1=0.99, 
                        weight2=0.99, 
                        is_training=True):
    with tf.variable_scope(name):
        # Get the shape of the input tensor
        inp_shape = inp.get_shape().as_list()

        # Defines the moving average of the non-trainable variable hist_mean record mean
        # The shape is the same as the last dimension of the input tensor
        hist_mean = tf.get_variable('hist_mean', 
                                    shape=inp_shape[-1:], 
                                    initializer=tf.zeros_initializer(), 
                                    trainable=False)

        # Define the moving average of the variance of the non-trainable variable hist_var records
        # The shape is the same as the last dimension of the input tensor
        hist_var = tf.get_variable('hist_var', 
                                   shape=inp_shape[-1:], 
                                   initializer=tf.ones_initializer(), 
                                   trainable=False)

        # Define trainable variables gamma and beta with the same shape as the last dimension of the input tensor
        gamma = tf.Variable(tf.ones(inp_shape[-1:]), name='gamma')
        beta = tf.Variable(tf.zeros(inp_shape[-1:]), name='beta')

        # Computes the mean and variance of the input tensor except the last dimension
        batch_mean, batch_var = tf.nn.moments(inp, 
                                    axes=[i for i in range(len(inp_shape) - 1)], 
                                    name='moments')

        # Calculate the moving average of the mean and assign the result to hist_mean/running_mean
        running_mean = tf.assign(hist_mean, 
                                 weight1 * hist_mean + (1 - weight1) * batch_mean)

        # Calculate the moving average of the variance and assign the result to hist_var/running_var
        running_var = tf.assign(hist_var, 
                                weight2 * hist_var + (1 - weight2) * batch_var)

        # Use control_dependencies to restrict the calculation of moving averages first
        with tf.control_dependencies([running_mean, running_var]):
            # Select different values ​​for normalization according to whether the current state is training or testing
            # is_training=True, use batch_mean & batch_var
            # is_training=False, use running_mean & running_var
            output = tf.cond(tf.cast(is_training, tf.bool),
                             lambda: tf.nn.batch_normalization(inp, 
                                                mean=batch_mean, 
                                                variance=batch_var, 
                                                scale=gamma, 
                                                offset=beta, 
                                                variance_epsilon=1e-5, 
                                                name='bn'),
                             lambda: tf.nn.batch_normalization(inp, 
                                                mean=running_mean, 
                                                variance=running_var, 
                                                scale=gamma, 
                                                offset=beta, 
                                                variance_epsilon=1e-5, 
                                                name='bn')
                             )
        return output

def _batch_normalization(inp, name, weight1=0.99, weight2=0.99, is_training=True):
    with tf.variable_scope(name):
        return tf.layers.batch_normalization(
            inp,
            training=is_training
        )
copy

3.3Layer Normalization

import tensorflow.compat.v1 as tf


def layer_normalization(inp, name):
    with tf.variable_scope(name):
        # Get the shape of the input tensor
        inp_shape = inp.get_shape().as_list()

        # Define the trainable variables gamma and beta, the batch dimension is the same as the first dimension of the input tensor
        para_shape = [inp_shape[0]] + [1] * (len(inp_shape) - 1)
        gamma = tf.Variable(tf.ones(para_shape, name='gamma'))
        beta = tf.Variable(tf.zeros(para_shape, name='beta'))

        # Computes the mean and variance of the input tensor except the first dimension
        layer_mean, layer_var = tf.nn.moments(inp, 
                                    axes=[i for i in range(1, len(inp_shape))],
                                    name='moments', keep_dims=True)

        output = gamma * (inp - layer_mean) / tf.sqrt(layer_var + 1e-5) + beta
        return output

if __name__ == "__main__":
    a = tf.ones([128, 10, 10, 3])
    b = layer_normalization(a, name='ln')
    print(b.shape)
    with tf.Session() as sess:
        tf.global_variables_initializer().run()
        print(sess.run(b))
copy

3.4Instance Normalization

import tensorflow.compat.v1 as tf

def instance_normalization(inp, name):
    with tf.variable_scope(name):
        # Get the shape of the input tensor
        inp_shape = inp.get_shape().as_list()

        # Define the trainable variables gamma and beta, the shape is [n,1,1,c] to facilitate direct linear transformation
        para_shape = [inp_shape[0], 1, 1, inp_shape[-1]]
        gamma = tf.Variable(tf.ones(para_shape, name='gamma'))
        beta = tf.Variable(tf.zeros(para_shape, name='beta'))

        # Computes the mean and variance above the first (H) and second (W) dimensions of the input tensor
        insta_mean, insta_var = tf.nn.moments(inp, 
                                        axes=[1,2],
                                        name='moments', keep_dims=True)

        output = gamma * (inp - insta_mean) / tf.sqrt(insta_var + 1e-5) + beta
        return output

if __name__ == "__main__":
    a = tf.ones([128, 10, 10, 3])
    b = instance_normalization(a, name='in')
    print(b.shape)
    with tf.Session() as sess:
        tf.global_variables_initializer().run()
        print(sess.run(b))
copy

3.5Group Normalization

import tensorflow.compat.v1 as tf


def group_normalization(inp, name, G=32):
    with tf.variable_scope(name):
        # Get the shape of the input tensor
        insp = inp.get_shape().as_list()

        # Convert the input NHWC format to NCHW for convenient grouping
        inp = tf.transpose(inp, [0, 3, 1, 2])
        
        # Group the input tensors to get a new tensor of shape [n,G,c//G,h,w]
        inp = tf.reshape(inp, 
                [insp[0], G, insp[-1] // G, insp[1], insp[2]])

        # Define the trainable variables gamma and beta, the shape is [1,1,1,c] to facilitate direct linear transformation
        para_shape = [1, 1, 1, insp[-1]]
        gamma = tf.Variable(tf.ones(para_shape, name='gamma'))
        beta = tf.Variable(tf.zeros(para_shape, name='beta'))

        # Computes the mean and variance above the second, third and fourth (c//G, h, w) dimensions of the input tensor
        group_mean, group_var = tf.nn.moments(inp, 
                                        axes=[2, 3, 4],
                                        name='moments', keep_dims=True)
        
        inp = (inp - group_mean) / tf.sqrt(group_var + 1e-5)

        # Restore tensor shape to original shape [n,h,w,c]
        # First regroup the standardized grouping results into [n,c,w,h]
        inp = tf.reshape(inp, 
                [insp[0], insp[-1], insp[1], insp[2]])

        # Convert NCHW format to NHWC by transpose operation
        inp = tf.transpose(inp, [0, 2, 3, 1])

        output = gamma * inp + beta
        return output

if __name__ == "__main__":
    import numpy as np
    a = tf.constant(np.random.randn(1,2,2,64), dtype=tf.float32)
    b = group_normalization(a, name='gn1')

    with tf.Session() as sess:
        tf.global_variables_initializer().run()
        rb = sess.run(b)
        print(rb)
copy

3.6 Switchable Normalization

The purpose of switchable Normalization is to enable the model to automatically learn the most suitable standardized strategy.

Tags: Deep Learning

Posted by ksmatthews on Mon, 21 Nov 2022 10:28:41 +0530