Normalization/Standardization Methods in Deep Learning

Continuing with the content of the previous two articles:

It is still the right to record as personal study notes.

1. Normalization/Standardization

By definition, normalization refers to converting data into small intervals with a length of 1 or near the origin, while standardization refers to converting data into data with a mean of 0 and a standard deviation of 1. . Normalization and standardization are essentially some kind of data change, whether it is a linear change or a nonlinear change, it will not change the numerical order in the original data, and they can convert the eigenvalues ​​​​to the same dimension. Since normalization is to map the data into a specific interval, its scaling range is only determined by the extreme values ​​​​in the data, while standardization is to transform the source data into a distribution with a mean of 0 and a variance of 1, which involves calculating the data The mean and standard deviation of , each sample point will have an impact on the standardization process.

In deep learning, using normalized/standardized data can speed up the convergence speed of the model, and it can sometimes improve the accuracy of the model, which is especially significant in models involving distance calculations. When calculating distances, if the amount of data If the syllabus is inconsistent, the final calculation result will be more biased towards the data with a large range. Since the numerical level is reduced, when the computer performs calculations, on the one hand, it can prevent the gradient of the model from being too large (explosion), and on the other hand, it can also avoid some numerical problems caused by too large values.

2. Normalization method

2.1 Min-Max Normalization

2.2 Mean normalization

2.3 Logarithmic function normalization

2.4 Arctangent function normalization

3. Standardization method

3.1 Z-score standardization

It is often used for data preprocessing. It is necessary to calculate the mean and standard deviation of all sample data first and then change the sample.

3.2Batch Normalization

```def batch_normalization(inp,
name,
weight1=0.99,
weight2=0.99,
is_training=True):
with tf.variable_scope(name):
# Get the shape of the input tensor
inp_shape = inp.get_shape().as_list()

# Defines the moving average of the non-trainable variable hist_mean record mean
# The shape is the same as the last dimension of the input tensor
hist_mean = tf.get_variable('hist_mean',
shape=inp_shape[-1:],
initializer=tf.zeros_initializer(),
trainable=False)

# Define the moving average of the variance of the non-trainable variable hist_var records
# The shape is the same as the last dimension of the input tensor
hist_var = tf.get_variable('hist_var',
shape=inp_shape[-1:],
initializer=tf.ones_initializer(),
trainable=False)

# Define trainable variables gamma and beta with the same shape as the last dimension of the input tensor
gamma = tf.Variable(tf.ones(inp_shape[-1:]), name='gamma')
beta = tf.Variable(tf.zeros(inp_shape[-1:]), name='beta')

# Computes the mean and variance of the input tensor except the last dimension
batch_mean, batch_var = tf.nn.moments(inp,
axes=[i for i in range(len(inp_shape) - 1)],
name='moments')

# Calculate the moving average of the mean and assign the result to hist_mean/running_mean
running_mean = tf.assign(hist_mean,
weight1 * hist_mean + (1 - weight1) * batch_mean)

# Calculate the moving average of the variance and assign the result to hist_var/running_var
running_var = tf.assign(hist_var,
weight2 * hist_var + (1 - weight2) * batch_var)

# Use control_dependencies to restrict the calculation of moving averages first
with tf.control_dependencies([running_mean, running_var]):
# Select different values ​​for normalization according to whether the current state is training or testing
# is_training=True, use batch_mean & batch_var
# is_training=False, use running_mean & running_var
output = tf.cond(tf.cast(is_training, tf.bool),
lambda: tf.nn.batch_normalization(inp,
mean=batch_mean,
variance=batch_var,
scale=gamma,
offset=beta,
variance_epsilon=1e-5,
name='bn'),
lambda: tf.nn.batch_normalization(inp,
mean=running_mean,
variance=running_var,
scale=gamma,
offset=beta,
variance_epsilon=1e-5,
name='bn')
)
return output

def _batch_normalization(inp, name, weight1=0.99, weight2=0.99, is_training=True):
with tf.variable_scope(name):
return tf.layers.batch_normalization(
inp,
training=is_training
)```
copy

3.3Layer Normalization

```import tensorflow.compat.v1 as tf

def layer_normalization(inp, name):
with tf.variable_scope(name):
# Get the shape of the input tensor
inp_shape = inp.get_shape().as_list()

# Define the trainable variables gamma and beta, the batch dimension is the same as the first dimension of the input tensor
para_shape = [inp_shape[0]] + [1] * (len(inp_shape) - 1)
gamma = tf.Variable(tf.ones(para_shape, name='gamma'))
beta = tf.Variable(tf.zeros(para_shape, name='beta'))

# Computes the mean and variance of the input tensor except the first dimension
layer_mean, layer_var = tf.nn.moments(inp,
axes=[i for i in range(1, len(inp_shape))],
name='moments', keep_dims=True)

output = gamma * (inp - layer_mean) / tf.sqrt(layer_var + 1e-5) + beta
return output

if __name__ == "__main__":
a = tf.ones([128, 10, 10, 3])
b = layer_normalization(a, name='ln')
print(b.shape)
with tf.Session() as sess:
tf.global_variables_initializer().run()
print(sess.run(b))```
copy

3.4Instance Normalization

```import tensorflow.compat.v1 as tf

def instance_normalization(inp, name):
with tf.variable_scope(name):
# Get the shape of the input tensor
inp_shape = inp.get_shape().as_list()

# Define the trainable variables gamma and beta, the shape is [n,1,1,c] to facilitate direct linear transformation
para_shape = [inp_shape[0], 1, 1, inp_shape[-1]]
gamma = tf.Variable(tf.ones(para_shape, name='gamma'))
beta = tf.Variable(tf.zeros(para_shape, name='beta'))

# Computes the mean and variance above the first (H) and second (W) dimensions of the input tensor
insta_mean, insta_var = tf.nn.moments(inp,
axes=[1,2],
name='moments', keep_dims=True)

output = gamma * (inp - insta_mean) / tf.sqrt(insta_var + 1e-5) + beta
return output

if __name__ == "__main__":
a = tf.ones([128, 10, 10, 3])
b = instance_normalization(a, name='in')
print(b.shape)
with tf.Session() as sess:
tf.global_variables_initializer().run()
print(sess.run(b))```
copy

3.5Group Normalization

```import tensorflow.compat.v1 as tf

def group_normalization(inp, name, G=32):
with tf.variable_scope(name):
# Get the shape of the input tensor
insp = inp.get_shape().as_list()

# Convert the input NHWC format to NCHW for convenient grouping
inp = tf.transpose(inp, [0, 3, 1, 2])

# Group the input tensors to get a new tensor of shape [n,G,c//G,h,w]
inp = tf.reshape(inp,
[insp[0], G, insp[-1] // G, insp[1], insp[2]])

# Define the trainable variables gamma and beta, the shape is [1,1,1,c] to facilitate direct linear transformation
para_shape = [1, 1, 1, insp[-1]]
gamma = tf.Variable(tf.ones(para_shape, name='gamma'))
beta = tf.Variable(tf.zeros(para_shape, name='beta'))

# Computes the mean and variance above the second, third and fourth (c//G, h, w) dimensions of the input tensor
group_mean, group_var = tf.nn.moments(inp,
axes=[2, 3, 4],
name='moments', keep_dims=True)

inp = (inp - group_mean) / tf.sqrt(group_var + 1e-5)

# Restore tensor shape to original shape [n,h,w,c]
# First regroup the standardized grouping results into [n,c,w,h]
inp = tf.reshape(inp,
[insp[0], insp[-1], insp[1], insp[2]])

# Convert NCHW format to NHWC by transpose operation
inp = tf.transpose(inp, [0, 2, 3, 1])

output = gamma * inp + beta
return output

if __name__ == "__main__":
import numpy as np
a = tf.constant(np.random.randn(1,2,2,64), dtype=tf.float32)
b = group_normalization(a, name='gn1')

with tf.Session() as sess:
tf.global_variables_initializer().run()
rb = sess.run(b)
print(rb)```
copy

3.6 Switchable Normalization

The purpose of switchable Normalization is to enable the model to automatically learn the most suitable standardized strategy.

Tags: Deep Learning

Posted by ksmatthews on Mon, 21 Nov 2022 10:28:41 +0530