1. Program explanation
(1) Vanilla encoder
In the simplest structure of this self encoder, there are only three network layers, that is, the neural network with only one hidden layer. Its input and output are the same. You can learn how to reconstruct the input by using the Adam optimizer and the mean square error loss function.
Here, if the hidden layer dimension (64) is less than the input dimension (784), the encoder is said to be lossy. Through this constraint, the neural network is forced to learn the compressed representation of data.
input_size = 784 hidden_size = 64 output_size = 784 x = Input(shape=(input_size,)) # Encoder h = Dense(hidden_size, activation='relu')(x) # Decoder r = Dense(output_size, activation='sigmoid')(h) autoencoder = Model(input=x, output=r) autoencoder.compile(optimizer='adam', loss='mse')
Dense:Keras Dense layer, keras layers. core. Dense( units, activation=None)
units, # represents the output dimension of this layer
activation=None, # activate the function But the default liner
Activation: the activation layer applies the activation function to the output of a layer
model.compile() : Model model method One: compile
Optimizer: optimizer, predefined optimizer name or optimizer object, reference optimizer
Loss: loss function, which is a predefined loss function name or an objective function, refer to loss function
adam: adaptive motion estimation is an update to RMSProp optimizer. The learning rate of each parameter is dynamically adjusted by using the first-order moment estimation and the second-order moment estimation of the gradient. Advantages: the learning rate of each iteration has a clear range, making the parameter change very stable.
mse: mean_squared_error, mean square error
(2) Multilayer self encoder
If one hidden layer is not enough, it is obvious that the number of hidden layers of the automatic encoder can be further increased.
Here, three hidden layers are used in the implementation, not just one. Any hidden layer can be used as a feature representation, but in order to make the network symmetrical, we use the middle network layer.
input_size = 784 hidden_size = 128 code_size = 64 x = Input(shape=(input_size,)) # Encoder hidden_1 = Dense(hidden_size, activation='relu')(x) h = Dense(code_size, activation='relu')(hidden_1) # Decoder hidden_2 = Dense(hidden_size, activation='relu')(h) r = Dense(input_size, activation='sigmoid')(hidden_2) autoencoder = Model(input=x, output=r) autoencoder.compile(optimizer='adam', loss='mse')
(3) Convolutional self encoder
In addition to the full connection layer, the self encoder can also be applied to the convolution layer. The principle is the same, but the 3D vector (such as image) should be used instead of the flattened one-dimensional vector. The input image is down sampled to provide a potential representation of smaller dimensions to force the self encoder to learn from the compressed data.
x = Input(shape=(28, 28,1)) # Encoder conv1_1 = Conv2D(16, (3, 3), activation='relu', padding='same')(x) pool1 = MaxPooling2D((2, 2), padding='same')(conv1_1) conv1_2 = Conv2D(8, (3, 3), activation='relu', padding='same')(pool1) pool2 = MaxPooling2D((2, 2), padding='same')(conv1_2) conv1_3 = Conv2D(8, (3, 3), activation='relu', padding='same')(pool2) h = MaxPooling2D((2, 2), padding='same')(conv1_3) # Decoder conv2_1 = Conv2D(8, (3, 3), activation='relu', padding='same')(h) up1 = UpSampling2D((2, 2))(conv2_1) conv2_2 = Conv2D(8, (3, 3), activation='relu', padding='same')(up1) up2 = UpSampling2D((2, 2))(conv2_2) conv2_3 = Conv2D(16, (3, 3), activation='relu')(up2) up3 = UpSampling2D((2, 2))(conv2_3) r = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(up3) autoencoder = Model(input=x, output=r) autoencoder.compile(optimizer='adam', loss='mse')
conv2d: Conv2D(filters, kernel_size, strides=(1, 1), padding='valid')
filters: the number of convolution kernels (that is, the dimension of the output).
kernel_size: width and length of convolution kernel, single integer or list/tuple composed of two integers. If it is a single integer, it indicates the same length in each spatial dimension.
Strings: convolution step size, single integer or list/tuple composed of two integers. If it is a single integer, it indicates the same step size in each spatial dimension. Any strings that are not 1 are associated with any division that is not 1_ Rates are incompatible.
padding: there are "valid" and "same" strategies for supplementing 0. "Valid" means only effective convolution, that is, boundary data is not processed. "Same" represents the convolution result at the retained boundary, which usually results in the same output shape as the input shape.
Maxpooling2d: the maximum pooled layer of 2D input. MaxPooling2D(pool_size=(2, 2), strides=None, border_mode='valid')
pool_size: pool_ Size: an integer tuple with a length of 2 represents the down sampling factor in two directions (vertical and horizontal). If (2, 2) is taken, the picture will become half of its original length in both dimensions.
Strings: integer tuple with length of 2, or None, step value.
padding: string, "valid" or "same".
Upsampling2d: upsampling. UpSampling2D(size=(2, 2))
size: integer tuple, which is the sampling factor on row and column respectively.
(4) Regular self encoder
In addition to imposing a hidden layer smaller than the input dimension, some other methods can also be used to constrain the reconstruction of self encoder, such as regular self encoder.
Regular self encoder does not need to use shallow encoder and decoder and small coding dimension to limit the capacity of the model, but uses loss function to encourage the model to learn other characteristics (except copying input to output). These characteristics include sparse representation, small derivative representation, and robustness to noise or input loss.
Even if the model capacity is large enough to learn a meaningless identity function, the nonlinear and over complete regular self encoder can still learn some useful information about the data distribution from the data.
In practical applications, two kinds of regular self encoder are commonly used, which are sparse self encoder and noise reduction self encoder.
(5) Sparse self encoder
It is generally used to learn features for tasks such as classification. The sparse regularized self encoder must reflect the unique statistical characteristics of the training data set, rather than simply acting as an identity function. Trained in this way, the model that can learn useful features can be obtained by performing the reproduction task with sparse penalty.
Another method to constrain the reconstruction of automatic encoder is to impose constraints on its loss function. For example, a regularization constraint can be added to the loss function, so that the self encoder can learn the sparse representation of data.
Note that in the hidden layer, we also add L1 regularization as the penalty term of the loss function in the optimization stage. Compared with vanilla self encoder, the data representation after such operation is more sparse.
input_size = 784 hidden_size = 64 output_size = 784 x = Input(shape=(input_size,)) # Encoder h = Dense(hidden_size, activation='relu', activity_regularizer=regularizers.l1(10e-5))(x) #L1 regular term imposed on output # Decoder r = Dense(output_size, activation='sigmoid')(h) autoencoder = Model(input=x, output=r) autoencoder.compile(optimizer='adam', loss='mse')
activity_regularizer: the regular item imposed on the output, which is the ActivityRegularizer object
l1(l=0.01): L1 regular term is usually used to impose some constraints on the training of the model. L1 regular term is L1 norm constraint, which will make the constrained matrix / vector more sparse.
(6) Noise reduction self encoder
Here we learn some useful information not by imposing penalty terms on the loss function, but by changing the reconstruction error terms of the loss function.
Add noise to the training data and make the self encoder learn to remove this noise to obtain the real input that has not been polluted by noise. Therefore, this forces the encoder to learn to extract the most important features and learn more robust representations in the input data, which is also the reason why its generalization ability is stronger than that of general encoders.
This structure can be trained by gradient descent algorithm.
x = Input(shape=(28, 28, 1)) # Encoder conv1_1 = Conv2D(32, (3, 3), activation='relu', padding='same')(x) pool1 = MaxPooling2D((2, 2), padding='same')(conv1_1) conv1_2 = Conv2D(32, (3, 3), activation='relu', padding='same')(pool1) h = MaxPooling2D((2, 2), padding='same')(conv1_2) # Decoder conv2_1 = Conv2D(32, (3, 3), activation='relu', padding='same')(h) up1 = UpSampling2D((2, 2))(conv2_1) conv2_2 = Conv2D(32, (3, 3), activation='relu', padding='same')(up1) up2 = UpSampling2D((2, 2))(conv2_2) r = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(up2) autoencoder = Model(input=x, output=r) autoencoder.compile(optimizer='adam', loss='mse')
2. Program examples:
(1) Single layer self encoder
from keras.layers import Input, Dense from keras.models import Model from keras.datasets import mnist import numpy as np import matplotlib.pyplot as plt (x_train, _), (x_test, _) = mnist.load_data() x_train = x_train.astype('float32') / 255. x_test = x_test.astype('float32') / 255. x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:]))) x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:]))) print(x_train.shape) print(x_test.shape) #Single layer self encoder encoding_dim = 32 input_img = Input(shape=(784,)) encoded = Dense(encoding_dim, activation='relu')(input_img) decoded = Dense(784, activation='sigmoid')(encoded) autoencoder = Model(inputs=input_img, outputs=decoded) encoder = Model(inputs=input_img, outputs=encoded) encoded_input = Input(shape=(encoding_dim,)) decoder_layer = autoencoder.layers[-1] decoder = Model(inputs=encoded_input, outputs=decoder_layer(encoded_input)) autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy') autoencoder.fit(x_train, x_train, epochs=50, batch_size=256, shuffle=True, validation_data=(x_test, x_test)) encoded_imgs = encoder.predict(x_test) decoded_imgs = decoder.predict(encoded_imgs) #Output image n = 10 # how many digits we will display plt.figure(figsize=(20, 4)) for i in range(n): ax = plt.subplot(2, n, i + 1) plt.imshow(x_test[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) ax = plt.subplot(2, n, i + 1 + n) plt.imshow(decoded_imgs[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) plt.show()
(2) Convolutional self encoder
from keras.layers import Input, Convolution2D, MaxPooling2D, UpSampling2D from keras.models import Model from keras.datasets import mnist import numpy as np import matplotlib.pyplot as plt from keras.callbacks import TensorBoard (x_train, _), (x_test, _) = mnist.load_data() x_train = x_train.astype('float32') / 255. x_test = x_test.astype('float32') / 255. x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) x_test = np.reshape(x_test, (len(x_test), 28, 28, 1)) noise_factor = 0.5 x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape) x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape) x_train_noisy = np.clip(x_train_noisy, 0., 1.) x_test_noisy = np.clip(x_test_noisy, 0., 1.) print(x_train.shape) print(x_test.shape) #Convolutional self encoder input_img = Input(shape=(28, 28, 1)) x = Convolution2D(16, (3, 3), activation='relu', padding='same')(input_img) x = MaxPooling2D((2, 2), padding='same')(x) x = Convolution2D(8, (3, 3), activation='relu', padding='same')(x) x = MaxPooling2D((2, 2), padding='same')(x) x = Convolution2D(8, (3, 3), activation='relu', padding='same')(x) encoded = MaxPooling2D((2, 2), padding='same')(x) x = Convolution2D(8, (3, 3), activation='relu', padding='same')(encoded) x = UpSampling2D((2, 2))(x) x = Convolution2D(8, (3, 3), activation='relu', padding='same')(x) x = UpSampling2D((2, 2))(x) x = Convolution2D(16, (3, 3), activation='relu')(x) x = UpSampling2D((2, 2))(x) decoded = Convolution2D(1, (3, 3), activation='sigmoid', padding='same')(x) autoencoder = Model(inputs=input_img, outputs=decoded) autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy') # Open a terminal and start TensorBoard. Enter tensorboard --logdir=/autoencoder in the terminal autoencoder.fit(x_train, x_train, epochs=50, batch_size=256, shuffle=True, validation_data=(x_test, x_test), callbacks=[TensorBoard(log_dir='autoencoder')]) decoded_imgs = autoencoder.predict(x_test) #Output image n = 10 # how many digits we will display plt.figure(figsize=(20, 4)) for i in range(n): ax = plt.subplot(2, n, i + 1) plt.imshow(x_test[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) ax = plt.subplot(2, n, i + 1 + n) plt.imshow(decoded_imgs[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) plt.show()
(3) Depth self encoder
from keras.layers import Input, Dense from keras.models import Model from keras.datasets import mnist import numpy as np import matplotlib.pyplot as plt (x_train, _), (x_test, _) = mnist.load_data() x_train = x_train.astype('float32') / 255. x_test = x_test.astype('float32') / 255. x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:]))) x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:]))) print(x_train.shape) print(x_test.shape) #Depth self encoder input_img = Input(shape=(784,)) encoded = Dense(128, activation='relu')(input_img) encoded = Dense(64, activation='relu')(encoded) decoded_input = Dense(32, activation='relu')(encoded) decoded = Dense(64, activation='relu')(decoded_input) decoded = Dense(128, activation='relu')(decoded) decoded = Dense(784, activation='sigmoid')(encoded) autoencoder = Model(inputs=input_img, outputs=decoded) encoder = Model(inputs=input_img, outputs=decoded_input) autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy') autoencoder.fit(x_train, x_train, epochs=50, batch_size=256, shuffle=True, validation_data=(x_test, x_test)) encoded_imgs = encoder.predict(x_test) decoded_imgs = autoencoder.predict(x_test) #Output image n = 10 # how many digits we will display plt.figure(figsize=(20, 4)) for i in range(n): ax = plt.subplot(2, n, i + 1) plt.imshow(x_test[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) ax = plt.subplot(2, n, i + 1 + n) plt.imshow(decoded_imgs[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) plt.show()
(4) Noise reduction self encoder
from keras.layers import Input, Convolution2D, MaxPooling2D, UpSampling2D from keras.models import Model from keras.datasets import mnist import numpy as np import matplotlib.pyplot as plt from keras.callbacks import TensorBoard (x_train, _), (x_test, _) = mnist.load_data() x_train = x_train.astype('float32') / 255. x_test = x_test.astype('float32') / 255. x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) x_test = np.reshape(x_test, (len(x_test), 28, 28, 1)) noise_factor = 0.5 x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape) x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape) x_train_noisy = np.clip(x_train_noisy, 0., 1.) x_test_noisy = np.clip(x_test_noisy, 0., 1.) print(x_train.shape) print(x_test.shape) input_img = Input(shape=(28, 28, 1)) x = Convolution2D(32, (3, 3), activation='relu', padding='same')(input_img) x = MaxPooling2D((2, 2), padding='same')(x) x = Convolution2D(32, (3, 3), activation='relu', padding='same')(x) encoded = MaxPooling2D((2, 2), padding='same')(x) x = Convolution2D(32, (3, 3), activation='relu', padding='same')(encoded) x = UpSampling2D((2, 2))(x) x = Convolution2D(32, (3, 3), activation='relu', padding='same')(x) x = UpSampling2D((2, 2))(x) decoded = Convolution2D(1, (3, 3), activation='sigmoid', padding='same')(x) autoencoder = Model(inputs=input_img, outputs=decoded) autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy') # Open a terminal and start TensorBoard. Enter tensorboard --logdir=/autoencoder in the terminal autoencoder.fit(x_train_noisy, x_train, epochs=10, batch_size=256, shuffle=True, validation_data=(x_test_noisy, x_test), callbacks=[TensorBoard(log_dir='autoencoder', write_graph=False)]) decoded_imgs = autoencoder.predict(x_test_noisy) n = 10 plt.figure(figsize=(30, 6)) for i in range(n): ax = plt.subplot(3, n, i + 1) plt.imshow(x_test[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) ax = plt.subplot(3, n, i + 1 + n) plt.imshow(x_test_noisy[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) ax = plt.subplot(3, n, i + 1 + 2*n) plt.imshow(decoded_imgs[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) plt.show()
To learn more about programming, please follow my official account: