Deep learning Express Edition 02 --- convolutional neural network

Limitations of linear neural networks

There is no difference between any multi hidden layer neural network and single-layer neural network, and they are linear, and the problems that can be solved by the linear model are limited

Types of neural networks

  • Basic neural network: linear neural network, BP neural network, Hopfield neural network, etc
  • Advanced neural network: Boltzmann machine, restricted Boltzmann machine, recursive neural network, etc
  • Deep neural network: deep confidence network, convolutional neural network, cyclic neural network, LSTM network, etc

Convolutional neural network

The traditional multi-layer neural network has only input layer, hidden layer and output layer. The number of hidden layers depends on the needs. There is no clear theoretical derivation to explain how many layers are suitable for the convolution neural network CNN. On the basis of the original multi-layer neural network, a more effective feature learning part is added. The specific operation is to add a partially connected convolution layer and pooling layer in front of the original fully connected layer. The emergence of convolutional neural network makes the number of neural network layers deepen and deep learning can be realized. Generally speaking, deep learning generally refers to these new structures such as CNN and some new methods (such as new activation function Relu), which solve some difficult problems of traditional multilayer neural network
Three structures of convolutional neural network

The basic composition of neural networks includes input layer, hidden layer and output layer. The characteristic of convolutional neural network is that the hidden layer is divided into convolution layer, pooling layer (also known as down sampling layer) and activation layer. Role of each layer

  • Convolution layer: extract features by translating on the original image
  • Active layer: increase nonlinear segmentation capability
  • Pooling layer: reduce the learning parameters and reduce the network complexity through sparse parameters after features (maximum pooling and average pooling)

In order to achieve the classification effect, there will be a full connection layer (FC), that is, the last output layer, for loss calculation and classification.
Convolution layer

Convolutional layer

Each convolution layer in convolutional neural network is composed of several convolution units (convolution kernel), and the parameters of each convolution unit are optimized by back propagation algorithm.

The purpose of convolution operation is to extract different input features. The first convolution layer may only extract some low-level features, such as edges, lines and angles. Networks with more layers can iteratively extract more complex features from low-level features.

Four elements of convolution kernel
  • Number of convolution kernels
  • Convolution kernel size
  • Convolution kernel step
  • Convolution kernel zero fill size

Next, let's explain through a calculation case, assuming that the picture is a black-and-white picture (only one channel) and a pixel value table

How convolution is calculated - size

Convolution kernel can be understood as an observer who observes with several weights and a bias to perform feature weighting operation.


: the above shall be offset

Convolution kernel size
    1*1,3*3,5*5

Usually, the size of convolution kernel is selected, which is proved to be a good effect by researchers. This person will get an operation result after observing,

So what if this person wants to see all the pixels in this picture? That's what's needed

How convolution is calculated - step size

You need to move the convolution kernel to observe this picture. The required parameter is the step size.
Assuming that the moving step is one pixel, the final observation result of this person is as follows:
For 5x5 pictures, the convolution size of 3x3 is calculated by one step to obtain the observation result of 3x3 size

If you move in steps of 2, this is the result

5x5 pictures, 3x3 convolution size, remove two step operations to obtain 2x2 size observation results

How to calculate convolution - number of convolution kernels

Then, if more than one person observes in a certain layer of structure, multiple people (convolution kernel) observe together. Then get multiple observations.

The weights and offsets of different convolution kernel bands are different, that is, the parameters of random initialization

We have come to the conclusion that the size of the output result depends on the size and step size, but is that the only one? The other is zero filling. The size and moving step of the Filter observation window will cause the pixel width of the picture to be exceeded!

How to calculate convolution - zero fill size

Zero fill is to fill a circle of pixels with a value of 0 around the picture pixels.


There are two ways, SAME and VALID

SAME: When sampling across the edge, the sampling area is consistent with the pixel width of the input image.
VALID: Without sampling across the edge, the sampling area is less than the pixel width of the input person's image.

Output size calculation formula
How much is the final zero fill? We don't need to pay attention. Next, we use these known conditions to find the size of the output and see the result

Understand the following formula through an example

Calculation case:

1,Assuming a known condition: input image 32*32*1, 50 individual Filter,Size 5*5,The move step is 1 and the zero fill size is 1. Request output size?

H2 = (H1 - F + 2P)/S + 1 = (32 - 5 + 2 * 1)/1 + 1 = 30

W2 = (H1 - F + 2P)/S + 1 = (32 -5 + 2 * 1)/1 + 1 = 30

D2 = K = 50

So the output size is[30, 30, 50]

2,Assuming a known condition: input image 32*32*1, 50 individual Filter,Size 3*3,Move step is 1, unknown zero padding. Output size 32*32?

H2 = (H1 - F + 2P)/S + 1 = (32 - 3 + 2 * P)/1 + 1 = 32

W2 = (H1 - F + 2P)/S + 1 = (32 -3 + 2 * P)/1 + 1 = 32

So the zero fill size is: 1*1

How to observe multi-channel pictures

If it is a color picture, there are three tables: R, G and B. Originally, everyone needs to take a convolution kernel of 3x3 or other sizes. Now, they need to take three 3x3 weights and an offset, a total of 27 weights. In the end, everyone came up with a result

Activation function

The convolution network structure adopts the activation function, since the network has been developed. It is found that the original sigmoid activation functions can not achieve good results, so a new activation function is adopted.

  • Relu
    -
    What is the effect?

  • Relu benefits
    Effectively solve the problem of gradient explosion
    The calculation speed is very fast. You only need to judge whether the input is greater than 0. The solution speed of SGD (batch gradient descent) is much faster than sigmoid and tanh

  • sigmoid disadvantages
    Using sigmoid and other functions, the amount of calculation is relatively large, while using Relu activation function, the amount of calculation in the whole process is saved a lot. In deep networks, when sigmoid function propagates back, it is easy to appear gradient explosion

Pooling layer

The main function of the Pooling layer is feature extraction, which further reduces the number of parameters by removing unimportant samples from the Feature Map. There are many methods of Pooling, which usually adopts maximum Pooling

max_polling:Take the maximum value of the pooled window
avg_polling:Take the average value of the pooled window


Pool layer calculation

The pool layer also has the window size and moving step size. How to calculate the subsequent output size? The calculation formula is the same as that of convolution

Calculation: 224 x224x64,The window is 2 and the step size is 2. Output the result?

H2 = (224 - 2 + 2*0)/2 +1 = 112

w2 = (224 - 2 + 2*0)/2 +1 = 112

Generally, the pool layer adopts a 2x2 window with a step size of 2

BN layer

Objective: to improve the generalization ability of network and prevent over fitting

BN(Batch Normalization) also belongs to a layer of the network, also known as the normalization layer. The benefits of using BN include greater learning rate, which has become the standard configuration of CNN


Full Connection layer

The previous convolution and pooling are equivalent to feature engineering. The final full connection layer plays the role of "Classifier" in the whole convolution neural network.



Gradient descent different optimized versions

The simplest optimization algorithm is SGD, and the gradient descent algorithm is also very effective, but there are also some problems. It is difficult to choose a reasonable learning rate. It is easy to fall into those suboptimal local extreme points

Expanded content (understanding):

    SGD with Momentum

Gradient update rule:Momentum Inertia is added in the process of gradient descent, which makes the speed faster in the dimension with unchanged gradient direction and slower in the dimension with changed gradient direction, so as to accelerate convergence and reduce oscillation.

    RMSProp

Gradient update rule:solve Adagrad The sharp decline in learning rate, RMSProp The second-order momentum calculation method is changed, that is, the second-order momentum is calculated by window sliding weighted average.

    Adam

Gradient update rule:Adam = Adaptive + Momentum,seeing the name of a thing one thinks of its function Adam Integrated SGD First order momentum sum of RMSProp Second order momentum

Development history of convolutional neural network



Convolutional networks for other purposes

Image target detection
    Yolo: GoogleNet+ bounding boxes
    SSD: VGG + region proposals

Simple CNN construction

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self): #Define the neural network structure and input data 1x32x32
        super(Net, self).__init__()
        # First layer (convolution layer)
        self.conv1 = nn.Conv2d(1,6,3) #Input channel 1, output channel 6, convolution 3x3
        # Second layer (convolution layer)
        self.conv2 = nn.Conv2d(6,16,3) #Input channel 6, output channel 16, convolution 3x3
        # Third floor (full connection floor)
        self.fc1 = nn.Linear(16*28*28, 512) #Input dimension 16x28x28=12544, output dimension 512
        # Fourth floor (full connection floor)
        self.fc2 = nn.Linear(512, 64) #Input dimension 512, output dimension 64
        # Fifth floor (full connection floor)
        self.fc3 = nn.Linear(64, 2) #Input dimension 64, output dimension 2
    
    def forward(self, x): #Define data flow
        x = self.conv1(x)
        x = F.relu(x)
        
        x = self.conv2(x)
        x = F.relu(x)
        
        x = x.view(-1, 16*28*28)
        x = self.fc1(x)
        x = F.relu(x)
        
        x = self.fc2(x)
        x = F.relu(x)
        
        x = self.fc3(x)
        
        return x
        
net = Net()
print(net)
Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=12544, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=2, bias=True)
)
#Generate random input
input_data = torch.randn(1,1,32,32) 
print(input_data)
print(input_data.size())
tensor([[[[ 0.3055, -0.8828,  0.1044,  ..., -0.4833,  1.1879, -0.0727],
          [ 0.2718, -1.5784, -1.0362,  ..., -0.5160,  0.4685, -0.5401],
          [ 2.4876,  0.1718,  1.2377,  ..., -0.6047, -0.7236,  0.3888],
          ...,
          [-0.8249, -0.3313, -0.3513,  ...,  0.2470, -0.6509, -0.9969],
          [ 1.0528,  0.0348,  0.6416,  ..., -0.4129, -0.1997,  0.1648],
          [ 1.5184,  0.0120, -2.3959,  ..., -1.3124, -0.4289, -0.2882]]]])
torch.Size([1, 1, 32, 32])
# Running neural network
out = net(input_data)
print(out)
print(out.size())
tensor([[-0.0375, -0.0235]], grad_fn=<AddmmBackward>)
torch.Size([1, 2])
# Randomly generated real value
target = torch.randn(2)
target = target.view(1,-1)
print(target)
tensor([[-2.1838, -0.4858]])
criterion = nn.L1Loss() # Define loss function
loss = criterion(out, target) # Calculate loss
print(loss)
tensor(1.3043, grad_fn=<L1LossBackward>)
# Reverse transfer
net.zero_grad() #Clear gradient
loss.backward() #Automatic calculation of gradient and reverse transfer

import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.01)
optimizer.step()
out = net(input_data)
print(out)
print(out.size())
tensor([[-0.0946, -0.0601]], grad_fn=<AddmmBackward>)
torch.Size([1, 2])
  • The second time is the weight update, and the loss is smaller than the first time
criterion = nn.L1Loss() # Define loss function MAE
loss = criterion(out, target) # Calculate loss
print(loss)
tensor(1.2574, grad_fn=<L1LossBackward>)

Python + CNN handwritten numeral recognition

  • Import package loads a mnist dataset
import torch
import torchvision.datasets as dataset
import torchvision.transforms as transforms
import torch.utils.data as data_utils
#data
train_data = dataset.MNIST(root="mnist",
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

test_data = dataset.MNIST(root="mnist",
                           train=False,
                           transform=transforms.ToTensor(),
                           download=False)
#batchsize
train_loader = data_utils.DataLoader(dataset=train_data,
                                     batch_size=64,
                                     shuffle=True)

test_loader = data_utils.DataLoader(dataset=test_data,
                                     batch_size=64,
                                     shuffle=True)

  • Building CNN
class CNN(torch.nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv =torch.nn.Sequential(
            torch.nn.Conv2d(1, 32, kernel_size=5, padding=2),
            torch.nn.BatchNorm2d(32),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2)
        )
        self.fc = torch.nn.Linear(14 * 14 * 32, 10)
    def forward(self, x):
        out = self.conv(x)
        out = out.view(out.size()[0], -1)
        out = self.fc(out)
        return out
  • Load model test
class CNN(torch.nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv =torch.nn.Sequential(
            torch.nn.Conv2d(1, 32, kernel_size=5, padding=2),
            torch.nn.BatchNorm2d(32),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2)
        )
        self.fc = torch.nn.Linear(14 * 14 * 32, 10)
    def forward(self, x):
        out = self.conv(x)
        out = out.view(out.size()[0], -1)
        out = self.fc(out)
        return out
cnn = CNN()
# cnn = cnn.cuda()
#loss

loss_func = torch.nn.CrossEntropyLoss()

#optimizer

optimizer = torch.optim.Adam(cnn.parameters(), lr=0.01)

#training
for epoch in range(10):
    for i, (images, labels) in enumerate(train_loader):
#         images = images.cuda()
#         labels = labels.cuda()
        outputs = cnn(images)
        loss = loss_func(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print("epoch is {}, ite is "
          "{}/{}, loss is {}".format(epoch+1, i,
                                     len(train_data) // 64,
                                     loss.item()))
    #eval/test
    loss_test = 0
    accuracy = 0
    for i, (images, labels) in enumerate(test_loader):
#         images = images.cuda()
#         labels = labels.cuda()
        outputs = cnn(images)
        #[batchsize]
        #outputs = batchsize * cls_num
        loss_test += loss_func(outputs, labels)
        _, pred = outputs.max(1)
        accuracy += (pred == labels).sum().item()

    accuracy = accuracy / len(test_data)
    loss_test = loss_test / (len(test_data) // 64)

    print("epoch is {}, accuracy is {}, "
          "loss test is {}".format(epoch + 1,
                                   accuracy,
                                   loss_test.item()))

torch.save(cnn, "mnist_model.pkl")

cnn = torch.load("mnist_model.pkl")
# cnn = cnn.cuda()
#loss
#eval/test
loss_test = 0
accuracy = 0

import cv2
#pip install opencv-python -i http://mirrors.aliyun.com/pypi/simple/   --trusted-host mirrors.aliyun.com
for i, (images, labels) in enumerate(test_loader):
#     images = images.cuda()
#     labels = labels.cuda()
    outputs = cnn(images)
    _, pred = outputs.max(1)
    accuracy += (pred == labels).sum().item()

    images = images.cpu().numpy()
    labels = labels.cpu().numpy()
    pred = pred.cpu().numpy()
    #batchsize * 1 * 28 * 28

    for idx in range(images.shape[0]):
        im_data = images[idx]
        im_label = labels[idx]
        im_pred = pred[idx]
        im_data = im_data.transpose(1, 2, 0)
accuracy = accuracy / len(test_data)
print(accuracy)

0.9824

Construction of Cifar10 image classifier by CNN



import torch
import torchvision
import torchvision.transforms as transforms
from tqdm import tqdm
# (0.5, 0.5, 0.5), (0.5, 0.5, 0.5) the first is the mean value of rgb, and the second is the variance of rgb. All three channels are specified as 0.5
# Standardized normalization
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ]
    )

#Training data set
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, 
                                       download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=16,
                                         shuffle=True, num_workers=2)

#Test data set
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                      download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=16,
                                        shuffle=False, num_workers=2)



Files already downloaded and verified
Files already downloaded and verified
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

def imshow(img):
    # Input data: torch.tensor [c, h, w]
    img = img / 2+0.5
    nping = img.numpy()
    nping = np.transpose(nping, (1,2,0)) # [h,w,c]
    plt.imshow(nping)
    
dataiter = iter(trainloader) #Load a mini batch randomly
images, labels = dataiter.next()

imshow(torchvision.utils.make_grid(images))

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self): #Define the neural network structure and input data 3x32x3
        super(Net, self).__init__()
        # First layer (convolution layer)
        self.conv1 = nn.Conv2d(3,6,3) #Input channel 3, output channel 6, convolution 3x3
        # Second layer (convolution layer)
        self.conv2 = nn.Conv2d(6,16,3) #Input channel 6, output channel 16, convolution 3x3
        # Third floor (full connection floor)
        self.fc1 = nn.Linear(16*28*28, 512) #Input dimension 16x28x28=12544, output dimension 512
        # Fourth floor (full connection floor)
        self.fc2 = nn.Linear(512, 64) #Input dimension 512, output dimension 64
        # Fifth floor (full connection floor)
        self.fc3 = nn.Linear(64, 10) #Input dimension 64, output dimension 10
    
    def forward(self, x): #Define data flow
        x = self.conv1(x)
        x = F.relu(x)
        
        x = self.conv2(x)
        x = F.relu(x)
        
        x = x.view(-1, 16*28*28)
        x = self.fc1(x)
        x = F.relu(x)
        
        x = self.fc2(x)
        x = F.relu(x)
        
        x = self.fc3(x)
        
        return x
        
net = Net()
print(net)
Net(
  (conv1): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=12544, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=10, bias=True)
)
import torch.optim as optim

criterion = nn.CrossEntropyLoss()  # Cross entropy loss # 
optimizer = optim.SGD(net.parameters(), lr=0.0001, momentum=0.9)
#  Momentum gradient descent method is to calculate the exponential weighted average of the gradient and update the weight. Its running speed is almost always faster than the standard gradient descent algorithm.
train_loss_hist = []
test_loss_hist = []
for epoch in range(2):
    for i, data in enumerate(trainloader):
        images, labels = data        
        outputs = net(images)
        loss = criterion(outputs, labels) # Calculate loss
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if i%1000==0:
            print("Epoch: {} step: {} Loss: {}".format(epoch, i, loss.item()))


Epoch: 0 step: 0 Loss: 2.327638864517212
Epoch: 0 step: 1000 Loss: 2.2910702228546143
Epoch: 0 step: 2000 Loss: 2.303840160369873
Epoch: 0 step: 3000 Loss: 2.252164363861084
Epoch: 1 step: 0 Loss: 2.2408382892608643
Epoch: 1 step: 1000 Loss: 2.0526092052459717
Epoch: 1 step: 2000 Loss: 2.0468878746032715
Epoch: 1 step: 3000 Loss: 2.1996114253997803

train_loss_hist = []
test_loss_hist = []

for epoch in tqdm(range(20)):
    #train
    net.train()
    running_loss = 0.0
    for i, data in enumerate(trainloader):
        images, labels = data        
        outputs = net(images)
        loss = criterion(outputs, labels) # Calculate loss
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        if(i%250 == 0): #Test every 250 mini batch
            correct = 0.0
            total = 0.0

            net.eval()
            with torch.no_grad():
                for test_data in testloader:
                    test_images, test_labels = test_data
                    test_outputs = net(test_images)
                    test_loss = criterion(test_outputs, test_labels)

            train_loss_hist.append(running_loss/250)
            test_loss_hist.append(test_loss.item())
            running_loss=0.0


100%|██████████| 20/20 [50:48<00:00, 148.08s/it]
plt.figure()
plt.plot(temp)
plt.plot(test_loss_hist)
plt.legend(('train loss', 'test loss'))
plt.title('Train/Test Loss')
plt.xlabel('# mini batch *250')
plt.ylabel('Loss')
Text(0,0.5,'Loss')

Tags: AI Deep Learning Pytorch neural networks

Posted by infomamun on Fri, 24 Sep 2021 14:43:14 +0530