[Li Hongyi Machine Learning HW2]

Tip: After the article is written, the table of contents can be automatically generated. For how to generate it, please refer to the help document on the right


This article is about some ideas of Li Hongyi's second assignment in the 2022 edition of machine learning, recording his own learning process, and the main reference content is attached at the end of the article.

HW2 job tasks:
This assignment comes from the part of speech recognition, which requires us to predict phonemes based on the existing audio material, and the data preprocessing part is: extract mfcc features from the original waveform (which has been done by the teaching assistant), and then we need to use this classification: i.e. frame-level phoneme classification using pre-extracted mfcc features.
Phoneme classification prediction (Phoneme classification) is to predict phonemes through speech data. A phoneme is the smallest unit of speech in a human language that can distinguish meaning, and it is the basic concept of phonological analysis. Every language has its own phoneme system.
Requirements are as follows


The following is a summary of the changes and methods of the teaching assistant code

1. Need to pass strongbaseline


First of all, according to the teaching assistant's prompt, because a phoneme will not have only one frame (frame), connecting the frame before and after training will get better results. Here, increase the concat_nframes, and connect the symmetrical number before and after, for example, if concat_n = 19, connect 9 before and after. frame.

2. Small details

Added Batch Normalization and dropout
Advantages about Batch Normalization:
Link: link
About weight decay
Link: link
About dropout
Link: link

3. Cosine Annealing

Reference: Links: link
The official website link is the link: link
Here we add the following code

optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate,weight_decay=0.01)
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer,T_0=8,T_mult=2,eta_min = learning_rate/2)

The cosine annealing learning rate formula is

The function usage is as follows

torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0, T_mult=1, eta_min=0, last_epoch=- 1, verbose=False)

T_0 is the initial period, which is exactly the epoch required for the learning rate to go from the maximum value to the next maximum value, and the required epoch after that is T_mult times the previous step. And eta_min is the minimum learning rate. last_epoch is the index of the last epoch, defaults to -1. When verbose is True, it can automatically output the learning rate of each epoch.
We verify with the following code

import torch
import torch.nn as nn
from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts
import matplotlib.pyplot as plt

class Simple_Model(nn.Module):
    def __init__(self):
        super(Simple_Model, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3,out_channels=3,kernel_size=1)

    def forward(self,x):

learning_rate = 0.0001
model = Simple_Model()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer,T_0=8,T_mult=2,eta_min = learning_rate/2)

print('initial learning rate',optimizer.defaults['lr'])
lr_get = []  #save lr for plotting

for epoch in range(1,100):
    # train

#CosineAnnealingWarmRestarts draws changes in learning rate
plt.title('How CosineAnnealingWarmRestart goes')

The result is as follows:

Finally, through the above steps, our model can achieve the following results

2. Follow-up improvement

1.LSTM(Long Short-term Memory )

Study ing, and then make up after learning, it is expected to be completed within these three days
The code is as follows (example):


not yet concluded

refer to

Link: link
[[ML2021 Li Hongyi Machine Learning] Homework 2 TIMIT phoneme classification Explanation of ideas - Bilibili]
Link: link
Link: link

Tags: AI Deep Learning Machine Learning

Posted by l008com on Sat, 08 Oct 2022 03:26:08 +0530