Tip: After the article is written, the table of contents can be automatically generated. For how to generate it, please refer to the help document on the right
This article is about some ideas of Li Hongyi's second assignment in the 2022 edition of machine learning, recording his own learning process, and the main reference content is attached at the end of the article.
HW2 job tasks:
This assignment comes from the part of speech recognition, which requires us to predict phonemes based on the existing audio material, and the data preprocessing part is: extract mfcc features from the original waveform (which has been done by the teaching assistant), and then we need to use this classification: i.e. frame-level phoneme classification using pre-extracted mfcc features.
Phoneme classification prediction (Phoneme classification) is to predict phonemes through speech data. A phoneme is the smallest unit of speech in a human language that can distinguish meaning, and it is the basic concept of phonological analysis. Every language has its own phoneme system.
Requirements are as follows
The following is a summary of the changes and methods of the teaching assistant code
1. Need to pass strongbaseline
First of all, according to the teaching assistant's prompt, because a phoneme will not have only one frame (frame), connecting the frame before and after training will get better results. Here, increase the concat_nframes, and connect the symmetrical number before and after, for example, if concat_n = 19, connect 9 before and after. frame.
2. Small details
3. Cosine Annealing
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate,weight_decay=0.01) scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer,T_0=8,T_mult=2,eta_min = learning_rate/2)
The cosine annealing learning rate formula is
The function usage is as follows
torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0, T_mult=1, eta_min=0, last_epoch=- 1, verbose=False)
T_0 is the initial period, which is exactly the epoch required for the learning rate to go from the maximum value to the next maximum value, and the required epoch after that is T_mult times the previous step. And eta_min is the minimum learning rate. last_epoch is the index of the last epoch, defaults to -1. When verbose is True, it can automatically output the learning rate of each epoch.
We verify with the following code
import torch import torch.nn as nn from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts import matplotlib.pyplot as plt class Simple_Model(nn.Module): def __init__(self): super(Simple_Model, self).__init__() self.conv1 = nn.Conv2d(in_channels=3,out_channels=3,kernel_size=1) def forward(self,x): pass learning_rate = 0.0001 model = Simple_Model() optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer,T_0=8,T_mult=2,eta_min = learning_rate/2) print('initial learning rate',optimizer.defaults['lr']) lr_get =  #save lr for plotting for epoch in range(1,100): # train optimizer.zero_grad() optimizer.step() lr_get.append(optimizer.param_groups['lr']) scheduler.step() #CosineAnnealingWarmRestarts draws changes in learning rate plt.plot(list(range(1,100)),lr_get) plt.xlabel('epoch') plt.ylabel('learning_rate') plt.title('How CosineAnnealingWarmRestart goes') plt.show()
The result is as follows:
Finally, through the above steps, our model can achieve the following results
2. Follow-up improvement
1.LSTM(Long Short-term Memory )
Study ing, and then make up after learning, it is expected to be completed within these three days
The code is as follows (example):
not yet concluded