1. Pytorch modeling process
1. Prepare data (difficulty. My data type: image data)
2. Define the model
3. Train the model
4. Evaluate the model
5. Use the model
6. Save the model
Image data modeling process example
1. Prepare the data
cifar2 dataset:
A subset of the cifar10 dataset, including only the first two categories, airplane and automobile
Training set:
5,000 airplane and automobile pictures each
Test set:
1000 airplane and automobile pictures each
Objective of the mission:
Train a model to classify images of airplane s and automobile s
File structure:
-cifar2
-train
-0_airplane
-...(5000)
-1_automobile
-...(5000)
-test
-0_airplane
-...(1000)
-1_automobile
-...(1000)
There are generally two approaches to building image data pipelines in Pytorch.
The first:
Use datasets.ImageFolder in torchvision to read images and then use DataLoader to load in parallel
The second: (not written in the book. Find it yourself)
Implement user-defined read logic by inheriting torch.utils.data.Dataset and then use DataLoader to load in parallel
(Generic way to read user-defined datasets. I should use more often)
Second, define the model
There are generally three ways to build models with Pytorch:
Use nn.Sequential to build a model in layer order
Inherit the nn.Module base class to build a custom model (I should use it frequently. It is also used here in the book)
Inherit the nn.Module base class to build the model and assist the application model container (nn.Sequential,nn.ModuleList,nn.ModuleDict) for encapsulation
class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(in_channels=3,out_channels=32,kernel_size = 3) self.pool = nn.MaxPool2d(kernel_size = 2,stride = 2) self.conv2 = nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5) self.dropout = nn.Dropout2d(p = 0.1) self.adaptive_pool = nn.AdaptiveMaxPool2d((1,1)) self.flatten = nn.Flatten() self.linear1 = nn.Linear(64,32) self.relu = nn.ReLU() self.linear2 = nn.Linear(32,1) self.sigmoid = nn.Sigmoid() def forward(self,x): x = self.conv1(x) x = self.pool(x) x = self.conv2(x) x = self.pool(x) x = self.dropout(x) x = self.adaptive_pool(x) x = self.flatten(x) x = self.linear1(x) x = self.relu(x) x = self.linear2(x) y = self.sigmoid(x) return y net = Net() print(net)
Net( (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1)) (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1)) (dropout): Dropout2d(p=0.1, inplace=False) (adaptive_pool): AdaptiveMaxPool2d(output_size=(1, 1)) (flatten): Flatten() (linear1): Linear(in_features=64, out_features=32, bias=True) (relu): ReLU() (linear2): Linear(in_features=32, out_features=1, bias=True) (sigmoid): Sigmoid() )
torchkeras.summary explanation
import torchkeras torchkeras.summary(net,input_shape= (3,32,32))
Print the network structure:
Observe what operation each layer of the network is, the output dimension, the total number of parameters of the model, the amount of training parameters, and the memory occupied by the network
---------------------------------------------------------------- Layer (type) Output Shape Param # ================================================================ Conv2d-1 [-1, 32, 30, 30] 896 MaxPool2d-2 [-1, 32, 15, 15] 0 Conv2d-3 [-1, 64, 11, 11] 51,264 MaxPool2d-4 [-1, 64, 5, 5] 0 Dropout2d-5 [-1, 64, 5, 5] 0 AdaptiveMaxPool2d-6 [-1, 64, 1, 1] 0 Flatten-7 [-1, 64] 0 Linear-8 [-1, 32] 2,080 ReLU-9 [-1, 32] 0 Linear-10 [-1, 1] 33 Sigmoid-11 [-1, 1] 0 ================================================================ Total params: 54,273 Trainable params: 54,273 Non-trainable params: 0 ---------------------------------------------------------------- Input size (MB): 0.011719 Forward/backward pass size (MB): 0.359634 Params size (MB): 0.207035 Estimated Total Size (MB): 0.578388 ----------------------------------------------------------------
Third, train the model
Pytorch usually requires users to write custom training loops, and the coding style of training loops varies from person to person.
There are 3 typical training loop code styles:
Scripted training loop
Functional form training loop (more general. The method used in the book)
Class Form Training Loop
// An highlighted block import pandas as pd from sklearn.metrics import roc_auc_score model = net model.optimizer = torch.optim.SGD(model.parameters(),lr = 0.01) model.loss_func = torch.nn.BCELoss() model.metric_func = lambda y_pred,y_true: roc_auc_score(y_true.data.numpy(),y_pred.data.numpy()) model.metric_name = "auc"
// An highlighted block var foo = 'bar';
def train_step(model,features,labels): # Training mode, dropout layer works model.train() # Gradient clear model.optimizer.zero_grad() # forward propagation loss predictions = model(features) loss = model.loss_func(predictions,labels) metric = model.metric_func(predictions,labels) # Backpropagation to find the gradient loss.backward() model.optimizer.step() return loss.item(),metric.item() def valid_step(model,features,labels): # Prediction mode, dropout layer does not work model.eval() predictions = model(features) loss = model.loss_func(predictions,labels) metric = model.metric_func(predictions,labels) return loss.item(), metric.item() # Test train_step effect features,labels = next(iter(dl_train)) train_step(model,features,labels)
(0.6922046542167664, 0.5088566827697262)
How to log, print log
def train_model(model,epochs,dl_train,dl_valid,log_step_freq): metric_name = model.metric_name dfhistory = pd.DataFrame(columns = ["epoch","loss",metric_name,"val_loss","val_"+metric_name]) print("Start Training...") nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') print("=========="*8 + "%s"%nowtime) for epoch in range(1,epochs+1): # 1. Training loop --------------------------------------------- --- loss_sum = 0.0 metric_sum = 0.0 step = 1 for step, (features,labels) in enumerate(dl_train, 1): loss,metric = train_step(model,features,labels) # print batch level logs loss_sum += loss metric_sum += metric if step%log_step_freq == 0: print(("[step = %d] loss: %.3f, "+metric_name+": %.3f") % (step, loss_sum/step, metric_sum/step)) # 2. Verify loop --------------------------------------------- --- val_loss_sum = 0.0 val_metric_sum = 0.0 val_step = 1 for val_step, (features,labels) in enumerate(dl_valid, 1): val_loss,val_metric = valid_step(model,features,labels) val_loss_sum += val_loss val_metric_sum += val_metric # 3. Record the log --------------------------------------------- --- info = (epoch, loss_sum/step, metric_sum/step, val_loss_sum/val_step, val_metric_sum/val_step) dfhistory.loc[epoch-1] = info # print epoch level log print(("\nEPOCH = %d, loss = %.3f,"+ metric_name + \ " = %.3f, val_loss = %.3f, "+"val_"+ metric_name+" = %.3f") %info) nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') print("\n"+"=========="*8 + "%s"%nowtime) print('Finished Training...') return dfhistory
epochs = 20 dfhistory = train_model(model,epochs,dl_train,dl_valid,log_step_freq = 50)
Start Training... ================================================================================2020-06-28 20:47:56 [step = 50] loss: 0.691, auc: 0.627 [step = 100] loss: 0.690, auc: 0.673 [step = 150] loss: 0.688, auc: 0.699 [step = 200] loss: 0.686, auc: 0.716 EPOCH = 1, loss = 0.686,auc = 0.716, val_loss = 0.678, val_auc = 0.806 ================================================================================2020-06-28 20:48:18 [step = 50] loss: 0.677, auc: 0.780 [step = 100] loss: 0.675, auc: 0.775 [step = 150] loss: 0.672, auc: 0.782 [step = 200] loss: 0.669, auc: 0.779 EPOCH = 2, loss = 0.669,auc = 0.779, val_loss = 0.651, val_auc = 0.815 ...... ================================================================================2020-06-28 20:54:24 [step = 50] loss: 0.386, auc: 0.914 [step = 100] loss: 0.392, auc: 0.913 [step = 150] loss: 0.395, auc: 0.911 [step = 200] loss: 0.398, auc: 0.911 EPOCH = 19, loss = 0.398,auc = 0.911, val_loss = 0.449, val_auc = 0.924 ================================================================================2020-06-28 20:54:43 [step = 50] loss: 0.416, auc: 0.917 [step = 100] loss: 0.417, auc: 0.916 [step = 150] loss: 0.404, auc: 0.918 [step = 200] loss: 0.402, auc: 0.918 EPOCH = 20, loss = 0.402,auc = 0.918, val_loss = 0.535, val_auc = 0.925 ================================================================================2020-06-28 20:55:03 Finished Training...
Fourth, the evaluation model
dfhistory = pd.DataFrame(columns = ["epoch","loss",metric_name,"val_loss","val_"+metric_name])
dfhistory
%matplotlib inline %config InlineBackend.figure_format = 'svg' import matplotlib.pyplot as plt def plot_metric(dfhistory, metric): train_metrics = dfhistory[metric] val_metrics = dfhistory['val_'+metric] epochs = range(1, len(train_metrics) + 1) plt.plot(epochs, train_metrics, 'bo--') plt.plot(epochs, val_metrics, 'ro-') plt.title('Training and validation '+ metric) plt.xlabel("Epochs") plt.ylabel(metric) plt.legend(["train_"+metric, 'val_'+metric]) plt.show()
plot_metric(dfhistory,"loss")
plot_metric(dfhistory,"auc")
Fifth, use the model
def predict(model,dl): model.eval() result = torch.cat([model.forward(t[0]) for t in dl]) return(result.data)
#predicted probability y_pred_probs = predict(model,dl_valid) y_pred_probs
tensor([[8.4032e-01], [1.0407e-02], [5.4146e-04], ..., [1.4471e-02], [1.7673e-02], [4.5081e-01]])
#prediction category y_pred = torch.where(y_pred_probs>0.5, torch.ones_like(y_pred_probs),torch.zeros_like(y_pred_probs)) y_pred
tensor([[1.], [0.], [0.], ..., [0.], [0.], [0.]])
Six, save the model
It is recommended to use the save parameter method to save the Pytorch model
The state_dict variable in the torch.nn.Module module stores the weights and paranoia coefficients that need to be learned during the training process
state_dict is essentially a Python dictionary object, so save, update, modify, and restore operations can be done well (a feature of the python dictionary structure), thus adding a lot of modularity to the PyTorch model and optimizer
state_dict variable
print(model.state_dict().keys())
odict_keys(['conv1.weight', 'conv1.bias', 'conv2.weight', 'conv2.bias', 'linear1.weight', 'linear1.bias', 'linear2.weight', 'linear2.bias'])
# save model parameters torch.save(model.state_dict(), "./data/model_parameter.pkl") net_clone = Net() net_clone.load_state_dict(torch.load("./data/model_parameter.pkl")) predict(net_clone,dl_valid)
tensor([[0.0204], [0.7692], [0.4967], ..., [0.6078], [0.7182], [0.8251]])