Detailed explanation of YOLOv5 code (yolov5l.yaml section)

3. yolov5l.yaml

This part belongs to the configuration file in the code. The code is xxxx Yaml uses the configuration file through/ Models/yolo Py parsing file plus an input network module.

Different from the network set by config, it does not need to be superimposed. It only needs to modify number in the configuration file.

The special note is the version updated on July 4, 2020.

3.1 yaml introduction

  1. YAML(YAML Ain`t Markup language) file, which is not a markup language. Configuration files include xml, properties, etc., but YAML is data centric and is more suitable for configuration files.
  2. The syntax of YAML is similar to that of other high-level languages, and it can simply express data forms such as lists, hash tables, scalars, etc.
  3. It uses the indentation of blank symbols and features that rely heavily on appearance. It is especially suitable for expressing or editing data structures, various configuration files, dump debugging contents, and file outlines. yaml introduction
  4. Case sensitive; tab is not allowed for indentation, and only spaces are allowed; The number of indented spaces is not important, as long as the elements of the same level are aligned to the left; '\ Indicates a note; Use indentation to indicate hierarchical relationships.

Note that the number of spaces in the YAML file is also important! When creating a YAML object, the key value pair of the object uses a colon structure to represent key: value, and a space should be added after the colon.

3.2 parameters

# parameters
nc: 15 # number of classes
depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple
  1. nc: number of categories. You can fill in as many categories as you want. Starting from 1, not 0-14.
  2. depth_multiple: controls the depth of the model.
  3. width_multiple: controls the number of convolution kernels.

depth_multiple is used when the number in the backbone is ≠ 1, that is, it is used in the Bottleneck layer to control the depth of the model. It is set to 0.33 in yolov5s. Assuming that there are three bottlenecks in yolov5l, there is only one Bottleneck in yolov5s.
Generally, number=1 represents the layer of functional background, such as down sampling Conv, Focus, SPP (spatial pyramid pooling).
-------------------
width_multiple is mainly used to set arguments. For example, when yolov5s is set to 0.5, Focus becomes [32,3], and Conv becomes [64,3,2].
By analogy, the number of convolution kernels becomes half of the setting.

yolov5 provides four types: s, m, l, and x. all yaml files have the same settings. Only the settings in 2 and 3 above are different. The author team is very powerful. It only needs to modify these two parameters to adjust the network structure of the model.

3.3 anchors

# anchors
anchors:
  - [116,90, 156,198, 373,326]  # P5/32
  - [30,61, 62,45, 59,119]  # P4/16
  - [10,13, 16,30, 33,23]  # P3/8

Add anchors according to your detection layer.

3.4 backbone

# YOLOv5 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Focus, [64, 3]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, BottleneckCSP, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 9, BottleneckCSP, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, BottleneckCSP, [512]],
   [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
   [-1, 1, SPP, [1024, [5, 9, 13]]],
  ]
  1. Bottleneck can be translated as "bottleneck layer".
  2. The from column parameter: -1 represents the input obtained from the upper layer, -2 represents the input obtained from the upper two layers (head is the same).
  3. number column parameter: 1 means there is only one, 3 means there are three identical modules.
  4. The codes of SPP, Conv, bottleneckand BottleneckCSP can be found in/ Models/common Py.
  5. [64, 3] analytically obtains [3, 32, 3], the input is 3 (RGB), the output is 32, and the convolution kernel k is 3;
  6. [128, 3, 2] this is fixed. 128 indicates the number of 128 convolution cores output. According to the analysis of [128, 3, 2], [32, 64, 3, 2], 32 is the input, 64 is the output (128*0.5=64), and 3 represents 3 × 3, 2 means the step size is 2.
  7. The backbone network is a picture that deepens from large to small.
  8. args the inputs here are omitted because the inputs are the outputs of the upper layer. In order to modify too much trouble, the input here is obtained from/ Models/yolo Def parse of PY_ Model (MD, CH) function.

3.5 head

Head detection header: generally refers to the feature map output after passing through the backbone network. The feature map is input into the head for detection, including category and location detection.

# YOLOv5 head
head:
  [[-1, 3, BottleneckCSP, [1024, False]],  # 9

   [-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, BottleneckCSP, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, BottleneckCSP, [256, False]],
   [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]],  # 18 (P3/8-small)

   [-2, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, BottleneckCSP, [512, False]],
   [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]],  # 22 (P4/16-medium)

   [-2, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, BottleneckCSP, [1024, False]],
   [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]],  # 26 (P5/32-large)

   [[], 1, Detect, [nc, anchors]],  # Detect(P5, P4, P3)
  ]

Post an analysis diagram, which is different from the above model.

If you feel good, remember to give a compliment~
Please point out any mistakes in the comment area. Please indicate the source for reprint, thank you!

Tags: Object Detection yolov5

Posted by adx on Tue, 31 May 2022 05:25:21 +0530