Technical Dry Goods | MindSpore Application Case: AI's Detection of Basketball Player's Objectives

MindSpore as an open source full-scene AI framework for end-edge cloud collaboration ), since March 20, Open Source has been widely concerned and applied by the industry. You are welcome to participate in open source contributions, model wisdom collaboration, industry innovation and applications, academic collaboration, and so on, contributing your application cases in the cloud side, end side (HiMindSpore application), side side and security field.

MindSpore grows with you in the field of AI, enabling AI to do thousands of jobs and unleash powerful productivity. This paper describes the application of MindSpore in the detection of basketball players'goals, and analyses the basketball court with AI technology.


AI Uses in Basketball Detection

Presumably you are familiar with basketball. When you watch a ball match, you can analyze some useful information by observing the situation of the match field with our naked eyes. Now with the rapid development of AI technology, it is possible to learn the pictures on the basketball field by deep learning algorithm, and then extract some useful information about the number and behavior of basketball players, such as distinguishing the members of each basketball team by the color of their clothes, and analyzing the players'behavior to find out what action they are performing and the score on the field. Figure 1 shows the player information displayed after inference is completed through the ModelArts platform.

Figure 1 Deep learning infers player information

So how to use AI technology to analyze the basketball field? First, we choose AI architecture based on MindSpore, a self-study AI computing framework of Huawei. MindSpore provides a unified full-scene API that provides end-to-end capabilities for model development, model run, and model deployment of full-scene AI. At the same time, MindSpore uses end-to-edge-cloud on-demand writing of distributed architecture, new paradigm of differential native programming and new execution mode of AI Native to achieve better resource efficiency, security and credibility, while lowering the threshold for AI development in the industry and releasing the computing power of Rising Chips to help Pratt-Whitney AI.

After choosing the AI framework, the network model we chose this time was yolov3_based on the MindSpore framework Darknet53 network model.

If you want to see the network code for basketball player detection, you can access the open source code for the MindSpore community at

Yolov3_based on the MindSpore framework can also be found in the MindSpore open source community The source code for the darknet53 network, using the coco2014 dataset, reference link:

The following is a detailed description of the network model and its application implementation.


Introduction to Yolo

Yolo(You only look once) is a classical one-stage target detection algorithm. In 16 years, the first version of yolov1 was proposed, followed by several versions, yolov3 being the third version.

The Yolo algorithm has some improvements in different versions. Here is a general description of the differences and improvements of the three versions of the yolov algorithm:

1. yolov1 solves the problem of target detection as a regression problem and uses a neural network to predict the bounding box and class probability directly from the whole picture. Because of the fast speed, real-time target detection can be achieved.

2. yolov2 is faster and more accurate than yolov1, with more improvement points, such as: using BatchNorm to make the network fit more easily; anchor is used to remove the full connection layer from yolov1; The method of dimension clustering and multi-scale training are used.

3. The yolov3 is more accurate than the yolov2, but the speed will decrease a little. The improvements of yolov3 are:

1. The feature extraction network uses residual structure with more layers.

(2) yolov3 was detected on three scales, and large, medium and small targets were detected in turn.


yolov3 network structure

The underlying network used by yolov3 is Darknet53, which is a fully convoluted structure, with Darknet19 on the left and Darknet53 on the right as shown in Figure 2. Darknet53 removed all Maxpooling layers and increased the number of convolution layers. It has 23 residual modules, and after five downsampling, the network output is 1/32 of the network input. The yolov3 network runs slightly slower than the yolov2 network because of the deeper network.

The Conv2D block in the yolov3 network structure contains five convolution layers. The whole network structure is relatively simple. At the same time, the network is a single-stage detection method, which is much easier than fast rcnn.

Figure 2 Darknet network structure


Yolov3_ Implementation of darknet53

The network model we chose is yolov3_described in the previous section The darknet53 model, which is a basketball game-related dataset, uses the AI framework, which is the Mindsore in-depth learning framework launched by Huawei. Finally, MindSpore's API is used to perform network model training and reasoning.

1. Data preparation

The data used this time is picture data, which can be taken at the scene of a basketball match or pictures in the video. The data we used this time is from downloading videos related to basketball matches on the Internet.

Videos are actually played frame by frame continuously, so in turn, we can read frame by frame pictures from the video and save the pictures to the relevant directory. Because there are many video frames in the video per second, and some adjacent pictures are very similar in content, no difference can be seen, so when getting the video frame, insert the appropriate time interval and intercept one picture in the video by a certain time frequency. The extracted pictures are quite different, few have the same pictures, and the quality of the data is better.

Once the data is generated, it also needs to be labeled. This can be done on ModelArts platform in Huawei Cloud. Finally, the prepared data is copied into the corresponding catalog, and the dataset is divided into training data and inference data for subsequent model training and inference.

2. Model training scripts

This model mainly calls the API interface of MindSpore, such as: context can be called. Set_ The context () interface configures the environment we are using. If our environment is using the Rising 910 chip and is trained in diagram mode, we can configure it by following instructions:

context.set_context(mode=context.GRAPH_MODE, device_target="Ascend")

The MindSpore-related interface used in this model training script is as follows:

import mindspore as ms
import mindspore.nn as nn
import mindspore.context as context
from mindspore import Tensor
from mindspore.nn.optim.momentum import Momentum
from mindspore.train.callback import ModelCheckpoint, RunContext
from mindspore.train.callback import _InternalCallbackParam, CheckpointConfig
from mindspore.train.serialization import load_checkpoint, load_param_into_net

The tutorials and API documentation for using these configurations or interfaces can be found on the MindSpore website, where links are available:

Yolov3_ The main structure of the darknet53 network model script is as follows:

Figure 3 yolov3_darknet53 Network Model Script Structure

Here are some of the main scripts in the network structure.

1 This is the training script for the network, mainly configuring environment parameters, network calls, calls to pre-training models, training dataset reading, optimizer configuration, ckpt file saving settings, model training scripts, etc. During model training, some training parameters need to be configured, which can be config in the src directory. The py file is set up, and you can also pass parameters to the training model when you execute the training command, as described below.

Here are some of the code for the training script, such as configuring environment parameters, network calls, data set reading, optimizer configuration, etc. for reference only.

context.set_context(mode=context.GRAPH_MODE, enable_auto_mixed_precision=True,
 device_target="Ascend", save_graphs=False)
network = YOLOV3DarkNet53(is_training=True)
ds, data_size = create_yolo_dataset(image_dir=os.path.join(local_data_path, 'images'), anno_path=os.path.join(local_data_path, 'annotation.json'), is_training=True, batch_size=args.per_batch_size, max_epoch=args.epoch_size,  device_num=args.group_size, rank=args.rank, config=config)
opt = Momentum(params=get_param_groups(network), learning_rate=Tensor(lr), 

(2) This is the inference model for this network, yolov3_darknet53 network also needs to configure environment, network call, read inference dataset, mainly use training model saved during network training, inference and so on. In order to understand the inference process, some log information generated during the inference process needs to be printed.

Here are some codes to implement the inference process for reference only.

context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", save_graphs=False)
network = YOLOV3DarkNet53(is_training=False)
param_dict = load_checkpoint(local_ckpt_path)
param_dict_new = {}
for key, values in param_dict.items():
 if key.startswith('moments.'):
 elif key.startswith('yolo_network.'):
 param_dict_new[key[13:]] = values
 param_dict_new[key] = values
load_param_into_net(network, param_dict_new)
detection = DetectionEngine(args)
input_shape = Tensor(tuple(config.test_img_shape), ms.float32)
for i, data in enumerate(ds.create_dict_iterator()):
  image = Tensor(data["image"])
  image_shape = Tensor(data["image_shape"])
  image_id = Tensor(data["img_id"])
  prediction = network(image, input_shape)
  output_big, output_me, output_small = prediction
  output_big = output_big.asnumpy()
  output_me = output_me.asnumpy()
  output_small = output_small.asnumpy()
  image_id = image_id.asnumpy()
  image_shape = image_shape.asnumpy()
  detection.detect([output_small, output_me, output_big], args.per_batch_size,
 image_shape, image_id, config=config)

(3) The src directory is mainly composed of some modules that the training script and inference script need to call, such as: the main yolov3_in the file Darknet53 network structure script, yolo_ The file is a script for models to read and process data, and so on.

(4) Under the scripts directory are some execution commands of network model training and reasoning running time, which are integrated into sh file and executed directly by aspect, such as: training execution commands can refer to the following ways:

python \
    --data_dir=./dataset/coco2014 \
    --pretrained_backbone=darknet53_backbone.ckpt \
    --is_distributed=0 \
    --lr=0.001 \
    --loss_scale=1024 \
    --weight_decay=0.016 \
    --T_max=320 \
    --max_epoch=320 \
    --warmup_epochs=4 \
    --training_shape=416 \
    --lr_scheduler=cosine_annealing > log.txt 2>&1 &

3. Model Inference

Yolov3_ The darknet53 network can infer from batch data. It is important to note that the inferred data should be separated from the training data, and they should not have the same data, otherwise the final inference accuracy will be affected.

When making model inference, two parameters have a greater impact on the final result:

1 ignore_threshold: This parameter sets the confidence level of the final inference result. For example, the confidence level of the target detected at the end is large or small. The greater the confidence level, the greater the probability that the result of inference is correct. Conversely, the smaller the probability that the result is correct. At this point, you can set the size of the confidence level to exclude the targets that are smaller than the confidence level, which will be helpful for the final result display.

(2) nms_ Threshold: This parameter sets the overlap of the detected target boxes. When the overlap of two boxes is greater than this parameter, the two targets are considered to be the same target, showing only one box. Conversely, when the overlap of two boxes is less than this parameter, both boxes are retained. The setup of the above two parameters requires some field experience and can be adjusted according to the actual situation in the field.


Summary and Extension

Compared with yolov2, yolov3 enhances the network, uses 3-scale detection, introduces more anchor s, reduces speed, but improves accuracy, and is better for detecting small targets.

yolov3 can be widely used in image classification, image detection, such as detection of personnel wearing masks, special area safety cap wearing, etc.

In conclusion, the AI technology through in-depth learning algorithm has penetrated into all aspects of life, reducing manual intervention, improving the safety of personnel, bringing great convenience to life, and so on. Interested children's shoes should be tried as soon as possible.

Tags: AI

Posted by dormouse1976 on Tue, 23 Aug 2022 02:20:48 +0530