Beginner's Guide to in-depth learning: use numpy to store your picture data set and prepare for the incoming network


Use numpy to store your image data set as np Array and write the npy file as a data set that can be directly transferred to the network
This tutorial is from 515Turtledove Lab turtle dove Lab

In convolutional neural network, a large part of the application is image classification and recognition. For the data set we crawled last time,
Introduction to in-depth learning Xiaobai tutorial (I): take the hero League as an example, crawl the pictures you are interested in, and build your own data set
I came up with a good method to apply classification. According to the background stories of heroes coming from different regions, we can classify heroes according to their regions:

In this way, we have established a data set corresponding to the image classification. The next task should be to write the network structure. Why do we need another process to use npy files? An old man gave us the answer in his blog:
Why package data as before model training What is the difference between an npy file and a normal file format? From Ethan's blog, here is a picture of Ethan:

The processing speed is faster and the data is more unified. For big data, it will be done in the future. Next
We create a txt file in the following format:

Because our classification results cannot be expressed in Chinese, we use numerical correspondence to represent the corresponding region,

I created a data in the folder where the code is located_ The heros folder is used as the root directory of the pictures, and the data folder we made last time is imported:

Modify the corresponding folder name according to the txt file:

Next, we will process our hero image, change the name of the image to the image below, and save its region as a label to hero_ List Txt folder:

The hero's name doesn't matter. We only need to know where he comes from during training. The number before the underscore of the file name represents his region. The sorted data name has the following format:

#Picture format

The implementation code is as follows:

import os
import shutil
data_base = "./data_heros/"#This is the folder we just created to store pictures

#Get_ The path function is used to obtain all paths under a path
def get_path(pic_dir):
    for path in os.listdir(pic_dir):
        real_path = os.path.join(pic_dir,path)
    return pic_path

def rename():
    a = get_path(data_base)#Get all folder addresses of database folder
     #This is a folder sort because the default sort order is 1, 10, 11, 12, 13, 14 instead of 1, 2, 3, 4
    a.sort(key=lambda x: int(x.split("data_heros")[1][1:]))
    for i in range(14):#14 folders
        count = 0
        for content in get_path(a[i]):#Find the picture address in each folder
        	#Move the pictures from each folder to the root directory where the pictures are stored, and rename them
            shutil.move(content, data_base + str(i)+'_'+str(count)+'.jpg')
            #Write the picture name and its corresponding region to hero_ List Txt file, which will be used as the data set label part in the future
            with open('hero_list.txt','a+') as f:
                f.write(str(i)+'_'+str(count)+'.jpg'+' '+str(i)+"\n")
            count += 1


This file or the second function is not well written, which violates the domain principle ha ha
The program running results are as follows:

Finally, with pictures and labels, we can create npy datasets:
The implementation code is as follows. It is mainly modified by referring to the course code of Mr. Caojian, which has benefited a lot. In terms of reading pictures, I prefer to use opencv for small modifications

import numpy as np
import cv2

x_path = "E:\pys\loldiqu\data_heros/" #The picture address we just saved
y_path = 'E:\pys\loldiqu\hero_list.txt' #Label address just saved
x_savepath = "E:\pys\loldiqu/x_save.npy" #npy picture set storage address
y_savepath = 'E:\pys\loldiqu/y_save.npy' #npy label storage address

def generateds(path, txt):
    f = open(txt, 'r') # Open label txt
    contents = f.readlines()  # Read by line and store in the contents list
    x, y_ = [], []
    for content in contents:
        value = content.split()  # Separate them with spaces and store them in the value array
        img_path = path + value[0]
        img = cv2.imread(img_path,cv2.IMREAD_GRAYSCALE)#Use opencv to read pictures in grayscale format
        img = cv2.resize(img,(192, 108))#Re modify the image size. It is too hard to calculate the default HD large image
        img = img / 255.#Normalization is to change 0-255 of the picture to 0-1
        x.append(img)#Picture information stored in x array
        y_.append(value[1])#Label information stored in y_ array
        print('loading : ' + content)

    x = np.array(x)#Become np Array format
    y_ = np.array(y_))#Become np Array format
    y_ = y_.astype(np.int64)
    return x, y_

print('-------------Generate Datasets-----------------')
x_save, y_save = generateds(x_path, y_path)

print('-------------Save Datasets-----------------')
x_save = np.reshape(x_save, (len(x_save), -1)), x_save), y_save)

The program running results are as follows:

Well, everything is ready. The next step is to transfer to the network and train your hero League hero - regional classification network! See you next time!

Tags: Deep Learning

Posted by markbeadle on Tue, 31 May 2022 21:39:57 +0530