Summary:
Use numpy to store your image data set as np Array and write the npy file as a data set that can be directly transferred to the network
This tutorial is from 515Turtledove Lab turtle dove Lab
In convolutional neural network, a large part of the application is image classification and recognition. For the data set we crawled last time,
Introduction to in-depth learning Xiaobai tutorial (I): take the hero League as an example, crawl the pictures you are interested in, and build your own data set
I came up with a good method to apply classification. According to the background stories of heroes coming from different regions, we can classify heroes according to their regions:
In this way, we have established a data set corresponding to the image classification. The next task should be to write the network structure. Why do we need another process to use npy files? An old man gave us the answer in his blog:
Why package data as before model training What is the difference between an npy file and a normal file format? From Ethan's blog, here is a picture of Ethan:
The processing speed is faster and the data is more unified. For big data, it will be done in the future. Next
We create a txt file in the following format:
Because our classification results cannot be expressed in Chinese, we use numerical correspondence to represent the corresponding region,
I created a data in the folder where the code is located_ The heros folder is used as the root directory of the pictures, and the data folder we made last time is imported:
Modify the corresponding folder name according to the txt file:
Next, we will process our hero image, change the name of the image to the image below, and save its region as a label to hero_ List Txt folder:
The hero's name doesn't matter. We only need to know where he comes from during training. The number before the underscore of the file name represents his region. The sorted data name has the following format:
#Picture format region_Label.jpg
The implementation code is as follows:
import os import shutil data_base = "./data_heros/"#This is the folder we just created to store pictures #Get_ The path function is used to obtain all paths under a path def get_path(pic_dir): pic_path=[] for path in os.listdir(pic_dir): real_path = os.path.join(pic_dir,path) pic_path.append(real_path) return pic_path def rename(): a = get_path(data_base)#Get all folder addresses of database folder #This is a folder sort because the default sort order is 1, 10, 11, 12, 13, 14 instead of 1, 2, 3, 4 a.sort(key=lambda x: int(x.split("data_heros")[1][1:])) print(a) for i in range(14):#14 folders count = 0 for content in get_path(a[i]):#Find the picture address in each folder #Move the pictures from each folder to the root directory where the pictures are stored, and rename them shutil.move(content, data_base + str(i)+'_'+str(count)+'.jpg') #Write the picture name and its corresponding region to hero_ List Txt file, which will be used as the data set label part in the future with open('hero_list.txt','a+') as f: f.write(str(i)+'_'+str(count)+'.jpg'+' '+str(i)+"\n") count += 1 rename()
This file or the second function is not well written, which violates the domain principle ha ha
The program running results are as follows:
Finally, with pictures and labels, we can create npy datasets:
The implementation code is as follows. It is mainly modified by referring to the course code of Mr. Caojian, which has benefited a lot. In terms of reading pictures, I prefer to use opencv for small modifications
import numpy as np import cv2 x_path = "E:\pys\loldiqu\data_heros/" #The picture address we just saved y_path = 'E:\pys\loldiqu\hero_list.txt' #Label address just saved x_savepath = "E:\pys\loldiqu/x_save.npy" #npy picture set storage address y_savepath = 'E:\pys\loldiqu/y_save.npy' #npy label storage address def generateds(path, txt): f = open(txt, 'r') # Open label txt contents = f.readlines() # Read by line and store in the contents list f.close() x, y_ = [], [] for content in contents: value = content.split() # Separate them with spaces and store them in the value array img_path = path + value[0] img = cv2.imread(img_path,cv2.IMREAD_GRAYSCALE)#Use opencv to read pictures in grayscale format img = cv2.resize(img,(192, 108))#Re modify the image size. It is too hard to calculate the default HD large image img = img / 255.#Normalization is to change 0-255 of the picture to 0-1 x.append(img)#Picture information stored in x array y_.append(value[1])#Label information stored in y_ array print('loading : ' + content) x = np.array(x)#Become np Array format y_ = np.array(y_))#Become np Array format y_ = y_.astype(np.int64) return x, y_ print('-------------Generate Datasets-----------------') x_save, y_save = generateds(x_path, y_path) print('-------------Save Datasets-----------------') x_save = np.reshape(x_save, (len(x_save), -1)) np.save(x_savepath, x_save) np.save(y_savepath, y_save)
The program running results are as follows:
Well, everything is ready. The next step is to transfer to the network and train your hero League hero - regional classification network! See you next time!