1 near infrared spectroscopy
Near Infrared Spectrum Instrument (NIRS) is an electromagnetic radiation wave between visible light (Vis) and mid infrared (MIR). The near infrared spectrum area of American Society for testing and materials (ASTM) is defined as 780-2526nm, which is the first non visible region found in the absorption spectrum. The near-infrared spectrum area is consistent with the frequency combination of vibration of hydrogen containing groups (O-H, N-H, C-H) in organic molecules and the absorption area of frequency doubling at all levels. By scanning the near-infrared spectrum of samples, the characteristic information of hydrogen containing groups in organic molecules in samples can be obtained. Moreover, the near-infrared spectrum technology has the advantages of convenience, rapidity, high efficiency, accuracy, low cost, no damage to samples, no consumption of chemical reagents, no pollution to the environment, etc, Therefore, this technology is favored by more and more people.
2 SVM algorithm
Support vector machines (SVM) is a binary classification model. Its basic model is the linear classifier with the largest interval defined in the feature space, which makes it different from the perceptron; SVM also includes kernel technique, which makes it a non-linear classifier in essence. The learning strategy of SVM is interval maximization, which can be formalized as a problem of solving convex quadratic programming, and is equivalent to the minimization of regularized hinge loss function. The learning algorithm of SVM is the optimization algorithm for solving convex quadratic programming.
3 algorithm implementation
The experimental data comes from: nirpy
The milk powder dataset contains 11 different categories, and the corresponding sample decreases by 10% from 100% milk powder to 0% milk powder (i.e. 100% coconut milk powder).
#Import package import pandas as pd import numpy as np from matplotlib.pyplot import imshow, show, colorbar import matplotlib.pyplot as plt from operator import truediv from sklearn.model_selection import train_test_split from sklearn import metrics from sklearn.metrics import cohen_kappa_score
#Import data data = pd.read_csv('/milk-powder.csv') y = data.iloc[:,1].values.astype('uint8')#label X = data.iloc[:,2:].values#Spectral value
#SVM training X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 100) #Divide 30% data into tests and 70% into training from sklearn.svm import SVC classifier = SVC(C=1000,kernel='rbf', random_state=0) #Set core parameters classifier.fit(X_train, y_train) # Test set results y_pred = classifier.predict(X_test)
#Draw confusion matrix from pretty_confusion_matrix import pp_matrix from sklearn.metrics import confusion_matrix label = ['1','2','3','4','5','6','7','8','9','10','11'] cm = confusion_matrix(y_test, y_pred) cm = pd.DataFrame(cm, index=np.unique(label), columns=np.unique(label)) cm.index.name = 'Actual' cm.columns.name = 'Predicted' fig, ax = plt.subplots(figsize=(15,10)) plt.rcParams.update({'font.size': 12}) cmap = "Greens_r" pp_matrix(cm, cmap=cmap) #Forecast results counter = cm.shape[0] list_diag = np.diag(cm) list_raw_sum = np.sum(cm, axis=1) each_acc = np.nan_to_num(truediv(list_diag, list_raw_sum)) average_acc = np.mean(each_acc) kappa = metrics.cohen_kappa_score(y_pred, y_test) overall_acc = metrics.accuracy_score(y_pred, y_test)
4 Results
The average accuracy, kappa coefficient and overall accuracy are used to measure the classification performance of SVM.
average_acc = 0.9772727272727273
kappa = 0.9831675592960979
overall_acc = 0.9848484848484849