Classification of near infrared spectra based on SVM

1 near infrared spectroscopy

Near Infrared Spectrum Instrument (NIRS) is an electromagnetic radiation wave between visible light (Vis) and mid infrared (MIR). The near infrared spectrum area of American Society for testing and materials (ASTM) is defined as 780-2526nm, which is the first non visible region found in the absorption spectrum. The near-infrared spectrum area is consistent with the frequency combination of vibration of hydrogen containing groups (O-H, N-H, C-H) in organic molecules and the absorption area of frequency doubling at all levels. By scanning the near-infrared spectrum of samples, the characteristic information of hydrogen containing groups in organic molecules in samples can be obtained. Moreover, the near-infrared spectrum technology has the advantages of convenience, rapidity, high efficiency, accuracy, low cost, no damage to samples, no consumption of chemical reagents, no pollution to the environment, etc, Therefore, this technology is favored by more and more people.

2 SVM algorithm

Support vector machines (SVM) is a binary classification model. Its basic model is the linear classifier with the largest interval defined in the feature space, which makes it different from the perceptron; SVM also includes kernel technique, which makes it a non-linear classifier in essence. The learning strategy of SVM is interval maximization, which can be formalized as a problem of solving convex quadratic programming, and is equivalent to the minimization of regularized hinge loss function. The learning algorithm of SVM is the optimization algorithm for solving convex quadratic programming.

3 algorithm implementation

The experimental data comes from: nirpy
The milk powder dataset contains 11 different categories, and the corresponding sample decreases by 10% from 100% milk powder to 0% milk powder (i.e. 100% coconut milk powder).

#Import package
import pandas as pd 
import numpy as np
from matplotlib.pyplot import imshow, show, colorbar
import matplotlib.pyplot as plt
from operator import truediv
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import cohen_kappa_score
#Import data
data = pd.read_csv('/milk-powder.csv')
y = data.iloc[:,1].values.astype('uint8')#label
X = data.iloc[:,2:].values#Spectral value
#SVM training
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 100) #Divide 30% data into tests and 70% into training
from sklearn.svm import SVC
classifier = SVC(C=1000,kernel='rbf', random_state=0) #Set core parameters
classifier.fit(X_train, y_train)
# Test set results
y_pred = classifier.predict(X_test)
#Draw confusion matrix
from pretty_confusion_matrix import pp_matrix
from sklearn.metrics import confusion_matrix
label = ['1','2','3','4','5','6','7','8','9','10','11']
cm = confusion_matrix(y_test, y_pred)
cm = pd.DataFrame(cm, index=np.unique(label), columns=np.unique(label)) 
cm.index.name = 'Actual'
cm.columns.name = 'Predicted' 
fig, ax = plt.subplots(figsize=(15,10))
plt.rcParams.update({'font.size': 12})
cmap = "Greens_r"
pp_matrix(cm, cmap=cmap)
#Forecast results
counter = cm.shape[0]
list_diag = np.diag(cm)
list_raw_sum = np.sum(cm, axis=1)
each_acc = np.nan_to_num(truediv(list_diag, list_raw_sum))
average_acc = np.mean(each_acc)
kappa = metrics.cohen_kappa_score(y_pred, y_test)
overall_acc = metrics.accuracy_score(y_pred, y_test)

4 Results

The average accuracy, kappa coefficient and overall accuracy are used to measure the classification performance of SVM.

average_acc = 0.9772727272727273
kappa = 0.9831675592960979
overall_acc = 0.9848484848484849

Tags: Python Machine Learning

Posted by Marc on Thu, 02 Jun 2022 00:09:22 +0530