1. Understanding of Gradient Descent Method
1. Gradient: In calculus, calculate the ∂ partial derivative of the parameters of the multivariate function, and write the obtained partial derivative of each parameter in the form of a vector (directed slope)
2. Gradient descent: it is an optimization algorithm for solving
For example, we are somewhere on a big mountain. Since we don’t know how to go down the mountain the fastest, we choose to take a step down at the steepest position of the position, and then continue to solve the gradient of the current position. Take a step where the steepest descent is most likely. Going down step by step like this, until we feel that we have reached the foot of the mountain. Of course, if we go on like this, we may not be able to reach the foot of the mountain, but to a certain part of the lower part of the mountain.
2. The concept of gradient descent
1. hypothesis function
In supervised learning, in order to fit the input samples, the hypothesis function used is denoted as. For example, for m samples (x(i),y(i)) (i=1,2,...m) of a single feature, the fitting function can be used as follows:
2. Loss function
3. Learning rate (step size)
too large may not converge
Too small may run too slowly
3. Algorithm of Gradient Descent
1. Algebraic way
1.1 Function
Hypothetical function:
Loss function:
1.2. Algorithm process
1.2.1 Determine the gradient of the loss function at the current position, namely
1.2.2 Multiply the step size by the gradient to get the descending distance, ie
1.2.3 Determine if the descent distance of all θ is less than
(The running process will automatically reduce the distance, and the derivative of the lowest point is found to be equal to the original position)
1.2.4 If not, update all θ,
2. Matrix method
2.1 Function
Hypothetical function:
Loss function:
2.2 Update expression
(The derivation process is exactly the same as the least squares method)
4. Application (diabetes s1)
1. Early stage
import numpy as np import pandas as pd import matplotlib.pyplot as plt import warnings warnings.filterwarnings('ignore') plt.rcParams['font.sans-serif']=['SimHei'] plt.rcParams['axes.unicode_minus']=False # Import package to get diabetes dataset from sklearn.datasets import load_diabetes data_diabetes = load_diabetes() data = data_diabetes['data'] target = data_diabetes['target'] feature_names = data_diabetes['feature_names'] df = pd.DataFrame(data,columns=feature_names)
#split dataset from sklearn.model_selection import train_test_split x_train,x_test,y_train,y_test = train_test_split(data,target,test_size=0.2) #Select one of the features s1 from numpy import mat x_train_df = pd.DataFrame(x_train,columns=feature_names) s1=x_train_df.iloc[:,4] s11=mat(s1.values).T s12=np.hstack([np.ones((len(s11),1)),s11.reshape(-1,1)]) #drawing plt.scatter(s1, y_train) plt.show()
2. Handwritten code
#gradient descent #loss function def J(X,Y,theta): try: return np.sum((Y-X.dot(theta))**2)/len(X) except: return float('inf') #Loss function derivation def dJ(X,Y,theta): res = np.empty(len(theta)) res[0] = np.sum(X.dot(theta)-Y) for i in range(1,len(theta)): res[i] = (X.dot(theta)-Y).dot(X[:,i]) return res*2 / len(X) def gradient_descent(X,Y,initial_theta,eta,n_iters=1e4,epsilon=1e-8): theta=initial_theta cur_iter=0 while cur_iter<n_iters: gradient = dJ(X,Y,theta) last_theta=theta theta = theta - eta * gradient if(abs(J(X,Y,theta)-J(X,Y,last_theta))<epsilon): break cur_iter += 1 return theta
#bring in data eta = 0.01 itheta=np.zeros(s12.shape[1]) gradient_descent(s12,y_train,itheta,eta) Output result: array([150.49064571, 106.10190172])
3. Comparison of sklearn
from sklearn.linear_model import SGDRegressor from sklearn.preprocessing import StandardScaler standardScaler = StandardScaler() reg = SGDRegressor() reg.fit(s12,y_train) print(reg.score(s12,y_train)) print(reg.coef_) Output result: 0.001314261902390479 [75.39860165 6.40174396]