# 1. Understanding of Gradient Descent Method

1. Gradient: In calculus, calculate the ∂ partial derivative of the parameters of the multivariate function, and write the obtained partial derivative of each parameter in the form of a vector (directed slope)

2. Gradient descent: it is an optimization algorithm for solving

For example, we are somewhere on a big mountain. Since we don’t know how to go down the mountain the fastest, we choose to take a step down at the steepest position of the position, and then continue to solve the gradient of the current position. Take a step where the steepest descent is most likely. Going down step by step like this, until we feel that we have reached the foot of the mountain. Of course, if we go on like this, we may not be able to reach the foot of the mountain, but to a certain part of the lower part of the mountain.

# 2. The concept of gradient descent

1. hypothesis function

In supervised learning, in order to fit the input samples, the hypothesis function used is denoted as. For example, for m samples (x(i),y(i)) (i=1,2,...m) of a single feature, the fitting function can be used as follows:

2. Loss function

3. Learning rate (step size)

too large may not converge

Too small may run too slowly

# 3. Algorithm of Gradient Descent

## 1. Algebraic way

1.1 Function

Hypothetical function:

Loss function:

1.2. Algorithm process

1.2.1 Determine the gradient of the loss function at the current position, namely

1.2.2 Multiply the step size by the gradient to get the descending distance, ie

1.2.3 Determine if the descent distance of all θ is less than

(The running process will automatically reduce the distance, and the derivative of the lowest point is found to be equal to the original position)

1.2.4 If not, update all θ,

## 2. Matrix method

2.1 Function

Hypothetical function:

Loss function:

2.2 Update expression

(The derivation process is exactly the same as the least squares method)

# 4. Application (diabetes s1)

1. Early stage

```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False

# Import package to get diabetes dataset

data = data_diabetes['data']
target = data_diabetes['target']
feature_names = data_diabetes['feature_names']

df = pd.DataFrame(data,columns=feature_names)```
```#split dataset
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(data,target,test_size=0.2)

#Select one of the features s1
from numpy import mat
x_train_df = pd.DataFrame(x_train,columns=feature_names)
s1=x_train_df.iloc[:,4]
s11=mat(s1.values).T
s12=np.hstack([np.ones((len(s11),1)),s11.reshape(-1,1)])

#drawing
plt.scatter(s1, y_train)
plt.show()
```

2. Handwritten code

```#gradient descent
#loss function
def J(X,Y,theta):
try:
return np.sum((Y-X.dot(theta))**2)/len(X)
except:
return float('inf')

#Loss function derivation
def dJ(X,Y,theta):
res = np.empty(len(theta))
res[0] = np.sum(X.dot(theta)-Y)
for i in range(1,len(theta)):
res[i] = (X.dot(theta)-Y).dot(X[:,i])
return res*2 / len(X)

theta=initial_theta
cur_iter=0

while cur_iter<n_iters:
last_theta=theta
theta = theta - eta * gradient

if(abs(J(X,Y,theta)-J(X,Y,last_theta))<epsilon):
break
cur_iter += 1

return theta```
```#bring in data
eta = 0.01
itheta=np.zeros(s12.shape[1])

Output result:
array([150.49064571, 106.10190172])```

3. Comparison of sklearn

```from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
standardScaler = StandardScaler()
reg = SGDRegressor()
reg.fit(s12,y_train)
print(reg.score(s12,y_train))
print(reg.coef_)

Output result:
0.001314261902390479
[75.39860165  6.40174396]```

Posted by c_pattle on Fri, 10 Feb 2023 23:51:52 +0530