# Logic regression principle and Python implementation (including data set)

## Basic principles   ## Solution of loss function

• There are many methods to minimize the loss function (maximum likelihood function) of binary logistic regression. The most common methods are gradient descent method, coordinate axis descent method, Newton method, etc. the most common method is gradient descent to continuously approach the optimal solution.

(1) The training speed is fast, and the amount of computation is only related to the number of features in classification;
(2) It is simple and easy to understand. The interpretability of the model is very good. From the weight of features, we can see the impact of different features on the final results;
(3) It is suitable for binary classification problems without scaling input features;
(4) The memory consumption is small, because only the characteristic values of each dimension need to be stored;

(1) Logistic regression can not be used to solve nonlinear problems, because the decision-making of logistic is linear;
(2) Sensitive to multicollinearity data;
(3) It is difficult to deal with the problem of data imbalance;
(4) The accuracy is not very high, because the form is very simple (very similar to the linear model), it is difficult to fit the real distribution of the data;
(5) Logistic regression itself cannot filter features. Sometimes gbdt is used to filter features, and then logistic regression is applied.

# Python implementation

## LR_Train training model

```# coding:UTF-8
# Author:xwj
# Date:2020-7-2
# Email:xwj770427414@126.com
# Environment:Python3.7
import numpy as np

def sig(x):
"""
Logarithmic probability function  Sigmoid function
:param x: feature * x + b
:return:P(y=1|x,w,b)
"""
return 1.0/(1+np.exp(-x))

def lr_train_bfd(feature, label, maxCycle, alpha):
"""
Training with gradient descent method LR model
:param feature:mat,feature
:param label: mar，label
:param maxCycle: int，Maximum iterations
:param alpha: float，Learning rate
:return: w，weight
"""
n = np.shape(feature)  # Number of features
w = np.mat(np.ones((n, 1)))  # Initialize weights
i = 0
while i <= maxCycle:  # Loop within maximum iterations
i += 1
h = sig(feature * w)  # Calculate the value of Sigmoid
err = label - h  # error
if i % 100 == 0:  # Once every 100 iterations
print('\t--------Iterations = ' + str(i) + ',Training error rate = ' + str(error_rate(h, label)) )
w = w + alpha * feature.T * err  # Weight correction
return w

def error_rate(h, label):
"""
Calculate loss function value
:param h: mat，Estimate
:param label: mat，actual value
:return: float，err/m error rate
"""
m = np.shape(h)   # Number of predicted values
sum_err = 0.0  # Initialization error rate
for i in range(m):   # m predicted value iterations
if h[i, 0] > 0 and (1 - h[i, 0]) > 0:  # Predictor slice
sum_err -= (label[i, 0] * np.log(h[i, 0]) + (1 - label[i, 0]) * np.log(1 - h[i, 0]))  # Loss function formula calculation
else:
sum_err -= 0
return sum_err / m

"""
Cleaning data, importing data
:param file_name: Dataset name
:return: Eigenvalues and labels in matrix form
"""
f = open(file_name)
feature_data = []  # Characteristic data
label_data = []  # Label data
feature_tmp = []  # Staging characteristics
label_tmp = []  # Temporary label
lines = line.strip().split('\t')  # Remove the special symbols (\n, etc.) at the end of the data, and divide the data into lists at \t intervals.
feature_tmp.append(1)  # The initial offset term b is 1 and merged into the feature
for i in range(len(lines) - 1):  # Read the characteristic data one by one and remove the label at the end
feature_tmp.append(float(lines[i]))  # The features are floating-point numbered one by one and added to the temporary features to form a list
label_tmp.append(float(lines[-1]))  # Add labels to temporary labels to form a list
feature_data.append(feature_tmp)  # Add staging list to general list
label_data.append(label_tmp)  # Add staging list to general list
f.close()
return np.mat(feature_data), np.mat(label_data)  # Matrix feature sequence and label sequence

def save_model(file_name, w):
"""
Save weights for final model
:param file_name:Dataset name
:param w: weight
:return: Save model file
"""
m = np.shape(w)
f_w = open(file_name, 'w')
w_array = []
for i in range(m):
w_array.append(str(w[i, 0]))
f_w.write('\t'.join(w_array))
f_w.close()

if __name__ == "__main__":
print('------1.Import data------')
print('------2.Training model------')
w = lr_train_bfd(feature, label, 1000, 0.01)
print('------3.Save model------')
save_model("weights.txt", w)

```

## test model

```# coding:UTF-8
# Author:xwj
# Date:2020-7-2
# Email:xwj770427414@126.com
# Environment:Python3.7
import numpy as np

def sig(x):
"""
Logarithmic probability function  Sigmoid function
:param x: feature * x + b
:return:P(y=1|x,w,b)
"""
return 1.0/(1+np.exp(-x))

"""
Import LR Training model
:param w: w Weight storage location
:return: np.mat(w),Matrix of weights
"""
f = open(w)
w = []
lines = line.strip().split('\t')
w_tmp = []
for x in lines:
w_tmp.append(float(x))
w.append(w_tmp)
f.close()
return np.mat(w)

"""
Import test data
:param file_name:Test data location
:param n: Number of features
:return: np.mat(feature)Characteristics of test sets
"""
f = open(file_name)
feature_data = []
feature_tmp = []
lines = line.strip().split('\t')
if len(lines) != n - 1:
continue
feature_tmp.append(1)  # The initial offset term b is 1 and merged into the feature
for x in lines:
feature_tmp.append(float(x))
feature_data.append(feature_tmp)
f.close()
return np.mat(feature_data)

def predict(data, w):
"""
Predict test data
:param data: mat,Characteristics of the model
:param w: Parameters of the model
:return: h,mat,Final forecast results
"""
h = sig(data * w.T)
m = np.shape(h)
for i in range(m):
if h[i, 0] < 0/5:
h[1, 0] = 0.0
else:
h[i, 0] = 1.0
return h

def save_result(file_name, result):
"""
Save final forecast results
:param file_name: Forecast result save file name
:param result: mat,Predicted results
"""
m = np.shape(result)
tmp = []
for i in range(m):
tmp.append(str(h[i, 0]))
f_result = open(file_name, "w")
f_result.write("\t".join(tmp))
f_result.close()

if __name__ == "__main__":
print('------1.Import model------')