In the blog post " Optimization Algorithm - L-BFGS Algorithm of Quasi-Newton Method "In ", the algorithm principle of L-BFGS has been introduced in detail. This article mainly reviews the algorithm principle and specific implementation process of L-BFGS on the open source code liblbfgs. The L-BFGS algorithm includes OWL for processing L1 regularization. -QN algorithm, for the detailed principle of OWL-QN algorithm, please refer to the blog post " Optimization algorithm - OWL-QN".
1. liblbfgs overview
liblbfgs is an L-BFGS algorithm library based on C language, which is used to solve nonlinear optimization problems. can be found via liblbfgs' home page ( http://www.chokkan.org/software/liblbfgs/ ) Query to introduce the liblbfgs module. Its code can be downloaded through the link below:
- For Linux platform https://github.com/downloads/chokkan/liblbfgs/liblbfgs-1.10.tar.gz
- For Windows platform https://github.com/chokkan/liblbfgs
2. liblbfgs source code analysis
2.1. The main structure of the source code
In liblbfgs, the main code includes
- liblbfgs-1.10/include/lbfgs.h: header file
- liblbfgs-1.10/lib/lbfgs.c: specific implementation
- liblbfgs-1.10/lib/arithmetic_ansi.h (the other two arithmetic_sse_double.h and arithmetic_sse_float.h are two equivalent forms written in assembly): equivalent to some tools
- liblbfgs-1.10/sample/sample.c: test sample
2.2. Tools arithmetic_ansi.h
In the arithmetic_ansi.h file, there are mainly some operations on the vector (vector). The program code of this part is relatively simple. Here is a brief description of the function of each function, including:
- Apply for a space of size and initialize it at the same time
copyvoid* vecalloc(size_t size)
- free up space
copyvoid vecfree(void *memblock)
- set the value in vector x to c
copyvoid vecset(lbfgsfloatval_t *x, const lbfgsfloatval_t c, const int n)
- Copies the values in vector x to vector y
copyvoid veccpy(lbfgsfloatval_t *y, const lbfgsfloatval_t *x, const int n)
- Take the negative of each value in vector x and put it in vector y
copyvoid vecncpy(lbfgsfloatval_t *y, const lbfgsfloatval_t *x, const int n)
- For each element in vector y, increase c times the corresponding element in vector x
copyvoid vecadd(lbfgsfloatval_t *y, const lbfgsfloatval_t *x, const lbfgsfloatval_t c, const int n)
- Computes the difference between vector x and vector y
copyvoid vecdiff(lbfgsfloatval_t *z, const lbfgsfloatval_t *x, const lbfgsfloatval_t *y, const int n)
- product of vector and constant
copyvoid vecscale(lbfgsfloatval_t *y, const lbfgsfloatval_t c, const int n)
- outer product of vectors
copyvoid vecmul(lbfgsfloatval_t *y, const lbfgsfloatval_t *x, const int n)
- dot product of vectors
copyvoid vecdot(lbfgsfloatval_t* s, const lbfgsfloatval_t *x, const lbfgsfloatval_t *y, const int n)
- square root of dot product of vectors
copyvoid vec2norm(lbfgsfloatval_t* s, const lbfgsfloatval_t *x, const int n)
- reciprocal of the square root of the dot product of vectors
copyvoid vec2norminv(lbfgsfloatval_t* s, const lbfgsfloatval_t *x, const int n)
2.3. Main functions of L-BFGS algorithm
In liblbfgs, there are many codes optimized by assembly language. These optimized codes are not considered here for the time being. For these optimized codes, the author provides a basic implementation method.
2.3.1. Allocating and reclaiming memory space for variables
The function lbfgs_malloc is a function to allocate memory space for variables in the optimization function, and its specific form is:
copy// allocate space for variables lbfgsfloatval_t* lbfgs_malloc(int n) { // Some knowledge related to compilation is not considered for the time being #if defined(USE_SSE) && (defined(__SSE__) || defined(__SSE2__)) n = round_out_variables(n); #endif/*defined(USE_SSE)*/ // Allocate n size memory space return (lbfgsfloatval_t*)vecalloc(sizeof(lbfgsfloatval_t) * n); }
The function lbfgs_malloc is used to allocate memory space with a length of n for variables, where lbfgsfloatval_t is the type of the defined variable, which is defined as float or double type:
copy#if LBFGS_FLOAT == 32 typedef float lbfgsfloatval_t; #elif LBFGS_FLOAT == 64 typedef double lbfgsfloatval_t; #else #error "libLBFGS supports single (float; LBFGS_FLOAT = 32) or double (double; LBFGS_FLOAT=64) precision only." #endif
Corresponding to memory allocation is memory recovery, and its specific form is:
copy// free space for variables void lbfgs_free(lbfgsfloatval_t *x) { vecfree(x); }
2.3.2. Initialization of parameters in L-BFGS
The function lbfgs_parameter_init provides the initialization method of L-BFGS default parameters.
In fact, the default parameter method is also provided during the algorithm of L-BFGS, so this method is a bit redundant.
copy// Initialize the parameters of lbfgs by default void lbfgs_parameter_init(lbfgs_parameter_t *param) { memcpy(param, &_defparam, sizeof(*param));// Initialize with default parameters }
The function lbfgs_parameter_init copies the default parameter _defparam to the parameter param, lbfgs_parameter_t is the structure of the L-BFGS parameter, and its specific code is as follows:
The author's comments in this part of the code are particularly detailed, from which you can learn a lot of important information when tuning parameters.
2.3.3. Overview of the core process of the L-BFGS algorithm
The function lbfgs is the core process of the optimization algorithm, and its parameters are:
copyint n,// number of variables lbfgsfloatval_t *x,// variable lbfgsfloatval_t *ptr_fx,// objective function value lbfgs_evaluate_t proc_evaluate,// Callback function to calculate the objective function value and gradient lbfgs_progress_t proc_progress,// Callback function to handle calculation process void *instance,// data lbfgs_parameter_t *_param// Parameters of L-BFGS
This function is used to calculate specific functions and handle some work in the calculation process by calling two functions proc_evaluate and proc_progress.
In the function lbfgs, its basic process is:

2.3.4. Declaration and initialization of parameters
The declaration of many parameters is involved in the initialization phase, and the function of each parameter will be introduced in detail next.
2.3.5. The process of loop solution
The solution process of the loop starts from the for loop. In the for loop, it is first necessary to use the line search strategy to select the optimal step size. The specific process is as follows:
copy/* Store the current position and gradient vectors. */ // store the current variable value and gradient value veccpy(xp, x, n);// Copy the current variable value into the vector xp veccpy(gp, g, n);// Copy the current gradient value to the vector gp /* Search for an optimal step. */ // Line search strategy, search for optimal step size if (param.orthantwise_c == 0.) {// No L1 regularization ls = linesearch(n, x, &fx, g, d, &step, xp, gp, w, &cd, ¶m);// gp is the gradient } else {// Contains L1 regularization ls = linesearch(n, x, &fx, g, d, &step, xp, pg, w, &cd, ¶m);// pg is the pseudo gradient // Calculate pseudo-gradient owlqn_pseudo_gradient( pg, x, g, n, param.orthantwise_c, param.orthantwise_start, param.orthantwise_end ); } if (ls < 0) {// The termination condition has been met // Since vector x and vector g are updated during the online search, variable values and gradient values are recovered from xp and gp /* Revert to the previous point. */ veccpy(x, xp, n); veccpy(g, gp, n); ret = ls; goto lbfgs_exit;// release resources }
Since the variable x and the gradient vector g are modified during the online search, the variable x and the gradient g are saved in another vector before performing the online search. The parameter param.orthantwise_c represents the parameter of L1 regularity. If it is 0, L1 regularity is not used, that is, L-BFGS algorithm is used; if it is not 0, L1 regularity is used, that is, OWL-QN algorithm is used.
For the specific method of line search, please refer to 2.3.6. For the owlqn_pseudo_gradient function, please refer to 2.3.4
In OWL-QN, pseudo gradients are used instead of gradients in L-BFGS since there are no derivatives at some points.
2.3.6. The termination condition of the loop
In the process of selecting the optimal step size, the variables will be updated at the same time. The second step is to judge whether the update at this time meets the termination conditions. The termination conditions are divided into the following three categories:
- whether to converge
copyvec2norm(&xnorm, x, n);// root of sum of squares if (param.orthantwise_c == 0.) {// Non-L1 regular vec2norm(&gnorm, g, n); } else {// L1 regularization vec2norm(&gnorm, pg, n); }
copy// Judging whether to converge if (xnorm < 1.0) xnorm = 1.0; if (gnorm / xnorm <= param.epsilon) { /* Convergence. */ ret = LBFGS_SUCCESS; break; }
The method of judging convergence is:
\frac{\left | g\left ( x \right ) \right |}{max\left ( 1,\left | x \right | \right )}<\varepsilonIf the above formula is satisfied, it means that the convergence condition has been met.
- Is there a large enough drop in the objective function value (minimum problem)
copyif (pf != NULL) {// Termination condition /* We don't test the stopping criterion while k < past. */ // k is the number of iterations, only the case of past>k is considered, and past means that only the value of past times is retained if (param.past <= k) { /* Compute the relative improvement from the past. */ // Calculates the ratio by which the function is reduced rate = (pf[k % param.past] - fx) / fx; /* The stopping criterion. */ // Whether the reduction ratio meets the conditions if (rate < param.delta) { ret = LBFGS_STOP; break; } } /* Store the current value of the objective function. */ // update the new objective function value pf[k % param.past] = fx; }
In pf, the objective function value of param.past times is saved. The calculation method is:
\frac{f_o-f_n}{f_n}<\Delta- Whether to reach the maximum number of iterations
copy// The maximum number of iterations has been reached if (param.max_iterations != 0 && param.max_iterations < k+1) { /* Maximum number of iterations. */ ret = LBFGSERR_MAXIMUMITERATION; break; }
If the conditions for termination are not met, then a new Hessian matrix needs to be fitted.
2.3.7. Fitting Hessian Matrix
In the BFGS algorithm ( Optimization algorithm - BFGS algorithm of quasi-Newton method ), its Hessian matrix is:
H_{k+1}=\left ( I-\frac{s_ky_k^T}{y_k^Ts_k} \right )^TH_k\left ( I-\frac{y_ks_k^T}{y_k^Ts_k} \right )+\frac{s_ks_k^T}{y_k^Ts_k}in,
y_k=\bigtriangledown f\left ( x_{k+1} \right )-\bigtriangledown f\left ( x_{k} \right ),
s_k=x_{k+1}-x_{k}. In the process of calculation, it is necessary to continuously calculate and store the historical Hessian matrix. In the L-BFGS algorithm, it is hoped that only the most recent
mThe iteration information can be used to fit the Hessian matrix. In the L-BFGS algorithm, the complete
H_k, but instead stores a sequence of vectors
\left \{ s_k \right \}and
\left \{ y_k \right \}, when the matrix is required
H_k, using a sequence of vectors
\left \{ s_k \right \}and
\left \{ y_k \right \}The calculation can be obtained, and the vector sequence
\left \{ s_k \right \}and
\left \{ y_k \right \}Not all have to be saved, just save the latest
mstep vector. Its specific calculation method is:

The specific principle of L-BFGS can be found in " Optimization Algorithm - L-BFGS Algorithm of Quasi-Newton Method".
In the above process, the first loop calculates the penultimate
mIn the second stage, the current descending direction is iteratively calculated using the method calculated above.
According to the above process, start to fit the Hessian matrix:
- Computing a sequence of vectors
and
\left \{ y_k \right \}copy// update s vector and y vector it = &lm[end];// Initially, end is 0 vecdiff(it->s, x, xp, n);// x - xp, xp is the value of the previous generation, x is the current value vecdiff(it->y, g, gp, n);// g - gp, gp is the value of the previous generation, g is the current value
- two loops
copy// Calculation of Hessian matrix by quasi-Newton method // Two cycles of L-BFGS j = end; for (i = 0;i < bound;++i) { j = (j + m - 1) % m; /* if (--j == -1) j = m-1; */ it = &lm[j]; /* \alpha_{j} = \rho_{j} s^{t}_{j} \cdot q_{k+1}. */ vecdot(&it->alpha, it->s, d, n);// calculate alpha it->alpha /= it->ys;// multiplied by rho /* q_{i} = q_{i+1} - \alpha_{i} y_{i}. */ vecadd(d, it->y, -it->alpha, n);// re-corrected } vecscale(d, ys / yy, n); for (i = 0;i < bound;++i) { it = &lm[j]; /* \beta_{j} = \rho_{j} y^t_{j} \cdot \gamma_{i}. */ vecdot(&beta, it->y, d, n); beta /= it->ys;// multiplied by rho /* \gamma_{i+1} = \gamma_{i} + (\alpha_{j} - \beta_{j}) s_{j}. */ vecadd(d, it->s, it->alpha - beta, n); j = (j + 1) % m; /* if (++j == m) j = 0; */ }
Among them, the calculation methods of ys and yy are as follows:
copyvecdot(&ys, it->y, it->s, n);// Calculate the dot product vecdot(&yy, it->y, it->y, n); it->ys = ys;
The calculation method of bound and end is as follows:
copybound = (m <= k) ? m : k;// Judging whether there are enough m generations ++k; end = (end + 1) % m;
2.3.8. Line search strategy
A large number of line search strategies are involved in liblbfgs, and the main function of the line search strategy is to find the optimal step size. We will conduct a detailed analysis in the next topic.
3. supplement
3.1. Callback function
A callback function is a process of using a function pointer to call a function. The advantage of the callback function is that the specific calculation process is passed to the calling place in the form of a function pointer, so that the calling function can be expanded more conveniently.
Suppose there is a print_result function that needs to output the sum of two int-type numbers, then write it directly. If you need to change it to a difference, you have to modify it again; if a function pointer is passed in the parameter of the print_result function, the specific calculation process If implemented in this function, the function can be expanded without changing the print_result function, as in the following example:
- frame.h
copy#include <stdio.h> void print_result(int (*get_result)(int, int), int a, int b){ printf("the final result is : %d\n", get_result(a, b)); }
- process.cc
copy#include <stdio.h> #include "frame.h" int add(int a, int b){ return a + b; } int sub(int a, int b){ return a - b; } int mul(int a, int b){ return a * b; } int max(int a, int b){ return (a>b?a:b); } int main(){ int a = 1; int b = 2; print_result(add, a, b); print_result(sub, a, b); print_result(mul, a, b); print_result(max, a, b); return 1; }