GitHub - tom-bleher/ALLS-Multivariable-Momentum-Gradient-Descent-: Multivariable optimization for count using momentum gradient descent

Project description

This version of the optimization algorithm utilizes momentum gradient descent such that instead of the usual gradient descent:

$$W_{n+1}=W_{n}-\gamma \nabla f(W_{n})$$

By adding a 'momentum' term, we arrive at the modified expression:

$$W_{n+1}=W_{n}+\beta(W_{n}-W_{n-1})-\gamma \nabla f(W_{n})$$

Where $\gamma$ is the learning rate as previously seen in the classic gradient descent algorithm, and $\beta$ is the new momentum constant.

In Python this mathematical formula takes on the following form:

self.new_focus = self.focus_history[-1] + (self.momentum*(self.focus_history[-1]-(self.focus_history[-2])) - self.focus_learning_rate*self.count_focus_der[-1]

This upgrade promises to help the optimization:

Take less step until optimization
Avoid getting stuck in local maximum points

Faster optimization

As evident by comparing the graphs for the optimization process, the momentum acceleration plot (right) reaches the maximum faster in fewer steps.


Momentum Gradient Descent		Vanilla Gradient Descent

Unlike classic gradient descent, momentum gradient descent takes less sharp turns. Essentially, where gradient descent depends on the previous gradient, momentum gradient descent incorporates a moving average of past gradients, allowing it to smooth out variations in the optimization.

Absolute optimization

In order to test the promise of momentum gradient descent to find the absolute maximum and minimum points, I tested using the following function: $$C(f,\phi_{2})=(0.1(f+\phi_{2}))^{2}\cdot \sin(0.01(f+\phi_{2}))$$ When plotting it we see it takes on the form:

Focusing on the region near zero, I get the following optimization test region with local and 'absolute' minimum points.

I will initialize the algorithm at the red point as seen above. Where a classic gradient descent will get stuck in the local minimum, the momentum gradient descent will be able to optimize to the greater 'absolute' minimum surpassing the local minimum and saddle point.

Now, running the algorithm as seen in momentum_main.py I will verify my assumptions.

Using vanilla gradient descent ($\beta=0$), we arrive at the local minimum point as expected:

Using momentum gradient we arrive at the greater minimum point:

As shown, the algorithm was able to get to the global minimum. Mission complete? Not exactly. The system is sensitive to the learning rates and momentum constants and will not always arrive at the optimal solution.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Media		Media
images		images
.gitignore		.gitignore
README.md		README.md
dazzler_parameters.txt		dazzler_parameters.txt
dm_parameters.txt		dm_parameters.txt
momentum_gradient_descent_optimization.py		momentum_gradient_descent_optimization.py
test_momentum_gradient_descent_optimization.py		test_momentum_gradient_descent_optimization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project description

Faster optimization

Absolute optimization

About

Releases

Packages

Languages

tom-bleher/ALLS-Multivariable-Momentum-Gradient-Descent-

Folders and files

Latest commit

History

Repository files navigation

Project description

Faster optimization

Absolute optimization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages