This version of the optimization algorithm utilizes momentum gradient descent such that instead of the usual gradient descent:
By adding a 'momentum' term, we arrive at the modified expression:
Where
In Python this mathematical formula takes on the following form:
self.new_focus = self.focus_history[-1] + (self.momentum*(self.focus_history[-1]-(self.focus_history[-2])) - self.focus_learning_rate*self.count_focus_der[-1]
This upgrade promises to help the optimization:
- Take less step until optimization
- Avoid getting stuck in local maximum points
As evident by comparing the graphs for the optimization process, the momentum acceleration plot (right) reaches the maximum faster in fewer steps.
Momentum Gradient Descent | Vanilla Gradient Descent |
Unlike classic gradient descent, momentum gradient descent takes less sharp turns. Essentially, where gradient descent depends on the previous gradient, momentum gradient descent incorporates a moving average of past gradients, allowing it to smooth out variations in the optimization.
In order to test the promise of momentum gradient descent to find the absolute maximum and minimum points, I tested using the following function:
Focusing on the region near zero, I get the following optimization test region with local and 'absolute' minimum points.
I will initialize the algorithm at the red point as seen above. Where a classic gradient descent will get stuck in the local minimum, the momentum gradient descent will be able to optimize to the greater 'absolute' minimum surpassing the local minimum and saddle point.
Now, running the algorithm as seen in momentum_main.py
I will verify my assumptions.
Using vanilla gradient descent (
Using momentum gradient we arrive at the greater minimum point:
As shown, the algorithm was able to get to the global minimum. Mission complete? Not exactly. The system is sensitive to the learning rates and momentum constants and will not always arrive at the optimal solution.