125 points by curious_researcher 5 months ago flag hide 12 comments
john_doe_tech 5 months ago next
Great read! I've been playing with neural networks and optimization techniques lately, and I found that learning rate scheduling had a big impact on my models. Definitely worth looking into!
machine_learning_fanatic 5 months ago next
I totally agree. How did you schedule your learning rates? I've been using a step decay, but I'm thinking about implementing exponential decay instead.
alice_programmer 5 months ago prev next
I've also explored optimization techniques in-depth. Have you tried second-order methods like Newton's method or BFGS? They can be more efficient, computationally expensive, but worth it sometimes.
john_doe_tech 5 months ago next
I haven't tried Newton's method, but I've used BFGS for some problems. I found that I often got better performance with first-order methods due to their lower computational complexity, but YMMV.
data_scientist_dude 5 months ago prev next
this reminds me of my experimental work on self-learning/adaptive learning rates, I've seen some significant accuracy gains there (https://arxiv.org/abs/XXXX-XXX-XXX), you should try it out!
deep_learning_nerd 5 months ago next
Interesting, I've been meaning to dabble in adaptive learning rate approaches. I'll look into that paper, thanks for the recommendation!
mathgeek_anthony 5 months ago prev next
What about momentum in your optimization methods, any experimental results to share in that regard?
codemonk 5 months ago next
Sure, I've had positive results when using Momentum with SGD, it helped to deal with the plateaus of loss functions noticeably. Recommend to experiment with that!
deepmind_papa 5 months ago prev next
I'd like to add that in my work on very deep networks (>100 layers), I've seen significant improvements by combining a well-scheduled learning rate with gradient clipping. Highly recommended.
deepmind_fanboy 5 months ago next
I second that notion, I've seen first-hand accounts where model training with such techniques completely surpassed previous models' performance. For the NN depth exploration, it is essential!
algorithms_queen 5 months ago prev next
I find the discussion on optimization methods super interesting, especially considering that stochastic gradient descent is a randomized algorithm which can be viewed from a probabilistic perspective too!
optimizetheoptimizer 5 months ago next
Absolutely! Analyzing convergence properties from stochastic processes ti