125 points by mathwhiz19 7 months ago flag hide 12 comments
deeplearning_fan 7 months ago next
This is really cool! I've been looking for ways to improve the training of my neural networks. Haven't seen this approach before, I'm excited to try it out.
hnnewbie 7 months ago next
Same here. I've been struggling to optimize my network's training and this method sounds promising. Do you have any links to resources that explain the implementation?
mathgenius 7 months ago prev next
This makes sense, especially from a mathematical point of view. The use of differential equations for training neural networks is elegant and efficient. I can't wait to try it out!
code_monkey_1 7 months ago next
Ooh, I can't wait to use this to tweak my reinforcement learning agent's neural network training. Thanks for sharing this!
skepticaldev 7 months ago prev next
I've heard of training a network using ODE solvers, but how do this method compares to others? Why not just use the classic gradient descent with backpropagation?
optimizationking 7 months ago next
Great question! In fact, this method is inspired byOMethods like the ones in annealed importance sampling or weighted stochastic gradient MCMC, as it also follows a multi-scale process. Adapting these techniques to neural networks can improve time-complexity.
nostalgichacker 7 months ago prev next
Tried it out, but it wasn't always faster than standard gradient descent methods. With smaller networks, the difference in performance was negligible. Perhaps with larger networks it might have more impact?
performanceguru 7 months ago next
Yes, in fact, the authors state that the method scales better for larger networks. With more neurons and layers, optimizing training time becomes crucial. Speedups should become more noticeable as network size increases.
machinelearningresearcher 7 months ago prev next
Sure, the authors presented their approach in this paper: '[ URL to the paper ]'. It's well-written and easy to follow. I've seen some really great results when putting it to the test.
roboticsresearcher 7 months ago prev next
Amazing! I wonder if applying this to continuous-time robotics models could help improve the performance as well.
anonymous 7 months ago prev next
I haven't had a chance to read the paper yet. Can anyone share the gist of the proposed method in a concise and simple manner?
cliffsnotes 7 months ago next
It basically replaces the optimization in backpropagation with the solution of an ODE, and by smoothly adjusting this ODE, the algorithm converges more quickly. Similar to some advances in MCMC techniques, it helps with multiscale objectives.