250 points by mlrevolution 6 months ago flag hide 17 comments
johnsmith 6 months ago next
Fascinating approach! I've been following recent ML training developments closely and this seems like a real breakthrough. Anyone know how well it scales to larger data sets?
doctorai 6 months ago next
I had a chance to try it on a modest 50 GB data set and it fared really well. Half the training time, and 10% higher accuracy on average. It requires more memory, but it seems like a good trade-off.
superlearner 6 months ago next
Incredible results! Definitely going to investigate this more for my research team. Has anyone experimented with applying pruning techniques on top of this approach?
anonymous 6 months ago prev next
Would be interested in learning more about the implementation. Anyone know if the source is available?
undergrad_researcher 6 months ago next
Source code is available on GitHub, but be prepared - there's a steep learning curve. Head over to their documentation if you need help getting started: [url](https://github.com/org/project)
cscodewarrior 6 months ago prev next
The improvements in accuracy are promising, however, I'm a bit concerned about the added memory consumption and the potential impact on GPU utilization.
optifix 6 months ago next
So far I've seen this approach compared mainly to SGD, which it outperforms fairly handily. Since it's not a direct comparison to Adaptive methods, we'd need to factor in the time required to tune Adaptive methods for a fair apples-to-apples comparison.
professor 6 months ago next
That's not a problem since tuning efficiency is part of the paper's conclusions. It examines not only the final testing accuracy but also the time required to achieve that result. Time shouldn't be overlooked.
phd_student 6 months ago next
[url](https://github.com/org/project) provides several proxy environments to test limited hardware settings. Based on the results, it still demonstrates an accuracy improvement, but more extensive real-world tests are needed.
datascientist123 6 months ago next
That's really interesting. I'd be keen on reading a case study once available. Or even participating in a survey related to the constraints researchers face in terms of hardware resources.
researchroundup 6 months ago next
We are planning to publish a series of case studies and surveys in the coming months. Stay tuned for updates on arXiv and in our monthly newsletter!
mlpioneer 6 months ago prev next
Any chance we can see a comparison of this method against a perfectly tuned standard method like Adam, RMSprop, etc.?
jupyterjock 6 months ago next
In their initial experiments, the team tried using Adam as the baseline. The results have been updated and demonstrate this new approach to model training surpassing the performance of Adam as well.
scriptkiddie 6 months ago prev next
I'm curious about how this will perform on constrained hardware, as many researchers have limited resources. Has anyone tried it on a GTX 1650, for instance?
hackernightowl 6 months ago next
There haven't been many user tests on GTX 1650s, but a colleague running an ML club at a local high school has been running tests. Will post an update when they have more conclusive findings.
pseudocode00 6 months ago prev next
This is one of the most impressive improvements in ML training that I've seen in recent years. I can't wait to see how this community puts it to use!
accidentallinuxuser 6 months ago next
It's definitely caught my attention. Keep the information coming! Just implemented this on my MNIST dataset and saw a nice boost in my accuracy.