250 points by mlrevolution 1 year ago flag hide 17 comments
johnsmith 1 year ago next
Fascinating approach! I've been following recent ML training developments closely and this seems like a real breakthrough. Anyone know how well it scales to larger data sets?
doctorai 1 year ago next
I had a chance to try it on a modest 50 GB data set and it fared really well. Half the training time, and 10% higher accuracy on average. It requires more memory, but it seems like a good trade-off.
superlearner 1 year ago next
Incredible results! Definitely going to investigate this more for my research team. Has anyone experimented with applying pruning techniques on top of this approach?
anonymous 1 year ago prev next
Would be interested in learning more about the implementation. Anyone know if the source is available?
undergrad_researcher 1 year ago next
Source code is available on GitHub, but be prepared - there's a steep learning curve. Head over to their documentation if you need help getting started: [url](https://github.com/org/project)
cscodewarrior 1 year ago prev next
The improvements in accuracy are promising, however, I'm a bit concerned about the added memory consumption and the potential impact on GPU utilization.
optifix 1 year ago next
So far I've seen this approach compared mainly to SGD, which it outperforms fairly handily. Since it's not a direct comparison to Adaptive methods, we'd need to factor in the time required to tune Adaptive methods for a fair apples-to-apples comparison.
professor 1 year ago next
That's not a problem since tuning efficiency is part of the paper's conclusions. It examines not only the final testing accuracy but also the time required to achieve that result. Time shouldn't be overlooked.
phd_student 1 year ago next
[url](https://github.com/org/project) provides several proxy environments to test limited hardware settings. Based on the results, it still demonstrates an accuracy improvement, but more extensive real-world tests are needed.
datascientist123 1 year ago next
That's really interesting. I'd be keen on reading a case study once available. Or even participating in a survey related to the constraints researchers face in terms of hardware resources.
researchroundup 1 year ago next
We are planning to publish a series of case studies and surveys in the coming months. Stay tuned for updates on arXiv and in our monthly newsletter!
mlpioneer 1 year ago prev next
Any chance we can see a comparison of this method against a perfectly tuned standard method like Adam, RMSprop, etc.?
jupyterjock 1 year ago next
In their initial experiments, the team tried using Adam as the baseline. The results have been updated and demonstrate this new approach to model training surpassing the performance of Adam as well.
scriptkiddie 1 year ago prev next
I'm curious about how this will perform on constrained hardware, as many researchers have limited resources. Has anyone tried it on a GTX 1650, for instance?
hackernightowl 1 year ago next
There haven't been many user tests on GTX 1650s, but a colleague running an ML club at a local high school has been running tests. Will post an update when they have more conclusive findings.
pseudocode00 1 year ago prev next
This is one of the most impressive improvements in ML training that I've seen in recent years. I can't wait to see how this community puts it to use!
accidentallinuxuser 1 year ago next
It's definitely caught my attention. Keep the information coming! Just implemented this on my MNIST dataset and saw a nice boost in my accuracy.