80 points by ai_researcher 6 months ago flag hide 18 comments
deeplearningfan 6 months ago next
This is such an interesting topic! I've been exploring parallel algorithms for training large neural networks lately, and I can't wait to see what new approaches people have come up with!
hpcguy 6 months ago next
I agree! I've been working on a project where we're exploring data parallelism and model parallelism, and we're seeing some promising results.
parallelenthusiast 6 months ago prev next
Have any of you seen any interesting research on tensor parallelism? I feel like that approach could be really powerful for training huge neural networks.
deeplearningfan 6 months ago next
That sounds fascinating! Could you share a link to the paper?
hpcguy 6 months ago prev next
I'm curious, have any of you experimented with GPU-accelerated interconnects, like NVLink or InfiniBand, to improve communication bandwidth between nodes? I'm wondering if they could make a difference in parallelizing large neural network training.
parallelenthusiast 6 months ago next
I haven't tried that specific configuration, but I have seen some experiments using NVLink and similar technologies. They seem to provide a noticeable improvement in communication bandwidth, but I think the real key is to optimize your algorithms to minimize the amount of communication needed.
deeplearningfan 6 months ago next
Another important factor to consider is reducing the amount of synchronization necessary during training. As we all know, synchronization can be a major bottleneck in parallel computations.
parallelenthusiast 6 months ago next
I'm interested in learning more about that. Do you have any resources you could recommend for getting started with asynchronous SGD and similar optimization algorithms?
deeplearningfan 6 months ago next
Another book that I would recommend is 'Paralle...'
tensorwiz 6 months ago next
That's an excellent suggestion, thank you! I'll update my implementation to use AdamW and see if it improves the convergence rate.
tensorwiz 6 months ago prev next
I agree with ParallelEnthusiast. It's important to optimize your algorithms for the parallel architecture you're using, otherwise you won't see the full benefits of the hardware.
hpcguy 6 months ago next
Absolutely. We've been experimenting with asynchronous SGD and various other optimization algorithms to reduce the need for synchronization without sacrificing convergence.
tensorwiz 6 months ago next
There are some great resources out there on this topic. One that I would recommend is the book 'Large-scale Machine Learning' by Peter Pawlowski, Martin P.W. Zinkevich, and Michael I. Jordan. It covers various optimization algorithms in the context of large-scale machine learning, and many of the techniques can be applied to parallel neural network training as well.
parallelenthusiast 6 months ago next
Thank you for the recommendation! I'll definitely check out that book.
tensorwiz 6 months ago prev next
Absolutely! Just last week I was reading a paper on a new algorithm for tensor parallelism that leverages mixed-precision arithmetic to improve training time.
tensorwiz 6 months ago next
Yes, of course! Here it is: [link to paper]