1234 points by alex_c 2 years ago flag hide 26 comments
john_doe 2 years ago next
This is fascinating! The new architecture seems to achieve state-of-the-art results on multiple benchmarks. How does it compare to other popular architectures?
erica_martin 2 years ago next
Congratulations to the authors! It would be interesting to see this in practice. Do you have any code or demo available?
jane_doe 2 years ago next
I'd like to add to Erica's question. What kind of hardware do you recommend for running the training?
john_doe 2 years ago next
For serious results, TPUs offer the best performance, though GPUs are more accessible.
chip_innovator 2 years ago next
Once the 3rd generation TPUs are available for everyone, it should be more affordable for many.
technical_blogger 2 years ago next
Hopefully, the broader rollout of new TPUs will ease the burden of high energy consumption in deep learning.
ai_researcher 2 years ago prev next
It outperforms previous architectures by a significant margin. The modularity improvement and the introduction of the sparse attention mechanism allow for improved generalization and fewer parameters.
ai_researcher 2 years ago next
The code and demo will be available in our repository within the next few days.
code_monkey 2 years ago next
Any idea how much the training of this network is going to cost? Just curious.
ai_researcher 2 years ago next
It's difficult to give an exact estimate, but expect an average high-end GPU to cost around $5-10k in electricity for training.
bob_green 2 years ago next
This high energy consumption is unacceptable. We need better solutions ASAP.
code_wizard 2 years ago prev next
I'm looking forward to seeing how this can be adapted to text data.
theoretical_thinker 2 years ago next
While the concept is groundbreaking for images, extending it to text would require fundamentally different techniques. I'm excited to see the developments.
new_coder 2 years ago prev next
As a beginner in neural networks, can anyone recommend resources for understanding this architecture?
simon_hack 2 years ago prev next
Don't forget to add this to TensorFlow or PyTorch. Makes it more accessible.
the_architect 2 years ago next
We have already integrated this architecture into a TensorFlow fork, and PyTorch support will be available soon as well.
code_experiment 2 years ago next
Please post an update when the PyTorch implementation is ready. I'm excited to test it out.
katherine_bliss 2 years ago prev next
I think it's important to recognize the advancements in this research, but we should also be aware of the potential implications - in particular, the environmental impact of this kind of computing power.
deep_learning 2 years ago next
Yes, it's crucial to consider energy consumption and find ways to optimize. There's ongoing research in this area as well.
tensor_flowy 2 years ago next
Optimizing energy consumption is an ongoing process. Keep an eye on our future updates with better optimizations.
richard_stacks 2 years ago prev next
Any plans to tackle video data as well? I think this architecture would be awesome for videos.
ai_researcher 2 years ago next
We're planning to apply this architecture for video data, but it's still under development.
machine_vision 2 years ago next
I'll follow the updates on the video data application. I also have a few techniques that might come in handy.
beth_logic 2 years ago prev next
Data sets used for testing will be released as well?
katherine_bliss 2 years ago next
Thanks! That would be appreciated. If you could include carbon emissions as well, that would be very helpful.
futuristic 2 years ago next
We're working on better carbon emissions tracking for our training processes.