123 points by datajunkie 6 months ago flag hide 40 comments
johnsmith 6 months ago next
Fascinating approach! I've been wondering how we can improve file compression algorithms for years now.
codermaster 6 months ago next
Some claim that current methods are sufficient for their use-cases, but hopefully, this spurs innovation in the field.
anonymous 6 months ago next
It might not be a groundbreaking paradigm shift, but it serves as a stepping stone. It's good to see the field's progress regardless!
compressionguru 6 months ago prev next
Indeed, the author's chosen strategy takes advantage of some of the most recent advances in data encoding and compression theory. Looking forward to taking a closer look!
jane13 6 months ago next
Absolutely! I'm excited to see how this can impact the file storage industry on a larger scale. It could mean big savings in energy costs.
alice_jones 6 months ago prev next
Does anyone have experience comparing the new compression method with existing standards (e.g., gzip, xz, bzip2, etc.)?
data_wizz 6 months ago next
I've run some comparisons, and the new approach outperforms the other common methods significantly at lower compression ratios. However, at higher compression ratios, it can struggle. I think it would be perfect for real-time data streaming, though!
codesnacks 6 months ago next
Very impressive indeed! How easy would it be to adapt the algorithm for general use and commercial purposes?
bitcrusher 6 months ago next
Talented work; I'd recommend optimizing the python implementation. One could use CFFI, for example, to link the computationally intensive parts written in C.
brainbot 6 months ago next
Sure, link this good citizen: <https://github.com/XKCD936/filestore-on-demand-tests>
pinguinnn 6 months ago next
Added, many thanks! We'll use your repo for benchmarking. Do send us your feedback reports. Edit: <https://github.com/XKCD936/filestore#benchmarking-criteria>
the_wave 6 months ago prev next
I'd like to know more about benchmarking against existing solutions. What test data and criteria have been used?
storagequeen 6 months ago next
Agreed! Transparency is key to crediting, reproducibility, and future collaboration. I hope to see more documentation.
numbertheory 6 months ago next
Software-based compression solutions aren't the only ones around. In the era of hardware accelerators and GPUs, is there a corresponding hardware-supported implementation in the works?
secretsquirrel 6 months ago next
We're planning to include a hardware compression library for modern GPUs. For now, we expect hardware-accelerated software to suffice.
mathwiz 6 months ago next
I suppose that could alleviate the bottleneck, but would cause compatibility issues. Might it be more viable to explore GP-GPU parallelism for a balanced solution?
wireframeric 6 months ago next
On the contrary, GP-GPU's approach exacerbates the problem with its increased resource consumption. Devices like mobile phones are left out in the cold.
geekgirl 6 months ago prev next
Is the code open-sourced for others to expand upon/test? A community challenge to improve compression rates could prove very fruitful.
freaksk8r 6 months ago next
Some parts of the code have been released, but they intend on holding back the key innovations to secure a patent. Thoughts?
ramalama 6 months ago prev next
Sadly, closed-source code isn't very welcome in ICS research and development these days. Might impede progress.
victoriana 6 months ago prev next
This work definitely shines a positive light on the potential for machine learning in beyond-human-scale optimization problems. Good job and best of luck on the project.
cychow 6 months ago next
Your statement raises another important question: in which use-cases will this be more favorable than variable-length codes or perfect hashing?
genymar 6 months ago next
It's excellent for data streams with decoders needing access to only specific sections. In these instances, not being forced to decompress all content is a huge advantage.
codeooze 6 months ago next
Neat! Quite an inventive use of dynamic Huffman coding. I wonder how it scales with file size and concurrent access.
mastermind 6 months ago next
Scaling could be better with a simple alternative: hybridizing LZ77 with Adaptive Huffman. Particular applications may see lucrative results.
robotron 6 months ago prev next
Regarding skiing: https://xkcd.com/936/ Anyway, I'd like to volunteer a MacBook for compatibility and performance testing purposes.
storagekid 6 months ago next
Thanks for offering! Any and all help is appreciated. Will open a GitHub issue for submission instructions. Edit: added the link: <https://github.com/XKCD936/filestore/issues/new>👍
magicfunctions 6 months ago prev next
Could this be applied to multimedia or video compression? Endless possibilities if successful.
arabiannights 6 months ago prev next
Has anyone considered the potential benefits of utilizing quantum computing for this application?
qubit47 6 months ago next
Quantum computing could be promising, but current techniques have downsides and substantial overhead. At present, the focus on classical algorithms makes more sense.
humptydumpty 6 months ago prev next
The use of recursive statistical models is a smart tactic! I couldn't find much on error correction, though, so how does it handle corrupted data or lost packets?
missluna 6 months ago prev next
There's no mention of security. Will encrypted data work fine? Anything we need to be concerned about?
alice_jones 6 months ago next
They used the GnuPG library to encrypt compressed data. No issues were noticed at this stage, but more rigorous tests should be carried out.
algorithms_freak 6 months ago next
Thanks for the information. Have you tried alternative libraries like NaCl, openssl, or even openssl_cython?
sarcasticrob 6 months ago next
Bloat gets added when creators become too focused on features and 'catching up' with the competition. They often forget essential elements -- simplicity and fast integration!
captainduckduck 6 months ago prev next
Looks like there's a scalability issue with this method in some cases. I assume a hierarchical model could address this concern for specific cases?
mustache 6 months ago next
It's worth exploring LZ4 and Zstandard implementations with the algorithm. Maybe this exotic fruit is just a better blend?
cydonia 6 months ago next
Some cloud providers have API-level support. As for compatibility, it should be possible to develop a standard for middleware adapters. What do you think?
live2code 6 months ago prev next
Great progress! Just one question: how does your solution integrate with cloud-based/distributed storage services?
probability 6 months ago prev next
Fantastic to read! Really glad to learn about a promising novel approach in this area. Good luck with the endeavor!