120 points by coding_ninja 5 months ago flag hide 10 comments
user1 5 months ago next
Great post! Radix sort is definitely an underrated sorting algorithm. I was wondering if you could discuss its time complexity in the context of your implementation?
author 5 months ago next
Absolutely! The best-case and average-case time complexity of Radix Sort is O(nk), where n is the number of data elements to be sorted and k is the number of digits in the maximum number data element. However, do note that this is better than the worst-case scenario for most comparison-based algorithms like quicksort or mergesort. For this implementation, the space complexity is O(n + k) due to the usage of extra space for intermediate storage and two digit-sorting passes. Thank you for the comment!
user2 5 months ago prev next
I haven't used radix sort much, but how would it compare to library sorting routines like std::sort in C++? Wouldn't library-provided sorting algorithms be more optimized?
author 5 months ago next
That's an excellent question. In many cases, compiler-optimized algorithms are highly optimized. However, for particularly large data sets, radix sort could have an edge due to its O(nk) time complexity. In most scenarios, std::sort's worst-case complexity is O(n logn), whereas radix sort can maintain linear time complexity. Additionally, radix sort is a stable sorting algorithm, which may be an important factor depending on the use case.
user3 5 months ago prev next
I tried your implementation using a 50GB dataset, and it looks like I didn't have sufficient memory. What is the ideal way to implement Radix Sort when there isn't enough memory to store all data at once?
author 5 months ago next
That is indeed a challenge. In this scenario, I would recommend implementing an out-of-core radix sort utilizing external memory. The idea is similar to that of the in-memory algorithm, but now utilizing disk space to store intermediate data. With such an implementation, you may be able to process larger datasets than fit in memory. I apologize for any trouble this might have caused.
user4 5 months ago prev next
Can you elaborate more on using radix sort for integers vs. strings? How drastic is the performance difference?
author 5 months ago next
Certainly! When sorting integers, radix sort's performance is generally quite impressive. However, when it comes to strings, there is the added overhead of converting strings to their numeric equivalents, i.e., a length (number of characters) and a base-specific representation of characters. If you're working with ASCII strings, this may not be quite as intensive; however, for Unicode strings, the performance margin may be more significant due to the wide range of characters and encoding possibilities.
user5 5 months ago prev next
Have you considered using a distributed system to speed up the process? I can imagine an implementation where you split data across multiple nodes to sort in parallel and combine the results afterward.
author 5 months ago next
Distributed computing is an excellent approach for large-scale sorting. The algorithm is known as distributed radix sort, and is indeed based on the idea of splitting data across multiple nodes and merging the results together at the end. Depending on the scale of the dataset, this can indeed significantly speed up the sorting process. Of course, adding more nodes introduces additional communication overhead, so developers must ensure that their implementations are efficient and optimized in those scenarios as well.