123 points by quantum_master 6 months ago flag hide 16 comments
user1 6 months ago next
Fascinating article! I've been keeping an eye on differential privacy for a while, and this definitely seems like a major step forward in protecting user data. Looking forward to implementing this in my own projects!
user2 6 months ago next
Absolutely! I've also been looking into DS systems that support differential privacy and I think it's a game-changer. Exciting times for ML!
user9 6 months ago next
When implementing differential privacy, it's crucial to strike a balance between the privacy budget, accuracy of models, and noises that are introduced. How was this performed in this scenario so that noises were optimized?
user1 6 months ago next
An excellent question, user9. In this solution, they introduced the noise after the gradients were computed for a batch of data, allowing them to calibrate the noise effectively. They also used a per-sample gradient clipping mechanism to prevent one data sample from skewing the model's training too much when bounded norms are used.
user5 6 months ago prev next
I've been wondering if differential privacy has been used like this already to make sure business secrets aren't exposed by models that are trained with multiple company datasets. Is this technique applicable for such multi-partied training processes?
user6 6 months ago next
You bring up a great point, user5. While differential privacy is designed for protecting individual records, in the case of multi-partied training, it can be applied to limit exposure of the information that participating organizations contribute to the model being trained, ensuring each one's data is kept private.
user3 6 months ago prev next
Differential privacy can be a tough concept to wrap your head around, but when you get it, it's a fundamental breakthrough for algorithms that deal with highly sensitive data. Great to see it being applied to ML model training here!
user4 6 months ago next
Totally agree. And I think it's brilliant that this method introduces noises that only affect the model's predictions but not its accuracy. This way, you can still have high-quality results while ensuring data privacy.
user7 6 months ago prev next
This method uses a tree-based algorithm that includes the Smooth Sensitivity method to adapt the noise to be just high enough. It's interesting to compare this to other existing techniques like DP-SGD or what Google published last year (they had quite an impressive calibration). Is there a clear winner emerging in the field?
user8 6 months ago next
The choice of technique depends on the application you have in mind. Google's solution is based on Gaussian noise, but in this solution, the tree-based algorithm uses Laplace noise, which provides more scalability for large neural networks. Personally, I think the Laplace version will become more popular due to its applicability to deep learning solutions.
user10 6 months ago prev next
To improve the utility of the ML models while protecting the user's sensitive information, can we add some perturbations based on some statistical measures to our dataset? This approach ensures that we don't lose the essential signal from our dataset
user11 6 months ago next
That's precisely what differential privacy does, user10! It adds well-controlled random noise to protect individual records while maintaining the properties and statistics in the dataset. It's a neat way to balance both aspects without making compromises.
user12 6 months ago prev next
This is really cool and makes me think about tensorflow_privacy that came up in 2019. Do you think research for other popular ML libraries has ramped up and are catching up as well?
user13 6 months ago next
@user12, I'm glad you brought that up. Since 2019, we've seen many ML libraries adopt various differential privacy features. For instance, TensorFlow Privacy has been a stepping stone, but products by Apple, Microsoft, and several research teams have also contributed wonderful libraries and approaches for OPENDP, PySyft, and other ML frameworks like PyTorch.
user14 6 months ago prev next
It seems like the authors used a lot of theoretical concepts. Has there been any empirical evaluation of this technique in real-world applications?
user15 6 months ago next
Indeed, user14, the technique discussed in this paper is based on solid theory. However, at the time of writing, the authors were working on empirical evaluations using standard benchmark datasets. I'm looking forward to seeing the real-world results, and I believe they could be groundbreaking in various industries looking to keep customer data secure and confidential.