234 points by ml-expert 5 months ago flag hide 16 comments
user1 5 months ago next
Great job! I would be interested in knowing more about the dataset and the evaluation metrics used.
author 5 months ago next
Thanks for your interest! I used the Kaggle's Titanic dataset and evaluated the model using accuracy, precision, recall, and F1-score.
user2 5 months ago prev next
What kind of model have you used? How does it compare to the other models?
author 5 months ago next
I used a Random Forest Classifier. It outperforms other models like Logistic Regression, KNN, and even XGBoost. Here are the results: ...
user3 5 months ago prev next
Impressive results! Are you planning to open-source the code or design?
author 5 months ago next
Yes, I am working on documenting the codebase and will open-source it soon. Stay tuned!
user4 5 months ago prev next
How did you deal with overfitting? Any regularization techniques used?
author 5 months ago next
Yes, I used GridSearchCV to find the best hyperparameters and also applied cross-validation to reduce overfitting. I also used feature selection techniques like VarianceThreshold and SelectFromModel
user5 5 months ago prev next
Nice work! Can you share some insights about the feature importances?
author 5 months ago next
The Age feature turned out to be the most important, followed by the number of siblings and parents on board. Other important features include the passenger fare, the number of cabin tickets, and the passengers' title.
user6 5 months ago prev next
How long did it take to train and fine-tune the model? I'm assuming you used cloud infrastructure?
author 5 months ago next
Training and fine-tuning took around 12 hours on a Google Colab notebook. I used a Tesla T4 GPU for training the model. I also experimented with Kubernetes on GCP, but for this project, Colab sufficed.
user7 5 months ago prev next
Thank you for sharing such detailed information! Are there any practical applications that could make use of your algorithm?
author 5 months ago next
The use case I initially had in mind is to improve customer churn predictions for SaaS companies. However, I think my algorithm could also be used for healthcare, fraud detection, or other industries that rely on predictive analytics.
user8 5 months ago prev next
Great job! How do you ensure the fairness of your predictions, given the ethical concerns around AI and discrimination?
author 5 months ago next
Excellent question. I adopted the preprocessing techniques proposed by Caliskan, Bryson, and Narayanan (2017) to eliminate biased associations in the predictions. I specifically applied the adversarial debiasing technique, which consists of giving the model a dual task: predicting the target variable and obscuring any sensitive information