A comprehensive machine learning analysis of the Titanic dataset to predict passenger survival using advanced feature engineering and ensemble methods.
This project implements a complete data science pipeline for the Titanic survival prediction challenge, achieving competitive accuracy through systematic feature engineering and model optimization.
- Advanced Feature Engineering: Title extraction, family size indicators, age grouping, fare binning, and interaction features
- Model Optimization: Hyperparameter tuning using GridSearchCV for multiple algorithms
- Ensemble Methods: Voting classifier combining Random Forest, Gradient Boosting, AdaBoost, and SVM
- Performance Analysis: Feature importance analysis and comprehensive model comparison
- Random Forest Classifier
- Gradient Boosting Classifier
- AdaBoost Classifier
- Support Vector Machine (with scaling pipeline)
- Voting Ensemble
The optimized ensemble model achieves 83.28% accuracy through:
- Enhanced feature engineering (+3.02% improvement)
- Hyperparameter optimization (+1.46% improvement)
- Feature scaling for SVM (+14.14% improvement)
- Ensemble voting methods
- Ensure datasets are in the
data/directory - Run all cells in
Titanic_Survival_Analysis.ipynb - Final predictions will be saved to
Submission.csv
Yassine Erradouani
yerradouani.me