2nd Place Solution - Predicting Valentine's date likelihood from demographic and social attributes using a tuned gradient boosting ensemble
The challenge was to predict whether a person has a Valentine's date, based on demographic and social attributes — a binary classification problem evaluated on ROC-AUC. The signal-to-noise ratio was intentionally low, making feature engineering and model calibration the key differentiators.
With a noisy target and large dataset, careful preprocessing and feature construction were essential to extract signal before model training.
Survey_Date into month, hour, and day-of-week componentsThree gradient boosting frameworks were tuned independently, then combined into a rank-based ensemble.
Each model was tuned with 50 Optuna trials, optimising ROC-AUC via cross-validation:
from scipy.stats import rankdata
# Get OOF probability predictions from each tuned model
lgb_preds = lgb_model.predict_proba(X_test)[:, 1]
xgb_preds = xgb_model.predict_proba(X_test)[:, 1]
cat_preds = cat_model.predict_proba(X_test)[:, 1]
# Rank-based ensemble: convert to ranks before averaging
# Reduces sensitivity to outlier probability estimates
ensemble = (
rankdata(lgb_preds) +
rankdata(xgb_preds) +
rankdata(cat_preds)
) / 3
Rank-based ensembling normalises each model's output before averaging, making it more robust to differences in probability calibration across models.