Ongoing participation in monthly Kaggle Playground Series competitions, practicing machine learning techniques on diverse datasets. Click on each month to see competition details.
Binary classification challenge predicting loan default probability. The dataset contained 593,994 training samples with 11 features and significant class imbalance (80/20 split). Evaluated using AUC-ROC metric.
Rank: #950 / 3,850 participants (Top 25%)
Final AUC: 0.92450
Approach: Single LightGBM with Optuna-optimized hyperparameters and multi-seed averaging
Phase 1: Cross-Validation Foundation SUCCESS
Phase 2: Feature Engineering FAILED
Phase 3: Model Ensembling FAILED
Phase 4: Multi-Seed Averaging & External Data SUCCESS
def objective(trial):
params = {
'n_estimators': trial.suggest_int('n_estimators', 100, 1000),
'max_depth': trial.suggest_int('max_depth', 3, 12),
'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
'num_leaves': trial.suggest_int('num_leaves', 20, 150),
'min_child_samples': trial.suggest_int('min_child_samples', 5, 100),
'subsample': trial.suggest_float('subsample', 0.6, 1.0),
'colsample_bytree': trial.suggest_float('colsample_bytree', 0.6, 1.0),
}
model = lgb.LGBMClassifier(**params, random_state=42, verbose=-1)
scores = cross_val_score(model, X, y, cv=skf, scoring='roc_auc')
return scores.mean()
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=30)
Prioritized deep understanding over leaderboard optimization. Validated information-theoretic principles: external data provided genuinely new information while engineered features were redundant transformations the model already learned through splits.
December 2025 competition details will be added once the competition is complete.
For each monthly competition, I follow a systematic workflow:
Regular participation in Kaggle competitions has helped develop practical skills in: