Mallorn - Hierarchical Anomaly Detection for Astronomical Time-Series
I made a mistake in this competition by over-optimizing for the public leaderboard. My best public score was rank 62/800, but my private score ended up around 32. I shifted too much focus onto the LB when I should have trusted my local CV, which was giving the right signal the whole time. The methodology itself was solid. A single gradient booster on a heavily imbalanced dataset optimizes for majority-class accuracy and fails on the rare class. The fix was hierarchical: a high-recall broad filter first to eliminate obvious negatives, then specialized discriminators for the hard boundary cases (AGN vs TDE, SNe vs TDE). Hard Negative Mining, intentionally generating TDE lookalikes and forcing the model to learn those exact failure cases, made the classifier robust against its own blind spots.
Don't optimize for the public leaderboard. It's useful feedback, but only a partial subset of the data is used to calculate the score. For extreme class imbalances, architectural changes like hierarchical gating and hard negative mining are far more effective than hyperparameter tuning. Isolating a 5% rare class from lookalike noise is the same problem as fraud detection or manufacturing defect identification.
Built a 4-stage hierarchical XGBoost ensemble to detect extremely rare Tidal Disruption Events (TDEs) from noisy astronomical light curves. Engineered 512 domain-specific physical features and used Hard Negative Mining to handle lookalike noise. The pipeline matched the strict 5.14% target classification rate with an estimated Test F1 of 0.7169.
§1. The Domain & The Problem
Modern telescopes generate massive streams of time-series light curves. The goal is to identify Tidal Disruption Events (TDEs), rare instances of a black hole consuming a star.
TDEs are only ~5% of the data. Supernovae (SNe) and Active Galactic Nuclei (AGN) are extremely common and have highly similar light curves. A standard ML model gets overwhelmed by the majority classes and produces massive false positives on TDEs.
§2. The Mental Model & Trade-offs
Training a single gradient booster gave high overall accuracy but terrible precision on TDEs. The model couldn't distinguish a true TDE from a stochastic AGN.
Hierarchical Gating: Instead of one model, a multi-stage pipeline:
- Stage 1 (High-Recall Net): A broad model optimized to catch anything that might be a TDE, filtering out 45% of obvious noise.
- Stage 2 (Specialized Filters): Separate models trained specifically to discriminate AGN vs TDE and SNe vs TDE.
Hard Negative Mining: A low-depth, heavily biased model was intentionally trained to generate 100+ false positives (TDE lookalikes). These were extracted and forced into the final ensemble training so the model learns its own failure cases.
§3. The Architecture
Physics-Informed Features (512 features):
- Supernovae modeled using Bazin parametric fits (
bazin_tau_rise,bazin_chi2) - AGN stochasticity via Damped Random Walk states and Stetson J/K indices (
drw_tau,stoch_stetson_J) - Thermodynamic physics from multi-band color arrays (
phys_Edd_ratio,phys_T_peak)
KNN Confidence Prior: A Nearest Neighbors algorithm upweighted samples that clustered near 150 known high-confidence TDEs.
Compute: Entire pipeline on float32 arrays with cuda for fast cross-validation.