QuantFeat - Production Python Tooling for Quantitative Finance
Copy-pasting feature engineering functions across notebooks creates version control debt. If you find a bug in a formula in one notebook, it doesn't get fixed in the others. Treating the research environment like a production codebase and abstracting all core math into a single tested package solves this. Keeping the scope strictly to feature engineering and EDA avoids dependency bloat and keeps the package's purpose clear.
Python packaging mechanics, dependency management, and writing clean documentation. Building reusable tools that multiply the output of the whole team is what platform engineering is about.
Authored and published a production-ready Python package (`quantfeat`) for quantitative financial research. Automates EDA, returns calculation, and advanced volatility estimation, abstracting complex financial math into a clean, reusable API distributed via PyPI.
§1. The Domain & The Problem
Quantitative modeling requires clean, stationary time-series data. Features like rolling returns and volatility are central to statistical arbitrage and algorithmic trading.
Writing the same boilerplate math across multiple Jupyter notebooks to calculate advanced drift-independent volatility metrics introduces human error and slows down the research phase.
§2. The Mental Model & Trade-offs
Copy-pasting feature engineering functions from old projects into new ones led to version control issues. A bug found in a statistical formula in one notebook wasn't getting fixed in the others.
Centralized Tooling: Treated the research environment like a production system. All core math was abstracted into a single tested Python package so any new project can just pip install quantfeat.
Scope: Deliberately excluded ML models from the package. quantfeat stays strictly focused on feature engineering and EDA, keeping it lightweight and purely mathematical.
§3. The Architecture
Four analytical modules:
quantfeat.volatility: Range-based estimators (Parkinson, Garman-Klass, Rogers-Satchell, Yang-Zhang). Using High/Low/Open/Close prices extracts significantly more statistical efficiency than close-only approaches.quantfeat.returns: Simple, logarithmic, lagged, and rolling returns with temporal shift handling.quantfeat.eda: One function call (perform_quantitative_eda) instantly profiles price/volume statistics and generates correlation heatmaps.quantfeat.convert_data: Utilities to resample raw tick data to target frequencies (e.g., strict 1H intervals).