Tools
Portofolio_maximizer
ML for Quantitative trading
Install
pip install -r
Configuration Example
# config/forecasting_config.yml (lines 98-115)
regime_candidate_weights:
CRISIS:
- {sarimax: 0.23, samossa: 0.72, mssa_rl: 0.05}
MODERATE_MIXED:
- {sarimax: 0.05, samossa: 0.73, mssa_rl: 0.22}
MODERATE_TRENDING:
- {sarimax: 0.05, samossa: 0.90, mssa_rl: 0.05}
README
# Portfolio Maximizer โ Autonomous Profit Engine
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](Documentation/EXIT_ELIGIBILITY_AND_PROOF_MODE.md)
[-success.svg)](tests/)
[](Documentation/)
[](#-research--reproducibility)
> End-to-end quantitative automation that ingests data, forecasts regimes, routes signals, and executes trades hands-free with profit as the north star.
**Version**: 4.2
**Status**: Phase 7.9 In Progress - Cross-session persistence, proof-mode validation, UTC normalization
**Last Updated**: 2026-02-09
---
## ๐ฏ Overview
Portfolio Maximizer is a self-directed trading stack that marries institutional-grade ETL with autonomous execution. It continuously extracts, validates, preprocesses, forecasts, and trades financial time series so profit-focused decisions are generated without human babysitting.
### Current Phase & Scope (Jan 2026)
**Phase 7.8 Complete** - All-Regime Weight Optimization:
- **3/6 regimes optimized** with SAMOSSA-dominant weights:
- **CRISIS**: 60.69% RMSE improvement (17.15 โ 6.74), 72% SAMOSSA
- **MODERATE_MIXED**: 6.30% improvement (17.63 โ 16.52), 73% SAMOSSA
- **MODERATE_TRENDING**: 65.07% improvement (20.86 โ 7.29), 90% SAMOSSA
- **Key Finding**: SAMOSSA dominates ALL regimes (72-90%), contradicting initial GARCH hypothesis
- **Method**: Rolling cross-validation with scipy.optimize.minimize (3+ years of AAPL data)
- **Validation**: 2/20 holdout audits complete
**Phase 7.9 In Progress** - Holdout Audit Accumulation:
- Current: 2/20 audits complete
- Target: 20 audits for production deployment decision
- 3 regimes not optimized (insufficient samples): HIGH_VOL_TRENDING, MODERATE_RANGEBOUND, LIQUID_RANGEBOUND
**System Architecture**:
- Regime-aware ensemble routing with adaptive model selection
- 4 forecasting models: SARIMAX, GARCH, SAMOSSA, MSSA-RL
- Quantile-based confidence calibration (Phase 7.4)
- Rolling cross-validation optimization framework
- Comprehensive logging with phase-organized structure
### Key Features
- **๐ Intelligent Caching**: 20x speedup with cache-first strategy (24h validity)
- **๐ Advanced Analysis**: MIT-standard time series analysis (ADF, ACF/PACF, stationarity)
- **๐ Publication-Quality Visualizations**: 8 professional plots with 150 DPI quality
- **๐ Robust ETL Pipeline**: 4-stage pipeline with comprehensive validation
- **โ
Comprehensive Testing**: 141+ tests with high coverage across ETL, LLM, and integration modules
- **โก High Performance**: Vectorized operations, Parquet format (10x faster than CSV)
- **๐ง Modular Orchestration**: Dataclass-driven pipeline runner coordinating CV splits, neural/TS stages, and ticker discovery with auditable logging
- **๐ Resilient Data Access**: Hardened Yahoo Finance extraction with pooling to reduce transient failures
- **๐ค Autonomous Profit Engine**: `scripts/run_auto_trader.py` keeps the signal router + trading engine firing so positions are sized and executed automatically
---
### Latest Enhancements (Jan 2026)
**Phase 7.8 Achievements**:
- All-regime weight optimization (3/6 regimes) with ~60-65% RMSE improvement for CRISIS/MODERATE_TRENDING and +6.30% for MODERATE_MIXED
- SAMOSSA dominance finding: 72-90% across ALL optimized regimes
- CRISIS regime optimization contradicts initial GARCH hypothesis
- Updated configuration files with data-driven weights
- Comprehensive documentation: [PHASE_7.8_RESULTS.md](Documentation/PHASE_7.8_RESULTS.md)
**Phase 7.7 Achievements**:
- Per-regime weight optimization framework established
- Organized log directory structure with phase-specific subdirectories
- Automated log organization script ([bash/organize_logs.sh](bash/organize_logs.sh))
**Infrastructure Improvements**:
- ENSEMBLE DB migration: CHECK constraint updated, busy_timeout for write resilience
- Enhanced confidence scoring with model key canonicalization
- SQLite read-only connections with immutable URI mode (WSL/DrvFS robustness)
- Position-based forecast alignment fallback for calendar vs business day handling
- Regime detection feature flag with instant enable/disable capability
## Academic Rigor & Reproducibility (MIT-style)
- **Traceable artifacts**: Log config + commit hashes alongside experiment IDs; keep hashes for data snapshots and generated plots (`logs/artifacts_manifest.jsonl` when present).
- **Deterministic runs**: Set and record seeds (`PYTHONHASHSEED`, RNG, hyper-opt samplers, RL) for every reported experiment; prefer config overrides over ad hoc flags.
- **Executable evidence**: Each figure/table used for publication should have a runnable script/notebook (target: `reproducibility/` folder) that regenerates it from logged artifacts.
- **Transparency**: Document MTM assumptions, cost models, and cron wiring in experiment notes; link back to `Documentation/RESEARCH_PROGRESS_AND_PUBLICATION_PLAN.md` for the publication plan and replication checklist.
- **Archiving plan**: Package replication bundles (configs, logs, plots, minimal sample data) for Zenodo/Dataverse deposit before submitting any paper/thesis.
---
## ๐ Table of Contents
- [Architecture](#-architecture)
- [Installation](#-installation)
- [Quick Start](#-quick-start)
- [Phase 7.8 Results](#-phase-78-results-all-regime-optimization)
- [Phase 7.9 Status](#-phase-79-cross-session-persistence--proof-mode)
- [Usage](#-usage)
- [Project Structure](#-project-structure)
- [Performance](#-performance)
- [Testing](#-testing)
- [Documentation](#-documentation)
- [Research & Reproducibility](#-research--reproducibility)
- [Contributing](#-contributing)
- [License](#-license)
---
## ๐๏ธ Phase 7.8 Results: All-Regime Optimization
### Key Results
**3/6 Regimes Optimized** with SAMOSSA-dominant weights:
| Regime | Samples | Folds | RMSE Before | RMSE After | Improvement | Optimal Weights |
|--------|---------|-------|-------------|------------|-------------|-----------------|
| **CRISIS** | 25 | 5 | 17.15 | 6.74 | **+60.69%** | 72% SAMOSSA, 23% SARIMAX, 5% MSSA-RL |
| **MODERATE_MIXED** | 20 | 4 | 17.63 | 16.52 | +6.30% | 73% SAMOSSA, 22% MSSA-RL, 5% SARIMAX |
| **MODERATE_TRENDING** | 50 | 10 | 20.86 | 7.29 | **+65.07%** | 90% SAMOSSA, 5% SARIMAX, 5% MSSA-RL |
### Major Finding: SAMOSSA Dominance
**SAMOSSA dominates ALL optimized regimes (72-90%)**, contradicting initial hypothesis that GARCH would be optimal for CRISIS regime.
- Pattern recognition outperforms volatility modeling across all market conditions
- CRISIS regime: SAMOSSA (72%) + SARIMAX (23%) provides best defensive configuration
- MODERATE_TRENDING: Confirms Phase 7.7 results with 2x sample size validation
### Configuration Updates
```yaml
# config/forecasting_config.yml (lines 98-115)
regime_candidate_weights:
CRISIS:
- {sarimax: 0.23, samossa: 0.72, mssa_rl: 0.05}
MODERATE_MIXED:
- {sarimax: 0.05, samossa: 0.73, mssa_rl: 0.22}
MODERATE_TRENDING:
- {sarimax: 0.05, samossa: 0.90, mssa_rl: 0.05}
```
### Regimes Not Optimized (Insufficient Samples)
| Regime | Reason | Recommendation |
|--------|--------|----------------|
| **HIGH_VOL_TRENDING** | Rare in AAPL 2024-2026 data | Test with NVDA (higher volatility) |
| **MODERATE_RANGEBOUND** | Rare in trending market | Use default weights |
| **LIQUID_RANGEBOUND** | Very rare (stable markets) | Use default weights |
**Full Results**: [Documentation/PHASE_7.8_RESULTS.md](Documentation/PHASE_7.8_RESULTS.md)
---
## ๐ Phase 7.9: Cross-Session Persistence & Proof Mode
### Objective
Establish reliable round-trip trade execution with cross-session position persistence, enabling profitability validation and holdout audit accumulation.
### Current Status
- **Closed trades**: 30 validated (proof-mode TIME_EXIT)
- **Holdout audits**: 9/20 (forecast audit gate active at 25% max violation rate)
- **UTC normalization**: Complete across execution and persistence layers
- **Frequency compatibility**: Deprecated pandas aliases (`'H'` -> `'h'`) resolved
### Key Components
- **Cross-session persistence**: `portfolio_state` + `portfolio_cash_state` tables via `--resume`
- **Proof mode** (`--proof-mode`): Tight max_holding (5d/6h), ATR stops/targets, flatten-before-reverse
- **Audit sprint**: `bash/run_20_audit_sprint.sh` with gate enforcement (forecast, quant health, dashboard)
- **UTC timestamps**: `etl/timestamp_utils.py` (`ensure_utc()`, `utc_now()`, `ensure_utc_index()`)
### Validation Commands
```bash
# Run proof-mode audit sprint
PROOF_MODE=1 RISK_MODE=research_production bash bash/run_20_audit_sprint.sh
# Check closed trades
python -c "
import sqlite3
conn = sqlite3.connect('data/portfolio_maximizer.db')
closed = conn.execute('SELECT COUNT(*) FROM trade_executions WHERE realized_pnl IS NOT NULL').fetchone()[0]
print(f'Closed trades with realized PnL: {closed}')
conn.close()
"
```
### Success Criteria
- [x] Cross-session position persistence working
- [x] Proof mode creates guaranteed round trips
- [x] UTC-aware timestamps across all layers
- [ ] 20/20 holdout audits accumulated
- [ ] Forecast audit gate violation rate < 25%
### Phase 7.10: Production Deployment (Future)
Prerequisites:
- 20/20 audits passed
- All 3 optimized regimes show consistent improvement
- Overall RMSE regression confirmed <25%
---
## ๐๏ธ Architecture
### System Architecture (7 Layers)
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Portfolio Maximizer โ
โ Production-Ready System โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
... (truncated)
tools
Comments
Sign in to leave a comment