
Deep Reinforcement Learning for Dynamic Portfolio Allocation
A PPO-based allocation engine integrating NLP sentiment, macro regime features, options-implied volatility, and explainability for dynamic multi-asset investing.
This project focuses on the development of a dynamic portfolio allocation framework using deep reinforcement learning to solve a sequential asset-allocation problem in a high-dimensional, changing market environment. Instead of generating static portfolio weights from one-period estimates, the system learns a policy that adapts over time as market states evolve, making it better suited to the path-dependent nature of real investment decisions.
The core model extends recent DRL portfolio literature by integrating a richer state space that combines earnings-transcript sentiment, macro-regime indicators, and options-implied volatility surfaces across equities, fixed income, commodities, and cash. These inputs allow the agent to condition its actions not only on realized returns, but also on forward-looking information about narrative tone, macro environment, and market-implied risk. The policy is trained using Proximal Policy Optimization with turnover penalties and ESG exclusions so that learning is grounded in implementable rather than purely theoretical allocation behavior.
Evaluation is carried out against constrained mean-variance benchmarks, with a focus on risk-adjusted performance rather than simple return maximization. The model is stress-tested across major dislocations, including the 2020 COVID crash and the 2022 rate-hike cycle, to examine whether adaptive policy learning provides value precisely when static optimization tends to struggle. Performance is assessed using Sharpe, Sortino, Calmar, net returns, and turnover-aware metrics, giving a more complete picture of realized investment quality.
Explainability is treated as a serious requirement rather than an afterthought. SHAP-based interpretation is used to identify which state variables matter most to the learned allocation policy, helping separate genuine economic learning from unstable pattern extraction. This makes the system more defensible as a research product and more useful for understanding when the model leans on sentiment, macro conditions, or market-implied stress.
Overall, the project demonstrates how reinforcement learning can be used to build a more adaptive, state-aware allocation engine when paired with domain-rich features and strong out-of-sample evaluation. Future extensions include transaction-cost simulation, hierarchical policy structures, and explicit regime-switching reward design.