Enhancing Financial Trading Strategies through Deep Reinforcement Learning (DRL)

This project implements a Deep Recurrent Q-Network (DRQN) to optimize stock and financial trading strategies. Building upon the reinforcement learning framework, we employed advanced deep learning techniques to enhance real-time decision-making in dynamic market environments. The core innovation lies in expanding state-action spaces and incorporating novel reward structures. The model has been tested on multiple trading strategies and compared to several baselines, significantly outperforming them in terms of returns and robustness.

Problem Statement

Challenge: Financial trading is complex due to:

Unknown future trends in stock prices.
High transaction costs associated with frequent trading actions.
The need to handle high-frequency trading data while making decisions quickly.

Traditional models fall short in their ability to make accurate predictions and minimize costs while managing large portfolios. Furthermore, the lack of a defined Markov Decision Process (MDP) in financial trading models poses challenges for real-time decision-making.

Objectives

The project’s goal was to build a Deep Reinforcement Learning agent that can:

Learn market behavior from historical data and optimize actions to achieve high long-term returns.
Reduce transaction costs and balance the agent’s portfolio for more strategic trading.
Outperform baseline methods through an improved decision-making framework.

Approach

Deep Recurrent Q-Network (DRQN) Architecture

The DRQN used in this project integrates Long Short-Term Memory (LSTM) layers to account for the sequential nature of financial data, which is crucial for capturing market trends over time. The trading task was modeled as an MDP with discrete time steps, where the agent made Buy, Sell, or Hold decisions based on market data.

State Space: Captured the market’s technical indicators, historical price movements, and the agent’s portfolio status. The input consisted of features like closing prices, log returns, and normalized market data over 96 periods.

Action Space: Three key actions (Buy, Sell, Hold) were available to the agent, with action augmentation techniques added to improve decision efficiency and reduce unnecessary trades.

Reward Function: The reward was derived from log returns at each time step, factoring in the change in account balance and the transaction costs incurred.

Action Augmentation

A major innovation was the action augmentation technique, which penalized random exploration and reduced costs. By integrating this into the reward structure, the model was able to make more informed and consistent trading actions.

Data and Feature Engineering

Data Source: The project utilized tick-by-tick stock data spanning from 2015 to 2020. Data was processed into 30-minute intervals, and important features such as the opening and closing prices were derived from these intervals. This preprocessing ensured that the agent had access to consistent and relevant information.

Feature Engineering:

Log Returns: Computed for the 8 most recent time intervals.
Z-Score Normalization: Applied to price and volume features to control for extreme values.
Technical Indicators: Incorporated indicators such as the Relative Strength Index (RSI) and Moving Average Convergence Divergence (MACD) to provide a broader perspective on market conditions.

The state feature encoding also involved encoding time variables via a sinusoidal function to help the agent recognize different timeframes in the market.

Model Architecture

Network Structure:

The DRQN model was designed with the following layers:

Input Layer: A vector representing the market’s state, including price data, volume, and technical indicators.
Hidden Layers: Two dense layers of 256 units with Exponential Linear Unit (ELU) activations.
LSTM Layer: Captured the temporal dependencies and enabled the agent to learn sequential patterns over time. Output Layer: Produced three outputs representing the actions (Buy, Sell, Hold).

Loss Function: A novel action augmentation loss was introduced to update the Q-values more efficiently, ensuring better learning without overfitting to noisy or irrelevant actions.

Training Strategy

Replay Memory:

A replay memory buffer was implemented to store state-action pairs, which the agent sampled from during training. The agent optimized its Q-network using soft updates and mini-batches from this memory, improving its decision-making with every iteration.

Hyperparameters: Gamma for Discounting: Set to 0.0010, a value optimized for balancing the trade-off between immediate rewards and future profitability. Learning Rate: Fine-tuned during an ablation study to achieve the best training performance.

Training Modifications:

Small Replay Memory: Prioritized recent experiences, which were more relevant to short-term trading decisions.
Longer Sequence Sampling: Enabled the model to better recognize and react to long-term price trends.
Batch Training: The model was trained in batches, not after every action, improving both computational efficiency and model performance.

Experiment Results

Performance Evaluation: We compared the DRQN-based trading model with the following baseline approaches:

Continuous Buy/Sell: A simple strategy where the agent buys or sells continuously at regular intervals.
Interval Trading: Divided the trading period into fixed intervals, selling a percentage of stocks at each interval.
e-Greedy DRL: A DRL model without action augmentation.

Results:

The DRQN model significantly outperformed the baselines in terms of return on investment. For instance, in one test case (stock BII), an initial investment of $100,000 resulted in returns of over $4,000,000.
Even in cases where the e-Greedy DRL performed comparably (stocks AAL, MSF), the DRQN with action augmentation showed more consistent results and avoided the common pitfalls of continuous buy/sell methods.

Limitations

The model primarily focused on stock trading with a fixed action space (Buy, Sell, Hold). Future iterations could explore more flexible action spaces, allowing variable amounts of stock trading per action.

Key Takeaways

Innovation in DRL for Trading

This project showcases the effectiveness of Deep Reinforcement Learning in real-time financial markets. By integrating advanced features like LSTM layers and action augmentation, the model successfully navigates complex market dynamics, outperforming traditional trading methods.

Impact

The model’s ability to optimize trading strategies in unpredictable markets has significant implications for hedge funds, individual traders, and algorithmic trading systems.

The Github repository is here.

Share on

Twitter Facebook LinkedIn

Srushti Manjunath