Reinforcement Learning in Finance: Resources and Expert Advice from Paul Bilokon

Reinforcement learning (RL) is one of the most exciting areas of Machine Learning, especially when applied to trading. RL is so appealing because it allows you to optimise strategies and enhance decision-making in ways that traditional methods can’t.

One of its biggest advantages?

You don’t have to spend a lot of time manually training the model. Instead, RL learns and makes trading decisions on its own (depending on feedback once received), continuously adjusting as per the dynamism of the market. This efficiency and autonomy are why RL is becoming so popular in finance.

As per the news, “The global Reinforcement Learning market was valued at $2.8 billion in 2022 and is projected to reach $88.7 billion by 2032, growing at a CAGR of 41.5% from 2023 to 2032.⁽¹⁾ “

Please note that we have prepared the content in this article almost entirely from Dr Paul Bilokon’s QuantInsti webinar. You can watch the webinar (below) if you wish to.

About the Speaker

Dr. Paul Bilokon, CEO and Founder of Thalesians Ltd, is a prominent figure in quantitative finance, algorithmic trading, and machine learning. He leads innovation in financial technology through his role at Thalesians Ltd and serves as the Chief Scientific Advisor at Thalesians Marine Ltd. In addition to his industry work, he heads the faculty at the Machine Learning Institute and the Quantitative Developer Certificate, playing a key role in shaping the future of quantitative finance education.

In this blog, we will first explore key research papers that will help you learn Reinforcement Learning in finance along with the latest advancements in RL applied to finance.

We will then navigate through some good books in the field.

Finally, we will take a look at valuable insights covered in the FAQ session with Paul Bilokon, where he answers an assortment of questions about reinforcement learning and its impact on trading strategies.

Let’s get started on this learning journey as this blog covers the following for learning Reinforcement Learning in Finance in depth:

Key Research Papers
Useful Books
Blogs
FAQs with Paul Bilokon: Expert Insights

Key Research Papers

Below are the key research papers recommended by Paul on Reinforcement Learning in finance.

Modern Perspectives on Reinforcement Learning in Finance (Link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3449401) by Kolm/Ritter provides an insightful look into the intersection of reinforcement learning in trading and financial markets.
QLBS and Q-Learner Black-Scholes Merton Worlds (Link: https://elsevier-ssrn-document-store-prod.s3.amazonaws.com/19/09/03/ssrn_id3446918_code1650179.pdf) by Halperin explores the integration of Q-learning with Black-Scholes models, providing insight into market environments where RL can inform pricing strategies.
RL Hedging and Dynamic Replication – Kolm/Ritter (Link: https://www.researchgate.net/publication/329435926_Dynamic_Replication_and_Hedging_A_Reinforcement_Learning_Approach) discusses the use of RL in options, hedging and dynamic replication, showcasing strategies for managing risk in derivative trading.
Deep Hedging (Link: https://elsevier-ssrn-document-store-prod.s3.amazonaws.com/20/01/02/ssrn_id3512930_code623849.pdf) by Buehler, Gonon, Teichmann, Wood, and Kochems presents a Deep Learning approach to hedging that optimally manages options in market frictions.
Deep hedging of Derivatives (Link: https://www.researchgate.net/publication/338851556_Deep_Hedging_of_Derivatives_Using_Reinforcement_Learning) by Cao/Chen/Hull/Poulos offers a different approach to Reinforcement Learning as compared to the above read.
Wealth Management (Link: https://elsevier-ssrn-document-store-prod.s3.amazonaws.com/20/02/25/ssrn_id3543852_code1452771.pdf) by Dixon and Halperin uses RL for portfolio allocation and wealth management focusing on aligning investment strategies with clients' financial goals.
Double Deep Q-Learning for Optimal Execution (Link: https://www.semanticscholar.org/reader/406a8d12a5bce3a7dd4c10958c73202f355fd491) by Ning, Lin and Jaimungal is a recognised work on applying RL in executing large financial transactions optimally.
Optimal Order Placement (Link: https://www.researchgate.net/publication/351382389_Deep_reinforcement_learning_for_the_optimal_placement_of_cryptocurrency_limit_orders) by Schnaubelt expands on RL strategies for determining the best timing and pricing in order execution.
Distributional Reinforcement Learning for optimal execution (Link: https://www.imperial.ac.uk/media/imperial-college/faculty-of-natural-sciences/department-of-mathematics/math-finance/TobyWestonSubmission.pdf) by Toby Weston explains how Deep Q learning offers a data-driven approach to optimal order execution by learning from real market interactions, potentially bypassing traditional models that oversimplify market complexity and execution costs.

Apart from the above-mentioned research papers which Paul recommends, let us also look at some other research papers below that are quite beneficial for learning Reinforcement Learning in finance.

**Note: The research papers below are not from the webinar video featuring Paul Bilokon.**

Deep Reinforcement Learning for Algorithmic Trading (Link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3812473) by Álvaro Cartea, Sebastian Jaimungal and Leandro Sánchez-Betancourt explains how reinforcement learning techniques like double deep Q networks (DDQN) and reinforced deep Markov models (RDMMs) are used to create optimal statistical arbitrage strategies in foreign exchange (FX) triplets. The paper also demonstrates their effectiveness through simulations of exchange rate models.
Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy (Link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3690996) by Hongyang Yang, Xiao-Yang Liu, Shan Zhong and Anwar Walid covers the explanation of an ensemble stock trading strategy that uses deep reinforcement learning to maximise investment returns. By combining three actor-critic algorithms (PPO, A2C, and DDPG), it creates a robust trading strategy that outperforms individual algorithms and traditional baselines in risk-adjusted returns, tested on Dow Jones stocks.
Reinforcement Learning Pair Trading: A Dynamic Scaling Approach (Link: https://arxiv.org/pdf/2407.16103) by Hongshen Yang and Avinash Malik explores the use of reinforcement learning (RL) combined with pair trading to enhance cryptocurrency trading. By testing RL techniques on BTC-GBP and BTC-EUR pairs, it demonstrates that RL-based strategies significantly outperform traditional pair trading methods, yielding annualised profits between 9.94% and 31.53%.
Deep Reinforcement Learning Framework to Automate Trading in Quantitative Finance (Link: https://ar5iv.labs.arxiv.org/html/2111.09395) by Xiao-Yang Liu, Hongyang Yang, Christina Dan Wang and Jiechao Gao introduces FinRL, the first open-source framework designed to help quantitative traders apply deep reinforcement learning (DRL) to trading strategies, overcoming the challenges of error-prone programming and debugging. FinRL offers a full pipeline with modular, customisable algorithms, simulations of various markets, and hands-on tutorials for tasks like stock trading, portfolio allocation, and cryptocurrency trading.
Deep Reinforcement Learning Approach for Trading Automation in The Stock Market (Link: https://arxiv.org/abs/2208.07165) by Taylan Kabbani and Ekrem Duman covers how Deep Reinforcement Learning (DRL) algorithms can automate profit generation in the stock market by combining price prediction and portfolio allocation into a unified process. It formulates the trading problem as a Partially Observed Markov Decision Process (POMDP) and demonstrates the effectiveness of the TD3 algorithm, achieving a 2.68 Sharpe Ratio, while highlighting DRL’s superiority over traditional machine learning approaches in financial markets.

Now let us find out about all those books that Paul recommends for learning Reinforcement Learning in finance.

Useful Books

You can see the list of books below:

Reinforcement Learning: An Introduction by Sutton and Barto is a foundational book on reinforcement learning, covering essential concepts that can be applied to various domains, including finance.

Algorithms for Reinforcement Learning by Csaba Szepesvári offers a deeper dive into the algorithms driving RL, helpful for those interested in the technical side of financial applications.

Reinforcement Learning and Optimal Control by Dimitri Bertsekas explores Reinforcement Learning, approximate dynamic programming, and other methods to bridge optimal control and Artificial Intelligence, with a focus on approximation techniques across various types of problems and solution methods.

Reinforcement Learning Theory by Agarwal, Jiang, and Sun is a newer work offering advanced insights into RL theory.

https://rltheorybook.github.io/rltheorybook_AJKS.pdf

Deep Reinforcement Learning Hands-On by Maxim Lapan how to use deep learning (DL) and Deep Reinforcement Learning (RL) to solve complex problems, covering key methods and applications, including training agents for Atari games, stock trading, and AI-driven chatbots. Ideal for those familiar with Python and basic DL concepts, it offers practical insights into the latest algorithms and industry developments.

Deep Reinforcement Learning in Action by Alexander Zai and Brandon Brown explains how to develop AI agents that learn from feedback and adapt to their environments, using techniques like deep Q-networks and policy gradients, supported by practical examples and Jupyter Notebooks. Suitable for readers with intermediate Python and deep learning skills, the book includes access to a free eBook.

Machine Learning in Finance by Matthew Dixon, Igor Halperin and Paul Bilokon offers a comprehensive guide to applying Machine Learning in finance, combining theories from econometrics and stochastic control to help readers select optimal algorithms for financial modelling and decision-making. Targeted at advanced students and professionals, it covers supervised learning for cross-sectional and time series data, as well as reinforcement learning in finance, with practical Python examples and exercises.

Machine Learning and Big Data with Kdb+ by Bilokon, Novotny, Galiotos, and Deleze, focuses on handling vast datasets for finance, which is essential for those working with real-time market data.

Essential concepts like Multi-Armed Bandits, Markov decision processes, and dynamic programming form the basis for many RL strategies in finance. These concepts enable the exploration of decision-making under uncertainty, a core element in financial modelling.

Books on Multi-Armed Bandits

Donald Berry and Bert Fristedt. Bandit problems: sequential allocation of experiments. Chapman & Hall, 1985.
(Link: https://link.springer.com/book/10.1007/978-94-015-3711-7)
Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006. (Link: https://www.cambridge.org/core/books/prediction-learning-and-games/A05C9F6ABC752FAB8954C885D0065C8F)
Dirk Bergemann and Juuso Välimäki. Bandit Problems. In Steven Durlauf and Larry Blume (editors). The New Palgrave Dictionary of Economics, 2nd edition. Macmillan Press, 2006. (Link: https://link.springer.com/referenceworkentry/10.1057/978-1-349-95121-5_2386-1)
Aditya Mahajan and Demosthenis Teneketzis. Multi-armed Bandit Problems. In Alfred Olivier Hero III, David A. Castañón, Douglas Cochran, Keith Kastella (editors). Foundations and Applications of Sensor Management. Springer, Boston, MA, 2008. (Link: https://epdf.tips/foundations-and-applications-of-sensor-management-signals-and-communication-tech.html)
John Gittins, Kevin Glazebrook, and Richard Weber. Multi-armed Bandit Allocation Indices. John Wiley & Sons, 2011. (Link: https://onlinelibrary.wiley.com/doi/book/10.1002/9780470980033)
Sébastien Bubeck and Nicolò Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning, now publishers Inc., 2012. (Link: https://arxiv.org/abs/1204.5721)
Tor Lattimore and Csaba Szepesvári. Bandit Algorithms. Cambridge University Press, 2020. (Link: https://tor-lattimore.com/downloads/book/book.pdf)
Aleksandrs Slivkins. Introduction to Multi-Armed Bandits. Foundations and Trends in Machine Learning, now publishers Inc., 2019. (Link: https://www.nowpublishers.com/article/Details/MAL-068)

Books on Markov decision processes and dynamic programming

Lloyd Stowell Shapley. Stochastic Games. Proceedings of the National Academy of Sciences of the United States of America, October 1, 1953, 39 (10), 1095–1100 [Sha53]. (Link: https://www.pnas.org/doi/full/10.1073/pnas.39.10.1095)
Richard Bellman. Dynamic Programming. Princeton University Press, NJ 1957 [Bel57]. (Link: https://press.princeton.edu/books/paperback/9780691146683/dynamic-programming?srsltid=AfmBOorj6cH2MSa3M56QB_fdPIQEAsobpyaWvlcZ-Ro9QFWNtkL2phJM)
Ronald A. Howard. Dynamic programming and Markov processes. The Technology Press of M.I.T., Cambridge, Mass. 1960 [How60]. (Link: https://gwern.net/doc/statistics/decision/1960-howard-dynamicprogrammingmarkovprocesses.pdf)
Dimitri P. Bertsekas and Steven E. Shreve. Stochastic optimal control. Academic Press, New York, 1978 [BS78]. (Link: https://web.mit.edu/dimitrib/www/SOC_1978.pdf)
Martin L. Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, New York, 1994 [Put94]. (Link: https://www.wiley.com/en-us/Markov+Decision+Processes%3A+Discrete+Stochastic+Dynamic+Programming-p-9781118625873)
Onesimo Hernández-Lerma and Jean B. Lasserre. Discrete-time Markov control processes. Springer-Verlag, New York, 1996 [HLL96]. (Link: https://www.kybernetika.cz/content/1992/3/191/paper.pdf)
Dimitri P. Bertsekas. Dynamic programming and optimal control, Volume I. Athena Scientific, Belmont, MA, 2001 [Ber01]. (Link: https://www.researchgate.net/profile/Mohamed_Mourad_Lafifi/post/Dynamic-Programming-and-Optimal-Control-Volume-I-and-II-dimitri-P-Bertsekas-can-i-get-pdf-format-to-download-and-suggest-me-any-other-book/attachment/5b5632f3b53d2f89289b6539/AS%3A651645092368385%401532375705027/Dynamic+Programming+and+Optimal+Control+Volume+I.pdf)
Dimitri P. Bertsekas. Dynamic programming and optimal control, Volume II. Athena Scientific, Belmont, MA, 2005 [Ber05]. (Link: https://www.researchgate.net/profile/Mohamed_Mourad_Lafifi/post/Dynamic-Programming-and-Optimal-Control-Volume-I-and-II-dimitri-P-Bertsekas-can-i-get-pdf-format-to-download-and-suggest-me-any-other-book/attachment/5b5632f3b53d2f89289b6539/AS%3A651645092368385%401532375705027/download/Dynamic+Programming+and+Optimal+Control+Volume+I.pdf)
Eugene A. Feinberg and Adam Shwartz. Handbook of Markov decision processes. Kluwer Academic Publishers, Boston, MA, 2002 [FS02]. (Link: https://www.researchgate.net/publication/230887886_Handbook_of_Markov_Decision_Processes_Methods_and_Applications)
Warren B. Powell. Approximate dynamic programming. Wiley-Interscience, Hoboken, NJ, 2007 [Pow07]. (Link: https://www.wiley.com/en-gb/Approximate+Dynamic+Programming%3A+Solving+the+Curses+of+Dimensionality%2C+2nd+Edition-p-9780470604458)
Nicole Bäuerle and Ulrich Rieder. Markov Decision Processes with Applications to Finance. Springer, 2011 [BR11]. (Link: https://www.researchgate.net/publication/222844990_Markov_Decision_Processes_with_Applications_to_Finance)
Alekh Agarwal, Nan Jiang, Sham M. Kakade, Wen Sun. Reinforcement Learning: Theory and Algorithms. (Link: https://rltheorybook.github.io/)

These resources provide a solid foundation for understanding and applying Reinforcement Learning in finance, offering theoretical insights as well as practical applications for real-world challenges like hedging, wealth management, and optimal execution.

Let us check out some blogs next that are quite informative as they cover essential topics on Reinforcement Learning in finance.

Blogs

Below are some of the blogs you can read.

Reinforcement Learning for Finance – Insights from Yves J. Hilpisch (Link: https://opendatascience.com/reinforcement-learning-for-finance-insights-from-yves-j-hilpisch/)

This blog consists of information on how Reinforcement Learning can be applied to finance, and why it might be one of the most transformative technologies in this space. The blog is based on the podcast by Dr. Yves J. Hilpisch which he covered in his podcast. Dr. Yves J. Hilpisch is a renowned figure in the world of quantitative finance, known for championing the use of Python in financial trading and algorithmic strategies.

Deep Learning in Quantitative Finance: Multiagent Reinforcement Learning for Financial Trading (Link: https://blogs.mathworks.com/finance/2024/05/17/deep-learning-in-quantitative-finance-multiagent-reinforcement-learning-for-financial-trading/)

This blog post covers how Multiagent Reinforcement Learning can be used to develop optimal trading strategies by simulating competitive agents. It demonstrates the effectiveness of competing agents in outperforming noncompeting agents when trading in a simulated stock environment.

Reinforcement Learning as your portfolio advisor (Link: https://blogs.mathworks.com/finance/2023/08/22/reinforcement-learning-as-your-portfolio-advisor/)

This blog covers the development of a Reinforcement Learning system that provides dynamic investment recommendations to maximise returns in a stock portfolio. It explains how the system handles complex market conditions, manages risk, and uses approximation methods to optimise decision-making in scarce environments.

Lastly, you can see the questions that the webinar audience asked Paul.

FAQs with Paul Bilokon: Expert Insights

Below are a few interesting questions the audience asked and very useful answers by Paul.

Q: How can Reinforcement Learning be useful in trading with low signal-to-noise ratios?

A: Yes, reinforcement learning can indeed be useful in finance. However, it's important to consider that finance often has a very low signal-to-noise ratio and non-stationarity, meaning the statistical properties of financial data change over time. These conditions aren't unique to finance, as they also appear in fields like life sciences and physical sciences with high stochasticity. I’ve written several papers addressing how to handle non-stationarity and low signal-to-noise ratio environments; they can be found on my SSRN page.

If you type “Paul Bilokon papers” on Google, you will see a list of SSRN research papers. The ones published in 2024 have a lot of such papers that explain how to deal with non-stationarity in the presence of low signal to noise ratio.

Q: Can Supervised Learning models like Black-Scholes guide Reinforcement Learning in trading?

A: Yes, they can. For instance, you can use the Black-Scholes model or a classical PDE solver to train reinforcement learning agents initially. Afterwards, you can improve your model by using real data to fine-tune the training. This approach combines insights from classical models with the flexibility of reinforcement learning.

Q: How important is coding experience for machine learning and reinforcement learning in finance?

A: Practical experience in programming is crucial. Those working in reinforcement learning or machine learning, in general, should be able to code quickly and efficiently. Many experts in reinforcement learning, like David Silver, come from software development backgrounds, often with experience in video game development. Building proficiency in programming can significantly enhance one's ability to handle data and develop sophisticated ML solutions.

Q: Is market and signal selection in a financial model a feature selection problem?

A: Yes, it can be viewed as a feature selection problem. You face the classic bias-variance trade-off. Using all features can introduce noise, while reducing features can help manage variance, but might increase bias. An effective feature selection algorithm will help maintain a balance, reducing variance without introducing too much bias and thus improving mean squared error.

Q: What are the top three trading strategies for quant researchers to explore?

A: Basic trading strategies from textbooks, such as momentum and mean reversion, may not work directly in practice, as many have been arbitraged away due to widespread use. Instead, understanding the statistical and market principles behind these strategies can inspire more sophisticated methods. Techniques like deep learning, if properly controlled for complexity and overfitting, could also help in feature selection and decision-making.

Q: Can options trading strategies achieve high AUM like mutual funds?

A: Options trading and mutual funds represent different financial activities and they are not directly comparable. For instance, selling options can expose one to high risk, so it’s often reserved for professionals due to the potential for unlimited downside. While options trading can yield higher fees, it’s essential to understand its inherent risks, such as the volatility risk premium.

Q: What happens when multiple traders use the same reinforcement learning strategy in the market?

A: If the market has high capacity and both are trading small sizes, they may not impact each other significantly. However, if the strategy’s capacity is low, competing participants can cause alpha decay, reducing profitability. Generally, once a strategy becomes well-known, overuse can lead to diminished returns.

Q: Is there a “Hugging Face” equivalent for reinforcement learning with pre-trained models?

A: OpenAI Gym provides a variety of classical environments for reinforcement learning and offers standard models like Deep Q-Learning and Expected SARSA. OpenAI Gym allows users to apply and refine models on these environments and then extend them to more complex real-world applications.

Q: How can Machine Learning enhance fundamental analysis for value investing?

A: Large Language Models (LLMs) can now process extensive unstructured data, such as text. Using a framework like LangChain with an LLM enables the automated processing of financial documents, like PDFs, to analyse fundamentals. Combining this with ML models can help identify undervalued, high-quality stocks based on fundamental analysis.

Courses by QuantInsti

**Note: This topic is not addressed in the webinar video featuring Paul Bilokon.**

Additionally, the following courses by QuantInsti cover Reinforcement Learning in finance.

Introduction to Machine Learning for Trading (Link: https://quantra.quantinsti.com/course/introduction-to-machine-learning-for-trading)

This free course introduces you to the application of machine learning in trading, focusing on the implementation of various algorithms using financial market data. You will explore different research studies and gain a comprehensive understanding of this specialised area.

Deep Reinforcement Learning in Trading

Utilise reinforcement learning to develop, backtest, and execute a trading strategy with two deep-learning neural networks and replay memory. This hands-on Python course emphasises quantitative analysis of returns and risks, culminating in a capstone project focused on financial markets.

AI for Portfolio Management: LSTM Networks (Link: https://quantra.quantinsti.com/course/ai-portfolio-management-lstm-networks)

If you are interested in using AI to determine optimal investments in Gold or Microsoft stocks, this course is the one for you. This course leverages LSTM networks to teach fundamental portfolio management, including mean-variance optimisation, AI algorithm applications, walk-forward optimisation, hyperparameter tuning, and real-world portfolio management. Also, you will get hands-on experience through live trading templates and capstone projects.

Conclusion

This blog explored key resources, including research papers, books, and expert insights from Paul Bilokon, to help you dive deeper into the world of RL in finance. Whether you are looking to optimise trading strategies or explore cutting-edge AI-driven solutions, the resources discussed provide a comprehensive foundation. As you continue your learning journey, leveraging these resources will equip you with the necessary tools to excel in the field of quantitative finance and algorithmic trading using reinforcement learning.

You can learn Reinforcement Learning in depth with the course on Deep Reinforcement Learning in Trading. With this course, you can take your trading skills to the next level as you will learn to apply reinforcement learning to create, backtest, and trade strategies. Further, you will learn to master quantitative analysis of returns and risks, ending the course with implementable techniques and a capstone project in financial markets.

File in the download:

PPT By Paul Bilokon

Compiled by: Chainika Thakar

Disclaimer: All data and information provided in this article are for informational purposes only. QuantInsti^® makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information in this article and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis..

Algo Trading Conference 2025