Mathematics and Econometrics - Quantitative Finance & Algo Trading Blog by QuantInsti

Beyond the Hype: What "Independent Events" REALLY Mean for Your Trades

Chainika Thakar — Wed, 18 Jun 2025 12:58:59 GMT

By Aacashi Nawyndder and Chainika Thakar

TL;DR

Understanding probability, independence, correlation, and cointegration is key to building robust trading strategies. While correlation shows short-term co-movements, cointegration captures long-term ties, and independence means no influence between variables. Visual tools and Python-based analysis help identify these relationships, supporting smarter diversification and precise hedging. Algorithms and AI further apply these ideas across strategies, but real-world shifts and human biases remind us that market relationships evolve. Mastering these concepts enables more adaptive, data-driven trading.

This blog covers:

The Building Blocks
What is Independence, Statistically?
Understanding the Concepts: Independence, Correlation, and Cointegration Defined
Seeing is Believing: Visual and Quantitative Tools
From Brain Food to Real Action: Leveraging Independence in Your Trading Arsenal
The Human Factor: Data Science Tools and Our Own Brain Quirks
Reality Check: Limitations and Caveats
Frequently Asked Questions

Ever look at the stock market and feel like it’s just a blur of randomness—spikes, dips, and noise with no clear rhyme or reason? You’re not alone. But here’s the thing: beneath the chaos, there are patterns. And one of the most powerful tools for spotting them is a statistical gem called independent events.

Forget the dry textbook stuff for a moment. This concept isn’t just academic—it’s practical. It’s the key to recognising signals that truly stand apart from the usual market noise. It’s how you start building a portfolio where one bad day doesn’t wreck your entire plan. And it’s the secret behind smarter, sharper strategies that don’t just ride the market’s mood—they cut through it.

Prerequisites

To grasp the concepts of statistical independence, correlation, and cointegration in trading, it's important to start with foundational knowledge in probability and statistics. Begin with Probability in Trading, which introduces the role of probabilistic thinking in financial markets. Follow it with Statistics & Probability Distribution, where you’ll learn about key statistical measures and how they apply to market data. These concepts are critical for interpreting market relationships and designing robust trading strategies. You can further reinforce your foundation with the Statistics & Probability for Trading Quantra course, which offers interactive content tailored for market practitioners.

Complement this understanding with Stock Market Data: Analysis in Python, which walks through acquiring and processing real market data—a vital step before running statistical models. For coding fluency, Basics of Python Programming and the Python for Trading (Basic) course offer hands-on experience with Python, ensuring you're equipped to analyze time series and build models effectively.

So, in this guide, we're going to take a journey together. Not just to define these terms, but to truly internalize them. We'll explore:

The core idea of independence and what it means in trading
A little bit of simple math to keep us grounded (I promise, not too scary!).
Clear examples from everyday life and, of course, the financial battleground.
A good look at what independence, correlation, and cointegration actually are, and critically, how they’re different.
Actionable ways to weave this knowledge into robust trading strategies and risk management.
Expanded, real-world algorithmic trading examples, showing these concepts in action.
The essential caveats – because no concept is a magic bullet.

Ready to move past just scratching the surface and get a real handle on this?

Let's dive in!

The Building Blocks

Alright, before we dive deeper, let's make sure we're speaking the same language. Here are a few foundational concepts:

Probability: Simply put, this is the measure of how likely an event is to occur. It’s expressed on a scale from 0 (impossible) to 1 (it’s a sure thing!).
Mathematically, if A is any event, then P(A) is the probability that event A occurs.
Random Variable: Think of this as a variable whose value is determined by the outcome of a random phenomenon.The daily price wiggle of a stock? A classic example.
Conditional Probability: This is the chance of something happening given that something else has already happened. We write it as P(A|B) – "the probability of A, if B has occurred." This is super important for understanding events that aren't independent (dependent events). If A and B are dependent, then:

P(A and B) = P(A) × P(B|A)

What is Independence, Statistically?

Two events are independent if one happens without changing the odds of the other happening. They're effectively in their own lanes.

Think: Event A is "Stock X goes up," and Event B is "It rains today." If they're independent, Stock X's rise (or fall) has zero impact on whether it rains, and the rain isn't bothered by what Stock X is doing.

Mathematically, this means knowing A happened doesn't change B's odds, so the probability of B given A (P(B|A)) is just the same as B's original probability (P(B)). Remember our conditional probability rule for any two events: P(A and B) = P(A) × P(B|A)? Well, for independent events, since P(B|A) simply equals P(B), the formula simplifies nicely to:

P(A and B) = P(A) × P(B)

Essentially, you just multiply their individual chances.

Spotting Independence: From Daily Life to Market Dynamics

It’s always easier to grasp these ideas when you see them in action. In everyday life, independent events show up in things like flipping two coins or rolling a pair of dice—where one outcome doesn’t affect the other.

Source

Extending this idea to Financial Markets and Trading:

Super Diversified Global Assets: Think about assets from totally different parts of the world and the economy. Say, bonds from a city in California and shares in a tech startup in Bangalore, India. They're likely operating under very different economic pressures and business drivers. Now, in our super-connected global market, are any two assets perfectly, 100% statistically independent? Probably not. But this kind of diversification aims to get them as close as possible, with low correlation (Markowitz, 1952). A crisis hitting one is much less likely to wallop the other in the same way directly. True statistical independence is more of an ideal we shoot for.
Unrelated Industry Performance (Usually): The stuff that makes cocoa bean prices jump (like weather in West Africa or crop diseases) is generally pretty separate from what drives the stock price of a big aerospace defense company (think government contracts or global political tensions).

A Quick Heads-Up on a Common Mix-Up:

Sometimes you'll see two things react to the same event but in totally opposite ways.

Take the early days of the COVID-19 pandemic, for instance. E-commerce giants like Amazon saw demand skyrocket as we all started shopping online from our couches. Meanwhile, airline companies like Delta watched their revenues nosedive because no one was flying.
It's super tempting to look at that and think, "Aha! Independent events!" because their fortunes went in completely different directions. But hold on – this isn't actually statistical independence.
It’s a classic case of strong negative correlation. Both were reacting to the same global event (the pandemic), just in opposite ways because of how it hit their specific businesses. For example, Baker et al. (2020) reported a very strong negative correlation-around -0.82 between Amazon and Delta in mid-2020.

So, just because things move in polar opposite directions doesn't mean they're truly independent of each other. It's a subtle but important difference to keep in mind!

Understanding the Concepts: Independence, Correlation, and Cointegration Defined

Let's break down these crucial terms individually before we compare them.

What is Statistical Independence?
Independence, in a statistical sense, signifies a complete lack of predictive power between two events or variables. Variable X gives you no clues about Variable Y, and Y offers no hints about X. There's no hidden string connecting them, no shared underlying reason that would make them move together or apart in any predictable way.

What is Correlation?
Correlation is a number that tells us how much and in what direction the returns (like the daily percentage change) of two assets tend to move together. It’s a score from -1 to +1:

+1 (Perfect Positive Correlation): This means that the assets' returns move perfectly in the same direction. When one goes up, the other goes up by a proportional amount, and vice versa.
-1 (Perfect Negative Correlation): This indicates that the assets' returns move perfectly in opposite directions.When one goes up, the other goes down by a proportional amount.
0 (Zero Correlation): This shows there's no clear linear connection in how their returns change.

Correlation is usually about how things co-move in the shorter term.
Craving the full scoop? This blog’s got you covered.

What is Cointegration?
This one's a bit more nuanced and thinks long-term. It’s about when two or more time series (like the prices of assets) are individually wandering around without a clear anchor (we call this non-stationary – they have trends and don't snap back to an average). BUT, if you combine them in a certain linear way, that combination is stationary – meaning it tends to hang around a stable average over time. So, even if individual prices drift, cointegration means they're tethered together by some deep, long-run economic relationship (Engle & Granger, 1987).

Classic Example: Think crude oil and gasoline prices. Both might trend up or down over long stretches due to inflation or significant economic shifts. However, the spread (the difference) between their prices, which is related to refinery profits, often hovers around a historical average. They can't stray too far from each other for too long.

Comparing these terms:

Now, let's see how these concepts stand apart – a critical distinction for any serious trader.

Feature	Independence	Correlation	Cointegration
Nature of Link	No statistical relationship at all (beyond luck).	Measures only linear co-movement of asset returns.	Describes a long-term equilibrium relationship between asset prices.
Time Horizon	Not really about time, just the lack of a link.	Usually a shorter-term thing (days, weeks, months). Can change fast!	A longer-term property. They might stray short-term but should come back.
What's Measured	The absence of any predictive power.	The strength & direction of a linear relationship in returns.	Whether prices are tethered in the long run.
Data Used	Can apply to any events or variables.	Typically calculated on asset returns (e.g., % changes).	Analyzed using asset price levels.
Trading Angle	Awesome for true diversification (less likely to tank together).	Good for short-term hedging, seeing near-future co-moves. Low correlation is good for diversification.	Basis for "pairs trading" – betting on the spread between two cointegrated assets returning to normal.

Super Important Point: Zero Correlation ≠ Independence!
This is a classic trip-up! Two assets can have zero linear correlation but still be dependent. Imagine Asset A does great when Asset B is either doing really well or really badly (picture a U-shape if you plotted them). The linear correlation might be near zero, but they're clearly not independent; knowing Asset B's extreme performance tells you something about Asset A.

Recap: Independence means no relationship; correlation is about short-term linear return patterns; cointegration points to long-term price relationships. Understanding these nuances is vital for building robust strategies.

Seeing is Believing: Visual and Quantitative Tools

Visualizing data and quantifying relationships can transform abstract concepts into actionable insights.

Price Charts & Scatter Plots:

As mentioned, overlaying price charts (like the AMZN vs. DAL example) or creating scatter plots of returns can offer initial clues. A scatter plot of returns for two truly independent assets would look like a random cloud with no discernible pattern.

Left: Random scatter indicating no correlation (independent variables), Right: Pattern showing a non-linear relationship (non-linear dependent variables)
Source

Beware! For reliable analysis, always use high-quality historical data from reputable providers like Yahoo Finance, Bloomberg, Refinitiv, or directly from the exchanges. Garbage in, garbage out!

Calculating Correlation with Python:

Don't worry if you're not a coder, but for those who are, a simple Python script can quickly show you the linear relationship

Python code snippet:

Output:

yf.download() has changed argument auto_adjust default to True
Ticker       CVX       XOM
Ticker
CVX     1.000000  0.837492
XOM     0.837492  1.000000
Ticker      AAPL      MSFT
Ticker
AAPL    1.000000  0.547987
MSFT    0.547987  1.000000
Ticker       GLD       SPY
Ticker
GLD     1.000000  0.004044
SPY     0.004044  1.000000

The correlation matrix for XOM/CVX shows a high 0.837492, meaning these oil stocks’ returns move closely together, driven by similar market factors. AAPL/MSFT (0.547987, moderate) and GLD/SPY (0.004044, near-zero) indicate tech stocks have some co-movement, while gold and the S&P 500 are, possibly, nearly independent, otherwise, they have a non-linear correlation.

From Brain Food to Real Action: Leveraging Independence in Your Trading Arsenal

This isn't just interesting theory; it's about giving you a real strategic advantage.

Next-Level Diversification: True diversification isn't just about owning many different assets; it's about owning assets whose price movements are, as much as possible, driven by independent factors. This is your best shield against unexpected shocks in one part of your portfolio.Want to learn more ? Check out this blog !
Precision Hedging: Hedging is about taking positions to protect against potential losses. Understanding independence (or the lack of it!) helps you pick better hedges – assets that are likely to move predictably (often negatively correlated) against your primary holdings under specific conditions, or assets that offer a safe haven due to their independent nature.
Building Resilient Portfolios: By thoughtfully mixing asset classes (stocks, bonds, commodities, real estate, alternative stuff) that have historically shown low correlation and are affected by different big-picture economic drivers, you can build portfolios that are designed to handle a wider variety of market storms.
Navigating Volatility Storms: When markets freak out, correlations often spike—everyone panics and does the same thing (herd behaviour). Knowing this and which assets might keep some independence (or even become negatively correlated, like some "safe-haven" assets) is key for quick-thinking risk management.

Modern Tools That Amp Up These Ideas:

Risk Parity Models: These are smart allocation strategies that try to make sure each asset class in your portfolio contributes an equal amount of risk, not just an equal amount of money. This relies heavily on good estimates of volatility and, you guessed it, correlations between assets.
Keen to learn more ? This blog has you covered!
AI and Machine Learning: Yep, AI can sift through massive piles of data to find complex, non-linear connections and fleeting moments of independence that a human might totally miss. This can lead to more dynamic and quick-to-adapt portfolio changes.
The Rise of Alternative Data: We're talking info from unusual places—satellite pics of oil tankers, credit card spending data, real-time supply chain info, what people are saying on social media. This can give unique, potentially independent clues about what's happening with the economy or specific companies, giving you an edge if you know how to read it.

Algorithmic Trading in Action: Selected Examples of Independence at Play

The ideas of independence, dependence, correlation, and cointegration are the secret sauce in many fancy trading algorithms. Here’s a peek at some key examples, especially how they relate to these concepts:

Cross-Asset & Global Diversification Algorithms:

How it works: These algorithms constantly juggle portfolios across diverse asset classes (stocks, bonds, commodities, currencies, real estate) and geographies. They continuously monitor correlations and volatility, trying to keep diversification at a target level.
Relevance of Independence: The whole point is to mix assets with low, or ideally zero, correlation that comes from independent economic drivers. For example, an algo might buy more Japanese stocks if it thinks their performance is, for the moment, independent of what's happening in the US market due to Japan's specific local policies. The dream is that a dip in one area (say, US tech stocks) is balanced out or barely felt by others (like emerging market bonds or gold).

Factor-Based Investing Algorithms:

How it works: These algorithms construct portfolios by targeting specific, well-studied "factors" that have historically driven returns– things like Value (cheap stocks), Momentum (stocks on a roll), Quality (solid companies), Low Volatility (less jumpy stocks), or Size (smaller companies). These factors were popularized in foundational work like Fama and French (1993), which identified common risk factors influencing stock and bond returns.
Relevance of Independence: The idea is that these different factors produce streams of returns that are, to some degree, independent of each other and of the overall market's general movement (beta) over the long haul. An algo might lean a portfolio towards factors expected to do well in the current economic climate or that offer diversification because they don't correlate much with other factors already in the portfolio.
Want to dig deeper? Check out the full breakdown in this blog.

Event-Driven Strategies (Focusing on Specific News):

How it works: Algos are built to trade around specific, known corporate or economic events – earnings calls, merger announcements, FDA drug approvals, key economic data releases (like inflation or job numbers).
Relevance of Independence: The strategy often banks on the market's immediate reaction to the specific news being somewhat independent of the broader market noise at that precise moment. For example, if Company A has a great earnings surprise, its stock might pop even if the overall market is blah or down, all thanks to info specific to Company A.

AI-Driven Sentiment Analysis & Alternative Data Integration:

How it works: Machine learning models chew through tons of text from news, social media, and financial reports to gauge sentiment (positive, negative, neutral) towards specific assets or the market. Alternative data (like satellite pics of store parking lots, web scraping of job ads, geolocation data) is also used to find non-traditional trading signals.
Relevance of Independence: The big idea here is that these data sources can offer insights or signals that are independent of traditional financial data (price, volume, company financials). For example, a sudden burst of negative online chatter about a product, spotted before any official sales numbers are out, could be an independent early warning sign for the company's stock.

Want to dive deeper? Two more strategies that lean heavily on the principles of independence and correlation are Market-Neutral & Statistical Arbitrage (StatArb) and Pairs Trading (based on Cointegration). Check out how they work in these quick reads:
https://blog.quantinsti.com/statistical-arbitrage/
https://blog.quantinsti.com/pairs-trading-basics/

Recap: Sophisticated algorithms leverage a deep understanding of independence, correlation, and cointegration to try and find that extra bit of profit (alpha), manage risk, and diversify effectively across all sorts of global markets and assets.

The Human Factor: Data Science Tools and Our Own Brain Quirks

Even though these concepts are statistical, it's humans doing the trading, and humans are, well, human – full of biases!

Data Science: Your Quantitative Lens: Spotting genuine independence in all the market noise is tough. Data scientists have a whole toolkit:
Rigorous Statistical Tests: Formal tests like the Pearson correlation coefficient, Spearman rank correlation (for non-linear monotonic relationships), and specific tests for cointegration (e.g., Engle-Granger, Johansen) are must-haves.
Advanced Time Series Analysis: Techniques like ARIMA, VAR, and GARCH models help to understand dependencies within and between time series data, separating real patterns from random noise.
Machine Learning Power: AI algorithms can dig up subtle, non-linear patterns of dependence or conditional independence that simpler linear models would completely miss.
Behavioral Finance: Mind Traps to Avoid:

Source

Our brains are wired to find patterns, sometimes even where none exist. Here are a few common mental traps that can mess up a trader's judgment about independence:

The Gambler's Fallacy: Wrongly believing that if an independent event (like a stock closing up) has happened a few times in a row, the opposite is now "due" to happen (Nope, each day is a new roll of the dice if they're truly independent
Representative Bias: Judging how likely something is based on how much it looks like a pattern or stereotype you already have in your head, while ignoring the actual underlying stats. For example, assuming oil stocks XOM and CVX are independent in Jan 2024 because they’re different companies, despite a high 0.84 correlation in 2023 returns showing strong dependence.
Confirmation Bias: We all do this – looking for, interpreting, and remembering information that confirms what we already believe about how assets are connected, and tuning out evidence that says otherwise. For instance, a trader might focus on a brief period of near-zero correlation (e.g., 0.05 between GLD and SPY in mid-2023) to assume independence, ignoring a longer-term 0.4 correlation indicating dependence.

Just knowing these biases exist is the first huge step towards making more objective, data-driven trading decisions.

Reality Check: Limitations and Caveats

As incredibly useful as all this is, we need to apply the idea of statistical independence with a good dose of realism:

The Myth of Perfect Independence: In our super-connected global financial world, finding assets that are perfectly, always independent is like finding a unicorn. Big systemic shocks – a global pandemic, a major financial meltdown, a widespread geopolitical crisis – can make correlations between seemingly unrelated assets suddenly shoot towards 1 (all move together) or -1 (all move opposite) as everyone rushes for (or away from) perceived safety at the same time.
Models are Guides, Not Crystal Balls: All statistical models, including those used to check for independence or correlation, are simplifications of a far more complex reality. They rely on historical data and assumptions that may not hold true in the future. Market regimes shift, and relationships evolve.
Dynamic, Not Static, Relationships: How independent or correlated assets are isn't set in stone. It's a moving target that changes over time thanks to evolving economies, tech breakthroughs, new rules, and what investors are feeling. What looks independent today might be strongly correlated tomorrow.

Conclusion

Understanding independent events – and how this concept relates to yet differs from correlation and cointegration – is vital for enhancing your market perspective, portfolio building, and risk management. Consider it an ongoing journey of refinement.

By truly grasping these principles, you can:

Forge Resilient Portfolios: Move beyond simple diversification to build portfolios designed to handle a wider array of market shocks by seeking genuinely independent return sources.
Execute Precise Hedging: Gain a clearer understanding of asset relationships to hedge unwanted risks more effectively.
Uncover Hidden Opportunities: Recognize that many strategies are built on exploiting temporary deviations from statistical relationships or capitalizing on true independencies.
Cultivate Adaptability: Acknowledge that market relationships are not static, encouraging continuous learning and strategy adjustments.

Financial markets are vast, interconnected, and constantly evolving. While perfect prediction remains elusive, a solid grasp of concepts like statistical independence provides a better compass to navigate, distinguish signals from noise, and identify opportunities.

For those seeking a practical, hands-on learning experience, Quantra by QuantInsti offers excellent courses. The Quantitative Portfolio Management Course covers techniques like Factor Investing and Risk Parity, while the Executive Programme in Algorithmic Trading (EPAT) provides a comprehensive path to mastering trading strategies.

Embracing this learning, questioning assumptions, and letting data guide you will significantly boost your ability to thrive in this ever-changing environment. The effort invested in understanding these concepts is a powerful independent variable in your journey to trading mastery.

References

Baker, S. R., Bloom, N., Davis, S. J., & Terry, S. J. (2020). COVID-Induced Economic Uncertainty. NBER Working Paper No. 26983.
https://www.nber.org/papers/w26983
Markowitz, H. (1952). Portfolio Selection. The Journal of Finance, 7(1), 77–91.
https://onlinelibrary.wiley.com/doi/10.1111/j.1540-6261.1952.tb01525.x
Engle, R. F., & Granger, C. W. J. (1987). Co-Integration and Error Correction: Representation, Estimation, and Testing. Econometrica, 55(2), 251–276.
https://www.jstor.org/stable/1913236?origin=crossref
Fama, E. F., & French, K. R. (1993). Common Risk Factors in the Returns on Stocks and Bonds. Journal of Financial Economics, 33(1), 3–56. https://doi.org/10.1016/0304-405X(93)90023-5

Next Steps

Once the basics are in place, the next step is to understand how statistical relationships between assets can inform strategy design. Factor Investing helps you recognise systematic return drivers and portfolio construction techniques based on factor exposure. Building on this, Covariance vs Correlation offers a deeper dive into how asset movements relate—fundamental for diversification and hedging.

You can then progress to Johansen Test & Cointegration to understand how long-term equilibrium relationships can signal profitable trading opportunities. This blog pairs well with Stationarity in Time Series and Hurst Exponent, both essential for assessing the stability and memory of financial data.

To apply these concepts practically, explore Statistical Arbitrage, which uses cointegration and mean reversion principles to build pair-based trading strategies. The Pairs Trading with Statistical Arbitrage course teaches you how to develop and test such strategies using Python. For those interested in broader strategy implementation, Backtesting Trading Strategies provides the tools to evaluate historical performance.

Quantitative traders can also benefit from Portfolio Optimization, which builds on correlation insights to construct efficient portfolios. For deeper modeling and predictive techniques, the Machine Learning & Deep Learning in Trading track offers extensive coverage of ML algorithms for forecasting and classification.

Finally, if you're looking to tie all of this together into a comprehensive career-ready framework, the Executive Programme in Algorithmic Trading (EPAT) provides in-depth training in statistical methods, machine learning, Python coding, portfolio theory, and real-world trading systems, making it ideal for serious professionals aiming to lead in quantitative finance.

Frequently Asked Questions

What is the difference between correlation and cointegration?

Correlation measures short-term co-movement between two variables, while cointegration identifies a long-term equilibrium relationship despite short-term deviations between two ore more non-stationary time series.

Why is independence important in trading?

Independence implies no influence between variables. Recognizing independent assets helps avoid false diversification and ensures that combined strategies aren't secretly overlapping.

How does cointegration help in building trading strategies?

Cointegration allows you to build pairs or mean-reversion strategies by identifying asset combinations that revert to a stable long-term relationship, even if each asset is volatile on its own.

Can correlation be used for portfolio diversification?

Yes, but with caution. Correlation is dynamic and can break down during market stress. The conclusion is the following: the lower the correlation, the better for diversification in asset allocation.

How can Python be used to identify these relationships?

Python libraries like statsmodels, scipy, and pandas provide tools to test for correlation, cointegration (e.g., Engle-Granger test), and independence, helping quants validate strategy assumptions.

How do AI and algorithms leverage these concepts?

AI models can automatically detect relationships like cointegration or conditional independence, improving strategy development, regime detection, and risk modeling.

What are the risks of ignoring these concepts?

Ignoring them can lead to overfitting, poor or wrong diversification, or failed hedges—ultimately resulting in unexpected drawdowns during market shifts.

Are these relationships stable over time?

Not always. Market regimes, macro events, and structural shifts can alter statistical relationships. Continuous monitoring and model updates are essential.

Acknowledgements

This blog post draws heavily from the information and insights presented in the following texts:

Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer. https://link.springer.com/book/10.1007/978-0-387-21736-9

1. Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Duxbury. https://www.cengage.com/c/statistical-inference-2e-casella-berger/9780534243128/

2. Ross, S. M. (2014). A First Course in Probability (9th ed.). Pearson.
https://www.pearson.com/en-us/subject-catalog/p/first-course-in-probability-a/P200000006334/9780134753119

3. Rodgers, J. L., & Nicewander, W. A. (1988). Thirteen Ways to Look at the Correlation Coefficient. The American Statistician, 42(1), 59–66 https://www.tandfonline.com/doi/abs/10.1080/00031305.1988.10475524

Disclaimer: This blog post is for informational and educational purposes only. It does not constitute financial advice or a recommendation to trade any specific assets or employ any specific strategy. All trading and investment activities involve significant risk. Always conduct your own thorough research, evaluate your personal risk tolerance, and consider seeking advice from a qualified financial professional before making any investment decisions.

From Logistic to Random Forests: Mastering Non-linear Regression Models

Vivek Krishnamoorthy — Fri, 02 May 2025 11:19:30 GMT

By: Vivek Krishnamoorthy, Aacashi Nawyndder and Udisha Alok

Ever wish you had a crystal ball for the financial markets? While we can't quite do that, regression is a super useful tool that helps us find patterns and relationships hidden in data – it's like being a data detective!

The most common starting point is linear regression, which is basically about drawing the best straight line through data points to see how things are connected. Simple, right?

In Part 1 of this series, we explored ways to make those line-based models even better, tackling things like curvy relationships (Polynomial Regression) and messy data with too many variables (using Ridge and Lasso Regression). We learned how to refine those linear predictions.

But what if a line (even a curvy one) just doesn't fit? Or what if you need to predict something different, like a "yes" or "no"?

Get ready for Part 2, my friend! Where we venture beyond the linear world and explore a fascinating set of regression techniques designed for different kinds of problems:

Logistic Regression: For predicting probabilities and binary outcomes (Yes/No).
Quantile Regression: For understanding relationships at different points in the data distribution, not just the average (great for risk analysis!).
Decision Tree Regression: An intuitive flowchart approach for complex, non-linear patterns.
Random Forest Regression: Harnessing the "wisdom of the crowd" by combining multiple decision trees for accuracy and stability.
Support Vector Regression (SVR): A powerful method using "margins" to handle complex relationships, even in high dimensions.

Let's dive into these powerful tools and see how they can unlock new insights from financial data!

Prerequisites

Hey there! Before we get into the good stuff, it helps to be familiar with a few key concepts. You can still follow along intuitively, but brushing up on these will give you a much better understanding. Here’s what to check out:

1. Statistics and Probability
Know the essentials—mean, variance, correlation, and probability distributions. New to this? Probability Trading is a great intro.

2. Linear Algebra Basics
Basics like matrices and vectors are super useful, especially for techniques like Principal Component Regression.

3. Regression Fundamentals
Get comfy with linear regression and its assumptions. Linear Regression in Finance is a solid starting point.

4. Financial Market Knowledge
Terms like stock returns, volatility, and market sentiment will come up a lot. Statistics for Financial Markets can help you brush up.

5. Explore Part 1 of This Series
Check out Part 1 for an overview of Polynomial, Ridge, Lasso, Elastic Net, and LARS. It’s not mandatory, but it provides excellent context for different regression types.

Once you're good with these, you’ll be all set to dive deeper into how regression techniques reveal insights in finance. Let’s get started!

What Exactly is Regression Analysis?

At its core, regression analysis models the relationship between a dependent variable (the outcome we want to predict) and one or more independent variables (predictors).

Think of it as figuring out the connection between different things – for instance, how does a company's revenue (the outcome) relate to how much they spend on advertising (the predictor)? Understanding these links helps you make educated guesses about future outcomes based on what you know.

When that relationship looks like a straight line on a graph, we call it linear regression – nice and simple!

What Makes These Models 'Non-Linear'?

Good question! In Part 1, we mentioned that 'linear' in regression refers to how the model's coefficients are combined.

Non-linear models, like the ones we're exploring here, break that rule. Their underlying equations or structures don't just add up coefficients multiplied by predictors in a simple way. Think about Logistic Regression using that S-shaped curve (sigmoid function) to squash outputs between 0 and 1, or Decision Trees making splits based on conditions rather than a smooth equation, or SVR using 'kernels' to handle complex relationships in potentially higher dimensions.

These methods fundamentally work differently from linear models, allowing them to capture patterns and tackle problems (like classification or modelling specific data segments) that linear models often can't.

Logistic (or Logit) regression

You use Logistic regression when the dependent variable (here, a dichotomous variable) is binary (think of it as a "yes" or "no" outcome, like a stock going up or down). It helps predict the binary outcome of an occurrence based on the given data.

It is a non-linear model that gives a logistic curve with values limited to between 0 and 1. This probability is then compared to a threshold value of 0.5 to classify the data. So, if the probability for a class is more than 0.5, we label it as 1; otherwise, it is 0.

This model is generally used to predict the performance of stocks.

Note: You can not use linear regression here because it could give values outside the 0 to 1 range. Also, the dependent variable can take only two values here, so the residuals won’t be normally distributed about the predicted line.

Want to learn more? Check out this blog for more on logistic regression and how to use Python code to predict stock movement.

Source

Quantile Regression: Understanding Relationships Beyond the Average

Traditional linear regression models predict the mean of a dependent variable based on independent variables. However, financial time series data often contain skewness and outliers, making linear regression unsuitable.

To solve this problem, Koenker and Bassett (1978) introduced quantile regression. Instead of modeling just the mean, it helps us see the relationship between variables at different points (quantiles and percentiles) in the dependent variable's distribution, such as:

10th percentile (low gains/losses)
50th percentile (median returns)
99th percentile (high gains/losses)

It estimates different quantiles (like medians or quartiles) of the dependent variables for the given independent variables, instead of just the mean. We call these conditional quantiles.

Source

Like OLS regression coefficients, which show the changes from one-unit changes of the predictor variables, quantile regression coefficients show the changes in the specified quantile from one-unit changes in the predictor variables.

Advantages:

Robustness to Outliers: According to Lim et al. (2020), regular linear regression assumes errors in the data are normally distributed, but this isn't reliable when you have outliers or extreme values ("fat tails"). Quantile regression handles outliers better because it focuses on minimizing absolute errors, not the squared ones like regular regression. This way the influence of extreme values is reduced, providing more reliable estimates in datasets that aren’t really “well behaved” (with heavy tails or skewed distributions)
Estimating Conditional Median: The conditional median is estimated using the median estimator, which minimizes the sum of absolute errors.
Handling Heteroskedasticity: OLS assumes constant variance of errors (homoskedasticity), but this is often unrealistic. Quantile regression allows for varying error variances, making it effective when predictor variables influence different parts of the response variable’s distribution (Koenker & Bassett, 1978).

Let’s look at an example to better understand how quantile regression works:

Let's say you're trying to understand how the overall "mood" of the market (measured by a sentiment index) affects the daily returns of a particular stock. Traditional regression would tell you the average impact of a change in sentiment on the average stock return.

But what if you're particularly interested in extreme movements? Quantile regression is used here:

Looking at the 10th percentile: You could use quantile regression to see how a negative shift in market sentiment affects the worst 10% of potential daily returns (the big losses). It might show that negative sentiment has a much stronger negative impact during these extreme downturns than it does on average.
Looking at the 90th percentile: Similarly, you could see how positive sentiment affects the best 10% of daily returns (the big gains). It might reveal that positive sentiment has a different (possibly larger or smaller) impact on these significant upward swings compared to the average.
Looking at the 50th percentile (median): You can also see the impact of sentiment on the typical daily return (the median), which might be different from the effect on the average if the return distribution is skewed.

So, instead of just one average effect, quantile regression gives you a more complete picture of how market sentiment influences different parts of the stock's return distribution, especially the potentially risky extreme losses. Isn’t that great?

Decision Trees Regression: The Flowchart Approach

Imagine trying to predict a numerical value – like the price of something or a company's future revenue. A Decision Tree offers an intuitive way to do this, working like a flowchart or a game of 'yes/no' questions.

A decision tree is divided into smaller and smaller subsets based on certain conditions related to the predictor variables. Think of it like this:

Decision trees start with your entire dataset and progressively splits it into smaller and smaller subsets at the nodes, thereby creating a tree-like structure. Each of the nodes where the data is split based on a condition is called an internal/split node, and the final subsets are called the terminal/leaf nodes.

In finance, decision trees may be used for classification problems like predicting whether the prices of a financial instrument will go up or down.

Source

Decision Tree Regression is when we use a decision tree to predict continuous values (like the price of a house or temperature) instead of categories (like predicting yes/no or up/down).

Here’s how it works in regression:

The tree asks a series of questions based on the input features (like “Is square footage > 1500?”).
Based on the answers, the data point moves down the tree until it reaches a leaf.
In that leaf, the prediction is the average (or sometimes the median) of the actual values from the training data that also landed there.

So, the tree splits the data into groups, and each group gets a fixed number as the prediction.

Things to Watch Out For:

Overfitting: Decision trees can get too detailed and match the training data too perfectly, making them perform poorly on new, unseen data.
Instability: Small changes in the training data can sometimes lead to significantly different tree structures. (Techniques like Random Forests and Gradient Boosting often help with this).

You have a full description of the model in this blog and its use in trading in this blog.

To learn more about decision trees in trading check out this Quantra course.

Let’s see a situation where this might be a useful tool:

Imagine you're trying to predict a company's sales revenue for the next quarter. You have data on its past performance and factors like: marketing spend in the current quarter, number of salespeople, the company's industry sector (e.g., Tech, Retail, Healthcare), etc.

The tree might ask:

"Marketing spend > $500k?" If yes, "Industry = Tech?". Based on the path taken, you land on a leaf.

The prediction for a new company following that path would be the average revenue of all past companies that fell into that same leaf (e.g., the average revenue for tech companies with high marketing spend).

Random forest regression: Wisdom of the Crowd for Predictions

Remember how individual Decision Trees can sometimes be a bit unstable or might overfit the training data? What if we could harness the power of many decision trees instead of relying on just one?

That's the idea behind Random Forest Regression!

It's an "ensemble" method, meaning it combines multiple models (in this case, decision trees) to achieve better performance than any single one could alone. You can think of it using the "wisdom of the crowd" principle: instead of asking one expert, you ask many, slightly different experts and combine their insights. Generally, Random Forests perform significantly better than individual decision trees (Breiman, 2001).

How does the forest get “random”?

The "random" part of Random Forest comes from two key techniques used when building the individual trees:

Random Data Subsets (Bootstrapping): Each tree in the forest is trained on a slightly different random sample of the original training data. This sample can be chosen "with replacement" (meaning some data points might be selected multiple times, and some might be left out for that specific tree). This ensures each tree sees a slightly different perspective of the data.
Random Feature Subsets: When deciding how to split the data at each step inside a tree, the algorithm can only consider a random selection of the input features, not all of them. This stops one or two powerful features from dominating all the trees and encourages diversity.

Making Predictions (Regression = Averaging)

To predict a value for new data, you run it through every tree in the forest. Each tree gives its own prediction. The Random Forest's final prediction is simply the average of all those individual tree predictions. This averaging smooths things out and makes the model much more stable.

Image representation of a Random forest regressor

Why Use Random Forest Regression?

High Accuracy: Often provides very accurate predictions.
Robustness: Less prone to overfitting compared to single decision trees and handles outliers reasonably well. (Breiman, L. , 2001)
Non-linearity: Easily captures complex, non-linear relationships.
Feature Importance: Can provide estimates of which predictors are most important.

Things to Consider:

Interpretability: It acts more like a "black box." It's harder to understand exactly why it made a specific prediction compared to visualizing a single decision tree.
Computation: Training many trees can be computationally intensive and require more memory.

Check out this post if you want to learn more about random forests and how they can be used in trading.

Think we’d leave you hanging? No way!

Here’s an example to help you better understand how random forests work in practice:

You want to predict how much a stock's price will swing (its volatility) next month, using data like recent volatility, trading volume, and market fear (VIX index).

A single decision tree might latch onto a specific pattern in the past data and give a jumpy prediction. A Random Forest approach is more robust:

It builds hundreds of trees. Each tree sees slightly different historical data and considers different feature combinations at each split. Each tree estimates the volatility. The final prediction is the average of all these estimates, giving a more stable and reliable forecast of future volatility than one tree alone could provide.

Support vector regression (SVR): Regression Within a 'Margin’ of Error

You might be familiar with Support Vector Machines (SVM) for classification. Support Vector Regression (SVR) takes the core ideas of SVM and applies them to regression tasks – that is, predicting continuous numerical values.

SVR approaches regression a bit differently than many other methods. While methods like standard linear regression try to minimize the error between the predicted and actual values for all data points, SVR has a different philosophy.

The Epsilon (ε) Insensitive Tube:

Imagine you're trying to fit a line (or curve) through your data points. SVR tries to find a "tube" or "street" around this line with a certain width, defined by a parameter called epsilon (ε). The goal is to fit as many data points as possible inside this tube.

Image representation of Support vector regression: Source

Here's the key idea: For any data points that fall inside this ε-tube, SVR considers the prediction "good enough" and ignores their error. It only starts penalizing errors for points that fall outside the tube. This makes SVR less sensitive to small errors compared to methods that try to get every point perfect. The regression line (or hyperplane in higher dimensions) runs down the middle of this tube.

Handling Curves (Non-Linearity):

What if the relationship between your predictors and the target variable isn't straight? SVR uses a "kernel trick". This is like projecting the data into a higher-dimensional space where a complex, curvy relationship might look like a simpler straight line (or flat plane). By finding the best "tube" in this higher dimension, SVR can effectively model non-linear patterns. Common kernels include linear, polynomial, and RBF (Radial Basis Function). The best choice depends on the data.

Pros:

Effective in high-dimensional spaces.
Can model non-linear relationships using kernels.
The ε-margin offers some robustness to small errors/outliers (Muthukrishnan & Jamila, 2020).

Cons:

Can be computationally slow on large datasets.
Performance is sensitive to parameter tuning (choosing ε, a cost parameter C, and the right kernel).
Interpretability can be less direct than linear regression.

The explanation for the whole model can be found here.

And if you want to learn more about how support vector machines can be used in trading, be sure to check out this blog, my friend!

By now, you probably know how this works, so let’s look at a real-life example that uses SVR:

Think about predicting the price of a stock option (like a call or put). Option prices depend on several complex, non-linear factors: the underlying stock's price, time left until expiration, expected future volatility (implied volatility), interest rates, etc.

SVR (especially with a non-linear kernel like RBF) is suitable for this. It can capture these complex relationships using the kernel trick. The ε-tube focuses on getting the prediction within an acceptable small range (e.g., predicting the price +/- 5 cents), rather than stressing about tiny deviations for every single option.

Summary

Regression Model	One-Line Summary	One-Line Use Case
Logistic Regression	Predicts the probability of a binary outcome.	Predicting whether a stock will go up or down.
Quantile Regression	Models relationships at different quantiles of the dependent variable's distribution.	Understanding how market sentiment affects extreme stock price movements.
Decision Trees Regression	Predicts continuous values by partitioning data into subsets based on predictor variables.	Predicting a company's sales revenue based on various factors.
Random Forest Regression	Improves prediction accuracy by averaging predictions from multiple decision trees.	Predicting the volatility of a stock.
Support Vector Regression (SVR)	Predicts continuous values by finding a "tube" that best fits the data.	Predicting option prices, which depend on several non-linearly related factors.

Conclusion

And that concludes our tour through the more diverse landscapes of regression! We've seen how Logistic Regression helps us tackle binary predictions, how Quantile Regression gives us insights beyond the average, especially for risk, and how Decision Trees and Random Forests offer intuitive yet powerful ways to model complex, non-linear relationships. Finally, Support Vector Regression provides a unique, margin-based approach practical even in high-dimensional spaces.

From the refined linear models in Part 1 to the varied techniques explored here, you now have a much broader regression toolkit at your disposal. Each model has its strengths and is suited for different financial questions and data challenges.

The key takeaway? Regression is not a one-size-fits-all solution. Understanding the nuances of different techniques allows you to choose the right tool for the job, leading to more insightful analysis and powerful predictive models.

And as you continue learning my friend, don’t just stop at theory. Keep exploring, keep practicing with real data, and keep refining your skills. Happy modeling!

Perhaps you're keen on a complete, holistic understanding of regression applied directly to trading? In that case, check out this Quantra course.

If you're serious about taking your skills to the next level, consider QuantInsti’s EPAT program—a solid path to mastering financial algorithmic trading.

With the right training and guidance from industry experts, it can be possible for you to learn it as well as Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. These and various aspects of Algorithmic trading are covered in this algo trading course. EPAT equips you with the required skill sets to build a promising career in algorithmic trading. Be sure to check it out.

References

Koenker, R., & Bassett, G. (1978). Regression quantiles. Econometrica, 46(1), 33–50. https://doi.org/10.2307/1913643
Lim, D., Park, B., Nott, D., Wang, X., & Choi, T. (2020). Sparse signal shrinkage and outlier detection in high-dimensional quantile regression with variational Bayes. Statistica Sinica, 13(2), 1. https://archive.intlpress.com/site/pub/files/_fulltext/journals/sii/2020/0013/0002/SII-2020-0013-0002-a008.pdf
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://link.springer.com/article/10.1023/A:1010933404324
Muthukrishnan, R., & Jamila, S. M. (2020). Predictive modeling using support vector regression. International Journal of Scientific & Technology Research, 9(2), 4863–4875. Retrieved from https://www.ijstr.org/final-print/feb2020/Predictive-Modeling-Using-Support-Vector-Regression.pdf

Disclaimer: All investments and trading in the stock market involve risk. Any decision to place trades in the financial markets, including trading in stock or options or other financial instruments, is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

Beyond the Straight Line: Advanced Linear Regression Models for Financial Data

Vivek Krishnamoorthy — Thu, 01 May 2025 11:19:00 GMT

By: Vivek Krishnamoorthy, Aacashi Nawyndder and Udisha Alok

Ever feel like financial markets are just unpredictable noise? What if you could find hidden patterns? That's where a cool tool called regression comes in! Think of it like a detective for data, helping us spot relationships between different things.

The simplest starting point is linear regression – basically, drawing the best straight line through data points to see how things connect. (We assume you've got a handle on the basics, maybe from our intro blog linked in the prerequisites!).

But what happens when a straight line isn't enough, or the data gets messy? In Part 1 of this two-part series, we'll upgrade your toolkit! We're moving beyond simple straight lines to tackle common headaches in financial modeling. We'll explore how to:

Model non-linear trends using Polynomial Regression.
Deal with correlated predictors (multicollinearity) using Ridge Regression.
Automatically select the most important features from a noisy dataset using Lasso Regression.
Get the best of both worlds with Elastic Net Regression.
Efficiently find key predictors in high-dimensional data with Least Angle Regression (LARS).

Get ready to add some serious power and finesse to your linear modeling skills!

Prerequisites

Hey there! Before diving in, getting familiar with a few key concepts is a good ideawe dive in, it’s a good idea to get familiar with a few key concepts. You can still follow along without them, but having these basics down will make everything click much easier. Here’s what you should check out:

1. Statistics and Probability
Know the basics—mean, variance, correlation, probability distributions. New to this? Probability Trading is a solid starting point.

2. Linear Algebra Basics
Matrices and vectors come in handy, especially for advanced stuff like Principal Component Regression.

3. Regression Fundamentals
Understand how linear regression works and the assumptions behind it. Linear Regression in Finance breaks it down nicely.

4. Financial Market Knowledge
Brush up on terms like stock returns, volatility, and market sentiment. Statistics for Financial Markets is a great refresher.

Once you've got these covered, you're ready to explore how regression can unlock insights in the world of finance. Let’s jump in!

Acknowledgements

This blog post draws heavily from the information and insights presented in the following texts:

Gujarati, D. N. (2011). Econometrics by example. Basingstoke, UK: Palgrave Macmillan.
Fabozzi, F. J., Focardi, S. M., Rachev, S. T., & Arshanapalli, B. G. (2014). The basics of financial econometrics: Tools, concepts, and asset management applications. Hoboken, NJ: Wiley.
Diebold, F. X. (2019). Econometric data science: A predictive modeling approach. University of Pennsylvania. Retrieved from http://www.ssc.upenn.edu/~fdiebold/Textbooks.html
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R. New York, NY: Springer.

Table of contents:

What Exactly is Regression Analysis?
So, Why Do We Call These 'Linear' Models?
Building the Basics
Advanced Models

What Exactly is Regression Analysis?

At its core, regression analysis models the relationship between a dependent variable (the outcome we want to predict) and one or more independent variables (predictors).

Before we dive deeper, let's quickly recap what linear regression is.

So, Why Do We Call These 'Linear' Models?

Great question! You might look at something like Polynomial Regression, which models curves, and think, 'Wait, that doesn't look like a straight line!' And you'd be right, visually.

But here's the key: in the world of regression, when we say 'linear,' we're actually talking about the coefficients – those 'beta' values (β) we estimate. A model is considered linear if the equation used to predict the outcome is a simple sum (or linear combination) of these coefficients multiplied by their respective predictor terms. Even if we transform a predictor (like squaring it for a polynomial term), the way the coefficient affects the outcome is still direct and additive.

All the models in this post—polynomial, Ridge, Lasso, Elastic Net, and LARS—follow this rule even though they tackle complex data challenges far beyond a simple straight line.

Building the Basics

From Simple to Multiple Regression

In our previous blogs, we’ve discussed linear regression, its use in finance, its application to financial data, and its assumptions and limitations. So, we'll do a quick recap here before moving on to the new material. Feel free to skip this part if you're already comfortable with it.

Simple linear regression

Simple linear regression studies the relationship between two continuous variables- an independent variable and a dependent variable.

Source

The equation for this looks like:

$$ y_i = \beta_0 + \beta_1 X_i + \epsilon_i \qquad \text{-(1)} $$

Where:

$\beta_0$ is the intercept
$\beta_1$ is the slope
$\epsilon_i$ is the error term

In this equation, ‘y’ is the dependent variable, and ‘x’ is the independent variable.
The error term captures all the other factors that influence the dependent variable other than the independent variable.

Multiple linear regression

Now, what happens when more than one independent variable influences a dependent variable? That's where multiple linear regression comes in.

Here's the equation with three independent variables:

$$ y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \beta_3 X_{i3} + \epsilon_i \qquad \text{-(2)} $$

Where:

$\beta_0, \beta_1, \beta_2, \beta_3$ are the model parameters
$\epsilon_i$ is the error term

This extension allows modeling more complex relationships in finance, such as predicting stock returns based on economic indicators. You can read more about them here.

Advanced Models

Polynomial Regression: Modeling Non-Linear Trends in Financial Markets

Linear regression works well to model linear relationships between the dependent and independent variables. But what if the relationship is non-linear?

In such cases, we can add polynomial terms to the linear regression equation to get a better fit for the data. This is called polynomial regression.

So, polynomial regression uses a polynomial equation to model the relationship between the independent and dependent variables.

The equation for a kth order polynomial goes like:

$$ y_i = \beta_0 + \beta_1 X_{i} + \beta_2 X_{i2} + \beta_3 X_{i3} + \beta_4 X_{i4} + \ldots + \beta_k X_{ik} + \epsilon_i \qquad \ $$

Choosing the right polynomial order is super important, as a higher-degree polynomial could overfit the data. So we try to keep the order of the polynomial model as low as possible.

There are two types of estimation approaches to choosing the order of the model:

Forward selection procedure:
This method starts simple, building a model by adding terms one by one in increasing order of the polynomial.
Stopping condition: The process stops when adding a higher-order term doesn't significantly improve the model's fit, as determined by a t-test of the iteration term.
Backward elimination procedure:
This method starts with the highest order polynomial and simplifies it by removing terms one by one.
Stopping condition: The process stops when removing a term significantly worsens the model's fit, as determined by a t-test.

Tip: The first- and second-order polynomial regression models are the most commonly used. Polynomial regression is better for a large number of observations, but it's equally important to note that it is sensitive to the presence of outliers.

The polynomial regression model can be used to predict non-linear patterns like what we find in stock prices. Do you want a stock trading implementation of the model? No problem, my friend! You can read all about it here.

Ridge Regression Explained: When More Predictors Can Be a Good Thing

Remember how we talked about linear regression, assuming no multicollinearity in the data? In real life though, many factors can move together. When multicollinearity exists, it can cause wild swings in the coefficients of your regression model, making it unstable and hard to trust.

Ridge regression is your friend here!
It helps reduce the standard error and prevent overfitting, stabilizing the model by adding a small "penalty" based on the size of the coefficients (Kumar, 2019).

This penalty (called L2 regularization) discourages the coefficients from becoming too large, effectively "shrinking" them towards zero. Think of it like gently nudging down the influence of each predictor, especially the correlated ones, so the model doesn't overreact to small changes in the data.
Optimal penalty strength (lambda, λ) selection is important and often involves methods like cross-validation.

Caution: While the OLS estimator is scale-invariant, the ridge regression is not. So, you need to scale the variables before applying ridge regression.

Ridge regression decreases the model complexity but doesn’t reduce the number of variables (as it can shrink the coefficients close to zero but doesn’t make them exactly zero).
So, it cannot be used for feature selection.

Let’s see an intuitive example for better understanding:

Imagine you're trying to build a model to predict the daily returns of a stock. You decide to use a whole bunch of technical indicators as your predictors – things like different moving averages, RSI, MACD, Bollinger Bands, and many more. The problem is that many of these indicators are often correlated with each other (e.g., different moving averages tend to move together).

If you used standard linear regression, these correlations could lead to unstable and unreliable coefficient estimates. But thankfully, you recall reading that QuantInsti blog on Ridge Regression – what a relief! It uses every indicator but dials back their individual influence (coefficients) towards zero. This prevents the correlations from causing wild results, leading to a more stable model that considers everything fairly.

Ridge Regression is used in various fields, one such example being credit scoring. Here, you could have many financial indicators (like income, debt levels, and credit history) that are often correlated. Ridge Regression ensures that all these relevant factors contribute to predicting credit risk without the model becoming overly sensitive to minor fluctuations in any single indicator, thus improving the reliability of the credit score.
Getting excited about what this model can do? We are too! That's precisely why we've prepared this blog post for you.

Lasso regression: Feature Selection in Regression

Now, what happens if you have tons of potential predictors, and you suspect many aren't actually very useful? Lasso (Least Absolute Shrinkage and Selection Operator) regression can help. Like Ridge, it adds a penalty to prevent overfitting, but it uses a different type (called L1 regularization) based on the absolute value of the coefficients. (While Ridge Regression uses the square of the coefficients.)

This seemingly small difference in the penalty term has a significant impact. As the Lasso algorithm tries to minimize the overall cost (including this L1 penalty), it has a tendency to shrink the coefficients of less important predictors all the way to absolute zero.

So, it can be used for feature selection, effectively identifying and removing irrelevant variables from the model.

Note: Feature selection in Lasso regression is data-dependent (Fonti, 2017).

Below is a really useful example of how Lasso regression shines!

Imagine you're trying to predict how a stock will perform each week. You've got tons of potential clues – interest rates, inflation, unemployment, how confident consumers are, oil and gold prices, you name it. The thing is, you probably only need to pay close attention to a few of these.

Because many indicators move together, standard linear regression struggles, potentially giving unreliable results. That's where Lasso regression steps in as a smart way to cut through the noise. While it considers all the indicators you feed it, its unique L1 penalty automatically shrinks the coefficients (influence) of less useful ones all the way to zero, essentially dropping them from the model. This leaves you with a simpler model showing just the key factors influencing the stock's performance, instead of an overwhelming list.

This kind of smart feature selection makes Lasso really handy in finance, especially for things like predicting stock prices. It can automatically pick out the most influential economic indicators from a whole bunch of possibilities. This helps build simpler, easier-to-understand models that focus on what really moves the market.

Want to dive deeper? Check out this paper on using Lasso for stock market analysis.

Feature	Ridge Regression	Lasso Regression
Regularization Type	L2 (sum of squared coefficients)	L1 (sum of absolute coefficients)
Effect on Coefficients	Shrinks but retains all predictors	Shrinks some coefficients to zero (feature selection)
Multicollinearity Handling	Shrinks correlated coefficients to similar values	Keeps one correlated variable, others shrink to zero
Feature Selection?	❌ No	✅ Yes
Best Use Case	When all predictors are important	When many predictors are irrelevant
Works Well When	Large number of significant predictor variables	High-dimensional data with only a few key predictors
Overfitting Control	Reduces overfitting by shrinking coefficients	Reduces overfitting by both shrinking and selecting variables
When to Choose?	Preferable when multicollinearity exists and all predictors have some influence	Best for simplifying models by selecting the most relevant predictors

Elastic net regression: Combining Feature Selection and Regularization

So, we've learned about Ridge and Lasso regression. Ridge is great at shrinking coefficients and handling situations with correlated predictors, but it doesn't zero out coefficients entirely (keeping all features) while Lasso is excellent for feature selection, but may struggle a bit when predictors are highly correlated (sometimes just picking one from a group somewhat randomly).

What if you want the best of both? Well, that's where Elastic Net regression comes in – an innovative hybrid, combining both Ridge and Lasso Regression.

Instead of choosing one or the other, it uses both the L1 penalty (from Lasso) and the L2 penalty (from Ridge) together in its calculations.

Source

How does it work?

Elastic Net adds a penalty term to the standard linear regression cost function that mixes the Ridge and Lasso penalties. You can even control the "mix" – deciding how much emphasis to put on the Ridge part versus the Lasso part. This allows it to:

Perform feature selection like Lasso regression.
Provide regularization to prevent overfitting.
Handle Correlated Predictors: Like Ridge, it can deal well with groups of predictors that are related to each other. If there's a group of useful, correlated predictors, Elastic Net tends to keep or discard them together, which is often more stable and interpretable than Lasso's tendency to pick just one.

You can read this blog to learn more about ridge, lasso and elastic net regressions, along with their implementation in Python.

Here's an example to make it clearer:

Let's go back to predicting next month's stock return using many data points (past performance, market trends, economic rates, competitor prices, etc.). Some predictors might be useless noise, and others might be related (like different interest rates or competitor stocks). Elastic Net can simplify the model by zeroing out unhelpful predictors (feature selection) and handle the groups of related predictors (like interest rates) together, leading to a robust forecast.

Least angle regression: An Efficient Path to Feature Selection

Now, imagine you're trying to build a linear regression model, but you have a lot of potential predictor variables – maybe even more variables than data points!

This is a common issue in fields like genetics or finance. How do you efficiently figure out which variables are most important?

Least Angle Regression (LARS) offers an interesting and often computationally efficient way to do this. Think of it as a smart, automated process for adding predictors to your model one by one, or sometimes in small groups. It's a bit like forward stepwise regression, but with a unique twist.

How does LARS work?

LARS builds the model piece by piece focusing on the correlation between the predictors and the part of the dependent variable (the outcome) that the model hasn't explained yet (the "residual"). Here’s the gist of the process:

Start Simple: Begin with all predictor coefficients set to zero. The initial "residual" is just the response variable itself.
Find the Best Friend: Identify the predictor variable with the highest correlation with the current residual.
Give it Influence: Start increasing the importance (coefficient) of this "best friend" predictor. As its importance grows, the model starts explaining things, and the leftover "residual" shrinks. Keep doing this just until another predictor perfectly matches the first one in how strongly it's linked to the current residual.
The "Least Angle" Move: Now you have two predictors tied for being most correlated with the residual. LARS cleverly increases the importance of both these predictors together. It moves in a specific direction (called the "least angle" or "equiangular" direction) such that both predictors maintain their equal correlation with the shrinking residual.

Geometric representation of LARS: Source

Keep Going: Continue this process. As you go, a third (or fourth, etc.) predictor might eventually catch up and tie the others in its connection to the residual. When that happens, it joins the "active set" and LARS adjusts its direction again to keep all three (or more) active predictors equally correlated with the residual.
Full Path: This continues until all predictors you're interested in are included in the model.

LARS and Lasso:

Interestingly, LARS is closely related to Lasso regression. A slightly modified version of the LARS algorithm is actually a very efficient way to compute the entire sequence of solutions for Lasso regression across all possible penalty strengths (lambda values). So, while LARS is its own algorithm, it provides insight into how variables enter a model and gives us a powerful tool for exploring Lasso solutions.

But, why use LARS?

It's particularly efficient when you have high-dimensional data (many, many features).
It provides a clear path showing the order in which variables enter the model and how their coefficients evolve.

Caution: Like other forward selection methods, LARS can be sensitive to noise.

Use case: LARS can be used to identify Key Factors Driving Hedge Fund Returns:

Imagine you're analyzing a hedge fund's performance. You suspect that various market factors drive its returns, but there are dozens, maybe hundreds, you could consider: exposure to small-cap stocks, value stocks, momentum stocks, different industry sectors, currency fluctuations, etc. You have way more potential factors (predictors) than monthly return data points.

Running standard regression is difficult here. LARS handles this "too many factors" scenario effectively.

Its real advantage here is showing you the order in which different market factors become essential for explaining the fund's returns, and exactly how their influence builds up.

This gives you a clear view of the primary drivers behind the fund's performance. And helps build a simplified model highlighting the key systematic drivers of the fund's performance, navigating the complexity of numerous potential factors efficiently.

Summary

Regression Model	One-Line Summary	One-Line Use Case
Simple Linear Regression	Models the linear relationship between two variables.	Understanding how a company's revenue relates to its advertising spending.
Multiple Linear Regression	Models the linear relationship between one dependent variable and multiple independent variables.	Predicting stock returns based on several economic indicators.
Polynomial Regression	Models non-linear relationships by adding polynomial terms to a linear equation.	Predicting non-linear patterns in stock prices.
Ridge Regression	Reduces multicollinearity and overfitting by shrinking the magnitude of regression coefficients.	Predicting stock returns with many correlated technical indicators.
Lasso Regression	Performs feature selection by shrinking some coefficients to exactly zero.	Identifying which economic factors most significantly drive stock returns.
Elastic Net Regression	Combines Ridge and Lasso to balance feature selection and multicollinearity reduction.	Predicting stock returns using a large number of potentially correlated financial data points.
Least Angle Regression (LARS)	Efficiently selects important predictors in high-dimensional data.	Identifying key factors driving hedge fund returns from a large number of potential market influences.

Conclusion

Phew! We've journeyed far beyond basic straight lines!

You've now seen how Polynomial Regression can capture market curves, how Ridge Regression stabilizes models when predictors move together, and how Lasso, Elastic Net, and LARS act like smart filters, helping you select the most crucial factors driving financial outcomes.

These techniques are essential for building more robust and reliable models from potentially complex and high-dimensional financial data.

But the world of regression doesn't stop here! We've focused on refining and extending linear-based approaches.

What happens when the problem itself is different? What if you want to predict a "yes/no" outcome, focus on predicting extreme risks rather than just the average, or model incredibly complex, non-linear patterns?

That's precisely what we'll tackle in Part 2! Join us next time as we explore a different side of regression, diving into techniques like Logistic Regression, Quantile Regression, Decision Trees, Random Forests, and Support Vector Regression. Get ready to expand your predictive modeling horizons even further!

Getting good at this stuff really comes down to rolling up your sleeves and practicing! Try playing around with these models using Python or R and some real financial data – you'll find plenty of tutorials and projects out there to get you started.

For a complete, holistic view of regression and its power in trading, you might want to check out this Quantra course.

And if you're thinking about getting serious with algorithmic trading, checking out something like QuantInsti’s EPAT program could be a great next step to really boost your skills for a career in the field.

Understanding regression analysis is a must-have skill for anyone aiming to succeed in financial modeling or trading strategy development.

So, keep practicing—and soon you'll be making smart, data-driven decisions like a pro!

With the right training and guidance from industry experts, it can be possible for you to learn it as well as Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. These and various aspects of Algorithmic trading are covered in this algo trading course. EPAT equips you with the required skill sets to build a promising career in algorithmic trading. Be sure to check it out.

References

Fonti, V. (2017). Feature selection using LASSO. Research Paper in Business Analytics. Retrieved from https://vu-business-analytics.github.io/internship-office/papers/paper-fonti.pdf
Kumar, D. (2019). Ridge regression and Lasso estimators for data analysis. Missouri State University Theses, 8–10. Retrieved from https://bearworks.missouristate.edu/cgi/viewcontent.cgi?article=4406&context=theses
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2003, January 9). Least Angle Regression. Statistics Department, Stanford University.
https://hastie.su.domains/Papers/LARS/LeastAngle_2002.pdf
Taboga, Marco (2021). "Ridge regression", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/ridge-regression

Exploring the Chain Rule with Step-by-Step Examples

QuantInsti — Fri, 11 Apr 2025 12:39:00 GMT

By Varun Divakar

In this blog on “Understanding the chain rule,” we will learn the math behind the application of chain rule with the help of an example.

What is a derivative?
What is the Chain Rule?
Understanding the Chain Rule
Example of Chain Rule

For those of you who are interested in Neural Networks and Deep Learning, the process of backpropagation is a very important concept which is extensively used while creating these advanced models. While performing backpropagation, we use the concept of chain rule to backpropagate the error values in prediction to adjust the weights.

To be able to understand this unit, you should know what a derivative is.

What is a derivative?

Don’t sweat it, in case you don’t know or don’t remember the same, you can learn about it on the glossary section of Quantra website.

What is the Chain Rule?

The chain rule is basically a formula for computing the derivative of a composition of two or more functions.

Understanding the Chain Rule

Let us say that f and g are functions, then the chain rule expresses the derivative of their composition as f ∘ g (the function which maps x to f(g(x)) ). The derivative of this composition is calculated as mentioned below.

Here f is the function of g and g is a function of variable x.

Another way of writing the above rule:

Where the function F represents the composite function f(g(x))

Let us say that we have three variables x, y and z such that, the variable z depends on the variable y, which in turn depends on the variable x. So y and z are dependent variables, and z, via the intermediate variable of y, depends on x. Then the chain rule for differentiating the variable z may be written in the following manner.

This is the final formula that we use in backpropagation.

Here z is the function of y,

z = f(y)

and y is a function of x,

y= g(x)

Using the previous formula, we can rewrite the differential equation as follows:

Let us understand this better with the help of an example.

Example of Chain Rule

Let us understand the chain rule with the help of a well-known example from Wikipedia. Assume that you are falling from the sky, the atmospheric pressure keeps changing during the fall. Check out the graph below to understand this change.

At the time of your fall, 4000 meters above sea level, the initial velocity was zero, and the gravity is 9.8 meters per second squared. Now compare this situation to the previous chain rule equation. Let us say that the variable x in the equation is variable t, or time.

Then the variable y or g(t), which is the distance travelled by you since the beginning of the fall is given by

g(t) = 0.5*9.8t²

So, the height from the mean sea level can be given by the variable h, which is

h = 4000 - g(t)

Let us say that we also know, based on a model, the atmospheric pressure at a height h as:

f(h) = 101325 e^−0.0001h

These two equations can be differentiated by their respective variable to get the following information:

g′(t) = −9.8t,

where, g′(t) is the velocity of you at time t

f′(h) = −10.1325^e−0.0001h

where, f′(h) is the rate of change in atmospheric pressure with respect to height h

Now let us understand how we can combine these two equations to derive the

the rate of change in the atmospheric pressure with respect to time at t seconds after the skydiver's jump, using the chain rule:

This equation gives us the rate of change of atmospheric pressure with respect to time since fall. In neural networks, we will need to calculate the change in weights at each neuron with respect to the errors in prediction. As you might have imagined by now, the chain rule helps adjusts these weights accordingly.

Conclusion

If we want to apply the chain rule to backpropagate the error in neural networks, then we will be using an equation such as this.

In the Quantra’s course on Deep Learning in Trading with Dr. E. P. Chan, we will help you not only understand advanced concepts such as deep learning, but also apply them in the context of trading.

Disclaimer: All investments and trading in the stock market involve risk. Any decisions to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

Suggested Reads:

Bayesian Inference Methods and Formula Explained

Vivek Krishnamoorthy — Thu, 10 Apr 2025 12:52:00 GMT

By Vivek Krishnamoorthy

This post on Bayesian inference is the second of a multi-part series on Bayesian statistics and methods used in quantitative finance.

In my previous post, I gave a leisurely introduction to Bayesian statistics and while doing so distinguished between the frequentist and the Bayesian outlook of the world. I dwelt on how each of their underlying philosophies influenced their analysis of various probabilistic phenomena. I then discussed the Bayes' Theorem along with some illustrations to help lay the building blocks of Bayesian statistics.

Intent of this Post

My objective here is to help develop a deeper understanding of statistical analysis by focusing on the methodologies adopted by frequentist statistics and Bayesian statistics. I consciously choose to tackle the programming and simulation aspects using Python in my next post.

I now instantiate the previously discussed ideas with a simple coin-tossing example adapted from "Introduction to Bayesian Econometrics (2nd Edition)".

Example: A Repeated Coin-Tossing Experiment

Suppose we are interested in estimating the bias of a coin whose fairness is unknown. We define θ (the Greek letter 'theta') as the probability of getting a head after a coin is tossed. θ is the unknown parameter we want to estimate. We intend to do so by inspecting the results of tossing the coin multiple times. Let us denote y as a realization of the random variable Y (representing the outcome of a coin toss). Let Y=1 if a coin toss results in heads and Y=0 if a coin toss results in tails. Essentially, we are assigning 1 to heads and 0 to tails.

∴ P(Y=1|θ)=θ ; P(Y=0|θ)=1−θ

Based on our above setup, Y can be modelled as a Bernoulli distribution which we denote as

Y ∼ Bernoulli (θ)

I now briefly view our experimental setup through the lens of the frequentist and the Bayesian before proceeding with our estimation of the unknown parameter θ.

Two Perspectives on the Experiment Setup

In classical statistics (i.e. the frequentist approach), our parameter θ is a fixed but unknown value lying between 0 and 1. The data we collect is one realization of a recurrent (i.e. repeating this n-toss experiment say N times) experiment. Classical estimation techniques like the method of maximum likelihood are used to arrive at θ̄̂ (called 'theta hat'), an estimate for the unknown parameter θ. In statistics, we usually express an estimate by putting a hat over the name of the parameter. I dilate this idea in the next section. To restate what has been said previously, we observe that in the frequentist universe, the parameter is fixed but the data is varying.

Bayesian statistics is fundamentally different. Here, the parameter θ is treated as a random variable since there is uncertainty about its value. It, therefore, makes sense for us to regard our parameter as a random variable which will have an associated probability distribution. In order to apply Bayesian inference, we turn our attention to one of the fundamental laws of probability theory, Bayes' Theorem that we had seen previously.

I use the mathematical form of Bayes' Theorem as a way to establish a connection with Bayesian inference.

…….. (1)

To repeat what I said in my previous post, what makes this theorem so handy is it allows us to invert a conditional probability. So if we observe a phenomenon and collect data or evidence about it, the theorem helps us analytically define the conditional probability of different possible causes given the evidence.

Let's now apply this to our example by using the notations we had defined earlier. I label A = θ and B = y. In the field of Bayesian statistics, there are special names used for each of these terms which I spell out below and use subsequently. (1) can be rewritten as:

…….. (2)

where:

P(θ) is the prior probability. We express our belief about the cause θ BEFORE observing the evidence Y. In our example, the prior would be quantifying our a priori belief on the fairness of the coin (here we can start with the assumption that it is an unbiased coin, so θ = 1/2). P(Y|θ) is the likelihood. Here is where the real action happens. This is the probability of the observed sample or evidence given the hypothesized cause. Let us, without loss of generality, assume that we obtain 5 heads in 8 coin tosses. Presuming the coin to be unbiased as specified above, the likelihood would be the probability of observing 5 heads in 8 tosses given that θ = 1/2. P(θ|Y) is the posterior probability. This is the probability of the underlying cause θ AFTER observing the evidence y. Here, we compute our updated or a posteriori belief on the bias of the coin after observing 5 heads in 8 coin tosses using Bayes' theorem. P(Y) is the probability of the data or evidence. We sometimes also call this the marginal likelihood. This is obtained by taking the weighted sum (or integral) of the likelihood function of the evidence across all possible values of θ. In our example, we would compute the probability of 5 heads in 8 coin tosses for all possible beliefs about θ. This term is used to normalize the posterior probability. Since it is independent of the parameter to be estimated θ, it is mathematically more tractable to express the posterior probability as:

P(θ|Y) ∝ P(Y|θ) × P(θ) …….(3)

(3) is the most important expression in Bayesian statistics and bears repeating. For clarity, I paraphrase what I said earlier. Bayesian inference allows us to turnaround conditional probabilities i.e. use the prior probabilities and the likelihood functions to provide a connecting link to the posterior probabilities i.e. P(θ|Y) granted that we only know P(Y|θ) and the prior, P(θ). I find it helpful to view (3) as:

Posterior Probability ∝ Likelihood × Prior Probability ………. (4)

The experimental objective is to get an estimate of the unknown parameter θ based on the outcome of n independent coin tosses. The coin tosses generate the sample or data y = (y1, y2, …, yn), where yi is 1 or 0 based on the result of the ith coin toss.

I now show the frequentist and Bayesian approaches to fulfilling this objective. Feel free to cursorily skim through the derivations I touch upon here if you are not interested in the mathematics behind it. You can still develop sufficient intuitions and learn to use Bayesian techniques in practice.

Estimating θ: The Frequentist Approach

We compute the joint probability function using the maximum likelihood estimation (MLE) approach. The probability of the outcome for a single coin toss can be elegantly expressed as:

For a given value of θ, the joint probability of the outcome for n independent coin tosses is the product of the probability of each individual outcome:

……. (5)

As we can see in (4), the expression worked out is a function of the unknown parameter θ given the observations from our experiment. This function of θ is called the likelihood function and is usually referred to in the literature as:

……….. (6)

…………… (7)

We would like to compute the value of θ which is most likely to have yielded the observed set of outcomes. This is called the maximum likelihood estimate, θ̄̂ ('theta hat'). For analytically computing it, we trivially take the first order derivative of (6) with respect to the parameter and set it equal to zero. It is prudent to also take the second derivative and check the sign of its value at θ = θ̄̂ to ensure that the estimate is indeed the maxima. We often customarily take the log of the likelihood function since it greatly simplifies the determination of the maximum likelihood estimator θ̄̂ . It should therefore not surprise you that the literature is replete with log likelihood functions and their solutions.

Estimating θ: The Bayesian Approach

I now change the notations we have used so far to make them a little more precise mathematically. I will use these notations throughout this series now. The reason for this alteration is so that we can suitably ascribe each term with symbols that remind us of their random nature. There is uncertainty over the values of θ, Y, etc., we, therefore, regard them as random variables and assign them corresponding probability distributions which I do below.

Notations for the Density and Distribution Functions

π(⋅) (the Greek letter 'pi') to denote the probability distribution function of the prior (this is pertaining to θ) and π(⋅|y) to denote the posterior density function of the parameter we attempt to estimate.
f(⋅) to denote the probability density function (pdf) for continuous random variables and p(.) which is the probability mass function (pmf) of discrete random variables. However, for simplicity, I use f(⋅) irrespective of whether the random variable Y is continuous or discrete.
The joint density function will continue to be denoted as L(θ|⋅). to denote the likelihood function which is the joint density of the sample values and is usually the product of the pdf's/pmf's of the sample values from our data.

Remember that θ is the parameter we are trying to estimate.

(2) and (3) can be rewritten as

π(θ|y) = [f(y|θ)⋅π(θ)] / f(y) ……(8)

π(θ|y)∝f(y|θ)×π(θ) …………….(9)

Stated in words, the posterior distribution function is proportional to the likelihood function times the prior distribution function. I redraw your attention to (4) and present it in congruence with our new notations.

Posterior PDF ∝ Likelihood × Prior PDF ……….(10)

I now rewrite (8) and (9) using the likelihood function L(θ|Y) defined earlier in (7).

……… (11)

………..(12)

The denominator of (11) is the probability distribution of the evidence or data. I reiterate what I have previously mentioned while inspecting (3): A useful way of considering the posterior density is using the proportionality approach as seen in (12). That way, we don't need to worry about the f(y) term on the RHS of (11).

For the mathematically curious among you, I now take you briefly down a needless rabbit hole to explain it incompletely. Perhaps, later in our journey, I may write a separate post brooding on these minutiae.

In (11), f(y) is the proportionality constant that makes the posterior distribution a proper density function integrating to 1. When we examine it more closely, we see that is, in fact, the unconditional (marginal) distribution of the random variable Y. We can determine it analytically by integrating over all possible values of the parameter θ. Since we are integrating out θ, we find that f(y) does not depend on θ.

(11) and (12) represent the continuous versions of the Bayes' Theorem.

The posterior distribution is central to Bayesian statistics and inference because it blends all the updated information about the parameter θ in a single expression. This includes information about θ before the observations were inspected and this is captured through the prior distribution. The information contained in the observations is captured through the likelihood function.

We can regard (11) as a method of updating information and this idea is further exemplified by the prior-posterior nomenclature.

The prior distribution of θ, π(θ) represents the information available about its possible values before recording the observations y.
The likelihood function L(θ|y) of θ is then determined based on the observations y.
The posterior distribution of θ, π(θ|y) summarizes all the available information about the unknown parameter θ after recording and incorporating the observations y.

The Bayesian estimate of θ would be the weighted average of the prior estimate and the maximum likelihood estimate, θ̄̂ . As the number of observations n increases and approached infinity, the weight on the prior estimate approaches zero and the weight on the MLE approaches one. This implies that the Bayesian and frequentist estimates would converge as our sample size gets larger.

To clarify, in a classical or frequentist setting, the usual estimator of the parameter, θ is the ML estimator, θ̄̂ . Here, the prior is implicitly treated as a constant.

Summary

I have devoted this post to deriving the fundamental result of Bayesian statistics, viz. (10) . The essence of this expression is to represent uncertainty by combining the knowledge obtained from two sources - observations and prior beliefs. In doing so, I introduced the concepts of prior distributions, likelihood functions and posterior distributions as well as the comparison of the frequentist and Bayesian methodologies. In my next post, I intend to make good my promise of illustrating the above example with simulations in Python.

Bayesian statistics is an important part of quantitative strategies which are part of an algorithmic trader’s handbook. The Executive Programme in Algorithmic Trading (EPAT™) course by QuantInsti® covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading that equip you with the required skill sets for applying various trading instruments and platforms to be a successful trader.

Introduction to Statistical Thinking for Smarter Choices and Analysis

Ansh Tayal — Thu, 10 Apr 2025 06:30:00 GMT

Statistical thinking is an approach to process information through the lens of probability and statistics so as to make informed decisions.

This series of blogs takes you through a journey where we begin with introducing statistical thinking, make a brief stopover to understand Bayesian statistics and then dwell on its applications in financial markets using Python.

“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write!”
H.G. Wells (1866-1946), the father of science fiction

Making choices is a part of our daily lives, be it personal or professional. If you apply statistical thinking wherever possible, you can make better choices.

In this article, we’ll go step by step in deconstructing the decision-making process under limited information. We’ll look at some examples, the jargon and the importance of statistics in the process.

What is statistics?
What is a statistical question?
Why do we need statistics?
Descriptive statistics vs Inferential statistics
Should we use descriptive statistics or inferential statistics?
Jargon in statistics

Population
Sample
Observation
Statistic
Parameter
Hypothesis
Hypothesis testing
Estimate

Why should we spend time on statistical inference?

What is statistics?

There are two ways to define statistics. Formally statistics is defined as "The science of statistics deals with the collection, analysis, interpretation, and presentation of data."

Intuitively, statistics is defined as "Statistics is the science of making decisions under uncertainty."

That is, statistics is a tool that helps you make decisions when you don’t have complete information.

What is a statistical question?

Source

Looking at the above image, let's address some questions!

How many cats does the above picture have?
4, right?

Do we have all the information to answer this question?
Yes.

Do all healthy cats have four legs?
Yes.

Do we have all the information to answer this question?
No. Because this is a picture of only 4 out of all the existing cats in the world!

But can we still answer it with certainty?
Yes.

So, is it a statistical question?
No.

Why?
Because if you have all the information to answer the question or if you can answer this question with certainty, it’s not a statistical question.

For a question to be a statistical question,

The question has to go beyond the available information, and
The question shouldn’t be answerable with certainty.

This concept will be reinforced repeatedly in this article, i.e., statistics is the science of decision making under uncertainty.

Why do we need statistics?

We now work with a toy example through this post to answer the above question.

Suppose we decide to design a Quantra course on Julia programming.

How do we decide if we should put time and effort into building this course?
What if our designed course fails and doesn’t get many interested users?

These are important business decisions that require substantial resources. Therefore, we decide to survey if such a course would sell.

Now, that raises the following questions:

Who would our potential paid users be?
Who should we approach? Programmers? Data scientists? Researchers? College graduates? Quantitative Analysts?
Ideally, all of them, right?

However,

Can we get access to all of these people? Unlikely.
So, what should we do?
Should we drop the idea of designing the new course?

That doesn’t sound right.

If we had access to all the people, the process would have been simple. If the majority say that they would buy such a course, you create it. If not, then drop it.

However, since we can’t do it, we do the next best thing, i.e. we ask the maximum number of people we can reach out to, and, based on their response, we estimate the likelihood of this course being successful.

To calculate this estimate, we need statistics.

To generalize this idea, in real-world scenarios, we rarely have complete information related to the decision we want to make, whether for individuals or businesses.

Hence, we need a tool that can help us decide with limited information. Statistics is one such tool, and making these decisions within a statistical framework is called statistical thinking.

Statistical thinking is not just about using formulas to calculate p-values and z-scores; it’s a way to think about the world. Once you internalize this idea, it will change how you see the world. You’ll start thinking in terms of probabilities instead of certainties, which will help you make better decisions in your professional and personal life.

Descriptive statistics vs Inferential statistics

Descriptive statistics is the process of taking the data and describing its features using measures of central tendency (mean, median and mode), measures of dispersion (standard deviations, interquartile range ), etc.

However, inferential statistics is about working with the limited data and using it to infer something about a larger question we pose to ourselves a priori. This question cannot be answered with certainty.

Our article focuses on the latter, i.e. inferential statistics.

Should we use descriptive or inferential statistics?

It depends on the question you’re asking and the available data. A simple question to ask yourself while deciding which one to use is:

Do we want to describe the existing data? OR
Do we want to draw inferences from the existing data (sample) to extrapolate about the population?

We go with descriptive statistics for the former and inferential statistics for the latter.

Jargon in statistics

Let’s look at some of the key terms used in statistics that will help you in understanding the concepts better.

Population

The universe of items we’re interested in. Going back to our Quantra course example, the population would be every person in this world who would be interested in the Julia course.

Sample

It is a subset of the population, i.e. the amount of information we can get. This could be the Quantra or EPAT user base we have. We could frame our question as: How likely are you to buy a course on Julia (on a scale of 1 to 10)?

Statistic

A summary measure of the data available, i.e. from the sample. Here, it could be the average score of say, 7 obtained from Quantra and EPAT users for the above question.

Parameter

A parameter is a summary measure of the population. Here, it could be the average score of say, 6 obtained from the population (as defined above).

A statistic is a summary measure of the existing data (sample), whereas a parameter is the same for the population.

Hypothesis

A description of how we think the world works. We hypothesize that EPAT and Quantra users are unlikely to buy a course on Julia (rating of 1). This is the assumption we start with that we call the null hypothesis.

Null Hypothesis

It’s crucial to have a null hypothesis before starting with any statistical analysis. And the null hypothesis is mostly status quo. The alternative hypothesis is the theory that you think could be true and are looking for evidence to verify it.

So to clarify, our null hypothesis ${H_0}$ and alternative hypothesis ${H_1}$ here are ${H_0}$: EPAT and Quantra users are unlikely to buy a course on Julia (Mean rating = 5)

${H_1}$: EPAT and Quantra users are likely to buy the course (Mean rating >=5)

Hypothesis testing

Hypothesis testing is a method to draw conclusion about the data from the sample i.e. to test whether a hypothesis is correct or not.

Estimate

And estimate can be defined as a variable that is the best guess of the actual value of the parameter.

Why should we spend time on statistical inference?

Let’s consider two scenarios:

Scenario 1 - We had access to only one user, and she rated 6 for the likelihood of buying the course.
Scenario 2 - We had access to 10 users, and they gave an average rating of 8 for buying the course.

These are our best estimates. However,

Which one is the better estimate?
The one with 10 users because it has more data.

Is the estimate of scenario 2 good enough to act on?
Should we create the course because 10 people have a high likelihood of buying the course?
Maybe not.

Why?
Because the response from 10 users is probably not enough, and so could lead to a poorly worked out decision.

This is where statistical inference comes in.

As we have mentioned before, If you want the correct answer, you will need all the data. No silver bullet can give you the right answer with limited data. But remember, as we discussed, statistics is the science of making decisions under uncertainty.

We’re not interested in knowing the correct answer with statistical inference because we can’t!

Using inferential statistics, the question you want to answer is:

Is the best guess good enough to change our minds?

This forms the basis of everything we do in statistical inference. Notice that the question mentions “changing our mind”. This means that we would need to already have something in our minds in the first place, a decision, an opinion.

We can only change our minds if we have already decided to do something by default. Remember we mentioned the importance of having a null hypothesis?

The hypothesis could be that people are extremely unlikely to buy the Quantra course on Julia programming, so we will not create a new course if the best guess is not good enough to change our minds.

This is where the need to have a predefined hypothesis comes in. This is another fundamental concept in inferential statistics. Suppose we are to make statistical inferences.

In that case, we need to have a predefined decision or an opinion because, at the cost of being repetitive, the question we’re asking using statistics is:

Is the best guess good enough to change our minds?

The entire exercise of statistical inference makes sense if you have a default action. If you don’t have a default action, just go with your best guess from the sample data.

Let’s take another example to understand this. Imagine if PepsiCo decides to change the colour of its logo to black or green. The responses of 1 million people are recorded as a sample.

Now, here’s the summary of which decision we can take based on our default action and data:

Default action	Results from data	Decision
Not decided	Data favours green.	Go with the best guess. Green.
Don’t change	Data marginally favours black	Logo remains unchanged
Don’t change	Data overwhelmingly favours green	Change the logo to green.

The table above consists of 3 scenarios to explain to concepts presented above.

In the first scenario, there’s no default action and the data supports green. So we go ahead and change the logo to green.
In the second scenario, the default action is “don’t change the color” and the data supports black but not strongly enough. So the logo color remains unchanged.
In the third scenario, the default action is “don’t change the color” but the data strongly supports green. So the logo is changed to green.

Resources for learning about statistical thinking

Here are a few resources that you can refer to for a detailed understanding of the topic:

Conclusion

We hope this write-up has piqued your interest in applying a statistical approach when confronted with choices. Do share your thoughts and comments about the blog in the below section. Until next time!

If you're serious about building a data-driven edge in trading, understanding statistics is non-negotiable — and the Module 2: Statistics for Financial Markets Course from EPAT delivers exactly that. This module focuses on applying probability, risk metrics, hypothesis testing, and trading strategy development directly to financial markets using real-world tools like Excel.

To explore the full curriculum and gain skills across machine learning, financial computing, quant trading strategies, and more, check out the complete Executive Programme in Algorithmic Trading (EPAT). Whether you're just starting out or looking to level up, EPAT gives you the structure, depth, and practical expertise to succeed in today’s markets.

Authors: Vivek Krishnamoorthy and Anshul Tayal

Disclaimer: All data and information provided in this article are for informational purposes only. QuantInsti^® makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information in this article and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis.

Autoregression: Time Series, Models, Trading, Python and more

Jose Carlos Gonzales Tanaka — Tue, 11 Feb 2025 09:19:07 GMT

By José Carlos Gonzáles Tanaka and Chainika Thakar (Originally written by Satyapriya Chaudhari)

Autoregression is a powerful tool for anticipating future values in time-based data. This data, known as a time series, consists of observations collected at various timestamps, regularly or irregularly. By leveraging historical trends, patterns, and other hidden influences, autoregression models can forecast the value for the next time step.

These models (including various options beyond autoregression) predict future outcomes by analyzing and learning from past data. This article delves deeper into one particular type: the autoregression model, often abbreviated as the AR model.

Prerequisite Blogs

Before delving into the this blog, it’s ideal to follow a structured learning track covering foundational to advanced topics.

Start with the basics in Introduction to Time Series and a comparative deep-learning perspective in the Time Series Vs LSTM Models.

Next, establish the essentials of Stationarity, the Hurst Exponent, and Mean Reversion to understand how and why time‐series data exhibit long‐term memory.

Once you’re comfortable with these, progress to advanced or multivariate methods, including Vector Autoregression (VAR), Johansen Cointegration, and Time-Varying-Parameter VAR.

This comprehensive roadmap equips you with the necessary background to fully appreciate this Blog.

You are expected to know how to use these models to forecast time series. You should also have a basic understanding of R or Python for time series analysis.

This article covers:

What is Autoregression?
Formula of Autoregression
Autoregression Calculation
Autoregression Model
Autoregression Models of Order 2 and Generalise to Order p
Autoregression vs Autocorrelation
Autoregression vs Linear Regression
Autocorrelation Function and Partial Autocorrelation Function
Steps to Build an Autoregressive Model
Example of Autoregressive Model in Python for Trading
Applications of Autoregression Model in Trading
Common Challenges of Autoregression Models
Tips for Optimizing Autoregressive Model Performance Algorithmically
Expanding on the AR Model

What is Autoregression?

Autoregression models time-series data as a linear function of its past values. It assumes that the value of a variable today is a weighted sum of its previous values.

For example, analyzing the past month’s AAPL (APPLE) performance can help predict future performance.

Formula of Autoregression

In simpler terms, first-order autoregression says: "Today's value depends on yesterday's value". We express this relationship mathematically using a formula:

$$y_t = c + \phi_1 y_{t-1} + \epsilon_t$$

Where,

• X_t is the current value in the time series.

• c is a constant or intercept term.

• ϕ₁ is the autoregressive coefficients.

• X_t-1 is the past value of the time series.

• ϵ_t is the error term representing the random fluctuations or unobserved factors.

Autoregression Calculation

The autoregressive coefficient, ϕ₁, is estimated using statistical methods like maximum likelihood estimation, Yule-Walker estimation, two-step regression estimation, and conditional least squares.

In the context of autoregressive (AR) models, the coefficients represent the weights assigned to the lagged values of the time series to predict the current value. These coefficients capture the relationship between the current observation and its past values.

The goal is to find the coefficients that best fit the historical data, allowing the model to capture the underlying patterns in the time series accurately. Once the coefficients are determined, they help forecast future values in the time series based on the observed values from previous time points. Hence, the autoregression calculation helps to create an autoregressive model for time series forecasting.

You can explore the video below to learn more about autoregression.

Autoregression Model

Before delving into autoregression, it's beneficial to revisit the concept of a regression model.

A regression model is a statistical method to determine the association between a dependent variable (often denoted as y) and an independent variable (typically represented as X). Thus, in regression analysis, the focus is on understanding the relationship between these two variables.

For instance, consider having the stock prices of Bank of America (ticker: BAC) and J.P. Morgan (ticker: JPM).

If the objective is to forecast the stock price of JPM based on BAC's stock price, then JPM's stock price would be the dependent variable, y, while BAC's stock price would act as the independent variable, X. Assuming a linear association between X and y, the regression equation would be:

$$y=mX + c$$

Here,

m represents the slope, and c denotes the intercept of the equation.

However, if you possess only one set of data, such as the stock prices of JPM, and wish to forecast its future values based on its past values, you can employ the autoregression model explained in the previous section.

Like linear regression, the autoregressive model presupposes a linear connection between y_t and y_t−1, termed autocorrelation. A deeper exploration of this concept will follow subsequently.

Autoregression Models of Order 2 and Generalise to Order p

Let's delve into autoregression models, starting with order 2 and then generalising to order p.

Autoregression Model of Order 2 (AR(2))

In an autoregression model of order 2 (AR(2)), the current value y_t is predicted based on its two most recent lagged values, that is, y_t-1 and y_t-2.

$$y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \epsilon_t$$

Where,

• c is a constant.

• ϕ₁ and ϕ₂ are the autoregressive coefficients for the first and second lags, respectively.

• ϵ_t represents the error term.

Generalising to order p (AR(p))

For an autoregression model of order p (AR(p)), the current value yt is predicted based on its p most recent lagged values.

$$y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} +...+ \phi_p y_{t-p} + \epsilon_t$$

Where,

• c is a constant.

• ϕ₁, ϕ₂,..., ϕ_p are the autoregressive coefficients for the respective lagged terms y_t-1,y_t-2, ..., y_t-p.

• ϵ_t represents the error term.

In essence, an AR(p) model considers the influence of the p previous observations on the current value. The choice of p depends on the specific time series data and is often determined using methods like information criteria or examination of autocorrelation and partial autocorrelation plots.

The higher the order p, the more complex the model becomes, capturing more historical information but also potentially becoming more prone to overfitting. Therefore, it's essential to strike a balance and select an appropriate p based on the data characteristics and model diagnostics.

Autoregression vs Autocorrelation

Before determining the difference between autoregression and autocorrelation, you can find the introduction of autocorrelation in this video below. This video will help you learn about autocorrelation with some interesting examples.

Now, let us find the difference between autoregression and autocorrelation in a simplified manner below.

Aspect	Autoregression	Autocorrelation
Modelling	Incorporates past observations to predict future values.	Describes the linear relationship between a variable and its lags.
Output	Model coefficients (lags) and forecasted values.	Correlation coefficients at various lags.
Diagnostics	ACF and PACF plots to determine model order.	ACF plot to visualise autocorrelation at different lags.
Applications	Stock price forecasting, weather prediction, etc.	Signal processing, econometrics, quality control, etc.

Autoregression vs Linear Regression

Now, let us see the difference between autoregression and linear regression below. Linear regression can be learned better and more elaborately with this video below.

Aspect	Autoregression	Linear Regression
Model Type	Specifically for time series data where past values predict the future.	Generalised for any data with independent and dependent variables.
Predictors	Past values of the same variable (lags).	Independent variables can be diverse (not necessarily past values).
Purpose	Forecasting future values based on historical data.	Predicting an outcome based on one or more input variables.
Assumptions	Time series stationarity, no multicollinearity among lags.	Linearity, independence, homoscedasticity, no multicollinearity.
Diagnostics	ACF and PACF mainly.	Residual plots, Quantile-Quantile plots, etc.
Applications	Stock price prediction, economic forecasting, etc.	Marketing analytics, medical research, machine learning, etc.

Autocorrelation Function and Partial Autocorrelation Function

Let's walk through how to create Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots using Python's statsmodels library and then interpret them with examples.

Step 1: Install Required Libraries

First, ensure you have the necessary libraries installed:

Step 2: Import Libraries

Step 3: Create Sample Time Series Data

Let's create a simple synthetic time series for demonstration:

Step 4: Plot ACF and PACF

Now, plot the ACF and PACF plots for the time series:

Output:

Interpretation

The ACF measures the correlation between a time series and its lagged values. A decreasing ACF value suggests that past values from the time series affect today’s time series.
The higher the significance of very long lags’ ACF on the time series, the more distant past values greatly impact today’s time series. This is what we found in this plot. The ACF slowly decreases, and even at lag 40, the ACF keeps being high.
The PACF drops off at lag 1. So, whenever we have a slowly decreasing ACF and a PACF significant only at lag 1, it is a clear sign we have a random-walk process, i.e., the time series is not stationary.
By examining the ACF and PACF plots and their significant lags, you can gain insights into the temporal dependencies within the time series and make informed decisions about model specification in Python.
The example given is a price series following a random-walk process, i.e., is not stationary.

Let’s see below how to estimate a stationary AR model.

Steps to Build an Autoregressive Model

Building an autoregressive model involves several steps to ensure that the model is appropriately specified, validated, and optimized for forecasting. Here are the steps to build an autoregressive model:

Step 1: Data Collection

Gather historical time series data for the variable of interest.
Ensure the data covers a sufficiently long period and is consistent in frequency (e.g., daily, monthly).

Step 2: Data Exploration and Visualisation

Plot the time series data to visualize trends, seasonality, and other patterns.
Check for outliers or missing values that may require preprocessing.

Step 3: Data Preprocessing

Handle missing values using appropriate methods such as interpolation or imputation.
Ensure the data is stationary. Stationarity is important to model autoregressive models. If not, you must difference or de-trend the data.

Step 4: Model Specification

Determine the appropriate lag order (p) based on the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots.
Decide on including any exogenous variables or external predictors that may improve the model's forecasting ability.

Step 5: Model Estimation

Described above. However, in this computer age, Almost all statistical packages can estimate an ARMA model.

Step 6: Forecasting

Split the data into training and test sets.
Fit the model on the training data.
Perform statistical metrics such as Mean Absolute Error (MAE) and root Mean Square Error (RMSE) to assess the model's predictive accuracy using the test data.

If the model performance is unsatisfactory for new data streams, consider returning to step 3.

Step 8: Documentation and Communication:

Document the model's specifications, assumptions, and validation results.
Communicate the model's findings, limitations, and implications to stakeholders or end-users.

By following these steps systematically and iteratively refining the model as needed, you can develop a robust autoregressive model tailored to your time series data's specific characteristics and requirements.

Example of Autoregressive Model in Python for Trading

Below is a step-by-step example demonstrating how to build an autoregressive (AR) model for time series forecasting in trading using Python. We'll use historical stock price data for Bank of America Corp (ticker: BAC) and the statsmodels library to construct the AR model.

Let us now see the steps in Python below.

Step 1: Install Required Packages

If you haven't already, install the necessary Python packages:

Step 2: Import Libraries

Step 3: Load Historical Stock Price Data

Some things to say:

Use the Apple stock data from 2000 to January 2025.
Save the window size to be used as the train span to estimate the AR model as “rolling_window”.

Output:

AAPL Stock prices

Step 4: Find the Order of Integration of the price series

You need a stationary time series to estimate an AR model. Due to that, you’ll need to find the order of integration of the price series, i.e., find the order “d” of integration of the prices, such that, to make it stationary, you’ll need to difference the data “d” times. To find that number “d”, you can apply an Augmented Dickey-Fuller test to the prices series, its first and second differences (the second difference is enough based on stylized facts). See below:

We use the adfuller method provided in the statsmodels library and output its second result, the p-value. Whenever the p-value is less than 5%, it means the time series is stationary.

Output:
(0.9987469346686696, 1.2195696223837154e-26, 0.0)

As we can see, the price, its first difference, and the second difference are non-stationary, stationary, and stationary, respectively. This price series needs to be first differenced to make it stationary. This makes us understand that the price has an order of integration 1, i.e., I(1).

So, to run an AR model, we need to estimate it based on the first difference, which in the ARIMA method of the statsmodels, means d=1. Here we estimate a stationary AR(1), i.e., an ARIMA(1,1,0), as described below.

Step 5: Train the AR model using ARIMA

Let us train the AR(1) model using the ARIMA method from the statsmodels library.

The ARIMA method can be imported as shown below

Using the ARIMA method, the autoregressive model can be trained as

ARIMA(data, (p, d, q))

where

p is the AR parameter that needs to be defined.
d is the difference parameter. This will be zero in case we’re sure the time series is stationary, 1 in case the time series is I(1), 2 in case the time series is I(2), and so on. Since we found that our price series is I(1), we set d as 1.
q is the MA parameter. This will also be zero in the case of an AR model. You will learn about this later.

Hence, the autoregressive model can be trained as

ARIMA(data, (p, 1, 0))

Output:
ar.L1     0.01
sigma2    0.05
dtype: float64

From the output above, you can see that

$ \phi_1 = 0.01 $
$ \text{Variance of the residuals} = \sigma^2 = sigma2 = 0.05 $

Therefore, the model becomes

$$AR(1) = y_t = 0.01*y_{t-1}$$

For the price, the first difference of the data. Remember that the AR model should have a stationary time series as input.

Let’s estimate an AR model for each day and forecast the next-day price. You can do it quickly using pandas.DataFrame.rolling.apply. Let’s create a function to estimate the model and return a forecast for the next day.

And let’s run the model for each day using as the train span the rolling_window variable. Thus, the first rolling_window days will be NaN values.

The forecast of tomorrow will be saved today. Consequently, we shifted the predicted_price.

Step 6: Evaluate model performance

We compute, for a specific year, in this function:

The Mean Absolute Error
The mean Squared Error
The Root Mean Squared Error
The Absolute Percentage Error
Plo the actual and forecasted prices
Plot the residuals
Plot the ACF
Plot the PACF

Output:
The Mean Absolute Error is 2.63
The Mean Squared Error is 11.41
The Root Mean Squared Error is 3.38
The Mean Absolute Percentage Error is 1.74

Model performance

The first plot above shows that the predicted values are close to the observed value. However, the forecasted prices don’t exactly follow the actual prices.

Tip: Whenever you compare actual prices against forecasted prices, do not compare them for a big data span. People usually compare those prices, e.g., from 1990 to 2025. When you see those plots, you’ll tend to think the forecasted prices follow exactly the actual prices’ behavior. But that’s not a good way to go. If you want to compare them well, a zoom-in inspection will be needed, e.g., compare the two prices for a specific month if the data frequency is daily, and so on.

From the third and fourth plots above, you can see that the model captures almost entirely the price behavior because there are very few significant ACF and PACF across the lags. To formally choose the correct model, you can follow the Box-Jenkins methodology to do it graphically each day, or you can select the best model with an information criteria, as described below, to do it algorithmically.

**Note: You can log into quantra.quantinsti.com and enroll in the course on Financial Time Series to find out the detailed autoregressive model in Python.**

Forecasting is a statistical process, so forecasting variance will be higher than zero, i.e., there can be errors in the forecasting prices with respect to actual prices.

Here are some reasons why your autoregressive model can have poor performance:

Model Misspecification: The AR model's assumptions or specifications may not align with the true data-generating process, leading to biased forecasts.
Lag Selection: Incorrectly specifying the lag order in the AR model can result in misleading predictions. Including too many or too few lags may distort the model's predictive accuracy.
Missed Trends or Seasonality: The AR model may not adequately capture underlying trends, seasonality, or other temporal patterns in the data, leading to inaccurate predictions.
External Factors: Unaccounted external variables or events that influence the time series but are not included in the model can lead to discrepancies between predicted and actual prices.
Data Anomalies: Outliers, anomalies, or sudden shocks in the data that were not accounted for in the model can distort the predictions, especially if the model is sensitive to extreme values.
Stationarity Assumption: If the time series is not stationary, applying an AR model can produce unreliable forecasts. Stationarity is a key assumption for the validity of AR models.

Applications of Autoregression Model in Trading

Autoregression (AR) models have been applied in various ways within trading and finance. Here are some applications of autoregression in trading:

Price prediction: As previously shown, traders often use autoregressive models to analyze historical price data and identify patterns to forecast prices or price direction. This is the most used case of AR models.
Risk Management: Autoregression can model and forecast volatility in financial markets. However, we would need the AR model together with the GARCH model to forecast variance, and with both you can do proper risk management.
Market Microstructure: Autoregression can be used to model the behavior of market disturbances, such as in high-frequency trading.

Common Challenges of Autoregression Models

The following are common challenges of the autoregression model:

Overfitting: Autoregressive models can become too complex and fit the noise in the data rather than the underlying trend or pattern. This can lead to poor out-of-sample performance and unreliable forecasts. That’s why a parsimonious model is the best choice for estimating AR models.
Stationarity: Many financial time series exhibit non-stationary behavior, meaning their statistical properties (like mean and variance) change over time. Autoregressive models assume stationarity, so failure to account for non-stationarity can result in inaccurate model estimates.
Model Specification: Determining an autoregressive model's appropriate lag order (p) is challenging. Too few lags might miss important information, while too many can introduce unnecessary complexity. A parsimonious model helps with this type of issue.
Seasonality and Periodicity: Autoregressive models might not capture seasonal patterns or other periodic effects in the data, leading to biased forecasts. You might need to de-seasonalize the data before you apply the AR model.

Tips for Optimizing Autoregressive Model Performance Algorithmically

Now, let us see some tips for optimizing the autoregressive model’s performance below.

Data Preprocessing: Ensure the data is stationary or apply techniques like differencing or de-trending to achieve stationarity before fitting the autoregressive model.
Model Selection: Usually, you apply the Box-Jenkins methodology to select the appropriate number of lags of the AR model. This methodology uses a graphical inspection of the ACF and PACF to derive the best model. In algorithmic trading, you can just estimate multiple AR models and select the best using information criteria (e.g., Akaike Information Criteria, AIC; Bayesian Information Criteria, BIC, etc.).
Include Exogenous Variables: It’s usually the case the AR models are estimated only with the time series lags. However, you can also incorporate relevant external factors or predictors that might improve the model's forecasting accuracy.
Continuous Monitoring and Updating: Financial markets and economic conditions evolve over time, this is called regime changes. Regularly re-evaluate and update the model to incorporate new data and adapt to changing dynamics.

By addressing these challenges and following the optimization tips, practitioners can develop more robust and reliable autoregressive models for forecasting and decision-making in trading and finance.

Expanding on the AR Model

We have talked about everything about autoregressive models. However, what about if we also lag the error term, i.e., we can do something like:

$$y_t = c + \phi_1y_{t-1} + \epsilon_t + \theta \epsilon_{t-1} $$

This model is the so-called ARMA model; specifically, it’s an ARMA(1,1) model; because we have the first lag of the time series (The AR component) and we also have the first lag of the model error (The MA component).

In case you want to:

Understand what ARMA/ARIMA model is thorougly.
Identify correctly the number of lags using the ACF and PACF graphically.
Learn how to estimate the ARMA model.
Learn how to choose the best number of lags for the AR and MA components.
Create a backtesting code using this model as a strategy.
Learn how to improve the model’s performance.

I would suggest reading the following 3 blog articles, where you’ll have everything you need to know about this type of model:

Conclusion

Utilizing time series modeling, specifically Autoregression (AR), offers insights into predicting future values based on historical data. We comprehensively covered the AR model, its formula, calculations, and applications in trading.

By understanding the nuances between autoregression, autocorrelation, and linear regression, traders can make informed decisions, optimize model performance, and navigate challenges in forecasting financial markets. Last but not least, continuous monitoring, model refinement, and incorporating domain knowledge are vital for enhancing predictive accuracy and adapting to dynamic market conditions.

You can learn more with our course on Financial Time Series Analysis for Trading for learning the analysis of financial time series in detail.

With this course, you will learn the concepts of Time Series Analysis and how to implement them in live trading markets. Starting from basic AR and MA models to advanced models like SARIMA, ARCH, and GARCH, this course will help you learn it all. Also, after learning from this course, you can apply time series analysis to data exhibiting characteristics like seasonality and non-constant volatility.

Continue Learning

Strengthen your grasp by looking into Autocorrelation & Autocovariance to see how data points relate over time, then deepen your knowledge with fundamental models such as Autoregression (AR), ARMA, ARIMA and ARFIMA
If your goal is to discover alpha, you may want to experiment with a variety of techniques, such as technical analysis, trading risk management, pairs trading basics, and Market microstructure. By combining these approaches, you can develop and refine trading strategies that better adapt to market dynamics.
For a structured approach to algo trading—and to master advanced statistics for quant strategies—consider the Executive Programme in Algorithmic Trading (EPAT). This rigorous course covers time series fundamentals (stationarity, ACF, PACF), advanced modelling (ARIMA, ARCH, GARCH), and practical Python‐based strategy building, providing the in‐depth skills needed to excel in today’s financial markets.

File in the download:

The Python code snippets for implementing the model are provided, including the installation of libraries, data download, create relevant functions for the model fitting and the forecasting performance.

Visit blog to download

Note: The original post has been revamped on 11^th Feb 2025 for recentness, and accuracy.

Disclaimer: All investments and trading in the stock market involve risk. Any decision to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

Ito's Lemma Applied to Stock Trading

Mahavir A. Bhattacharya — Fri, 06 Dec 2024 11:17:49 GMT

By Mahavir A. Bhattacharya

This is the second part of the two-part blog where we explore how Ito’s Lemma extends traditional calculus to model the randomness in financial markets. Using real-world examples and Python code, we’ll break down concepts like drift, volatility, and geometric Brownian motion, showing how they help us understand and model financial data, and we’ll also have a sneak peek into how to use the same for trading in the markets.

In the first part, we saw how classical calculus cannot be used for modeling stock prices, and in this part, we’ll have an intuition of Ito’s lemma and see how it can be used in the financial markets. Here’s the link to part I, in case you haven’t gone through it yet: Laying the Groundwork for Ito's Lemma and Financial Stochastic Models

This blog covers:

Pre-requisites
Quick Recap
Ito Calculus
Ito's Lemma Applied to Stock Prices
Use Case - I of Ito's Lemma
Important Considerations
Use Case - II of Ito's Lemma
Till Next Time

Pre-requisites

You will be able to follow the article smoothly if you have elementary-level proficiency in:

Calculus
Python coding

Quick Recap

In part I of this two-blog series, we learned the following topics:

The chain rule
Deterministic and stochastic processes
Drift and volatility components of stock prices
Weiner processes

In this part, we shall learn about Ito calculus and how it can be applied to the markets for trading.

Ito Calculus

Remember from part I? $ W_t $ is why Ito came up with the calculus he did. In classical calculus, we work with functions. However, in finance, we frequently work with stochastic processes, where $ W_t $ represents stochasticity.

Rewriting the equations from part I:

The equation for chain rule:

$$\frac{dy}{dx} = \frac{dy}{dz} \cdot \frac{dz}{dx}$$ –-------------- 1

The equation for geometric Brownian motion (GBM):

$$dS_t = \mu S_t , dt + \sigma S_t , dW_t$$--------------- 2

Equation 2 is a differential equation. The presence of $ W_t $ makes the GBM a stochastic differential equation (SDE). What’s so special about SDEs?

Remember the chain rule discussed in part I? That’s only for deterministic variables. For SDEs, our chain rule is Ito’s lemma!

Let’s get down to business now.

Ito's Lemma Applied to Stock Prices

The following equation is an expression of Ito’s lemma:

$$df(S_t) = f'(S_t) dS_t + \frac{1}{2} f''(S_t) d[S, S]_t$$--------------- 3

Here,

f(x) is a function which can be differentiated twice, and

S is a continuous process, having bounded variation

What do we mean by bounded variation?

It simply means that the difference between St+1 and St, for any value of t, would never exceed a certain value. What this ‘certain value’ is, is not of much significance. What is significant is that the difference between two consecutive values of the process is finite.

Next question: What’s $ [S, S]_t $?

It’s a notation.

Of what?

A notation to denote a quadratic variation process.

What’s that?

In this blog, we won’t get into the intuition of the quadratic variation. It would suffice to know that the quadratic variation of $ S_t $, i.e., $ [S, S]_t $ is as follows:

$$ \begin{matrix} \lim_{\Delta t \to 0} & \sum_{0}^{t} \left(S_{t_{i+1}} - S_{t_i}\right)^2 \end{matrix} $$

If St follows a Brownian motion, the derivative of its quadratic variation is:

$$d[S, S]_t = \sigma^2 S_t^2 dt$$--------------- 4

Substituting equation 4 in equation 3, we get:

\[ df(S_t) = f'(S_t) dS_t + \frac{1}{2} f''(S_t) \sigma^2 S_t^2 dt \]

--------------- 5

How is this derived?

We can treat equation 5 as a Taylor series expansion till the second order. If you aren’t familiar with it, don’t worry; you can continue reading.

Still, what’s the intuition? Here, f is a function of the process S, which itself is a function of time t. The change in f depends on:

The first-order partial derivative of f with respect to S,
The second-order partial derivative of f with respect to t,
The square of the volatility σ, and,
The square of S.

The last three are multiplied and then added to the first one.

We saw earlier that stock returns follow a Brownian motion, so stock prices follow a GBM. Hence, suppose we have a process $ R_t $ , which is equal to log($ S_t $).

If we take $ R_t $ = log($ S_t $) in the GBM SDE (equation 2), and if we use the expression for Ito’s lemma (equation 3), we’ll have:

$$f(S_t) = R_t = \log(S_t)$$--------------- 6

and,

$$dR_t = \frac{dS_t}{S_t} - \frac{d[S_t, S_t]}{2S_t^2}$$--------------- 7

Since $$dS_t = \mu S_t , dt + \sigma S_t , dW_t$$ and

$$d[R, R]_t = \sigma^2 S^2 \, dt$$ (equation 4),

we can rewrite equation 7 as:

$$dR_t = \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma dW_t$$--------------- 8

Since the second term on the RHS doesn’t depend on the LHS, we can use direct integration to solve equation 8:

$$R_t = R_0 + \left(\mu - \frac{\sigma^2}{2}\right)t + \sigma W_t$$--------------- 9

Since

\[ R_t = \log(S_t) \quad \text{and} \quad S_t = \exp(R_t), \]

Thus, equation 9 changes to:

$$S_t = S_0 \cdot e^{\left(\mu - \frac{\sigma^2}{2}\right)t + \sigma W_t}$$--------------- 10

Let’s understand what the equation means here. The stock price at time t = 0, when multiplied by this term:

$$e^{\left(\mu - \frac{\sigma^2}{2}\right)t + \sigma W_t}$$--------------- 11

would give the stock price at time t.

In equation 2, the drift component had just $ \mu $, but in equation 10, we subtract $ \frac{\sigma^2}{2} $ from $ \mu $. Why so? Remember how we obtain $ \mu $? By taking the mean of daily log returns, right?

Umm, no! As mentioned in part I, μ is the average percentage drift (or returns), and NOT the logarithmic drift.

As we saw from the drift component and volatility component graphs, the close price isn’t just the drift component, but also the volatility component added to it. Hence, we need to correct the drift to consider the volatility component as well. It is towards this correction that we subtract $ \frac{\sigma^2}{2} $ from μ. The intuition here is that the arithmetic mean of a set of non-negative real numbers is greater than or equal to the geometric mean of the same set of numbers. The value of μ before the correction is the arithmetic mean, and after the correction, it is close to the geometric mean. When taken on an annual basis, the geometric mean is the CAGR.

How do we interpret equation 10? The current stock price is simply a function of the past stock price, the corrected drift, and the volatility.

How do we use this in the markets? Let’s see…

Use Case - I of Ito's Lemma

Note: The codes in this part are continued from part I, and the graphs and values obtained are as of October 18, 2024.

Output:

The mean of the daily percent returns = 0.00109
The standard deviation of the daily percent returns = 0.01707
The variance of the daily percent returns = 0.00029

Output:

Daily compounded returns = 0.00094878

Output:

Corrected daily percent returns = 0.000949

The arithmetic mean of the returns was initially 0.00109, and the geometric mean (daily compounded returns) computes to 0.00094878. After incorporating the drift correction, the arithmetic mean stood at 0.000949. Quite close to the geometric mean!

How do we use this for trading?

Suppose we wanna predict the range within which the price of Microsoft is likely to lie after, say, 42 trading days (2 calendar months) from now.

Let’s seek refuge in Python again:

Output:

Corrected drift for 42 days = 0.03985788
Variance for 42 days = 0.01223456
Standard deviation for 42 days = 0.11060996

Output:

Price below which the stock isn't likely to trade with a 95% probability after 42 days = 347.6
Price above which the stock isn't likely to trade with a 95% probability after 42 days = 541.04

We know with 95% confidence between which ranges the stock is likely to lie after 42 trading days from now! How do we trade this? Ways are many, but I’ll share one specific method.

Output:

Put with strike 345:

Contract Symbol:   MSFT241220P00345000  
Last Trade Date:   2024-10-17 19:44:37+00:00  
Strike:            345.0  
Last Price:        1.53  
Bid:               0.0  
Ask:               0.0  
Change:            0.0  
Percent Change:    0.0  
Volume:            1.0  
Open Interest:     0  
Implied Volatility: 0.125009  
In The Money:      False  
Contract Size:     REGULAR  
Currency:          USD

Call with strike 545:

Contract Symbol:   MSFT241220C00545000  
Last Trade Date:   2024-10-16 13:45:27+00:00  
Strike:            545.0  
Last Price:        0.25  
Bid:               0.0  
Ask:               0.0  
Change:            0.0  
Percent Change:    0.0  
Volume:            169  
Open Interest:     0  
Implied Volatility: 0.125009  
In The Money:      False  
Contract Size:     REGULAR  
Currency:          USD

We have chosen out-of-the-money strikes near the 95% confidence price range we obtained earlier.

This way, we can pocket around $1.53 + $0.25 (emboldened in the above output) = $1.78 per pair of stock options sold, if held till expiry. If we sell one lot each of these call and put option contracts, we can pocket $178, since the lot size is 100. And what’s the assurance of us making this profit? 95%, right? Simplistically, yes, but let’s move closer to reality now.

Important Considerations

Assumption of Normality: We used mean +/- 2 standard deviations and kept talking about 95% confidence. This works in a world where the stock returns are normally distributed. But in the real world, they are not! And more often than not, this deviation from a normal distribution works against us since people react faster to news of impending doom over news of euphoria.

Transaction Costs: We didn’t consider the transaction costs, taxes, and implementation shortfalls.

Backtesting: We haven’t backtested (and forward tested) whether the prices have historically lied (and would lie in the future) within the predicted price ranges.

Opportunity Costs: We also didn’t consider the margin requirements and the opportunity costs, were we to deploy some margin amount in this strategy.

Volatility: Finally, we are trading volatility here, not the price. We’ll end up pocketing the whole premium only if both the options expire worthless, i.e., out-of-the-money. But for that to happen, the volatility must be low until the expiry. We must account for the implied volatilities obtained in the previous code output. Oh, and by the way, how is this implied volatility calculated?

To better understand how volatility affects options pricing, consider exploring our course on option volatility, where you'll learn key concepts like implied volatility and its impact on your trades.

Use Case - II of Ito's Lemma

We calculate the implied volatility from the classic Black-ScholesMerton model for option pricing. And how did Fischer Black, Myron Scholes, and Robert Merton develop this model? They stood on the shoulders of Kiyoshi Ito! 🙂

Till Next Time

And this is where I bid au revoir! Do backtest the code and check whether it can predict the range of future prices with reasonable accuracy. You can also use mean +/- 1 standard deviation in place of 2 standard deviation. The benefit? The range would be tighter, and you could pocket more premium. The flip side? The chances of being profitable get reduced to around 68%! You can also think of other ways how to capitalise on this prediction. Do let us know in the comments what you tried.

References:

Main Reference:

https://research.tilburguniversity.edu/files/51558907/INTRODUCTION_TO_FINANCIAL_DERIVATIVES.pdf

Auxiliary References:

Wikipedia pages of Ito’s lemma, Brownian motion, geometric Brownian motion, quadratic variation, and, AM-GM inequality

EPAT lectures on statistics and options trading

File in the download

Ito's_Lemma - Python notebook

Visit blog to download

All investments and trading in the stock market involve risk. Any decision to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

Laying the Groundwork for Ito's Lemma and Financial Stochastic Models

Mahavir A. Bhattacharya — Fri, 06 Dec 2024 11:15:14 GMT

By Mahavir A. Bhattacharya

This is a two-part blog where we’ll explore how Ito’s Lemma extends traditional calculus to model the randomness in financial markets. Using real-world examples and Python code, we’ll break down concepts like drift, volatility, and geometric Brownian motion, showing how they help us understand and model financial data, and we’ll also have a sneak peek into how to use the same for trading in the markets.

In the first part, we’ll see how classical calculus cannot be used for modeling stock prices, and in the second part, we’ll have an intuition of Ito’s lemma and see how it can be used in the financial markets.

If you are already conversant with the chain rule in calculus, the concepts of deterministic and stochastic processes, drift and volatility components in asset prices, and Wiener processes, you can skip this blog and directly read this one: Ito's Lemma Applied to Stock Trading

It has an involved discussion on Ito’s lemma, and how it is harnessed for trading in the financial markets.

This blog covers:

Pre-requisites
Etymology of Sorts
The Chain Rule
Deterministic and Stochastic Processes
Drift and Volatility Components on Python
Weiner Weiner Stochastic Dinner

Pre-requisites

You will be able to follow the article smoothly if you have elementary-level proficiency in:

Calculus
Python coding

Etymology of Sorts

You would have learned theorems in high school math. Simply put, a lemma is like a milestone in attempting to prove a theorem. So what is Ito’s lemma? Kiyoshi Ito came up with his own ways of calculus (as if the existing ones weren’t hard to learn already 😝). Why did he do that? Were there any problems with the existing methods? Let’s understand this with an example.

The Chain Rule

Suppose we have the following function:

$$ y = \sin(3x) $$

This function can also be written as:

$$y = \sin(z), \quad \text{where} \quad z = 3x$$

Here, y is a function of z, which itself is a function of x. Such functions are known as composite functions.

This means that whatever value x takes, z would take thrice its value, and whatever value z takes, y would take its corresponding sine value.

Suppose x doubles, what would happen to z? It would also double. And when x halves, z would also halve. Thus, z would always bear the same ratio with x, i.e., 3. The ratio between the change in z, and the change in x would also be 3. We refer to this as the derivative of z with respect to x, also denoted by: dz/dx.

From elementary calculus, you would know that dz/dx = 3.

Similarly, dy/dz = cos(x), that is, the tangent to the slope of the sinusoidal curve sin(x) at every point on the curve would be cos(x).

What about dy/dx?

We can solve this using the chain rule, shown below:

$$ \frac{dy}{dx} = \frac{dy}{dz} \cdot \frac{dz}{dx} $$ –-------------- 1

Substituting the above values for dy/dz and dz/dx,

$$ \frac{dy}{dx} = \cos(x) \cdot 3 = 3 \cos(x) $$

Straightforward, isn’t it?

Sure, but only when we deal with ‘functions’. The problem is, when it comes to finance, we deal with processes. What kind of processes? Well, we can have deterministic processes and stochastic processes.

Deterministic and Stochastic Processes

A deterministic process is one whose realized path, and value after certain intervals of time is known beforehand with certainty. Examples would be the returns on a fixed deposit or the payouts of an annuity.

What about a stochastic process then? Can you think of something whose value can never be predicted with certainty, even for the next second? The path traversed by a stock! Can you imagine a world where the stock prices follow a deterministic path? No, right? But hey, we’ll discuss this too in a while now!

Coming back, in financial literature, stock prices are assumed to follow a Geometric Brownian motion. What’s that? Keep reading!

Suppose you ignite an incense stick. What variables contribute to the path that a single particle of fumes from the stick would follow? The wind speed in the surroundings, the direction of the wind, the density of the surrounding air, the absolute and relative proportion of other particles already present in the air, the size of the particles of the incense stick, the gap between each particle, the molecular orientation of the particles, their inflammability, and so on.

Even if you can create an elegant model that factors in the effect of all these variables, would you be able to predict with certainty the exact path that a single fume particle would traverse? No! Same is the case with asset prices. Suppose you know the fundamentals of the underlying, values of all technical indicators, the drift (we’ll come to this in a while), the volatility, the risk-free rate, macro-economic metrics, market sentiments, and everything else. Can you predict the exact path the price will take tomorrow?

If yes, well, you don’t need to read any further. Keep your secrets and make a ton of money 😁. Realistically, we cannot predict it with certainty. Stock returns follow a path similar to the incense stick fumes. We call it “Brownian motion” or “Wiener process”.

How do we characterise them?

Firstly, the value of the random variable at time t = 0, is 0.

Secondly, the value of the random variable at one time instant would be independent of its value in any previous time instant.

Thirdly, the random variable would have a normal distribution.

Finally, the random variable would follow a continuous path, not a discrete one.

Now, stock prices don’t have values = 0, at time t =0 (when they get listed). Stock prices are also known to have autocorrelations; i.e., the price at any given instant depends on one or more of the prices in previous instances. Stock prices also don’t follow a normal distribution. Still, how can it be that they follow a Brownian motion?

There’s a minor tweak that we need to do here. We shall use the daily returns of the adjusted close prices as a proxy for the increments in the stock prices. And since the price returns follow a Brownian motion, the prices themselves follow what is known as a geometric Brownian motion (GBM).

Let’s explore the GBM further using math notation. Suppose we have a stochastic process S. We say that it follows a GBM if it can be written in the following form:

$$ dS_t = \mu S_t \, dt + \sigma S_t \, dW_t $$ --------------- 2

Let’s treat S as the stock price here.

dS_t simply refers to the change in the stock price over time t. Suppose the current price is $200, and it becomes $203 the next day. In this case, dS_t = $3, and t = 1 day.

The Greek alphabet μ (written as mu, and pronounced as ‘mew’) represents the drift. Let’s take the Microsoft stock to understand this.

Drift and Volatility Components on Python

Note: The graphs and values obtained are as of October 18, 2024.

Figure 1: Adjusted Close Price

Figure 2: Drift Component of the Close Price

Figure 3: Volatility Component of the Close Price

Figure 4 Close Prices and Drift Component

This last plot (Figure 4) is the crux of everything we did on Python. What’s the blue line denoting? It’s the path taken by Microsoft stock's adjusted close prices over the past ten years. And what’s the orange line for? Well, it’s just a simple straight line that connects the first day's adjusted closing price and the most recent adjusted closing price.

I’m trying to show here that irrespective of which of the two paths the stock would have taken, it would have reached the same destination today. We can see from the blue line that the stock price has increased over the past ten years. That explains the positive slope of the orange line. This is known as the “drift”. We have essentially broken down the path of the adjusted close price into two components: the drift, and the volatility. When we add these two, we get the adjusted close prices. The following plot (Figure 5) illustrates this by plotting all three together:

Figure 5 Close, Drift Component, and Volatility Component

Stock Price = Drift Component + Volatility Component

If you need more intuition on the drift and volatility component, imagine driving from cities A to B. As much as you would like to take the imaginary path that connects both cities straight, you can’t since there will be buildings, trees, mountains, etc. You would need to take detours and turns to reach your destination.

Remember I asked you to imagine a world where the stock prices follow a deterministic path? That’s what the drift component is, after all! Can you imagine trading in a world where stock prices follow only the drift component and don’t have any volatility component?

We have taken a long detour from our main discussion (yup, we have drifted away from our drift)! Coming back to the GBM, we understood what μ is. σ is another Greek alphabet (called and pronounced as ‘sigma’) and denotes the volatility.

In equation 2, the first term is the deterministic component, and the second term is the stochastic or random or indeterministic, or noise component. Also, μ is the percentage drift, and σ is the percentage volatility.

The equation essentially tells that the change in the stock price at time t is an additive combination of the change in the stock price due to the drift component and the volatility component.

The drift component here is the product of the drift μ, the stock price at time t, and the unit change in time dt. Let’s consider dt to be one day, as mentioned earlier, for the sake of simplicity. If the stock price S is treated as a continuous random variable, ideally, we should measure dt in milli, micro, nano, or even picoseconds.

Weiner Weiner Stochastic Dinner

The volatility component is more nuanced. We know what σ and St denote in the equation. What we don’t know yet is: $$ W_t $$

Or do we?

Remember Brownian motion (the fumes of the incense stick)? That’s what $ W_t $ denotes here. The letter W is used since this motion is called a Wiener process. I’ll (hopefully) discuss Wiener processes in depth in a subsequent blog. But for now, just know that the increments follow a normal distribution with mean = 0 and variance = t for a Wiener process.

This means if the value of $ W_t $ changes from $ W_1 $ to $ W_2 $, $ W_2$ to $ W_3 $, and so on, the changes $ W_2 $ – $ W_1 $, $ W_3 $ – $ W_2 $, and so on follow a normal distribution. The mean or expected value of this distribution is 0. This means that if we have many samples of such changes, the average of these changes would be 0 (or very close to it). What about the variance? The variance is equal to the time duration; hence, the standard deviation would be the root of this time duration.

When we say $ W_t $ follows a normal distribution with mean = 0 and variance = t, multiplying this with $ \sigma $, we can conclude that the volatility component follows a normal distribution with mean = 0, and variance = $ \sigma_t $.

Wanna see what a Weiner process looks like!

Here you go…

Figure 6 Weiner Process Plot

We simulated 15 paths that the Wiener process could have taken, over 10 days. At what frequency are the values getting updated? Every second. The shaded region is the expected standard deviation of the returns. This is how the fumes from an incense stick would look if you tilt it sideways!

Conclusion

With this, we come to the end of part I. We learned about the chain rule in classical calculus, Brownian motion, geometric Brownian motion, and how stock prices follow a geometric Brownian motion. We also developed a visual intuition for Wiener processes (Brownian motion).

In part II, we’ll cover Ito calculus, and show how to use it for developing a trading strategy. Here’s the link to the second part: Ito's Lemma Applied to Stock Trading.

You can avail of the below-mentioned free Quantitative Finance Course offerings on Quantra to gain insights into Python for trading, data procurement, and stock market basics:

If you need a small primer on the math required for trading in the financial markets, you can go through this blog article: https://blog.quantinsti.com/algorithmic-trading-maths/

If you want to get started with algorithmic trading and need knowledge on how to do so, you can learn from here:

Algorithmic Trading for Beginners

And, if you want to learn in detail the basic and advanced statistics used in algo trading, data modeling, strategy building, backtesting using Python, how to set up your proprietary trading desk and much more, you can check out the EPAT:

Algorithmic Trading Course

References:

Main Reference:

https://research.tilburguniversity.edu/files/51558907/INTRODUCTION_TO_FINANCIAL_DERIVATIVES.pdf

Auxiliary References:

Wikipedia pages of Ito’s lemma, Brownian motion, geometric Brownian motion, quadratic variation, and, AM-GM inequality

2. EPAT lectures on statistics and options trading

The Risk-Constrained Kelly Criterion: From the foundations to trading

Jose Carlos Gonzales Tanaka — Mon, 25 Nov 2024 04:42:35 GMT

By José Carlos Gonzáles Tanaka

The Kelly Criterion is good enough for long-term trading where the investor is risk-neutral and can handle big drawdowns. However, we cannot accept long-duration and big drawdowns in real trading. To overcome the big drawdowns caused by the Kelly Criterion, Busseti et al. (2016) offered a risk-constrained Kelly Criterion that incorporates maximizing the long-term log-growth rate together with the drawdown as a constraint. This constraint allows us to have a smoother equity curve. You will learn everything about the new type of Kelly Criterion here and apply a trading strategy to it. You can find the risk-constraint Kelly criterion code on GitHub as well.

This blog covers:

The Kelly criterion
The risk-constrained Kelly criterion
A trading strategy based on the risk-constrained Kelly Criterion

The Kelly criterion

The Kelly Criterion is a well-known formula for allocating resources into a portfolio.

You can learn more about it by using many resources on the Internet. For example, you can find a quick definition of Kelly Criterion, a blog with an example of position sizing, and even a webinar on Risk Management.

We won’t go deep on the explanation since the above links already do that. Here, we provide the formula and some basic explanation for using it.

$$K\% = W - \frac{1 - W}{R}$$

where,

K% = The Kelly percentage
W = Winning probability
R = Win/loss ratio

Let’s understand how to apply.

Suppose we have your strategy returns for the past 100 days. We get the hit ratio of those strategy returns and set it as “W”. Then we get the absolute value of the mean positive return divided by the mean negative return. The resulting K% will be the fraction of your capital for your next trade.

The Kelly Criterion ensures the maximum long-term return for your trading strategy. This is from a theoretical perspective. In practice, if you applied the criterion in your trading strategy, you would face many long-lasting big drawdowns.

To solve this problem, Busseti et al. (2016) provided the “risk-constrained Kelly Criterion”, which allows us to have a smoother equity curve with less frequent and small drawdowns.

The risk-constrained Kelly criterion

The Kelly criterion relates to an optimization problem. For the risk-constraint version, we add, as the name says, a constraint. The basic principle of the constraint can be formulated as:

$$Prob(Minimum\; wealth < alpha) < beta$$

The drawdown risk is defined as Prob(Minimum Wealth < alpha), where alpha ∈ (0, 1) is a given target (undesired) minimum wealth. This risk depends on the bet vector b in a very complicated way. The constraint limits the probability of a drop in wealth to value alpha to be no more than beta.

The authors highlight the important issue that the optimization problem with this constraint is highly complex thing to solve. Consequently, to make it easier to solve it, Busseti et al. (2016) provided a simpler optimization problem in case we have only 2 outcomes (win and loss), which is the following:

$$\text{maximize } \pi \log(b_1 P + (1 - b_1)) + (1 - \pi)(1 - b_1),\\ \text{ subject to } 0 \leq b_1 \leq 1,\\ \pi(b_1 P + (1 - b_1))^{-\frac{\log \beta}{\log \alpha}} + (1 - \pi)(1 - b_1)^{-\frac{\log \beta}{\log \alpha}} \leq 1.$$

Where:

Pi: Winning probability

P: The payoff of the win case.

b1: The kelly fraction to be found. b1= K%. The control variable of the maximization problem

Lambda: The risk aversion of the trader: log(beta)/log(alpha)

Please take into account that the win/loss ratio defined in the basic criterion named as R is:

R = P - 1, where P is the payoff of the win case described for the risk-constrained Kelly criterion.

You might ask now: I don’t know how to solve that optimization problem! Oh no!

I can surely help with that! The authors have proposed a solution. See below!

The solution algorithm for the risk-constrained Kelly criterion goes like this:

If B1 = (pi*P-1)/(P-1) satisfies the risk constraint, then that is the solution. Otherwise, we find b1 by finding the b1 value for which

$$\pi(b_1 P + (1 - b_1))^{-\lambda} + (1 - \pi)(1 - b_1)^{-\log \lambda} = 1.$$

As explained by the authors, the solution can be found with a bisection algorithm.

A trading strategy based on the risk-constrained Kelly Criterion

Let’s inspect a trading strategy based on the risk-constrained Kelly criterion!

Let’s import the libraries.

Let’s define our customized bisection method for later use:

Let’s define our 2 functions to be used to compute the risk-constraint Kelly criterion bet size:

Let’s import the MSFT stock data from 1990 to October 2024 and compute the buy-and-hold returns.

Let’s get all the available technical indicators in the “ta” library:

Let’s create the prediction feature and some relevant columns.

Let’s define the seed and some other relevant variables.

We will use a for loop to iterate through each date.

The algorithm goes like this, for each day:

Sub-sample the data where we’ll use one year of data and the last 60 days as the test span for the sub-sample data
Split the data into X and y and their respective train and test sections
Fit a Support Vector machine model
Predict the signal
Obtain the strategy returns
Get the positive mean return as pos_avg
Get the negative mean return as neg_avg
Get the number of positive returns as pos_ret_num
Get the number of negative returns as neg_ret_num
Set some conditions to get the position size for the day
Get the basic-Kelly and risk-constraint Kelly fraction
Split the data once again as train and test data to
Estimate once again the model, and
Predict the next-day signal

Let’s compute the strategy returns. We compute 2 strategies, the basic Kelly strategy and the risk-constrained Kelly strategy. Apart from that, I’ve incorporated an “improved” version of the strategy which consists of having the same signal of the previous 2 strategies, but with the condition that the buy-and-hold cumulative returns is higher than their 30-day moving average.

Let’s see now the graphs. We see the basic Kelly position sizes.

Output:

It has high volatility. It ranges from 0 to 0.6.

Let’s see the risk-contraint Kelly fractions.

Output:

It now ranges from 0 to 0.25. It has a lower range of volatility.

Let’s see the strategy returns from the both.

Output:

The basic Kelly strategy has a higher drawdown, as informally checked. The main drawback of the risk-constraint Kelly strategy is the lower equity curve.

Let’s see the improved strategy returns.

Output:

It’s interesting to see that the basic Kelly strategy gets to reduce its drawdown, the same for the risk-constrained strategy. The risk-constrained strategy keeps having a low equity curve.

Some comments:

Once you have a good Sharpe ratio, you can increase the leverage. So, don’t get disappointed by the low equity curve of the risk-constraint Kelly strategy. I leave as an exercise to check that.
You can increase the equity returns with stop-loss and take-profit targets.
You can combine the risk-constraint Kelly criterion with meta-labelling.
The risk-constraint Kelly criterion limitation is the low equity curve. You can imagine solutions to improve the results!
You can use the pyfolio-reloaded library to implement the trading summary statistics and analytics to check formally the lower drawdown and volatility of the risk-constraint Kelly strategy.

Conclusion

As you can see, you can implement the risk-constraint Kelly Criterion to get a smoother equity curve. The main issue might be that it gets you a lower cumulative return, but it can help find days you don’t need to trade, saving you drawdowns!

If you want to learn more about position sizing, don’t forget to take our course on position sizing!

References

Busseti, E., Ryu, E. K., Boyd, S. (2016), “Risk-Constrained Kelly Gambling”, Working paper. https://web.stanford.edu/~boyd/papers/pdf/kelly.pdf

File in the download

The Kelly Criterion - Python notebook

Visit blog to download

A time-varying-parameter vector autoregression model with stochastic volatility

Jose Carlos Gonzales Tanaka — Thu, 07 Nov 2024 14:44:45 GMT

By José Carlos Gonzáles Tanaka

The basic Vector Autoregression (VAR) model is heavily used in macro-econometrics for explanatory purposes and forecasting purposes in trading. In recent years, a VAR model with time-varying parameters has been used to understand the interrelationships between macroeconomic variables. Since Primiceri (2005), econometricians have been applying these models using macroeconomic variables such as:

Japan time series (Nakahima, 2011)
US Bond yields (Fischer et al., 2022)
Monthly Stock Indices from industrialized countries (Gupta et al., 2020)
Peruvian exchange rate (Rodriguez et al., 2024)
Indian exchange rate (Kumar, M., 2010)

This article extends the model usage to something our audience greatly cares about: trading! You’ll learn the basics of the estimation procedure and how to create a trading strategy based on the model. You can find the TVP-VAR-SV model code on GitHub as well.

Are you excited? I was when I started writing this article. Let me share what I’ve learned with you!

This blog covers:

What is the difference between a basic VAR and a TVP-VAR-SV model?
The TVP-VAR-SV model variables
The priors
The mixture of indicators
The TVP-VAR-SV model estimation algorithm
A TVP-VAR-SV estimation in R
A trading strategy using the TVP-VAR-SV model in R
Notes about the TVP-VAR-SV strategy

What is the difference between a basic VAR and a TVP-VAR-SV model?

All the explanations of the basic VAR can be found in our previous article on VAR models. Here, we’ll provide the system of equations and compare them with our new model.

Let’s remember the basic model. For example, a basic bivariate VAR(1) can be described as a system of equations:

\[ Y_{1,t} = \phi_{11} Y_{1,t-1} + \phi_{12} Y_{2,t-1} + u_{1,t} \] \[ Y_{2,t} = \phi_{21} Y_{1,t-1} + \phi_{22} Y_{2,t-1} + u_{2,t} \]

Or,

\[ Y_t = \Phi Y_{t-1} + U_t \]

Where

\[ Y_t = \begin{bmatrix} Y_{1,t} \\ Y_{2,t} \end{bmatrix} \] \[ \Phi_t = \begin{bmatrix} \phi_{11} & \phi_{12} \\ \phi_{21} & \phi_{22} \end{bmatrix} \] \[ Y_{t-1} = \begin{bmatrix} Y_{1,t-1} \\ Y_{2,t-1} \end{bmatrix} \] \[ U_t = \begin{bmatrix} u_{1,t} \\ u_{2,t} \end{bmatrix} \]

A time-varying parameter VAR would be something like the following:

\[ Y_{1,t} = \phi_{11,t} Y_{1,t-1} + \phi_{12,t} Y_{2,t-1} + \epsilon_{1,t} \] \[ Y_{2,t} = \phi_{21,t} Y_{1,t-1} + \phi_{22,t} Y_{2,t-1} + \epsilon_{2,t} \]

Do you get to see the difference between the two models? Not yet?

Let’s use matrices to see it clearly.

\[ Y_t = \Phi_t Y_{t-1} + U_t \]

Where:

\[ Y_t = \begin{bmatrix} Y_{1,t} \\ Y_{2,t} \end{bmatrix} \] \[ \Phi_t = \begin{bmatrix} \phi_{11,t} & \phi_{12,t} \\ \phi_{21,t} & \phi_{22,t} \end{bmatrix} \] \[ Y_{t-1} = \begin{bmatrix} Y_{1,t-1} \\ Y_{2,t-1} \end{bmatrix} \] \[ \mathcal{E}_t = \begin{bmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{bmatrix} \]

Now you see it?

The only difference is that the model's parameters vary as time passes. Hence, it’s referred to as a “time-varying-parameter” model.

Even though the difference appears simple, the estimation procedure is much more complex than the basic VAR estimation.

You now say: I know we can have time-varying parameters, but where is the stochastic volatility in the previous equations?

Wait for it, my friend! We’ll see it later!

Don’t worry, we’ll keep it simple!

The TVP-VAR-SV model variables

The system of equations of the model

Using a new notation provided by Primiceri (2005):

$$Y_t = B_t Y_{t-1} + A_t^{-1} \Sigma_t^{-1} \varepsilon_t$$

Where:

Y: The vector of time series

B: The parameters of the lagged time series of this reduced model

A: The contemporary parameters of the time series vector

Sigma: The time-varying standard deviation (volatility) of each equation in the VAR.

Epsilon: A vector of shocks of each equation in the VAR.

What is the reduced model and what are contemporary parameters?

Well, in macroeconometrics, the reduced model can be understood as a simple VAR as modeled in our previous article on VAR models. In this model, today’s time series values of the VAR vector are impacted only by their lag versions.

However, economists also talk about the impact that the same today’s time series values have on each other today’s time series values. This can be modeled as:

$$A_t Y_t = C_t Y_{t-1} + \Sigma_t \varepsilon_t$$

This can shown as a matrix below:

$$\begin{bmatrix} a_{11,t} & a_{12,t} \\ a_{21,t} & a_{22,t} \end{bmatrix} \begin{bmatrix} y_{1,t} \\ y_{2,t} \end{bmatrix} = \begin{bmatrix} c_{11,t} & c_{12,t} \\ c_{21,t} & c_{22,t} \end{bmatrix} \begin{bmatrix} y_{1,t-1} \\ y_{2,t-1} \end{bmatrix} + \begin{bmatrix} \sigma_{1,t} & 0 \\ 0 & \sigma_{2,t} \end{bmatrix} \begin{bmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{bmatrix}$$

Which can also be presented as a system of equations:

$$\begin{aligned} a_{11,t}y_{1,t} + a_{12,t}y_{2,t} &= c_{11,t}y_{1,t-1} + c_{12,t}y_{2,t-1} + \sigma_{1,t}\epsilon_{1,t} \\ a_{21,t}y_{1,t} + a_{22,t}y_{2,t} &= c_{21,t}y_{1,t-1} + c_{22,t}y_{2,t-1} + \sigma_{2,t}\epsilon_{2,t} \end{aligned}$$

The above model is understood in econometrics as a structural model to comprehend the time series interrelationships, contemporary or not, between the time series analyzed.

So, assuming we have daily data, the first question, which belongs to y1, has a12*y2 as today’s y2 impact on today’s y1. The same is true for the second question, which belongs to y2, where we see a21*y1, which is today’s time series y1 impact on y2. In a VAR, we have lag periods impacting today’s variables, in a structural VAR we have today’s variables impacting today’s other variables.

Due to these contemporary relationships, there is a problem called endogeneity, where the error terms epsilons are correlated with Y_t-1. To estimate a structural VAR, we need to clearly identify the matrix A variables. As Eric (2021) explained, there are 3 ways in the economic literature. But it’s not only that, as per this model, A is also time-varying. We’ll see later how this variables are estimated.

When you pre-multiply the system of equations by A^-1, you get something like:

$$Y_t = A_t^{-1} C_t Y_{t-1} + A_t^{-1} \Sigma_t \varepsilon_t$$

Which can be further simplified as:

$$Y_t = B_t Y_{t-1} + U_t$$

So,

$$B_t = \Phi_t = A_t^{-1} C_t \\ U_t = A_t^{-1} \Sigma_t \mathcal{E}_t$$

Time-varying volatilities?

Yes! In a basic VAR, the error terms are homoskedastic, meaning, they present constant variance. In this case, we have variances that change over time; they’re time-variant.

The time-varying parameter stochastic behaviors

The basic VAR had its parameters constant. In this TVP-VAR-SV, we have almost all of our parameters time-variant. Due to this, we need to assign them stochastic processes. As in Primiceri (2005), we define them as:

$$\begin{aligned} B_t &= B_{t-1} + \nu_t \\ a_t &= a_{t-1} + \zeta_t \\ \log \sigma_t &= \log \sigma_{t-1} + \eta_t \end{aligned}$$

We can then specify the matrix of variances of all the model’s shocks as:

$$V = Var \left\{ \begin{bmatrix} \epsilon_t \\ \nu_t \\ \zeta_t \\ \eta_t \end{bmatrix} \right\} = \begin{bmatrix} I_n & 0 & 0 & 0 \\ 0 & Q & 0 & 0 \\ 0 & 0 & S & 0 \\ 0 & 0 & 0 & W \end{bmatrix}$$

Where I_n is the identity matrix and n is the number of time series in the VAR (in our case it’s 2). Q, S, and W are square positive-definite covariance matrices with a number of rows (or columns) equal to the number of parameters in B, A, and Sigma, respectively.

Something else to note: sigma is stochastic-based, which can be interpreted as stochastic volatility as, e.g., the Heston model is.

The priors

For a Bayesian inference, you always need priors. In the Primiceri (2005) algorithm, the priors are computed using your data sample's first “T1” observations.

Using our previously defined variables, you can specify the priors (following Primiceri, 2005, and Del Negro and Primiceri, 2015):

$$\begin{aligned} B_0 &\sim N(B_{OLS}, 4V(B_{OLS})) \\ A_0 &\sim N(A_{OLS}, 4V(A_{OLS})) \\ \log \sigma_0 &\sim N(\log \sigma_{OLS}, I_n) \\ Q_0 &\sim IW(k_Q^2 \cdot 40 \cdot V(B_{OLS}), 40) \\ W_0 &\sim IW(k_W^2 \cdot 2 \cdot I_n, 2) \\ S_0 &\sim IW(k_S^2 \cdot 2 \cdot V(A_{OLS}), 2) \end{aligned}$$

Where

N(): Normal distribution
B_ols: This is the point estimate of the B parameters obtained by estimating a basic time-invariant VAR using the first T1 observations of the data sample.
V(B_ols): This is the point estimate of the B parameters’ variances obtained by estimating a basic time-invariant structural VAR using the first T1 observations of the data sample. In B_0, the variance is multiplied by 4. This value can be named k_B.
A_ols: This is the point estimate of the A parameters obtained by estimating a basic time-invariant structural VAR using the first T1 observations of the data sample.
V(A_ols): This is the point estimate of the A parameters’ variances obtained by estimating a basic time-invariant structural VAR using the first T1 observations of the data sample. In A_0, this variance is multiplied by 4. This value can be named k_A.
log(sigma_0): This is the point estimate of the standard errors obtained by estimating a basic time-invariant structural VAR using the first T1 observations of the data sample.}
I_n: This is the identity matrix with “nxn” dimensions, where “n” is the number of time series used to estimate the VAR on them. Contrary to to A_0 and B_0, this variance is just multiplied by 1, where this value can be named k_sig.
IW: The inverse Wishart distribution
Q_0 follows an IW distribution with a scale matrix of k_Q^2 times 40 times V(B_ols) and 40 degrees of freedom
W_0 follows an IW distribution with a scale matrix of k_W^2 times 2 times V(B_ols) and 2 degrees of freedom
Q_0 follows an IW distribution with a scale matrix of k_S^2 times 2 times V(B_ols) and 2 degrees of freedom
k_Q^2, k_W^2 and k_S^2 are 1, 0.01 and 0.1, respectively.

Once you estimate the priors with the first T1 observations, then you get the posterior distribution using the rest of the data sample.

The mixture of indicators

Before we dive into the algorithm, let’s learn something else. Do you remember the reduced-form model:

$$Y_t = B_t Y_{t-1} + A_t^{-1} \Sigma_t \varepsilon_t$$

To clear the error term, we get

$$A_t(Y_t - B_t Y_{t-1}) = A_t \hat{y}_t = \Sigma_t \varepsilon_t$$

Primiceri (2005), appendix A.2 explains that the above model has a Gaussian non-linear state space representation. The difficulty with drawing Sigma_t is that they enter the model multiplicatively.

This presents the issue of not making it easy for the Kalman filter estimation done inside the whole estimation algorithm (The Kalman filter is linear-based). To overcome this issue, Primiceri (2005) applies squaring and takes the logarithms of every element of the previous equation. As a consequence of this transformation, the resulting state-space form becomes non-Gaussian, because the log(epsilon_t^2) has a log chi-squared distribution. To finally get a normal distribution for the error terms, Kim et al. (1998) use a mixture of normals to approximate each element of log(epsilon_t^2). Thus, the estimation algorithm uses the mixture indicators for each error term and each date.

$$S^T \equiv \{s_t\}_{t=1}^T$$

The TVP-VAR-SV model estimation algorithm

First of all, you need to know the TVP-VAR model estimation explained here follows the Primiceri (2005) methodology and Del Negro and Primiceri (2015).

This methodology uses the modified Bayesian-based Gibbs sampling algorithm provided by Cogley and Sargent (2005) to estimate the parameters.

Now you say: What? Is that Chinese?

We’ve got it! Don’t worry! Let’s explain the algorithm in simple words in detail. To let you know, regarding the Bayesian estimation approach, please refer to this article on Bayesian Statistics In Finance and this other one on Foundations of Bayesian Inference to fully learn more about it.

Let’s explain the algorithm. Following Del Negro and Primiceri (2015), the algorithm consists of the following loop:

$ \text{for each MCMC iteration:} $
$ \hspace{1em}\text{- Draw } \Sigma^T \text{ from } p(\Sigma^T | y^T, \theta^T, s^T) $
$ \hspace{1em}\text{- Draw } \theta \text{ from } p(\theta | y^T, \Sigma^T) $
$ \hspace{1em}\text{- Draw } s^T \text{ from } p(s^T | y^T, \Sigma^T, \theta) $

Where

Draw means

Use the Kalman filter to update the state equation and compute the likelihood.
Sample the variable from its posterior distribution using a Metropolis-Hastings step.

MCMC is Markov Chain Monte Carlo. Please refer to our article on Introduction To Monte Carlo Analysis to learn more about this type of Monte Carlo and the Metropolis-Hastings algorithm.
Theta is [B, A, V] where these 3 variables were defined previously.
p(e|d) is the corresponding probability distribution of “e” given “d”.

You iterate until you make the distribution converge. Even though we say the algorithm is based on MCMC and Metropolis-Hastings, Primiceri (2005) applies his own specifications for his TVP-VAR-SV model.

A TVP-VAR-SV estimation in R

Let’s see how we can estimate the model in this programming language. Let’s install the corresponding libraries.

Then let’s import them

Let’s import the data and compute the log returns.

Let’s estimate the model with all the available data and forecast the next-day return. To get this forecast, we get draws from the converged posterior distribution and we use the mean of all the draws to output a forecast point estimate. You can also use the median or any other measure of central tendency (Giannone, Lenza, and Primiceri, 2015).

Output:

[0.015880450, 0.013688861, 0.014319192, 0.002445156, 0.005108312, 0.020364678, 0.015684312]

These returns’ signs will depend on each day’s estimation.

There are 4 inputs to discuss:

tau: is the length of the training sample used for determining prior parameters via least squares (LS). In this case, we set it to one year: 250 observations. So, if we have “n” observations, we use the first 250 observations to get the priors and the last “n-250” for model estimation.
nf: Number of future time periods for which forecasts are computed. In this case, we’re interested in the next-day return.
nrep: It’s the number of MCMC draws excluding the burn-in observations. We set it to 300. You can read more about it in our blog on Introduction To Monte Carlo Analysis.
nburn: The number of MCMC draws used to initialize the sampler used for convergence to get the posterior distribution. We set it to 20. So, since we have 300 draws, we compute the posterior distributions with the last 280 draws (300-20).

The function actually has more inputs, let’s see them together with their default values:

k_B = 4, k_A = 4, k_sig = 1, k_Q = 0.01, k_S = 0.1, k_W = 0.01,

pQ = NULL, pW = NULL, pS = NULL

You can relate k_B, k_A and k_sig with the previous section priors. Regarding the other inputs, see below:

$$\begin{aligned} k_Q &= 0.01 \\ k_S &= 0.1 \\ k_W &= 0.01 \\ p_Q &= 40 \\ p_W &= 2 \\ p_S &= 2 \end{aligned}$$

A trading strategy using the TVP-VAR-SV model in R

Now we get to the place you wanted! We will use the same imported libraries and the same dataframe called var_data, which contains the stocks’ price log returns.

Some things to say:

We initialize the forecasts from 2019 onwards.
We estimate using 1500 observations as span.
We also estimate a basic VAR to compare performance with our TVP-VAR-SV strategy.
The strategy for both models will be long only.
Since the TVP-VAR-SV model estimation takes a lot of time each trading period, we have made the code script so that if you need to stop the code from running, you can do it and run it later by running the whole code again.

Let’s first define the function that will allow us to import the dataframe of the forecast results dataframe of the basic VAR and the TVP-VAR-SV model in case you have done so previously.

Then, we

Set the initial date to start the forecasting process.
Import the saved df_forecasts dataframe, otherwise, we create a new one without the previous function.
Set the span as 1500.

Next, we create the basic-VAR-based strategy signals. The code follows our previous article on Vector AutoRegression model.

Next, we create the TVP-VAR-SV model signals through a similar loop. This time we set tau to 40. The input can be chosen arbitrarily as long as you respect the proportions between nrep and nburn.

The estimation of the model each day will take some minutes, thus, the whole loop will take a long time. Be careful. In case you need to turn off your computer before the loop runs, you can just turn on once again your computer and run the script once again. The code is written in such a way that whenever you want to continue running the for loop, you can just run the whole code.

Next, we compute the strategy returns. We have four strategies

A Buy-and-Hold strategy
A basic-VAR-based strategy
A TVP-VAR-SV-based strategy
A strategy based on the TVP-VAR-SV model makes its long signals if and only if the Buy-and-Hold cumulative returns are higher than their 15-window simple moving average.

See, Focusing only on equity returns, the basic VAR performs the worst.

The TVP-VAR-SV and the SMA-based TVP-VAR-SV strategy perform closely to the Buy-and-hold strategy. However, the latter performs the best in almost all the years. Let’s see their trading summary statistics.

The equity-curve informal conclusion can be further confirmed by the summary statistics.

The basic VAR performs the worst with respect to not only returns but also equity volatility. This is reflected in poor results for the Sharpe, Calmar, and Sortino ratios. The maximum drawdown is also huge.
The TVP-VAR-SV performs slightly better with respect to the Buy-and-Hold strategy.
The SMA-based TVP-VAR-SV is the best performer. It has an increased 80% equity curve return with respect to the Buy-and-Hold strategy and the other statistics are clearly better. The Sortino ratio is quite good, too.

Notes about the TVP-VAR-SV strategy

There are some things we need to take into account while developing a strategy based on this model:

We chose tau equal to 40 arbitrarily, which is probably not enough. Choosing another number would likely produce different results. The seed is also arbitrarily chosen. Do a hyperparameter tuning to get the best results while doing a walk-forward optimization.
We have chosen nrep equal to 300. This is quite small compared to macroeconometric standards, where nrep gets to be 50,000 in some cases. The reason econometricians use such a large number is that macroeconomic data samples are usually very small compared to financial data samples. Due to the low quantity of data samples, macroeconomic data tends to be fitted with this model very quickly even though they use nrep high. Due to our span being equal to 1500, if we used nrep equal to 50,000, the estimation for each day will surely take hours or even days. That’s why we use only 300 as nrep. Please feel free to change nrep at your convenience. Just make sure that, if you trade hourly, the model estimation should take less time than an hour for live trading, if you trade daily, the model estimation should take less than a day, and so on.
We haven’t incorporated stop-loss and take-profit targets. Please do so to improve your results.

Conclusion

We have delved into the basic definition of a TVP-VAR-SV model. We then explained a little bit the model estimation, and finally we opted for a trading strategy backtesting loop script to test the model performance.

Do you want to learn the basics of the financial time series analysis? Do not hesitate to learn from our course Financial Time Series Analysis for Trading.

Do you want more models to be tested?

Do not hesitate to follow our blog, we’re always creating more strategies for you!

References

Cogley, T. and Sargent, T. J. (2005), “Drifts and Volatilities: Monetary Policies and Outcomes in the Post WWII U.S.,” Review of Economic Dynamics, 8(2), 262-302.
Del Negro, M. and Primiceri, G., (2015), Time Varying Structural Vector Autoregressions and Monetary Policy: A Corrigendum, The Review of Economic Studies, 82, issue 4, p. 1342-1345.
Eric (2021) “Understanding and Solving the Structural Vector Autoregressive Identification Problem” in https://www.aptech.com/blog/understanding-and-solving-the-structural-vector-autoregressive-identification-problem/, consulted on August 1st, 2024.
Fischer MM, Hauzenberger N, Huber F, Pfarrhofer M (2023) General Bayesian time-varying parameter VARs for predicting government bond yields. J Appl Econom 38(1):69–87
Giannone, Domenico, Lenza, Michele and Primiceri, Giorgio, (2015), Prior Selection for Vector Autoregressions, The Review of Economics and Statistics, 97, issue 2, p. 436-451.
Gupta, R., Huber, F., and Piribauer, P. (2020) Predicting international equity returns: Evidence from time-varying parameter vector autoregressive models, International Review of Financial Analysis, Volume 68, 101456, ISSN 1057-5219.
Kim, S., Shephard, N., and Chib, S., (1998), Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models, The Review of Economic Studies, 65, issue 3, p. 361-393.
Kumar, M., (2010) A time-varying parameter vector autoregression model for forecasting emerging market exchange rates, International Journal of Economic Sciences and Applied Research, ISSN 1791-3373, Kavala Institute of Technology, Kavala, Vol. 3, Iss. 2, pp. 21-39
Nakajima, Jouchi, (2011), Time-Varying Parameter VAR Model with Stochastic Volatility: An Overview of Methodology and Empirical Applications, Monetary and Economic Studies, 29, issue, p. 107-142.
Primiceri, Giorgio, (2005), Time Varying Structural Vector Autoregressions and Monetary Policy, The Review of Economic Studies, 72, issue 3, p. 821-852.
Rodriguez, G., Castillo, P., Calero, R., Salcedo, R., Arellano, M. A., (2024), Evolution of the exchange rate pass-through into prices in Peru: An empirical application using TVP-VAR-SV models, Journal of International Money and Finance, Volume 142, 2024, 103023, ISSN 0261-5606.

File in the download:

Trading strategy using the TVP-VAR-SV model in R - Python notebook

Visit blog to download

Disclaimer: All data and information provided in this article are for informational purposes only. QuantInsti^® makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information in this article and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis.

Fibonacci Retracement: Trading Strategy, Python implementation, and more

Ishan Shah — Mon, 29 Apr 2024 10:38:00 GMT

By Ishan Shah

Fibonacci trading tools are used to determine support/resistance levels or to identify price targets. It is the presence of the Fibonacci series in nature which attracted technical analysts’ attention to use Fibonacci for trading. Fibonacci numbers may work like magic in some cases, in finding key levels in any widely traded security. Fibonacci's retracement strategy relies on key retracement levels to predict future price movements.

In this guide, we delve into Fibonacci retracement levels and their implementation using Python, enabling traders to leverage these mathematical principles for informed decision-making.

By combining technical analysis with programming capabilities, traders gain a deeper understanding of market dynamics and enhance their ability to execute trades with maximum returns. So let us dive in and unlock the potential of the Fibonacci Retracement Trading Strategy in Python for navigating volatile financial markets.

Moving ahead, let us find out more with this blog that covers:

What is the Fibonacci sequence?
What is the Fibonacci retracement strategy?
How to use Fibonacci retracement in trading?
Calculating Fibonacci retracement levels using Python
Best practices for optimising Fibonacci trading strategy in Python
Overcoming challenges faced while using Fibonacci trading strategy

What is the Fibonacci sequence?

The Fibonacci retracement strategy involves the use of the Fibonacci sequence. So, let us first of all learn about the Fibonacci sequence.

The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones. The sequence usually is: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, and so on.

The sum is in the following order:

In mathematical terms, the Fibonacci sequence can be defined recursively by the formula:

X(n) = X(n-1) + X(n-2)

Where:

X(n) is the nth number in the sequence
X(n-1) is the (n-1)th number in the sequence
X(n-2) is the (n-2)th number in the sequence

In finance and trading, the Fibonacci sequence is widely used in technical analysis to identify potential support and resistance levels and is an essential part of the Fibonacci retracement strategy.

Moreover, there are some interesting properties of the Fibonacci sequence.

Divide any number in the sequence by the previous number; the ratio is always approximately 1.618.

Xn/Xn-1 = 1.618

55/34 = 1.618

89/55 = 1.618

144/89 = 1.618

1.618 is known as the golden ratio. I suggest searching for the golden ratio examples on the Google images and you will be pleasantly astonished by the relevance of the ratio in nature.

2. Similarly, divide any number in the sequence by the next number; the ratio is always approximately 0.618.

Xn/Xn+1 = 0.618

34/55 = 0.618

55/89 = 0.618

89/144 = 0.618

3. 0.618 expressed in percentage is 61.8%. The square root of 0.618 is 0.786 (78.6%).

Similar consistency is found when any number in the sequence is divided by a number two places right to it.

Xn/Xn+2 = 0.382

13/34 = 0.382

21/55 = 0.382

34/89 = 0.382

0.382 expressed in percentage is 38.2%

4. Also, there is consistency when any number in the sequence is divided by a number three places right to it.

Xn/Xn+3 = 0.236

21/89 = 0.236

34/144 = 0.236

55/233 = 0.236

0.236 expressed in percentage terms is 23.6%.

5. The ratios 23.6%, 38.2%, 61.8%, and 78.6% are known as the Fibonacci ratios.

Now we can move to learning about Fibonacci retracement strategy.

What is the Fibonacci retracement strategy?

The Fibonacci retracement strategy is a popular technical analysis tool to identify potential reversal levels in financial markets and is used by traders. Based on the Fibonacci sequence, this strategy involves plotting key retracement levels. The typical or default levels are 23.6%, 38.2%, 50%, 61.8%, and 78.6%, against a price movement.

These levels are derived from ratios found in the Fibonacci sequence, believed to represent areas of support or resistance.

Fibonacci retracement levels help traders to identify the entry and exit points for trades. Hence, the determination of the stop-loss and take-profit levels is done. When the price of an asset retraces to one of these Fibonacci levels, it may indicate a potential reversal in the prevailing trend.

The Fibonacci ratios, 23.6%, 38.2%, and 61.8%, can be applied for time series analysis to find support levels. Whenever the price moves substantially upwards or downwards, it tends to retrace back before it continues moving in the original direction.

For example, if the stock price has moved from $200 to $250, it is likely to retrace to $230 before it continues to move upward. The retracement level of $230 is forecasted using the Fibonacci ratios.

We can arrive at $230 by using simple maths.

Total up move = $250 - $200 = $50 38.2% of up move = 38.2% * 50 = $19.1
Retracement forecast = $250 - $19.1 = $230.9

Any price below $230 provides a good opportunity for the traders to enter into new positions in the direction of the trend. Likewise, we can calculate for 23.6%, 61.8% and the other Fibonacci ratios.

The Fibonacci retracement strategy is commonly applied alongside other technical indicators and analysis techniques to confirm signals and enhance trading decisions. Additionally, it can be used across various financial instruments and timeframes, making it a versatile tool for traders across different markets.

Let us now find out how to use Fibonacci retracement in trading.

How to use Fibonacci retracement in trading?

The retracement levels can be used to buy a particular stock but you have not been able to because of a sharp run-up in the stock price.

In such a situation, it is recommended to wait for the price to correct to Fibonacci retracement levels such as 23.6%, 38.2%, and 61.8% and then buy the stock. The ratios 38.2% and 61.8% are the most important support levels.

This Fibonacci retracement trading strategy is more effective over a longer time interval and like any indicator, using the strategy with other technical indicators such as RSI, MACD, and candlestick patterns can improve the probability of success.

Now, we will head to calculating Fibonacci retracement levels using Python.

Calculating Fibonacci retracement levels using Python

As we now know, retracements are the price movements that go against the original trend. To forecast the Fibonacci retracement level we should first identify the total up move or total down move. To mark the move, we need to pick the most recent high and low on the chart.

Let’s take an example of Exxon Mobil to understand the Fibonacci retracement construction.

Output:

Minimum Price: 31.56999969482422
Maximum Price: 121.37000274658203

Output:

Level	Price
0	 121.37000274658203
0.236	 100.17720202636718
0.382	 87.06640158081055
0.618	 65.8736008605957
1	 31.56999969482422

It is visible that the maximum price is 121.37 where the level is 0 since there is no retracement at the maximum price.

The first retracement level is at 23.6% is $100.17, the second retracement level is at 38.2% is $87.06, and the next retracement level is at 61.8% is $65.87.

Best practices for optimising Fibonacci trading strategy in Python

When you implement Python for the Fibonacci trading strategy, there are chances that optimisation will be required to improve the strategy performance.

Consider the following tips and best practices for the same:

Define clear trading rules: Establish clear rules for identifying Fibonacci retracement levels and trade setups. This helps to remove subjectivity and emotion from the decision-making process.
Backtest the strategy: Use historical market data to backtest the Fibonacci strategy across various market conditions. This helps validate the effectiveness of the strategy and identify its strengths and weaknesses.
Optimise parameters: Fine-tune the parameters of the Fibonacci strategy, such as the anchor points for drawing retracement levels or the threshold for confirming trade signals. Optimisation can be done using techniques like grid search.
Incorporate risk management: Implement robust risk management techniques to protect capital and minimise losses. This may include setting stop-loss orders, position sizing based on risk tolerance, and diversifying across multiple assets or instruments.
Combine with other indicators: Enhance the Fibonacci strategy by integrating it with other technical indicators or chart patterns. This can help confirm trade signals and increase the probability of successful trades.
Continuously monitor and adapt: Regularly monitor the performance of the Fibonacci strategy and make adjustments as needed based on evolving market conditions. This may involve refining the strategy parameters, adding new filters, or incorporating feedback from live trading experiences.

By following these tips and best practices, traders can optimise their Fibonacci trading strategy in Python and improve their overall trading performance.

Moving to the next section, we will find out how to overcome the challenges faced while using the Fibonacci trading strategy.

Overcoming challenges faced while using Fibonacci trading strategy

Challenges with the Fibonacci Trading Strategy	Ways to Overcome the Challenges
Subjectivity: Identifying the correct swing highs and lows to anchor Fibonacci retracement levels can be subjective and may vary among traders.	Use Objective Criteria: Define clear criteria for identifying swing highs and lows, such as significant price peaks and troughs, or use automated tools to detect these points. Additionally, consider using multiple timeframes to confirm key levels.
Overfitting: There is a risk of overfitting the Fibonacci levels to historical data, leading to poor performance in real-time trading.	Validate with Backtesting: Test the Fibonacci strategy on historical data across different market conditions to ensure robustness. Avoid over-optimising the strategy based on specific past events. Incorporate risk management rules to limit potential losses.
False Signals: Fibonacci retracement levels may sometimes generate false signals, resulting in poor trade execution and losses.	Combine with Other Indicators: Use Fibonacci levels in conjunction with other technical indicators, such as moving averages, trendlines, or candlestick patterns, to confirm trade setups. This can help filter out false signals and improve the reliability of the strategy.
Emotional Bias: Traders may become emotionally attached to Fibonacci levels, leading to biased decision-making and reluctance to adapt to changing market conditions.	Stay Disciplined: Stick to predefined trading rules and objectives, regardless of emotional impulses or attachment to Fibonacci levels. Regularly review and adjust the strategy based on objective performance metrics and market feedback.
Market Noise: In choppy or volatile market conditions, Fibonacci levels may not accurately capture price movements, resulting in increased noise and false signals.	Adjust Parameters: Consider adjusting the sensitivity of Fibonacci levels by modifying the anchor points or using alternative Fibonacci tools, such as Fibonacci extensions or clusters, to better align with prevailing market dynamics. Additionally, apply filters to smooth out noise and focus on high-probability trade setups.

Conclusion

The Fibonacci retracement trading strategy in Python offers traders a systematic approach to navigating volatile financial markets and enables them to unlock the potential for maximum returns.

Mastering the Fibonacci retracement trading strategy in Python equips traders with a powerful tool for identifying potential price reversal levels and making informed trading decisions. Fibonacci retracement is one of the most widely used entry timing tools in a swing trading course, where traders use the 38.2% and 61.8% levels to identify high-probability pullback zones before taking positions in the direction of the prevailing trend. By leveraging the Fibonacci sequence and ratios, traders can pinpoint key support and resistance levels, allowing for precise entry and exit points in the market. Through the implementation of Python programming, traders gain the ability to calculate and visualise Fibonacci retracement levels accurately, enhancing their technical analysis capabilities.

If you wish to learn more about the Fibonacci retracement strategy, check out the course on price action trading strategies. This course will help you learn the strategies and codes that help you to tweak, fine-tune and implement this strategy in the live markets. Learn how to spot and trade the most important trading patterns: double tops/double bottoms, triple tops/triple bottoms, head and shoulders. Get acquainted with several trading strategies, and price action tools such as pivot points and the Fibonacci Retracement levels via a practical approach. Enroll now!

File in the download

Fibonacci retracement strategy - Python notebook

Visit blog to download

Note: The original post has been revamped on 29^th April 2024 for recentness, and accuracy.

Disclaimer: All investments and trading in the stock market involve risk. Any decision to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

Standard Deviation in Trading: Calculations, Use Cases, Examples and more

Chainika Thakar — Thu, 07 Mar 2024 05:51:58 GMT

Whether you're a seasoned trader or just starting in quantitative finance, grasping the concept of standard deviation is crucial. Standard deviation helps traders with volatility measures in finance. With this helpful tool, traders can make sense of market fluctuations and manage risks effectively.

In this blog, we break down the idea of standard deviation by imparting all the necessary information related to standard deviation in the trading domain. Let's dive into the exciting journey of understanding standard deviation’s role in trading as we cover:

What is standard deviation?
Formula of standard deviation
How to calculate standard deviation?
Examples of applications of standard deviation
Use cases of standard deviation
Essential components of standard deviation
Standard deviation in trading as a measure of volatility
Computing annualized volatility of stocks using Python
Real-world Case Studies of standard deviation
- Application in Different Markets
- Impact on Trading Decisions
Correlation of standard deviation with other indicators
Limitations of standard deviation in trading
Common misconceptions about standard deviation in trading
Risk management tips using standard deviation

What is standard deviation?

Let us see a famous quote defining standard deviation by John Bollinger, a well-known figure in the trading world, primarily recognised for developing the widely-used technical analysis tool known as Bollinger Bands.

"Standard deviation is a key tool for traders to quantify the uncertainty and risk in the market. It allows us to better understand the potential variability of returns and make informed decisions to manage our portfolios effectively." - John Bollinger

Definition of standard deviation

“The standard deviation (σ) is a measure that is used to quantify the amount of variation or dispersion of data from its mean.”

Let's simplify the concept of deviation from the mean.

In essence, deviation refers to how far a data point is from the average. Imagine we have a set of observations represented by the variable X, consisting of various values: x₁, x₂, ..., xn.

Now, let's consider two of these observations (as shown in the image below), x₁ and x₂, and their deviations from the mean of X.

Deviations are straightforward: they tell us if an observation is above or below the mean, shown by positive or negative values respectively.

What if we add up all these deviations?

Interestingly, they would balance out to zero due to the mix of positive and negative values. To overcome this, we square each difference to remove the sign and find their average. This yields the variance, indicating how spread out the data is.

Standard deviation, derived from the variance, provides a standardised measure of dispersion. It involves taking the positive square root of the variance. This process ensures we're dealing with values in the same units as the original data. In the following section, we'll delve into the formula for calculating standard deviation.⁽¹⁾

Formula of standard deviation

The formula for calculating the standard deviation (denoted by σ) is as follows:

$$σ = \sqrt\sum(x_i-μ)^2/n)$$

$$\text{Where,}\\
σ\;\text{is the standard deviation}\\
x_i\;\text{represents each individual data point}\\
μ\;\text{is the mean of the data set}\\
Σ\;\text{denotes the sum of all the values}\\
n\;\text{is the total number of data points}$$

Going forward, we will discuss the calculation of standard deviation.

How to calculate standard deviation?

To calculate the standard deviation using Python, you can utilise libraries such as pandas and numpy. Here's a step-by-step guide to calculate the standard deviation using historical price data:

Step 1: Install Required Libraries and import packages

If you haven't already installed numpy, you can install it using pip:

Step 2: Define an example data

Here we have taken an array of numbers to show the calculation.

Step 3: Compute Standard Deviation

Output:

Standard Deviation of Data: 7.0710678118654755

We will now discuss some examples of the applications of standard deviation in general as well as in trading.

Examples of applications of standard deviation

Let us check out the general examples of applying standard deviation before heading to the trade oriented example.

General examples

The term standard deviation sounds like something you hear in a statistics class, but don’t dismiss it as an overly technical term just yet. It can be used in different aspects of our lives.

A teacher can use the standard deviation of marks of her students in an exam as a metric to assess the overall level of understanding of the subject. If the mean and standard deviation are both high, it indicates that, on average, students have a good understanding of the subject.

However, there would be many students who have scores that are much above and much below the average scores. In case the mean is high and the standard deviation is low, it indicates that the average scores are similar to the previous case.

The low standard deviation tells her that most students have scores that are close (i.e. slightly above and slightly below) to the mean. In weather forecasting, it can be used to compare the weather patterns in two or more regions.

If we compare the standard deviation of temperatures in Jaisalmer (which has extreme weather) with Mumbai (which has moderate weather), we would find that the former has more variability in temperature around the mean.

Examples of application in trading

Example: Amazon (AMZN) Stock

Volatility Analysis: Traders analysing Amazon's stock may calculate the standard deviation of its daily returns over a specific period, such as the past year. A higher standard deviation indicates greater price volatility, implying larger price swings.
For instance, if Amazon's stock has a daily standard deviation of 2% over the past year, it suggests that, on average, the daily price movements deviate by 2% from the mean daily return.
Options Trading: Standard deviation is a crucial factor in options pricing models like the Black-Scholes model. Traders estimating the implied volatility of Amazon's options contracts may use historical standard deviation as a reference point.
For instance, if the historical standard deviation of Amazon's stock is 20% and the implied volatility of its options is significantly higher, it might suggest that options are relatively expensive, potentially presenting trading opportunities.
To build on your understanding of implied volatility and options pricing, consider exploring trading volatility options course, where you’ll gain practical experience in strategies and techniques for analyzing and capitalizing on market fluctuations.
Risk Management: Investors holding Amazon's stock in their portfolio may use standard deviation to assess and manage risk. By calculating the standard deviation of Amazon's daily returns, investors can estimate the potential range of price movements and set stop-loss orders or position sizes accordingly.
For instance, if an investor is comfortable with a certain level of risk, they may adjust their position size based on Amazon's historical standard deviation to align with their risk tolerance.

Use cases of standard deviation

Here are the use cases of standard deviation in risk assessment, volatility analysis, and portfolio management:

Risk Assessment

Credit Risk Evaluation: In financial institutions, standard deviation is used to assess the variability of returns on loans or investments. A higher standard deviation indicates higher volatility, implying greater risk. Lenders may use standard deviation to evaluate the creditworthiness of borrowers and determine appropriate interest rates.
Market Risk Management: Standard deviation helps quantify market risk by measuring the variability of asset prices or portfolio returns. Traders and investors use standard deviation to assess the potential downside risk of their investments and implement risk mitigation strategies accordingly.

Volatility Analysis

Options Pricing: Standard deviation is a key input in options pricing models like the Black-Scholes model. A higher standard deviation implies higher implied volatility, leading to higher option premiums. Traders use standard deviation to estimate the future volatility of underlying assets and determine the fair value of options contracts.
Technical Analysis: Standard deviation is used to calculate volatility indicators such as Bollinger Bands. These bands consist of a moving average and upper and lower bands representing standard deviations from the mean. Traders use Bollinger Bands to identify potential buy or sell signals based on volatility levels.

Portfolio Management

Diversification: Standard deviation is used to measure the risk of individual assets and portfolios. By diversifying investments across assets with low or negatively correlated returns, investors can reduce portfolio risk. Standard deviation helps investors assess the effectiveness of diversification strategies and optimise asset allocation to achieve desired risk-return profiles.
Risk-adjusted Performance: Standard deviation is used to calculate risk-adjusted performance measures such as the Sharpe ratio and the Sortino ratio. These ratios quantify the excess return generated per unit of risk (measured by standard deviation). Portfolio managers use these metrics to evaluate investment strategies and compare the risk-adjusted returns of different portfolios.⁽²⁾

Let us now head to the essential components concerning standard deviation that are used in the calculation part.

Essential components of standard deviation

Let us now see the essential components that are required for calculating standard deviation in the trading domain.

These are:

Unit of standard deviation

The unit of standard deviation would be the same as the unit of our data. This makes it easier to interpret compared to the variance. In the next section, we do a detailed comparison between these two measures of dispersion.

Standard deviation vs Variance

$$The\;variance(σ ^2)\;of\;a\;random\;variable\;X\;is\;given\;by\;the\;formula\;below:\\Variance=\frac{\sum_{i=1}^N(x_i-μ)^2}{N}$$

As we can see, by its very construction, the variance is in the square of the original unit. This means that if we are dealing with distances in kilometres, the unit of variance would be in square kilometres.

Now, square kilometres may be easy to visualise as a unit, but what about year2 or IQ2, if we are working with ages or IQs of a group? They are harder to interpret. Hence, it makes sense to use a measure that can be comparable to the data on the same scale/units, like the standard deviation.

Standard deviation is calculated as the square root of variance. It has the same unit as our data and this makes it easy to use and interpret. For example, consider a scenario where we are looking at a dataset of the heights of residents of a neighbourhood. Assume that the heights are normally distributed with a mean of 165 cm and a standard deviation of 5 cm.

We know that for a normal distribution,

68% of the data points fall within one standard deviation,
95% within two standard deviations, and
99.7% fall within three standard deviations from the mean.

Thus, we can conclude that the height of almost 68% of the residents would lie between one standard deviation from the mean, i.e., between 160 cm (mean – sd) and 170 cm (mean + sd).⁽³⁾

Standard deviation for sample data - Bessel's correction

When calculating the standard deviation of a population, we use the formula discussed above. However, we modify it slightly when dealing with a sample instead.

This is because the sample is much smaller compared to the entire population. To account for differences in a randomly selected sample and the entire population, we ‘unbias’ the calculation by using '(n-1)' instead of 'n' in the denominator of equation 1. This is referred to as Bessel's correction.⁽⁴⁾

Thus, we use the following formula to calculate the sample standard deviation (s).

$$s= \sqrt\frac{\sum_{i=1}^n (x_i-\bar x)^2}{n-1}$$

$$\text{Where,}\\
x_i = \text{value of the } i\text{th point in the sample}\\
\bar{x} = \text{sample mean}\\
n = \text{total number of data points in the sample}\\
\text{Do note that as the sample size } n \text{ gets larger, the impact of dividing by } (n - 1) \text{ or } n \text{ will become lesser.}$$

Now, we can discuss the standard deviation in trading as a measure of the volatility.

Standard deviation in trading as a measure of volatility

In trading and finance, it is important to quantify the volatility of an asset. An asset’s volatility, unlike its return or price, is an unobserved variable.

Standard deviation has a special significance in risk management and performance analysis as it is often used as a proxy for the volatility of a security. For example, the well-established blue-chip securities have a lower standard deviation in their returns compared to that of small-cap stocks.

On the other hand, assets like cryptocurrency have a higher standard deviation, as their returns vary widely from their mean. To explore trading approaches for such assets, consider an intermediate crypto trading course.

Moving forward, let us discuss the computation of the annualised volatility of stocks using Python.

Computing annualised volatility of stocks using Python

Let us now compute and compare the annualized volatility for two Indian stocks namely, ITC and Reliance. We begin with fetching the end of day close price data using the yfinance library for a period of the last 5 years:

Output:

Date           Adj Close
2021-10-19     245.949997
2021-10-20     246.600006
2021-10-21     244.699997
2021-10-22     236.600006
2021-10-25     234.350006

Output:

Date          Adj Close
2021-10-19   2731.850098
2021-10-20   2700.399902
2021-10-21   2622.500000
2021-10-22   2627.399902
2021-10-25   2607.300049

Below, we calculate the daily returns using the pct_change() method and the standard deviation of those returns using the std() method to get the daily volatilities of the two stocks:

Output:

Date         Adj Close   Returns
2016-10-25   511.991608       NaN
2016-10-26   508.709717 -0.006410
2016-10-27   506.127686 -0.005076
2016-10-28   509.144104  0.005960
2016-11-01   507.237701 -0.003744
...                 ...       ...
2021-10-19  2731.850098  0.008956
2021-10-20  2700.399902 -0.011512
2021-10-21  2622.500000 -0.028848
2021-10-22  2627.399902  0.001868
2021-10-25  2607.300049 -0.007650

Output:

Date            Adj Close	 Returns
2016-10-26	508.709717	-0.006410
2016-10-27	506.127686	-0.005076
2016-10-28	509.144104	 0.005960
2016-11-01	507.237701	-0.003744
2016-11-02	494.086243	-0.025928

In general, the volatility of assets is quoted in annual terms. So below, we convert the daily volatilities to annual volatilities by multiplying with the square root of 252 (the number of trading days in a year):

Output:

The annualized standard deviation of the ITC stock daily returns is: 27.39%
The annualized standard deviation of the Reliance stock daily returns is: 31.07%

Now we will compute the standard deviation with Bessel's correction. To do this, we provide a ddof parameter to the Numpy std function. Here, ddof means 'Delta Degrees of Freedom'.

By default, Numpy uses ddof=0 for calculating standard deviation- this is the standard deviation of the population. For calculating the standard deviation of a sample, we give ddof=1, so that in the formula, (n−1) is used as the divisor. Below, we do the same:

Output:

The annualized standard deviation of the ITC stock daily returns with Bessel's correction is: 27.39%

The annualized standard deviation of the Reliance stock daily returns with Bessel's correction is: 31.07%

Thus, we can observe that, as the sample size is very large, Bessel's correction does not have much impact on the obtained values of standard deviation. In addition, based on the given data, we can say that the Reliance stock is more volatile compared to the ITC stock.

Note: The purpose of this illustration is to show how standard deviation is used in the context of the financial markets, in a highly simplified manner. There are factors such as rolling statistics (outside the scope of this write-up) that should be explored when using these concepts in strategy implementation.

The z-score

Z-score is a metric that tells us how many standard deviations away a particular data point is from the mean. It can be negative or positive. A positive z-score, like 1, indicates that the data point lies one standard deviation above the mean and a negative z-score, like -2, implies that the data point lies two standard deviations below the mean.

In financial terms, when calculating the z-score on the returns of an asset, a higher value of z-score (either positive or negative) means that the return of the security differs significantly from its mean value. So, the z-score tells us how well the data point conforms to the norm.

Usually, if the absolute value of a z score of a data point is very high (say, more than 3), it indicates that the data point is quite different from the other data points. We use standard deviation to calculate the z-score using the following formula in case we have sample data:

$$z=\frac{x_i-\bar x}{s}$$

$$Where,\\ x_i = a\; single\; data\; point \\ \bar x = the\; sample\; mean\\ s = the\; sample\; standard\; deviation$$

Below we calculate and plot the z-scores for the ITC stock returns using the above formula in Python:

Output:

From the above figure, we observe that around March of 2020, the ITC stock returns had a z-score reaching below -3 several times, indicating that the returns were more than 3 standard deviations below the mean for the given data sample. As we know this was during the sell-off triggered by the COVID pandemic.

In addition, a standardised measure like the z-score is used widely to generate signals for mean-reverting trading strategies such as pairs trading.

Also, one can use the z score function from the scipy.stats module to calculate the z-scores as follows:

Output:

Date		 Adj Close	 Returns     Returns_zscore
2021-10-19	2731.850098	 0.008956	 0.380491
2021-10-20	2700.399902	-0.011512	-0.665617
2021-10-21	2622.500000	-0.028848	-1.551575
2021-10-22	2627.399902	 0.001868	 0.018247
2021-10-25	2607.300049	-0.007650	-0.468222

Value at Risk

Value at Risk (VaR) is an important financial risk management metric that quantifies the maximum loss that can be realized in a given time with a given level of confidence/probability for a given strategy, portfolio or trading desk.

It can be computed in three ways, one of which is the variance-covariance method. In this method, we assume that the returns are normally distributed for the lookback period. Understand how VaR calculation can help enhance your skills in financial risk management.

The idea is simple. We calculate the z-score of the returns of the strategy based on the confidence level we want and then multiply it with the standard deviation to get the VaR. To get the VaR in dollar terms, we can multiply it with the investment in the strategy.

For example, if we want the 95% confidence VaR, we are essentially finding the cut-off point for the worst 5% of the losses from the returns distribution. If we assume that the stock returns are normally distributed, then their z-scores will have a standard normal distribution. So, the cut-off point for the worst 5% returns is -1.64:

Thus the 1-year 95% VaR of a simple strategy of investing in the ITC stock is given by:

VaR = (−1.64) ∗ (s) ∗ investment

where, s is the annualized standard deviation of the ITC stocks.

Output:

z_score_cut_off
-1.6448536269514722
VaR = z_score_cut_off * annual_standard_deviation * initial_investment
VaR
-45045.34407051503

Thus, we can say that the maximum loss that can be realised in 1 year with 95% confidence is INR 45045. Of course, this was calculated under the assumption that ITC stock returns follow a normal distribution.

Confidence intervals

Another common use case for standard deviation is in computing the confidence intervals. In general, when we work with data, we assume that the population from which the data has been generated follows a certain distribution and the population parameters for that distribution are not known. These population parameters have to be estimated using the sample.

For example, the mean daily return of the ITC stock is a population parameter, which we try to estimate using the sample mean. This gives us a point estimate. However, financial market forecasts are probabilistic, and hence, it would make more sense to work with an interval estimate rather than a point estimate.

A confidence interval gives a probable estimated range within which the value of the population parameter may lie. Assuming the data to be normally distributed, we can use the empirical rule to describe the percentage of data that falls within 1, 2, and 3 standard deviations from the mean.

About 68% of the values lie between -1 and +1 standard deviation from the mean.
About 95% of the values lie within two standard deviations from the mean.
About 99.7% of the values lie within three standard deviations from the mean.

Output:

The 95% confidence interval of the ITC stock daily returns is: [-0.03,0.03]

Thus, we can say with 95% confidence that the stock’s daily returns will lie in a range of -3% and +3% (assuming the ITC stock returns are normally distributed).

Let us now discuss the real world case studies of standard deviation in the trading domain to make the concept clearer.

Real-world Case Studies of standard deviation

Here are a couple of real-world case studies demonstrating the application of standard deviation in different markets and its impact on trading decisions:

Case Study: Standard Deviation in Forex Trading

Application: Forex traders often use standard deviation to measure the volatility of currency pairs and assess the risk associated with their trading positions. For example, consider a trader who is analysing the EUR/USD currency pair. By calculating the standard deviation of the pair's daily returns over a specific period, the trader can gauge the level of price volatility.
Impact on Trading Decisions: If the standard deviation of EUR/USD's daily returns is relatively high, it indicates greater price volatility and potentially higher risk. In such cases, the trader may adjust their position size or set wider stop-loss orders to account for the increased volatility. Conversely, if the standard deviation is low, the trader may opt for tighter risk management measures.

Case Study: Standard Deviation in Stock Market Trading

Application: Stock market traders use standard deviation to assess the risk and volatility of individual stocks or entire portfolios. For instance, consider an investor analysing the standard deviation of Apple Inc. (AAPL) stock returns over the past year. By calculating the standard deviation, the investor can quantify the level of price variability in AAPL stock. Start building your understanding of the stock market with our free Stock Market Beginner Course.
Impact on Trading Decisions: If the standard deviation of AAPL's returns is high, it suggests that the stock experiences significant price fluctuations, indicating higher risk. In response, traders may adopt risk mitigation strategies such as diversification or hedging. Conversely, a low standard deviation implies lower volatility and may lead traders to adjust their trading strategies accordingly, potentially by pursuing more aggressive trading opportunities.

Now that you are familiar with most of the standard deviation related concepts, in the next section you will see how correlation of standard deviation with other indicators can help.

Correlation of standard deviation with other indicators

The correlation of standard deviation with other indicators can provide valuable insights into market dynamics and help traders make informed decisions. Here are some common indicators with which standard deviation is often correlated:

Mean (Average): The mean and standard deviation are closely related. A higher standard deviation indicates greater variability of data points around the mean, while a lower standard deviation suggests less variability.
Variance: Variance is the square of the standard deviation. As such, they are directly related. Higher variance implies a higher dispersion of data points from the mean, leading to a higher standard deviation.
Volatility Measures: Standard deviation is a key component of volatility measures such as historical volatility and implied volatility. These measures assess the magnitude of price fluctuations in financial instruments. High standard deviation values indicate high volatility, while low values suggest low volatility.
Bollinger Bands: Bollinger Bands consist of a moving average line and upper and lower bands, which are typically set at a certain number of standard deviations above and below the moving average. Changes in standard deviation affect the width of the bands, with wider bands indicating higher volatility and narrower bands suggesting lower volatility.
Sharpe Ratio: The Sharpe ratio measures the risk-adjusted return of an investment. It is calculated by dividing the excess return (return above risk-free rate) by the standard deviation of returns. A higher standard deviation leads to a lower Sharpe ratio, indicating a higher risk for a given level of return.
Sortino Ratio: Similar to the Sharpe ratio, the Sortino ratio measures risk-adjusted return but focuses only on downside risk, considering only the standard deviation of negative returns. A higher standard deviation of negative returns leads to a lower Sortino ratio, indicating higher downside risk.

Let us now see some limitations of standard deviation in trading.

Limitations of standard deviation in trading

While standard deviation is a widely used and valuable tool in trading, it does have several limitations that traders should be aware of:

Assumption of Normal Distribution: Standard deviation assumes that the data follows a normal distribution. However, financial markets often exhibit non-normal distributions, such as fat tails or skewness. In such cases, the standard deviation may not accurately capture the true risk and volatility of the market.
Sensitivity to Outliers: Standard deviation is highly sensitive to outliers, or extreme values, in the data. A single outlier can significantly affect the standard deviation, leading to potentially misleading results. Traders should be cautious when interpreting standard deviation in the presence of outliers.
Equal Weighting of Data Points: Standard deviation treats all data points equally, regardless of their significance or relevance. In financial markets, recent data points may be more informative than older ones, especially in fast-moving markets. Standard deviation may not adequately reflect changes in market conditions or sentiment.
Limited Interpretation of Volatility: Standard deviation measures total volatility, including both upside and downside movements. However, traders may be more interested in downside volatility, as it represents the risk of losses. Other measures such as the downside deviation or semi-deviation may provide more relevant insights into downside risk.
Lack of Context: Standard deviation provides a numerical measure of volatility but does not provide any context or explanation for the observed variability. Traders should complement standard deviation with qualitative analysis and market knowledge to fully understand the drivers of volatility and risk.
Inability to Capture Non-linear Relationships: Standard deviation assumes a linear relationship between data points, which may not always hold true in financial markets. Complex interactions and non-linear relationships between variables may not be fully captured by standard deviation alone.

Overall, while standard deviation is a useful tool for measuring volatility and risk in trading, traders should be mindful of its limitations and use it in conjunction with other tools and techniques for a comprehensive analysis of market conditions.

Let us now find out some common misconceptions about standard deviation in trading next.

Common misconceptions about standard deviation in trading

Several misconceptions about standard deviation exist in trading, which can lead to misinterpretation of market data and incorrect decision-making. Here are some common misconceptions:

Standard Deviation Predicts Future Returns: One common misconception is that high standard deviation implies high returns and vice versa. While volatility can indicate potential opportunities for maximising returns, it does not guarantee future returns. High volatility can also lead to significant losses if not managed properly.
Standard Deviation is the Only Measure of Risk: While standard deviation is widely used to measure volatility and risk, it is not the only measure of risk. Other factors such as correlation, liquidity, and fundamental analysis should also be considered when assessing risk in trading.
Standard Deviation Reflects Market Direction: Some traders mistakenly believe that changes in standard deviation indicate the direction of the market. However, standard deviation measures volatility, not market direction. It is possible for standard deviation to increase or decrease even if the market remains relatively unchanged.
Standard Deviation is Static: Another misconception is that standard deviation remains constant over time. In reality, volatility can change dynamically in response to various factors such as news events, market sentiment, and economic conditions. Traders should regularly monitor and adjust their risk management strategies accordingly.
Standard Deviation Measures Risk in Isolation: While standard deviation quantifies the variability of returns, it does not account for other factors that may influence risk, such as leverage, position size, and trading frequency. Traders should consider these factors holistically when assessing risk in their portfolios.
Standard Deviation Provides Complete Information: Traders may mistakenly believe that standard deviation provides a comprehensive understanding of market risk. While standard deviation is a useful tool, it has limitations and should be used in conjunction with other risk measures and analysis techniques for a more accurate assessment of market conditions.

By understanding and avoiding these misconceptions, traders can make more informed decisions and better manage risk in their trading activities.

Now let us discuss the risk management tips for using standard deviation next. These tips can help traders successfully use this helpful concept, which is, standard deviation.

Risk management tips for using standard deviation

Using standard deviation as part of a risk management trading strategy can help traders better understand and mitigate risks in their trading activities.

Here are some tips for incorporating standard deviation into your risk management approach:

Set Risk Tolerance Levels: Determine your risk tolerance level based on factors such as your investment objectives, time horizon, and personal risk preferences. Use standard deviation to quantify the potential volatility and downside risk of your trades and investments.
Use Stop-loss Orders: Set stop-loss orders based on the standard deviation of asset prices or portfolio returns. Place stop-loss levels at a certain number of standard deviations away from the mean to limit losses and protect capital in case of adverse price movements.
Position Sizing: Adjust position sizes based on the standard deviation of asset returns. Increase position sizes for assets with lower volatility and decrease position sizes for assets with higher volatility to maintain consistent risk exposure across your portfolio.
Diversify Your Portfolio: Diversification can help reduce overall portfolio risk by spreading investments across different asset classes, sectors, and geographical regions. Use standard deviation to assess the correlation between assets and ensure that your portfolio is adequately diversified.
Monitor and Rebalance Regularly: Monitor the standard deviation of asset prices and portfolio returns regularly to identify changes in market conditions and adjust your risk management strategy accordingly. Rebalance your portfolio periodically to maintain desired risk levels and adapt to evolving market trends.
Consider Risk-adjusted Performance: Evaluate the risk-adjusted performance of your trades and investments using metrics such as the Sharpe ratio or Sortino ratio, which take into account both returns and volatility. Aim to achieve positive risk-adjusted returns by optimising your risk-return trade-off.
Stay Informed and Adapt: Stay informed about market news, economic indicators, and geopolitical events that may impact asset prices and market volatility. Be prepared to adjust your risk management strategy in response to changing market conditions and unexpected developments.

By incorporating these risk management tips into your trading approach and leveraging standard deviation as a tool for measuring and managing risk, you can improve your chances of achieving long-term success and preserving capital in the financial markets.

Conclusion

Standard deviation is pivotal for traders, offering insights into volatility, risk, and informed decision-making. It quantifies uncertainty and variability of returns, aiding in options pricing, portfolio management, and volatility analysis.

Despite its usefulness, traders must acknowledge its limitations and supplement it with qualitative judgement. By integrating standard deviation into risk management practices, traders can navigate market complexities more effectively, optimise risk-return profiles, and strive for success in financial markets.

If you wish to learn more about standard deviation, you can enrol into the course on Volatility Trading Strategies for Beginners. With this course, you will learn how volatility can be your friend if you have the right tools and knowledge. In this course, you will learn four different ways to measure volatility, namely ATR, standard deviation, VIX and Beta. Hence, you will learn how to set dynamic stop loss and take profit levels, hedge your portfolio using VIX and select stocks in your portfolio.

File in the download

Standard deviation in trading - Python notebook

Visit blog to download

Author: Chainika Thakar (Originally written by Ashutosh Dave and Udisha Alok)

Note: The original post has been revamped on 7th March 2024 for recentness, and accuracy.

Disclaimer: All investments and trading in the stock market involve risk. Any decision to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

Autocorrelation and Autocovariance: Calculation, Examples, and More

Jose Carlos Gonzales Tanaka — Mon, 17 Oct 2022 09:03:51 GMT

By José Carlos Gonzáles Tanaka

Autocorrelation and autocovariance are one of the most critical metrics in financial time series econometrics. Both functions are based on covariance and correlation metrics. For a broader foundation in stock market mathematics, this guide covers the core mathematical concepts used across trading strategy development and evaluation. You will learn more about them. This easy-to-learn essential guide will help you understand better about ARMA models.

What is autocovariance?
What is autocorrelation?
What are the autocovariance and autocorrelation at lag zero?
Calculation of autocovariance with an example
Calculation of autocorrelation with an example
Computation of autocovariance and autocorrelation in Python
Plot the autocorrelation function in Python
Computation of autocovariance and autocorrelation in R
Plot the autocorrelation functions in R
What is partial autocorrelation?
Computation of partial autocorrelation in Python and R

You might have encountered yourself trying to learn the Autoregressive Moving Average (ARMA) model. You then started to see a lot of use of covariances and correlations, but strangely enough, you see those two words with the prefix "auto" and you get frightened!

Don’t worry, this article will help you understand their details. Just keep the focus on the article and everything will be ok!

What is autocovariance?

First, you need to understand what covariance and correlation are. Remember that covariance is applied to 2 assets. The autocovariance is the same as the covariance.

The only difference is that the autocovariance is applied to the same asset, i.e., you compute the covariance of the asset price return X with the same asset price return X, but from a previous period.

How’s that possible? Simple, check it out:

$$ \text{Cov(X,Y)} = \frac{\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)\left(Y_{i}-\overline{Y}\right)}{N-1}$$

Where X and Y can be the returns of asset X and Y, respectively.

Now, the autocovariance function can be defined as:

$$ \gamma_{s} = \frac{\sum_{t=1}^{T}\left(r_{t}-\overline{r}\right)\left(r_{t-s}-\overline{r}\right)}{N-1}$$

Where:

$ \gamma_{s} \text{: Autocovariance at lag “}s\text{”}.$

$ r_{t} \text{: Asset price returns at time “}t\text{”}.$

$ r_{t-s} \text{: Asset price returns at time “}t-s\text{”}.$

What is autocorrelation?

In simple terms, autocorrelation is the same as the correlation function! To be specific, autocorrelation, as the autocovariance, is applied to the same asset.

Check the difference between correlation and autocorrelation (also called serial correlation) below:

$$ \text{Corr(X,Y)} = \frac{\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)\left(Y_{i}-\overline{Y}\right)}
{
\sqrt{
\left(\sum_{i=1}^{n} \left( X_{i}-\overline{X} \right)^2 \right)
\left(\sum_{i=1}^{n} \left( Y_{i}-\overline{Y} \right)^2 \right)
} }

= \frac{\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)\left(Y_{i}-\overline{Y}\right)}
{
\sqrt{
\sum_{i=1}^{n} \left( X_{i}-\overline{X} \right)^2
}
\sqrt{
\sum_{i=1}^{n} \left( Y_{i}-\overline{Y} \right)^2
}
}
$$
$$
\text{Corr(X,Y)} = \frac{Cov(X,Y)}{SD_{X}SD_{Y}}
$$

Where:

$ \text{Cov(}X,Y\text{: Covariance between }X \text{ and }Y.$

$ SD_{X} \text{: Standard Deviation of variable }\text{X}.$

$ SD_{Y} \text{: Standard Deviation of variable }\text{Y}.$

Now let’s check the autocorrelation:

$$ \rho_{s} = \frac{\text{Cov}\left(r_t,r_{t-s}\right)}{\text{Var}\left(r_t\right)}$$

Where:

$ \rho_{s} \text{: Autocorrelation at lag “}s\text{”}.$

$ r_{t} \text{: Asset price returns at time “}t\text{”}.$

$ r_{t-s} \text{: Asset price returns at time “}t-s\text{”}.$

$ \text{Var}\left(r_t\right) \text{: Variance of returns}.$

You might ask us:

Why variance and not the multiplication of the standard deviation of the returns at the different lags?

Well, you must remember that an ARMA model is applied to stationary time series. This topic belongs to time series analysis. Consequently, it is assumed that the price returns, if stationary, have the same variance for any lag, i.e.:

$$ \text{Var}\left(r_t\right) = \text{Var}\left(r_{t-1}\right) = \text{Var}\left(r_{t-2}\right) = ... = \text{Var}\left(r_{0}\right) $$

What are the autocovariance and autocorrelation at lag zero?

Interesting question and simple to answer! Let’s see first for the former:

$$ \gamma_{0} = \frac{\sum_{t=1}^{T}\left(r_{t}-\overline{r}\right)\left(r_{t-0}-\overline{r}\right)}{N-1} = \gamma_{0} = \frac{\sum_{t=1}^{T}\left(r_{t}-\overline{r}\right)\left(r_{t}-\overline{r}\right)}{N-1} = \gamma_{0} = \frac{\sum_{t=1}^{T}\left(r_{t}-\overline{r}\right)^2}{N-1} $$

Can you guess what the last part resembles?
It is the variance of the price returns!

Consequently, the autocovariance of the returns at lag 0 is the variance of the returns.

Can you guess now what the autocorrelation of the returns would be at lag 0?Let’s use the formulas to find out:

$$ \rho_{0} = \frac{\text{Cov}\left(r_t,r_{t-0}\right)}{\text{Var}\left(r_t\right)} = \frac{\text{Cov}\left(r_t,r_{t-0}\right)}{\text{Var}\left(r_t\right)} = \frac{\text{Cov}\left(r_t,r_{t}\right)}{\text{Var}\left(r_t\right)}$$

Since we know, from the above algebraic calculation, that the covariance of the same variable is its variance, we have the following:

$$ \rho_{0} = \frac{\text{Cov}\left(r_t,r_{t}\right)}{\text{Var}\left(r_t\right)} = \frac{ \text{Var}\left(r_t\right)} {\text{Var}\left(r_t\right)}$$ $$ \rho_{0} = 1 $$

Consequently, the autocorrelation function for any asset price return at lag 0 is always 1.

Calculation of the autocovariance with an example

You might have been thinking up to now:
Why are the autocovariance and autocorrelation defined with an “s” subscript?
Great question!

Let us explain: Actually, the autocovariance formula defined above is a function which allows the calculation of the autocovariance for different lags. The same for the autocorrelation function.

Confused? Don’t worry! We got you covered!

Let’s see an example to make the concept clear to your thoughts! We are going to make an example of how to calculate the autocovariance of the Microsoft price returns at lag 1. We are going to use the autocovariance function shown above.

Imagine we have the following returns for Microsoft prices:

Day 1	Day 2	Day 3	Day 4	Day 5	Day 6	Day 7	Day 8	Day 9	Day 10
5%	1%	-2%	3%	-4%	6%	2%	-1%	-3%	4%

Let’s suppose we want to compute the autocovariance at lag 1. You will need the returns up to day 10, and the 1-period lagged returns up to day 9.

Thus, you have the following data structure for returns on days 10 and 9:

Variable	Day 1	Day 2	Day 3	Day 4	Day 5	Day 6	Day 7	Day 8	Day 9	Day 10
$ r_{t} $	5%	1%	-2%	3%	-4%	6%	2%	-1%	-3%	4%
$ r_{t-1} $		5%	1%	-2%	3%	-4%	6%	2%	-1%	-3%

Do you get to see the difference between the 2 variables?
The second one is the first lag of Xt.

Now, since the 2 variables have different dimensions (the first one has 10 observations, while the second one has 9), we are going to use data from day 2 onwards.

Consequently, our data is as follows:

Variable	Day 2	Day 3	Day 4	Day 5	Day 6	Day 7	Day 8	Day 9	Day 10
$ r_{t} $	1%	-2%	3%	-4%	6%	2%	-1%	-3%	4%
$ r_{t-1} $	5%	1%	-2%	3%	-4%	6%	2%	-1%	-3%

The covariance between these 2 variables will be the autocovariance of the returns at lag 1.

You can do this, right?
Check an example we give in our previous article.

Before you get ready to use a pencil and a piece of paper, let us tell you something important.

Remember the autocovariance formula:

$$ \gamma_{s} = \frac{\sum_{t=1}^{T}\left(r_{t}-\overline{r}\right)\left(r_{t-s}-\overline{r}\right)}{N-1}$$

If you paid attention to details, you could see that the average return is the same for both returns, in our case, for returns up to day 10 and up to day 9. As we explained before, autocovariance and autocorrelation functions are applied only to stationary time series.

Consequently, not only the variance but also the mean is a unique value for the whole span. That’s why the mean is the same for any lag of the price returns.

The mean of the Microsoft price returns is 1.1%. Let’s follow the procedure to compute the autocovariance:

Variable	$ r_{t} $	$ r_{t-1} $	$ \left(r_{t}-\overline{r}\right) $	$ \left(r_{t-1}-\overline{r}\right) $	$ \left(r_{t}-\overline{r}\right) $$ \left(r_{t-1}-\overline{r}\right) $
Day 2	1%	5%	-0.100%	3.900%	-0.004%
Day 3	-2%	1%	-3.100%	-0.100%	0.003%
Day 4	3%	-2%	1.900%	-3.100%	-0.059%
Day 5	-4%	3%	-5.100%	1.900%	-0.097%
Day 6	6%	-4%	4.900%	-5.100%	-0.250%
Day 7	2%	6%	0.900%	4.900%	0.044%
Day 8	-1%	2%	-2.100%	0.900%	-0.019%
Day 9	-3%	-1%	-4.100%	-2.100%	0.086%
Day 10	4%	-3%	2.900%	-4.100%	-0.119%

The autocovariance is just the sum of the last column values divided by (N-1, equal to 8), which results in -0.046%.

Calculation of the autocorrelation with an example

Let’s follow the same exercise and compute the autocorrelation of the Microsoft price returns up to day 10 at lag 1. The autocorrelation is the autocovariance divided by the variance. We give you the exact hint you need: The variance of Microsoft price returns up to day 10 is 0.121%.

Let’s follow the algebraic formulas and use the numbers to compute the autocorrelation:

$$ \rho_{s} = \frac{\text{Cov}\left(r_t,r_{t-s}\right)}{\text{Var}\left(r_t\right)}=\frac{-0.046%}{0.121%}$$ $$ \rho_{s} = -0.38 $$

Computation of autocovariance and autocorrelation in Python

Before we do this section, let us tell you something. We have computed the autocovariance and autocorrelation of the Microsoft price returns at lag 1. We couldn’t have computed at lag 2, lag 3, etc. We leave you that as an exercise!

In Python, or any other programming language, when you are required to compute these two important metrics, you will need to compute them at many lags, not only 1, as we did previously above.

So, keep in mind that we are going to use Python to compute the autocovariance and autocorrelation functions, i.e., the autocovariance and autocorrelation at different lags.

First, let’s import the necessary libraries to use:

We now download the Microsoft close prices from January 2021 to August 2022:

We compute the price returns:

As we told you above, in any programming language which has available a library to compute the autocovariance and autocorrelation, you will see you’re going to have functions to get their values for many lags with just that unique functions.

So, let’s compute the autocovariance function up to lag 10 for the Microsoft price returns.

Check the values below:

$ \gamma_{0} $	0.00028672
$ \gamma_{1} $	-0.00001403
$ \gamma_{2} $	-0.00000250
$ \gamma_{3} $	-0.00001941
$ \gamma_{4} $	0.00002185
$ \gamma_{5} $	0.00000488
$ \gamma_{6} $	-0.00001460
$ \gamma_{7} $	0.00001332
$ \gamma_{8} $	-0.00002423
$ \gamma_{9} $	0.00002422

Where

$ \gamma_{s} \text{: Autocovariance at lag “}s\text{”}.$

Let’s compute the first 10 autocorrelation values:

See the values:

$ \rho_{0} $	1
$ \rho_{1} $	-0.04893
$ \rho_{2} $	-0.00871
$ \rho_{3} $	-0.0677
$ \rho_{4} $	0.076198
$ \rho_{5} $	0.017024
$ \rho_{6} $	-0.05092
$ \rho_{7} $	0.046457
$ \rho_{8} $	-0.08451
$ \rho_{9} $	0.084456

As you learned previously, you see that the autocorrelation at lag 0 is 1.

Plot the autocorrelation function in Python

What is usually done in econometrics is to plot the autocorrelation function. We, of course, are going to do that, see:

Autocorrelation in Python

You see something strange in the plot, right? What is that blue-coloured highlighted zone? Well, it’s the confidence interval. You can compute that zone with the following formula:

$$ \left[-\frac{1.96}{\sqrt{T}},\frac{1.96}{\sqrt{T}}\right] $$

Where T is the total number of observations.

Computation of autocovariance and autocorrelation in R

Let’s go through this excellent programming language. We install the necessary packages and import them:

Next, we use the getSymbols method to download the Tesla stock data with the same span as for Microsoft.

We compute the returns:

And we obtain the autocovariance function up to lag 10:

Let’s now compute the autocorrelation function up to lag 10:

As you can deduct, the default value for “type” is the autocorrelation function.

Plot the Autocorrelation Function in R

Finally, let’s plot the autocorrelation function:

We specify first the graph parameters: The main size increase, the axis values increase and the axis labels increase all of them with respect to the plot default value. Besides, we define the margins of the plot.

Next, we plot the autocorrelation function. Finally, we apply the graph parameters to the autocorrelation function plot.

Autocorrelation in R

What Is partial autocorrelation?

Since our mission is to prepare you to model an ARMA process, we need to explain to you this function.

What is it? Let’s explain the function of intuition.

For example, the autocorrelation function at lag 5 might have correlations with its previous lags’ autocorrelations. The partial autocorrelation function gives the autocorrelation at lag 5, but without the relationship of the shorter lags’ autocorrelations.

Putting it in another way, the autocorrelation of Microsoft price returns at lag 5 is about the autocorrelation between returns at time t and at the time (t-5). But this autocorrelation is also influenced by the correlations from lag 1 to lag 4.

The partial autocorrelation is a function that allows having the autocorrelation of returns t and (t-5) removing the indirect relationship that returns from lag 1 to 4 have on it.

Computation of partial autocorrelation in Python and R

Let’s compute them in python and R.

We follow our previous order. We begin with Python. We plot the partial autocorrelation function for Microsoft:

Partial Autocorrelation in Python

Let’s compute the partial autocorrelation function for the Tesla stock price returns in R:

We use the same code as for the autocorrelation function, but this time we specify “type=partial” to get the desired output.

Partial Autocorrelation in R

Conclusion

ARMA models based their construction of inspection of the autocovariance and autocorrelation functions. Here, we have helped you understand the most important things about them and their applications in Python and R.

This will help you whenever you want to trade algorithmically since you have here useful code to use. Whenever you initiate yourself on the ARMA models, you will definitely remember this article and be well prepared to understand those types of models. Bookmark this article in your browser, you will not regret it!

As we told you before, this two formulas belong to the time series analysis topic. We guess you're already excited about the topic, aren't you? Do you want to continue learning about it? Check this course on time series analysis to start trading algorithmically!

Are you ready to create your ARMA model? We bet you are!
Ready? Set?
Go algo!

File in the download

Autocorrelation and autocovariance in Python

Visit blog to download

Disclaimer: All investments and trading in the stock market involve risk. Any decision to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

Covariance and Correlation: Intro, Formula, Calculation, and More

Jose Carlos Gonzales Tanaka — Wed, 04 May 2022 03:30:00 GMT

By José Carlos Gonzáles Tanaka

Let us know a little bit more about you. Do you think you would need to see a “Covariance vs Correlation” fight in a boxing ring so you could choose properly between them?

Are you a beginner in the financial markets and want to know basic concepts in order to start trading?
Do you know how crude oil moves together with the dollar, the euro or the Japanese Yen?
Do you know how gold prices and interest rates change when the world is in a recession?

Whatever the position you have in the industry or doubts you might have, this easy-to-read article will take you through all about these two concepts: From the definitions and formulas to trading applications, and you will also be prepared to properly analyze the relationships between important financial variables.

So, take a comfortable seat and enjoy this learning experience with a cup of coffee!

What is Covariance?
What is Correlation?
What are the differences between Covariance and Correlation?
How to calculate Covariance and Correlation?
Importance of Covariance and Correlation
Trading applications of Covariance and Correlation
Covariance and Correlation Simple Exercise Examples
Covariance and Correlation Computation Examples in Python

What is Covariance?

Covariance is a statistical measurement with the following definition: It measures the direction of the linear association between two variables.

With this explanation, you then start to imagine that we are your teacher and you raise your hand to say: “Teacher, I don’t understand.”

Don’t worry about it! Let’s make it simpler: It measures the comovement direction between two variables. Put it in another way, you have data of two variables and Covariance measures whether these two variables move in the same or opposite direction.

We say positive Covariance, if both variables move in the same direction: If a stock price goes up, then the other stock price goes up, too. We say negative Covariance, when a stock price goes up and the other stock price goes in the opposite direction, meaning, goes down.

About the words “linear association”, you’ll get the point later. So, yes, Covariance can be either positive or negative.

Imagine we have data from four stock prices:

Amazon,
Apple,
Walmart, and
Microsoft.

At the top, you have a graph in which you could see a positive Covariance: While Amazon stock price moves higher, Apple moves in the same positive direction. At the bottom, you can see that, when Microsoft moves down, Walmart does the opposite, moves higher, which means there is a negative Covariance between them.

Question: If variable A and B have a positive Covariance, then should we interpret that whenever variable A goes higher in value, B “always” goes in a higher value too?”

Not always. This direct or inverse comovement could also be strong or weak. If the positive or negative Covariance value is big, in absolute terms, then we say that the comovement is strong, if it is low, we say the comovement is weak.

What is a strong or weak comovement?

Well, what we can say is that a big positive Covariance, or a strong positive comovement, means that, most of the time, the two variables will move in the same positive direction.

The same for a negative Covariance: If we have a big negative Covariance value for two variables, in absolute terms, this means that we expect to see, most of the time, the two variables move in the opposite direction.

So, now with a clearer definition, you have a deduction and two more questions to test your knowledge:

Does Covariance measure the strong or weak relationship between two variables?
Can we have a perfect direct or inverse comovement between two variables?

For these questions, let us take you to our other important concept called Correlation.

What is Correlation?

You might have made a pause and taken a sip of your cup of coffee and might be saying to yourself: “Oh, here comes another definition, it’s a lot of information!”

Hey! Don’t worry at all! Correlation has, actually, a similar definition. Let’s put it formally first. Correlation measures the degree of the linear association between two variables.

We know, you are surprised, and ask: “Do we have the words ‘linear association”, again?”

Hey! Don’t hurry, we’re talking about this later. For now, we could say: While Covariance measures the direction of the comovement, Correlation not only measures that, but also the degree, or “strength”, of this relationship direction.

We’ll see later how this ‘strength’ looks like. For now, let’s express the differences between these two concepts below:

What are the differences between Covariance and Correlation?

Covariance	Correlation
Measures one thing	Measures two things
Infinite range	Finite range
It has unit of measurement	Free of unit of measurement
Scalable value	Non-scalable value

Difference #1: Covariance measures one thing and Correlation measures two things.

Covariance, as explained above, measures only the direction of the comovement of two variables. Correlation measures not only the direction of the relationship, but also the strength of this relationship.

So how well do Amazon and Apple stock prices move in the same or opposite direction?

You now have two technical tools to answer this question:

To find out whether two variables move in the same or opposite direction, you could use either the Covariance or the Correlation functions.
About “how well” this direct or inverse relationship is, you need to answer this with the latter.

Difference #2: Covariance and Correlation value range are not the same.

Covariance could have an infinite positive value or an infinite negative value, the range of values takes the whole real number spectrum. However, Correlation value range is only between -1 and +1.

So your question regarding a perfect comovement could be answered here: Since Covariance could have any real value, we could not appreciate with this statistical measure a perfect degree of linear association between two variables.

The best way to approach this question is with Correlation. If the Correlation of variables A and B has the value of +1, you can say without any doubt that both variables “always” move in the same direction. The same for a Correlation with a -1 value: you can say without any doubt that both variables “always” move in the opposite direction.

Here you can understand Difference #1. When the Correlation is between -1 and 1, the comovement is not perfectly negative or positive, respectively. Last but not least about this difference, we must say that it’s almost impossible to see real world variables having a Correlation exactly equal to +1 or -1.

Difference #3: Covariance and Correlation have different measure units.

You will get to understand it mathematically later. By now, we can say that Covariance has as a unit of measurement the multiplication of the two variables’ units of measurement.

For example, if you have two stock prices which are Amazon and Apple, which both have as unit of measurement the dollar, you will have for the Covariance a unit of measurement of: Dollar times Dollar, which is Dollar squared.

However, Correlation doesn’t have any unit of measurement at all. Don’t hurry to worry! Sip your coffee and wait a little bit more to fully understand this.

Difference #4: Covariance could change in value if the variables are scaled differently, Correlation is not affected by this.

Let us give you an example to understand this difference. You have two stock prices, Amazon and Apple, and then you calculate their Covariance and Correlation, which result in “a” and “b” respectively.

Next, you decide to multiply the two stock prices by 1000, and you calculate, again, their Covariance and Correlation, which result in values “c” and “d”. Something interesting that you will find is that:

Covariance “a” is different from Covariance “c” and,
Correlation “b” is equal to Correlation “d”.

When you scale one or both variables, Covariance will change in value accordingly. However, Correlation is not affected by the scale change.

How to Calculate Covariance and Correlation?

Up until now, we explained to you everything about their concepts and their properties. But from now on, you will have a better grasp with the mathematical formulas.

First, you must differentiate between what is a population and sample data. Once you are familiar with these two concepts, let’s begin with the presentation of the formulas:

We start with Covariance.

Population Covariance:

$$ \sigma_{X,Y}^{2} = \frac{\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)\Bigl(Y_{i}-\overline{Y}\Bigr)}{N} $$Where:
$ \sigma_{X,Y}^{2} $: Population Covariance between variables $X$ and $Y$.
$X_{i}$: The $i^{th}$ observation of the $X$ variable.
$\overline{X}$: The mean value of variable $X$.
$Y_{i}$: The $i^{th}$ observation of the $Y$ variable.
$\overline{Y}$: The mean value of variable $Y$.
$N$: Total number of observations for variable $X$ o $Y$.

Sample Covariance:

$$ S_{X,Y}^{2} = \frac{\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)\Bigl(Y_{i}-\overline{Y}\Bigr)}{N-1} $$Where:
$ S_{X,Y}^{2} $: Sample Covariance between variables $X$ and $Y$. The other variables are the same as for the Population Covariance

So, in simple words, how could we explain the Covariance formula? Let’s say it in this way: Covariance is a measure in which we multiply each deviation from the mean of X and Y, given by (Xi-X) and (Yi-Y) respectively, and then we sum all these products and divide them by N.

It resembles the formula of the Variance of a variable, right? Well, both formulas are almost similar, the only difference is that in the Variance formula you see the deviations from the mean squared, for the Covariance, you don’t see a squared deviation.

But the Variance formula can also be written as:

$$ \sigma_{X,Y}^{2} = \frac{\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)^2}{N}=\frac{\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)\Bigl(X_{i}-\overline{X}\Bigr)}{N} $$

So, you get it? Variance and Covariance formulas are actually identical; the difference resides in the second parenthesis, which is replaced by the deviations of a second variable. Variance refers to a single variable, Covariance refers to two variables, that is why it’s called “Co” Variance.

The sample Covariance is the same as the population variance, but instead of N, we divide the dividend by N-1. The explanation of this difference can be found in the “Standard deviation for sample data - Bessel's correction” section of this blog.

As you could guess, X is in X units, and Y is in Y units. So, the Covariance formula, since it is a multiplication of these two variables, will be in XY measure units.

Let’s go now for the Correlation formula:

$ \rho_{X,Y} = \frac{\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)\Bigl(Y_{i}-\overline{Y}\Bigr)}{\sqrt{\biggl(\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)^2\biggr)\biggl(\sum_{i=1}^{N}\Bigl(Y_{i}-\overline{Y}\Bigr)^2\biggr)}} = \frac{\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)\Bigl(Y_{i}-\overline{Y}\Bigr)}{\sqrt{\biggl(\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)^2\biggr)}\sqrt{\biggl(\sum_{i=1}^{N}\Bigl(Y_{i}-\overline{Y}\Bigr)^2\biggr)}}$$$ \rho_{X,Y} = \frac{Cov(X,Y)}{\sigma_X\:\sigma_Y} $$Where:
$ \rho_{X,Y} $: Correlation between variables $X$ and $Y$.
$Cov(X,Y)$: Covariance between $X$ and $Y$.
$\sigma_X$: Standard Deviation of variable $X$.
$\sigma_Y$: Standard Deviation of variable $Y$.

So, how could we explain this formula in simple words? Well, you already know the dividend, and about the divisor, if you could remember the Variance formula, you can realize that the divisor is the multiplication of the standard deviation of both variables X and Y.

So we can say that Correlation is the Covariance divided by the Standard Deviations of the two variables. You might be thinking that the Covariance and the Standard Deviation have as a divisor “N” or “N-1”, so you could think that we wrote the first formula wrongly.

Don’t worry, let us make you see what happened with the divisors:

$ \rho_{X,Y} = \frac{\frac{\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)\Bigl(Y_{i}-\overline{Y}\Bigr)}{N}}{\sqrt{\Biggl(\frac{\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)^2}{N}\Biggr)}\sqrt{\Biggl(\frac{\sum_{i=1}^{N}\Bigl(Y_{i}-\overline{Y}\Bigr)^2}{N}\Biggr)}} = \frac{\frac{\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)\Bigl(Y_{i}-\overline{Y}\Bigr)}{N}}{\sqrt{\biggl(\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)^2\biggr)\biggl(\sum_{i=1}^{N}\Bigl(Y_{i}-\overline{Y}\Bigr)^2\biggr)\Biggl(\frac{1}{N^2}\Biggr)}} $

$ \rho_{X,Y} = \frac{\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)\Bigl(Y_{i}-\overline{Y}\Bigr)\Biggl(\frac{1}{N}\Biggr)}{\sqrt{\biggl(\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)^2\biggr)\biggl(\sum_{i=1}^{N}\Bigl(Y_{i}-\overline{Y}\Bigr)^2\biggr)}\Biggl(\frac{1}{N}\Biggr)} = \frac{\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)\Bigl(Y_{i}-\overline{Y}\Bigr)}{\sqrt{\biggl(\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)^2\biggr)\biggl(\sum_{i=1}^{N}\Bigl(Y_{i}-\overline{Y}\Bigr)^2\biggr)}}\frac{\Biggl(\frac{1}{N}\Biggr)}{\Biggl(\frac{1}{N}\Biggr)} $

$ \rho_{X,Y} = \frac{\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)\Bigl(Y_{i}-\overline{Y}\Bigr)}{\sqrt{\Biggl(\sum_{i=1}^{N}\Bigl(X_{i}-\overline{X}\Bigr)^2\Biggr)}\sqrt{\Biggl(\sum_{i=1}^{N}\Bigl(Y_{i}-\overline{Y}\Bigr)^2\Biggr)}} $

$$ \rho_{X,Y} = \frac{Cov(X,Y)}{\sigma_X\:\sigma_Y} $$

Now you get it? The divisors from the Covariance and the Standard Deviations canceled each other. Besides, the dividend, as explained earlier, has as a measure unit the multiplication of both the measure units of X and Y. The divisor has as a measure unit, also, the multiplication of the measure units of both Standard Deviations, which are from X and Y.

Consequently, since the dividend and the divisor have the same XY measure unit, they cancel each other and you get the Correlation value free of a measure unit. Now you can understand the Difference #3 explained before, right?

Besides, you can now understand Difference #4. Why? Because the Covariance ends with a XY measure unit, so whenever you change the scale of any of the two variables, then you will scale, too, the divisor’s measure unit.

Meanwhile, since the Correlation formula does not have a measure unit, the change of scale you could make to, again, any of the two variables, won’t affect the Correlation range which is between -1 and 1.

We got through all these explanations and you might ask now about:

Importance of Covariance and Correlation

Well, you now ask this important question. The answer resides in the essence of Economics and Finance. Finance, as an essential part of Economics, is about markets where economic agents buy and sell assets from one or more markets.

Within markets or between markets, assets can move in the same or opposite direction with any other asset due to the same agents that make transactions everyday.

Asset prices change according to agent’s behaviors, market conditions, etc. If there’s a person who buys 100 shares of Apple or if there’s a crowd who sells their bank shares due to a bank run, every single person’s transaction has consequences not only on the asset itself, but also on other stocks or in other markets.

Markets are interrelated, so Correlation and Covariance are an essential part of the study of financial markets. If markets or financial assets weren’t interconnected, we wouldn't need to take care of comovements.

You now know not only what these two concepts mean but also why they are important. We get to our main purpose.

So, now, you want to trade, right?
You are ready to check the correlation of the Apple and Amazon stock prices and now you are prepared to press the BUY and SELL buttons in your broker’s platform to start investing. Aren’t you?

No! Wait a minute! Let us explain to you, before you make a decision, the real world applications of Covariance and Correlation in trading and investment.

Trading Applications of Covariance and Correlation

Trading Application #1: Portfolio Volatility Computation

When you start investing in more than one stock, you must consider how well the volatility of your portfolio is behaving. The computation of this Portfolio Volatility is only possible to be done once you fully understand both how Covariance and Correlation work.

So, please, don’t press the BUY button yet. You can check our course on Portfolio Management on Quantra to get to know more about it. Wait a bit more to let us explain to you more.

Trading Application #2: Statistical Arbitrage

Whenever you get into Statistical Arbitrage you have to understand the definitions of Correlation and Cointegration. Statistical Arbitrage does not make use of Correlation, but it’s really important to comprehend the difference between the two concepts.

Our course on Statistical Arbitrage Trading explains all about this. We can guess that you want now to start pressing the BUY and SELL buttons to do arbitrage, right? We told you, wait for more!

Trading Application #3: Correlated Variables in a Regression Estimation

One of the main conditions to make a regression between two or more variables is to have uncorrelated independent variables. So, since you already know the meaning of Correlation, you are now able to correct a problem in case there is a violation of this condition in your regression estimates. You can expand your knowledge about this topic with this course on Trading with Machine Learning: Regression.

Trading Application #4: Correlation to Predict Asset Prices

See, financial asset prices, contrary to what happens to variables in a controlled physics experiment, tend to suffer regime changes across time. So in case you calculate the Correlation of a US Treasury 2-year Bond Yield with the FED Funds Rate using data from 2021, this value won’t necessarily be the same for the year 2022.

Correlation through time could vary greatly and this mostly happens in adverse economic shocks or economic or financial crises. Be careful while using Correlation to predict your next signal to trade!

Trading Application #5: ARMA Models to Predict Asset Price Returns

So, have you heard about Autoregressive Moving Average (ARMA) Models? This model tries to forecast asset price returns with their past returns estimates calculated through a regression model, in which the moving average of the regression errors is also important.

Here you will get to know what AutoCovariance and AutoCorrelation functions are. So in order to understand these two concepts, you have first to understand the two concepts of this article.

What does an autocorrelation function look like?

Well, remember that the Correlation function is calculated based on two variables and remember also that the ARMA models try to predict asset price returns with their previous returns. So if you say these returns could be called rt, then, the autocorrelation function can be expressed as:

$$ \rho_{r_{t},r_{(t-1)}} = \frac{Cov(r_{t},r_{(t-1)})}{\sigma_{r_{t}}\:\sigma_{r_{(t-1)}}} $$

Do you want to know more? Get access to this amazing course on Time Series Analysis in Quantra.

Trading Application #6: Causality and Correlation Are Not the Same Thing

This is more of a trading knowledge so pay attention to this. Causation means that a movement in Apple Stock price causes a movement in Amazon Stock price. But, what do we mean by “causes” here? Well, Causation is even discussed philosophically in Economics. The philosophical discussion hasn’t ended up to now. Let me give you an example so you could understand it better.

Imagine we have the number of litres of rainwater that falls each month in New York City and we also have the monthly GDP of New York city as well. You calculate the Correlation between them and get a value of 0.95.

You get to say: “Oh, that’s interesting! I can say, then, that, whenever it’s raining a lot in the city, this will cause the NY GDP to increase.” No! You can’t say that a lot of rainwater ‘causes’ an increase in GDP; or vice versa, that an increase in GDP ‘will cause’ an increase in rainwater in the city.

Correlation simply means that you could find a pattern of comovement between these two variables but it doesn’t mean that there is an economic causation between rainwater and GDP.

So, whenever you are talking, as a trader, about Correlation, you must always keep in mind that this concept is not necessarily the same as Causation. Putting it in another way:

Causation always implies Correlation, but Correlation doesn’t always imply Causation.

Trading Application #7: Correlation Stylized Facts

This is also important information to incorporate into your background knowledge. Some worldwide correlations are important to consider when you want to trade in some financial assets.

Here we present you some stylized facts to keep in mind whenever you want to trade with these assets:

Crude Oil and Currencies: Crude-oil net exporters like Russia, Canada, Venezuela or Saudi Arabia see their currencies fall whenever the crude oil price falls. However, crude net importers like Japan tend to see their currency appreciate whenever the crude oil price falls.
Flight-to-Quality: Whenever there is a financial turmoil in an important country or part of the world, like the 1997 Asian financial crisis, investors tend to get their money out of these high-yield countries and buy US Treasury bonds. This action makes the affected countries’ currencies depreciate highly against the dollar and the US Treasury bonds tend to soar in value. Consequently, in the Asian crisis, there was a negative correlation between the asian countries’ currencies and the US bond prices.
Equity-Bond Negative Correlation: For example, whenever the US economy starts to boom, investors reallocate their portfolios. They invest heavily in more risky assets and less in Treasury bonds. When the economy enters a recession, it happens the opposite, there’s a greater investment in fixed income and less in equity securities. This negative correlation between Equity and Bonds is a common feature of countries around the world.
Gold Time-Varying Correlation: When world investors are more prone to invest in risky assets, there is an increase in the positive correlation between gold and the US stock market. When they become more risk averse, due to a global recession, gold becomes inversely correlated with the US stock market.
Gold and Inflation: When world inflation goes up, investors tend to increase their portfolio allocation to gold, meaning, the correlation between gold and inflation is positive. Gold behaves as a hedge against inflation. The precious metal maintains your buying power when inflation rises above expectations.
Gold and US interest rates: When the US sees its interest rates go up, then this means the economy is in good strength. This makes world investors allocate their resources to more risky assets. Thus, gold starts to decline in value in this period. We can say Gold and US interest rates have a negative correlation throughout time.
Geopolitics and Gold: Whenever there are adverse geopolitical shocks that affect the whole world, gold price tends to increase, since this precious metal acts as a ‘safe haven’.
Returns-Volatility Correlation: When stock prices fall, companies suffer an increase in their debt-to-equity ratio, which in turn, makes its stock price return volatility increase. This negative correlation between stock price returns and its volatility is called “the leverage effect”. When you trade with stocks and want to model volatility, you should consider this effect on your estimation in order to capture the negative correlation.

Covariance and Correlation Simple Exercise Examples

How to Calculate Covariance?

You are now ready for this, right? Ok, let’s start with a simple example. Imagine we have two asset prices called Microsoft and Tesla, and we have 5 days of data for each stock.

Below we have the data as a table presented below:

Date \ Stock	Microsoft	Tesla
Day 1	240	850
Day 2	265	800
Day 3	255	820
Day 4	280	870
Day 5	301	900

Since we have a sample instead of a population data, we will use the sample Covariance and the Correlation function. First of all, we know that we have 5 observations, that’s why our N variable is 5.

Then, we have to calculate the mean for each Stock. We’ll help you with that, these two means are: 268.2 and 848 for Microsoft and Tesla, respectively. Next, we have to calculate the deviations of each observation from the mean, this must be done for each stock:

Date \ Stock	Microsoft	Tesla	Microsoft Deviations from the Mean	Tesla Deviations from the Mean
Day 1	240	850	-28.2	2
Day 2	265	800	-3.2	-48
Day 3	255	820	-13.2	-28
Day 4	280	870	11.8	22
Day 5	301	900	32.8	52

Once you are done with that, you can proceed to multiply both deviations for each date and then sum all the multiplications to get the dividend of the Covariance Formula:

Microsoft’s Deviations from the Mean	Tesla’s Deviations from the Mean	Multiplication of both Deviations
-28.2	2	-56.2
-3.2	-48	153.6
-13.2	-28	369.6
11.8	22	259.6
32.8	52	1705.6
	Covariance Dividend	2432

What’s missing to calculate Covariance?

You’re almost done. You have the dividend, you’re missing the divisor. The divisor is just the total number of observations minus one. So let’s divide the Covariance Dividend with (N - 1):

$ Cov(Microsoft,Tesla) = \frac{2432}{5-1} = \frac{2432}{4} $
$ Cov(Microsoft,Tesla) = 608 $

Remember the definition? Covariance measures the direction of the comovement, in this case, between Microsoft and Tesla. So what interpretation can we get from this value? Well, we can say that, since 608 is a positive number, we conclude that there is a positive comovement between these two stock prices.

Now you ask me, How “strong” is this comovement between Microsoft and Tesla? That could be answered with the Correlation coefficient.

How to Calculate the Correlation coefficient?

Once we have the Covariance value, you can remember from above, we only need the Standard Deviations from both Microsoft and Tesla. We are going to help you with those values. The Standard Deviation of both Microsoft and Tesla are 23.42 and 39.62, respectively. So applying the formula, we get:

$ Cov(Microsoft,Tesla) = \frac{2432}{23.42*39.62} = \frac{2432}{928.15} $
$ Cov(Microsoft,Tesla) = 0.66 $

If we choose 0.5 as our threshold to decide if a correlation value is close to 1 or close to zero, that depends on the researcher, and since 0.66 is greater than 0.5, we could say that the comovement between Microsoft and Tesla is positive and also strong.

What did we mean by “degree of linear association” in the Correlation definition?

Way before, we explained to you that Correlation measures the degree of the linear association between two variables. This explanation is the formal definition you will find in textbooks of Statistics.

How could you face this explanation with a simple understanding?
Let’s do that now.

We know that Correlation has a value range between -1 and +1. It could also be zero, right? Let’s graph those two extreme cases and a close-to-zero one in a scatter plot:

As you can see, when Correlation is perfectly +1 or -1, the scatter plot forms itself like a line. That means both variables A and B have a linear relationship or linear association.

If the Correlation is close to one, you could say that, if you graph a line throughout the values, you will have this line with a positive slope. If the Correlation is close to -1, and if you graph a line throughout the values, you will have a line with a negative slope. As the value decreases from 1 to zero, this linear association becomes less clear, the same as the value increases from -1 to zero.

Correlation equal to zero or very close to zero means that there is no correlation at all between the two variables and you will also see for this Correlation value almost a random scatter plot.

Using our correlation definition from above, the strength of a positive correlation can be understood as the formula value tending from zero to +1. The strength of a negative correlation can be understood as the formula value tending from zero to -1.

Let’s see graphically how weak or strong correlations behave:

You already saw what a perfectly negative Correlation looks like. Now you can see from the above graphs what strong and weak correlations might look like. The top graph shows a negative Correlation of -0.83 and the bottom graph shows a Correlation equal to -0.25.

As you can see, when the value gets close to -1, you can say that this negative Correlation is strong. However, as the Correlation approaches zero, the linear association can’t be seen clearly in the graph, and we also say that this negative correlation is weak.

Next, let’s see the case for positive Correlations. The top graph shows a positive Correlation of 0.84 and the bottom graph shows a Correlation equal to 0.29. As you can see, when the value gets close to 1, you can say that this positive Correlation is strong.

However, as the Correlation approaches zero, the linear association can’t be seen clearly in the graph, and we also say that this positive Correlation is weak.

We showed above a graph with a zero Correlation case. Now, you could ask us maybe, isn’t there any other form of no correlation?

Let’s see this other graph:

As you can see, this Correlation between X and Y is zero, it has both a positive and negative linear association at the same time. So there are two types of graphs regarding a Correlation equal to zero:

A random scatter plot, and
A non-linear scatter plot.

What are good Covariance and Correlation values?

Seeing the last graphs or even before, you might have asked yourself this question. You must know the answer so we’ll explain it here. Actually, there is no “good” or “perfect” value for a Covariance and Correlations function for all cases.

For example, for portfolio management, it’s a good value for the Stock Covariances to be negative, so you can have a better diversification between assets. For the independent variables in a regression estimation, you consider a good value for their Correlations to be close to zero.

For an ARMA model, you will want to have the autocorrelation functions be different from zero, in order to confirm that the construction of the model is appropriate for the stock price returns.

As you could see, a good value for our Covariance or Correlation formulas depends on what you’re looking for: It depends on the trader or the researcher, and also on the topic you are talking about.

Covariance and Correlation Computation Examples in Python

Now let’s make use of real-world data to compute these two important concepts in our main programming language called Python:

How to Calculate Covariance and Correlation in Python

Let’s download Microsoft and Tesla Stock prices to use them for our calculations:

We first set up the environment to get things done. Don’t forget to install the ‘yfinance’ library in case you don’t have it:

Then we get daily historical data with the ‘yfinance’ API. We will download data from the year 2021 up until March 3, 2022. We also adjust the OHLC and Volume values, with their corresponding Adjusted Close, setting to True to the auto_adjust condition.

Next, we calculate the Covariance and Correlation for the whole sample period:

Here's the output:

Here we have to explain something about this output. As you can see, both for the Covariance and Correlation there is a table for each one.

The special tables you see are actually a Covariance Matrix for the above matrix and the Correlation matrix for the below matrix. You will have in the Covariance matrix diagonal the Variances, which are 1312.74 and 26796.55 for Microsoft and Tesla respectively. The matrix lower triangular should be a mirror of the upper triangular, meaning it should have the same values.

The same for the Correlation matrix shown below. The Correlation matrix diagonal represents the Correlation function applied not with another variable, but with the same one. We mean, for the Matrix(1,1), it is a correlation function applied with the Microsoft Stock Price against the same Stock Price, Corr(Microsoft,Microsoft).

You can prove that we leave you this as an exercise when you apply a Correlation function only with one Stock price, you will always get +1 as value. These Covariance and Correlation matrices could be deployed for more than two variables too, and they will follow the same properties as per the lines above explained.

Once you understand what we just explained, we can go to our final Covariance and Correlation computations:

How to Calculate the Rolling Covariances and Correlations in Python.

As we explained before, Correlation and Covariance could change in value as time passes. That’s why it’s essential to calculate Correlation or Covariance throughout time. One way to do this is by calculating the rolling Covariances and Correlations for our Microsoft and Tesla Stock Prices.

We mean rolling here in the sense that, specifying first a number of days called “n”, you will get both the Covariance and the Correlation functions applied for each day with “n” previous days as the history data window to estimate them.

It’s super simple. Just pay attention to the following. Since you know that the pandas properties called “.cov” and “.corr” get you a matrix, you might guess that if you applied a rolling Covariance and Correlation functions you will also get a matrix for each day.

In order to avoid getting the whole matrix for each day, you extend each property called by unstacking them and, then, calling, for both functions, the Covariance and Correlation functions between MSFT and TSLA only. We want to set as the historical data for each observation a 20-day window to calculate the functions, a monthly window. See below:

Here's the output:

Now let’s see graphically how Covariance and Correlation look throughout time:

As we told you before, you can see that both Covariance and Correlation change throughout time. There are some periods where Microsoft and Tesla move in the same direction, while there are other periods where they move in the opposite direction.

Be careful and be patient when trading! Risk management is something valuable in order to trade well in the financial markets!

Conclusion

Now, you have a better understanding of what these two concepts mean. This article also made you knowledgeable about the intricacies of the formulas. You not only know that but also comprehend how to apply Covariance or Correlation in Trading.

You are almost there to press the BUY button. It’s time now to start learning strategies in which you can potentially use these two concepts without any problem.

So, do you want to start trading with more than one asset and use the Correlation and Covariance formulas? Why not? You can enroll into our course Quantitative Portfolio Management so you could start using these concepts now!

Are you ready? Go Algo!

File in the download:

Covariance vs Correlation - Python Notebook

Visit blog to download

Disclaimer: All investments and trading in the stock market involve risk. Any decision to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

\( \rho_{0} \)	1
\( \rho_{1} \)	-0.04893
\( \rho_{2} \)	-0.00871
\( \rho_{3} \)	-0.0677
\( \rho_{4} \)	0.076198
\( \rho_{5} \)	0.017024
\( \rho_{6} \)	-0.05092
\( \rho_{7} \)	0.046457
\( \rho_{8} \)	-0.08451
\( \rho_{9} \)	0.084456

Variable	\( r_{t} \)	\( r_{t-1} \)	\( \left(r_{t}-\overline{r}\right) \)	\( \left(r_{t-1}-\overline{r}\right) \)	\( \left(r_{t}-\overline{r}\right) \)\( \left(r_{t-1}-\overline{r}\right) \)
Day 2	1%	5%	-0.100%	3.900%	-0.004%
Day 3	-2%	1%	-3.100%	-0.100%	0.003%
Day 4	3%	-2%	1.900%	-3.100%	-0.059%
Day 5	-4%	3%	-5.100%	1.900%	-0.097%
Day 6	6%	-4%	4.900%	-5.100%	-0.250%
Day 7	2%	6%	0.900%	4.900%	0.044%
Day 8	-1%	2%	-2.100%	0.900%	-0.019%
Day 9	-3%	-1%	-4.100%	-2.100%	0.086%
Day 10	4%	-3%	2.900%	-4.100%	-0.119%

\( \gamma_{0} \)	0.00028672
\( \gamma_{1} \)	-0.00001403
\( \gamma_{2} \)	-0.00000250
\( \gamma_{3} \)	-0.00001941
\( \gamma_{4} \)	0.00002185
\( \gamma_{5} \)	0.00000488
\( \gamma_{6} \)	-0.00001460
\( \gamma_{7} \)	0.00001332
\( \gamma_{8} \)	-0.00002423
\( \gamma_{9} \)	0.00002422

Mathematics and Econometrics - Quantitative Finance & Algo Trading Blog by QuantInsti

Beyond the Hype: What "Independent Events" REALLY Mean for Your Trades

TL;DR

Prerequisites

The Building Blocks

What is Independence, Statistically?

Spotting Independence: From Daily Life to Market Dynamics

Understanding the Concepts: Independence, Correlation, and Cointegration Defined

Comparing these terms:

Seeing is Believing: Visual and Quantitative Tools

Price Charts & Scatter Plots:

Calculating Correlation with Python:

Output:

From Brain Food to Real Action: Leveraging Independence in Your Trading Arsenal

Modern Tools That Amp Up These Ideas:

Algorithmic Trading in Action: Selected Examples of Independence at Play

Cross-Asset & Global Diversification Algorithms:

Factor-Based Investing Algorithms:

Event-Driven Strategies (Focusing on Specific News):

AI-Driven Sentiment Analysis & Alternative Data Integration:

The Human Factor: Data Science Tools and Our Own Brain Quirks

Reality Check: Limitations and Caveats

Conclusion

References

Next Steps

Frequently Asked Questions

What is the difference between correlation and cointegration?

Why is independence important in trading?

How does cointegration help in building trading strategies?

Can correlation be used for portfolio diversification?

How can Python be used to identify these relationships?

How do AI and algorithms leverage these concepts?

What are the risks of ignoring these concepts?

Are these relationships stable over time?

Acknowledgements

From Logistic to Random Forests: Mastering Non-linear Regression Models

Prerequisites

What Exactly is Regression Analysis?

What Makes These Models 'Non-Linear'?

Logistic (or Logit) regression

Quantile Regression: Understanding Relationships Beyond the Average

Decision Trees Regression: The Flowchart Approach

Random forest regression: Wisdom of the Crowd for Predictions

Support vector regression (SVR): Regression Within a 'Margin’ of Error

Summary

Conclusion

References

Beyond the Straight Line: Advanced Linear Regression Models for Financial Data

Prerequisites

Acknowledgements

What Exactly is Regression Analysis?

So, Why Do We Call These 'Linear' Models?

Building the Basics

From Simple to Multiple Regression

Simple linear regression

Multiple linear regression

Advanced Models

Polynomial Regression: Modeling Non-Linear Trends in Financial Markets

Ridge Regression Explained: When More Predictors Can Be a Good Thing

Elastic net regression: Combining Feature Selection and Regularization

Least angle regression: An Efficient Path to Feature Selection

Summary

Conclusion

References

Exploring the Chain Rule with Step-by-Step Examples

Table of Contents

What is a derivative?

What is the Chain Rule?

Understanding the Chain Rule

Example of Chain Rule

Conclusion

Bayesian Inference Methods and Formula Explained

Intent of this Post

Example: A Repeated Coin-Tossing Experiment

Two Perspectives on the Experiment Setup

Estimating θ: The Frequentist Approach

Estimating θ: The Bayesian Approach

Notations for the Density and Distribution Functions

Summary

Introduction to Statistical Thinking for Smarter Choices and Analysis