Asset return prediction is difficult. Most traditional time series techniques don't work well for asset returns. One significant reason is that time series analysis (TSA) models require your data to be stationary. If it isn’t stationary, then you must transform your data until it is stationary.
That presents a problem.
In practice, we observe multiple phenomena that violate the rules of stationarity including non-linear processes, volatility clustering, seasonality, and autocorrelation. This renders traditional models mostly ineffective for our purposes.
What are our options?There are many algorithms to choose from, but few are flexible enough to address the challenges of predicting asset returns:
- mean and volatility changes through time
- sometimes future returns are correlated with past returns, sometimes not
- sometimes future volatility is correlated with past volatility, sometimes not
- non-linear behavior
Can Mixture Models offer a solution?They have potential. First, they are based on several well-established concepts.
Markov models – These are used to model sequences where the future state depends only on the current state and not any past states. (memoryless processes)
Hidden Markov models – Used to model processes where the true state is unobserved (hidden) but there are observable factors that give us useful information to guess the true state.
Expectation-Maximization (E-M) – This is an algorithm that iterates between computing class parameters and maximizing the likelihood of the data given those parameters.
An easy way to think about applying mixture models to asset return prediction is to consider asset returns as a sequence of states or regimes. Each regime is characterized by its own descriptive statistics including mean and volatility. Example regimes could include low-volatility and high-volatility. We can also assume that asset returns will transition between these regimes based on probability. By framing the problem this way we can use mixture models, which are designed to try to estimate the sequence of regimes, each regime’s mean and variance, and the transition probabilities between regimes.
The most common is the Gaussian mixture model (GMM).
The underlying model assumption is that each regime is generated by a Gaussian process with parameters we can estimate. Under the hood, GMM employs an expectation-maximization algorithm to estimate regime parameters and the most likely sequence of regimes.
GMMs are flexible, generative models that have had success approximating non-linear data. Generative models are special in that they try to mimic the underlying data process such that we can create new data that should look like original data.