Autocorrelation and Partial Autocorrelation Functions (CFA Level 1): Autocorrelation: Detecting Patterns Over Lags, Mathematical Definition, and What Large ACF Values Imply. Key definitions, formulas, and exam tips.
Time-series analysis often feels a bit like detective work—you’re basically peeking into a financial market’s historical data to uncover structural clues. Autocorrelation and partial autocorrelation functions, commonly referred to as ACF and PACF, are among the most powerful tools you have for this detective work. These functions dig into the past values of your time series to see if they influence the present and, if so, how strongly.
I remember back when I was first learning about time-series analysis: I kept mixing up the difference between “autocorrelation” and “partial autocorrelation.” It felt a bit like I was stumbling around in the dark, not fully grasping which tool helped with AR terms and which helped with MA terms. If you’re in that boat, no worries—plenty of experienced analysts still have to pause and think for a second when choosing between the two. In this section, we’ll clarify the difference, show you how the ACF and PACF can help you fit better models, and illustrate how they come into play in a real-world finance setting.
Autocorrelation measures how related the time series is with itself at different points in the past. In formal terms, the autocorrelation function at lag k, sometimes denoted ρₖ or Corr(Yₜ, Yₜ₋ₖ), tells you how well the data at time t aligns with the data at time t−k.
When analyzing time series, you typically look at autocorrelation values for several different lags—1, 2, 3, and so on. You can think of it like checking if the market’s return this month is related to last month’s return, or even last quarter’s return. This can be particularly relevant if you suspect that certain cyclical or seasonal factors affect your data.
For a covariance-stationary time series {Yₜ}, the sample autocorrelation at lag k can be written as:
where T is the total number of observations and \(\bar{Y}\) is the sample mean. This ratio normalizes the covariance at lag k by the variance of the entire series.
Autocorrelations that remain large for multiple lags—especially ones that slowly decay—may indicate that your data have a long memory or could fit an AR process with a particular order.
Partial autocorrelation is basically autocorrelation but after controlling (or “partialling out”) for the presence of all other intervening lags. It can help you pinpoint the specific direct effect of a lag k on the current value, independent of any indirect effects through intermediate lags.
A personal anecdote: once, I was analyzing a monthly returns series for an equity portfolio. The naive ACF showed strong correlations at lags 1 and 2, but the partial autocorrelation indicated that only lag 1 truly had a significant direct influence once you accounted for the chain of relationships. Without the PACF in my toolbox, I might have incorrectly included a second-order term that wasn’t really necessary.
The partial autocorrelation for lag k, often denoted \(\phi_{k,k}\), is the coefficient of Yₜ₋ₖ in an autoregression of order k after you’ve accounted for all the lags from 1 to k−1. In other words, you run a regression like:
and then \(\phi_{k,k}\) is your partial autocorrelation at lag k. This coefficient stands in for the “clean” effect of lag k on Yₜ.
You’ll commonly see ACF and PACF displayed as bar charts called correlograms. The ACF plot shows correlations for lags on the x-axis and their magnitudes on the y-axis, while the PACF plot displays partial correlations for the same lags.
Most statistical software provides confidence bands—usually around ±1.96/√T if you assume the series is white noise or near-white noise. Spikes that exceed these bands indicate that the correlation is significantly different from zero at the 5% level. If an ACF bar for lag 3, for instance, shoots out wildly beyond the band, that’s a big neon sign telling you that your time series is correlated with its own value three steps in the past.
Below is a simplified Mermaid diagram illustrating how financial data might show lagged relationships:
graph LR
A["Y(t)"] -- Correlation with lag 1 --> B["Y(t-1)"]
B["Y(t-1)"] -- Indirect effect (via Y(t-2)) --> C["Y(t-2)"]
A["Y(t)"] -- Correlation with lag 2 --> C["Y(t-2)"]
Autocorrelation and partial autocorrelation can be the difference between a profitable forecast and a money-losing guess for portfolio managers, quantitative analysts, and risk managers alike. If you can recognize that lag 1 has a strong partial autocorrelation while lags 2 and 3 are negligible, you might adopt an AR(1) model. Misspecifying the order can lead to underfitting or overfitting your forecasts, which might compound the errors in your portfolio allocation and risk management decisions.
Imagine you’re in charge of risk modeling for a large mutual fund. You want to forecast next month’s returns so you can stress test for possible drawdowns. If your model captures the correct autocorrelation structure, your volatility forecasts and drawdown predictions are likely more accurate—meaning fewer nasty surprises for your risk committees.
Let’s say you have a monthly equity return series for a certain market index (e.g., the S&P 500) over 10 years, giving you 120 observations. You might do the following:
In practice, you’ll combine the insights from these plots with additional knowledge: AIC or BIC criteria, domain expertise about the market, and in-sample or out-of-sample backtesting results. But the ACF and PACF plots are always near the top of your diagnostic checklist.
When diagnosing ARIMA (AutoRegressive Integrated Moving Average) models, or even more advanced ARMA-GARCH setups, the patterns in the ACF and PACF can point to the right combos of AR and MA terms:
If the data are non-stationary, you might see patterns that only become clear after differencing the series. (See the earlier discussions on stationarity if you need a refresher.)
One of the best uses of ACF and PACF is in checking your residuals after fitting a model. Essentially, if your chosen model is adequate, the residuals (the difference between the actual data and your model’s fitted values) should look like white noise—meaning minimal autocorrelation and partial autocorrelation. If you see large, persistent correlation in the residuals, that’s a sign your model hasn’t captured something in the data.
A typical routine goes like this:
For those comfortable with code, the Python “statsmodels” library offers neat functions like acf() and pacf(). A quick snippet:
1import numpy as np
2import pandas as pd
3import statsmodels.api as sm
4from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
5import matplotlib.pyplot as plt
6
7# Example: returns = pd.Series(your_data)
8
9fig, ax = plt.subplots(2, 1, figsize=(10, 8))
10plot_acf(returns, lags=12, ax=ax[0])
11plot_pacf(returns, lags=12, ax=ax[1])
12plt.show()
This script generates the two key plots. You can eyeball the spikes to see which lags are significant before plugging that knowledge into an ARIMA model. If you test it out on real data, you’ll find that it speeds up your model-selection process, letting you zero in on plausible parameter choices quickly.
Time-series analyses, including the use of ACF and PACF, should comply with the CFA Institute’s Code of Ethics and Standards of Professional Conduct, especially around diligence and thoroughness. Ensure that any forecasts or risk metrics you produce for clients or internal stakeholders are accompanied by a clear disclosure of their limitations. If your model’s residual checks are still showing strong autocorrelations, it’s important not to present your forecasts as more reliable than they truly are.
Autocorrelation and partial autocorrelation are fundamental to diagnosing the underlying structure of time series. Think of them like X-rays and MRIs for your data—the ACF is the big-picture scan for correlations at different lags, while the PACF is the more focused view that helps pin down direct relationships after controlling for the in-betweens. In combination, they’re incredibly powerful for identifying whether your data demand an AR, MA, or ARMA approach and for confirming that your final model’s residuals look like random noise.
Ultimately, better modeling leads to better financial decisions, be it for forecasting, portfolio optimization, or risk management. In real-world finance, every small improvement in your predictive accuracy can have magnified effects on returns—and that’s why it’s worth investing time to master these techniques.
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.