Moving-average models

Ellen L. Hamaker; Sophie W. Berkhout

Moving-average models

Authors

Affiliation

Ellen L. Hamaker

Methodology & Statistics Department, Utrecht University

Sophie W. Berkhout

Methodology & Statistics Department, Utrecht University

Published

2025-08-28

This article has not been peer-reviewed yet and may be subject to change.

Want to cite this article? See citation info.

This article focuses on the class of moving-average (MA) models. MA models are often considered in combination with autoregressive (AR) models in the joint framework of autoregressive moving-average (ARMA) models (Box & Jenkins, 1976). This is a fundamental modeling framework within the context of N=1 time series data, and allows you to account for different kinds of temporal dependencies in your data. While AR models—particularly the AR(1) model—are well known among psychological researchers, MA models have not received the same level of attention or appreciation.

Like AR models, MA models can be used to analyze your data and obtain parameter estimates that you can consider for descriptive purposes. MA models can also be used for predictive purposes, to obtain forecasts for an individual (for an elaborate source on this, see Hyndman & Athanasopoulos, 2021). Moreover, when considering an MA model as the data generating mechanism, you may prefer to talk about it as the MA process. Understanding what kind of patterns the various MA models can generate is helpful in deciding whether you want to consider this as a way to analyze your data.

In this article, you find: 1) a presentation of the first-order MA model, including an interactive tool that allows you to see what kinds of temporal patterns it can generate; 2) a presentation of the second-order MA model, including an interactive tool that allows you to see what kind of temporal patterns it can generate; 3) a presentation of the general MA model including higher-order MA processes; and 4) a discussion of how this model is related to other models for N=1 ILD.

1 First-order moving-average model: MA(1) model

Suppose you have a time series consisting of univariate measurements of the variable \(y\), obtained at equal time intervals. The observation at occasion \(t\) is denoted as \(y_t\). If this time series is a first-order MA model, this implies that each observation \(y_t\) can be thought of as the weighted sum of two random shocks: A random shock associated with the current occassion \(\epsilon_t\), and a random shock from the previous occasion \(\epsilon_{t-1}\). There are various ways in which you can represent an MA(1) model.

First, you can visualize it as in Figure 1. From this visualization you can see that \(y_t\) and \(y_{t-1}\) are related: They share a common cause, which is \(\epsilon_{t-1}\). When considering \(y_t\) and \(y_{t-2}\) instead, you can see that these are not related: They do not share a common cause, nor does one have a direct or indirect effect on the other. The same holds for longer lags (e.g., \(y_{t}\) with \(y_{t-3}\)). This means that when you look at the [autocorrelation function] of an MA(1) process, you will see that \(\rho_1\) differs from zero, but autocorrelations at larger lags are all zero.

Figure 1: Path diagram of a first-order moving-average or MA(1) model.

Second, you can also represent an MA(1) model with the equation

\[y_t = \mu + \epsilon_t + \theta_1 \epsilon_{t-1},\] where: \(\mu\) is a constant that represents the stable mean of the observed variable \(y_t\) over time; \(\epsilon_t\) is the random shock associated with the current occasion \(t\); \(\epsilon_{t-1}\) is the random shock associated with the previous occasion \(t-1\); and \(\theta_1\) is the moving-average parameter by which the previous shock is weighted. The random shocks are also referred to as innovations, perturbations, residuals, or dynamic error. Characteristic of these random shocks is that they are independent of each other over time; this implies they form a [white noise series]. Typically, these random shocks are assumed to come from a normal distribution with a mean of zero and variance \(\sigma^2_\epsilon\).

Below, you find an expression of the variance of an MA(1) model, as well as an explanation of the role of the MA parameter \(\theta_1\). Additionally, there is an interactive tool that allows you to investigate the role that each parameter plays in the kind of temporal patterns that an MA(1) process can generate. There is also an explanation of the constraint that is needed to ensure the process is invertible, and why this is important.

1.1 The variance of \(y\) versus the variance of \(\epsilon\)

The variance of an MA(1) process can be expressed as a function of the variance of the innovations, through

\[ \sigma_y^2 = (1 + \theta_1^2)\sigma_{\epsilon}^2.\]

This shows that if \(\theta_1=0\), the variance of \(y\) is the same as the variance of \(\epsilon\) (i.e, \(\sigma_y^2=\sigma_{\epsilon}^2\)). But if \(\theta_1 \neq 0\), the variance of \(y\) will be larger than that of \(\epsilon\).

The expression above also implies that

\[ \sigma_{\epsilon}^2 = \frac{\sigma_y^2}{1+\theta^2} .\]

Hence, if you know two of the three parameters \(\theta_1\), \(\sigma_{\epsilon}^2\), and \(\sigma_{y}^2\), the third is also known.

1.2 Interpretation of the moving-average parameter \(\theta_1\)

The MA parameter can be thought of as a delayed effect of a random shock that occurred in the past. In Figure 2 you see two examples of an MA(1) process, where the one at the top was generated with \(\theta_1 = 0.3\) and the one at the bottom was generated with \(\theta_1=0.9\).

Figure 2: Two examples of an MA(1) process. On the left the two time series are plotted against time, showing the temporal fluctuations up and down. On the right, each score is plotted against its preceding score. Top represents an MA(1) with moving-average parameter of 0.3; bottom represents an MA(1) with a moving-average parameter of 0.9

When comparing the sequences on the left, you may see that the top one is a bit more random than the bottom one; yet this is not very obvious when looking at these short series. The lag 1 autocorrelation of an MA(1) can be expressed as a function of the MA parameter through

\[\rho_1 = \frac{\theta_1}{1 + \theta_1^2},\]

while all autocorrelations at lags larger than 1 will be zero for an MA(1) process.

For the two examples of an MA(1) process considered above, this implies that the first series that was generated with \(\theta_1 = 0.3\) should have a lag 1 autocorrelation of about 0.275, whereas for the bottom one generated with \(\theta_1= 0.9\) it should be about 0.497. The correlations for the two series in Figure 2 are estimated to be \(\hat{\rho}_1 = 0.324\) and \(\hat{\rho}_1 = 0.574\), respectively. Hence, the series in the bottom is indeed characterized by a stronger relation between \(y_t\) and \(y_{t-1}\) (i.e., less randomness). You may also be able to infer this from the plots on the right of Figure 2, although when series are short, sampling fluctuations can easily change the results. You can try this out with the interactive tool provided below.

Example: Daily drinking behavior

Mike is studying daily alcohol consumption in a person who has been identified as a heavy drinker. He thinks that daily alcohol consumption is likely to be impacted by random events and experiences during the day, but that the random events and experiences of yesterday may also have an impact.

Specifically, Mike thinks that when there were more stress factors today, the person is likely to drink more alcohol; similarly, if there were more stress factors yesterday, this is also likely to result in more alcohol consumption today. Mike assumes that the stress factors of the day before yesterday no longer exert an effect on today’s alcohol consumption.

Mike estimates an MA(1) model, and interprets the innovations (or random shocks) as representing the total collection of stress factors on a particular day. His results show a positive moving-average parameter, indicating an additional delayed effect of yesterday’s stress factors on today’s alcohol consumption.

From the perspective of your research question, the interpretation of the moving-average parameter may differ somewhat. When your goal is description, you may simply use the moving-average parameter as a quantification of the temporal dependencies in your data. When your goal is prediction, the moving-average parameter may be used to make forecasts for the next occasion, but it is difficult to make prediction beyond that (see Hyndman & Athanasopoulos, 2021). When your goal is to study causation, the model may be viewed as the data generating mechanism, where the moving-average parameter is not just a feature that can be obtained from the data, but it forms an actual force that shapes the process (Slutzky, 1937).

1.3 Interactive tool of an MA(1) process

Below is an interactive tool that allows you to obtain an intuition for what kinds of temporal patterns an MA(1) model can generate. Specifically, it allows you to change the values of the various parameters of an MA(1) model, so that you can see how each of them plays a role in the process.

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 410
library(shiny)
library(bslib)

## file: app.R
library(shiny)
library(bslib)
library(ggplot2)
library(patchwork)

simulateData <- function(n, mu, sigma, theta) {
  
  burnin <- 100
  
  burnin_t <- n + burnin
  
  innov_var <- sigma / (1 +  theta ^ 2)

  e <- rnorm(burnin_t, sd = sqrt(innov_var))
  y <- numeric(burnin_t)
  
  for (t in 2:burnin_t) {
    y[t] <- mu + e[t] + theta * e[t - 1]
  }
  
  y <- y[(burnin + 1):burnin_t]
  
  dat <- data.frame(y = y, t = 1:n)
  
  return(dat)
  
}

plotTimeSeries <- function(dat, plot_widths = c(2, 5, 1)) {
  
  x_breaks <- pretty(1:nrow(dat))

  y_breaks <- pretty(dat$y)
  y_lim <- c(min(y_breaks), max(y_breaks))

  axis_line_width <- 0.3
  
  dat$y_lag <- c(NA, dat$y[-length(dat$y)])
  
  coryy <- round(cor(dat$y,  dat$y_lag, use = "complete.obs"), 3)
  meany <- mean(dat$y)
  
  lm1 <- lm(y ~ y_lag, dat)
  ys1 <- cbind(1, c(min(y_breaks), max(y_breaks))) %*% coef(lm1)

  p_ss <- ggplot(dat, aes(x = y_lag, y = y)) +
    geom_segment(aes(x = min(y_breaks), xend = max(y_breaks),
                     y = ys1[1], yend = ys1[2]),
                 linewidth = 0.3, colour = "gray") +
    geom_point(size = 2, shape = 21, fill = "white") +
    scale_y_continuous(breaks = y_breaks) +
    scale_x_continuous(breaks = y_breaks) +
    coord_cartesian(ylim = y_lim, xlim = y_lim) +
    theme_void() +
    labs(x = bquote(y[t-1]), y = bquote(y[t])) +
    theme(text = element_text(family = "sans", size = 16),
          axis.text = element_text(margin = margin(5, 5, 5, 5)),
          axis.title = element_text(margin = margin(5, 5, 5, 5)),
          axis.ticks = element_line(lineend = "butt",
                                    linewidth = axis_line_width),
          axis.ticks.length = unit(2.5, "pt"),
          panel.spacing = unit(5, units = "pt"),
          plot.margin = margin(0, 5, 0, 0),
          legend.position = "none") +
    geom_segment(x = -Inf, xend = -Inf, y = y_breaks[1],
                 yend = y_breaks[length(y_breaks)],
                 linewidth = axis_line_width, lineend = "square") +
    geom_segment(y = -Inf, yend = -Inf, x = y_breaks[1],
                 xend = y_breaks[length(y_breaks)],
                 linewidth = axis_line_width, lineend = "square") +
    annotate("text", x = -Inf, y = Inf, hjust = -0.2, vjust = 2,
             label = bquote(hat(rho)[1]~"="~.(coryy)), size = 5)
  
  p_ts <- ggplot(dat) + 
    geom_segment(x = x_breaks[1], xend = max(x_breaks), y = meany, yend = meany,
                 linewidth = 0.3, colour = "gray") +
    geom_line(aes(x = t, y = y)) +
    geom_point(aes(x = t, y = y), size = 2, shape = 21, fill = "white") +
    scale_y_continuous(breaks = y_breaks) +
    scale_x_continuous(breaks = x_breaks) +
    coord_cartesian(ylim = y_lim, xlim = c(min(x_breaks), max(x_breaks))) +
    theme_void() +
    labs(x = "t") + 
    theme(text = element_text(family = "sans", size = 16),
          axis.text.x = element_text(margin = margin(5, 5, 5, 5)),
          axis.title.x = element_text(margin = margin(5, 5, 5, 5)),
          axis.ticks = element_line(lineend = "butt",
                                    linewidth = axis_line_width),
          axis.ticks.length = unit(2.5, "pt"),
          panel.spacing = unit(5, units = "pt"),
          plot.margin = margin(0, 5, 0, 0),
          legend.position = "none") +
    geom_segment(x = -Inf, xend = -Inf, y = y_breaks[1], yend = max(y_breaks),
                 linewidth = axis_line_width, lineend = "square") +
    geom_segment(y = -Inf, yend = -Inf, x = x_breaks[1], xend = max(x_breaks),
                 linewidth = axis_line_width, lineend = "square")
  
  p_hist <- ggplot(dat) + 
    geom_histogram(aes(y = y, x = ..density..), fill = "gray", colour = "black") +
    geom_density(aes(y = y)) +
    geom_hline(yintercept = meany, colour = "gray", linewidth = 0.5) +
    scale_y_continuous(breaks = y_breaks) +
    scale_x_continuous(breaks = x_breaks) +
    coord_cartesian(ylim = y_lim) +
    theme_void() +
    labs(x = "", y = "") +
    theme(text = element_text(family = "sans", size = 16),
          axis.title = element_text(margin = margin(5, 5, 5, 5)),
          panel.spacing = unit(5, units = "pt"),
          plot.margin = margin(0, 5, 0, 0))
  
  p <- p_ss + p_ts + p_hist + plot_layout(widths = plot_widths)
  
  return(p)
  
}

ui <- fluidPage(
  fluidRow(
    column(3,
      numericInput("n", label = h4("Number of occasions T"), value = 100,
                   min = 10, max = 1e5, step = 10)
    ),
    column(3,
           numericInput("mean", label = HTML("<h4>Mean &mu;</h4>"), value = 0, step = 1)
    ),
    
    column(3,
           numericInput("var", label = HTML("<h4>Variance &sigma;<sup>2</sup><sub style='position: relative; left: -0.4em;'>y</sub></h4>"),
                        value = 1, min = 0, max = 100, step = 1)
    ),
    
    column(3,
           sliderInput("ma", label = HTML("<h4>Moving average \U03B8<sub>1</sub></h4>"), min = -0.999, 
                       max = 0.999, value = 0.3, step = 0.05)
    )
  ),
  
  actionButton("refresh", "Refresh"),
  
  plotOutput(outputId = "main_plot", height = "250px"),
  
)

server <- function(input, output) {
  
  dat <- reactive({
    input$refresh  # include to make it reactive to the button
    simulateData(input$n, input$mean, input$var, input$ma)
  })
  
  output$main_plot <- renderPlot({
    
    plotTimeSeries(dat())
    
  })
  
}

shinyApp(ui = ui, server = server)

The tool shows three visualizations of the data generated from an MA(1) model with the specified parameter values and number of occasions. The left figure is an autocorrelation scatter plot of every \(y_t\) plotted against \(y_{t-1}\), with the lag-1 autocorrelation \(\hat{\rho}_1\) of the generated data and a straight line indicating the linear relationship. The middle figure is a time series plot of \(y\) plotted against time, where the horizontal grey line indicates the observed mean of the series. Finally, the right figure shows the distribution as a histogram and a density line, again with a horizontal grey line indicating the observed mean.

The refresh button allows you to generate a new data set with the same specifications; you can use this to see how sampling fluctuations may lead to quite different patterns in the data, especially when the number of occasions is small.

1.4 Invertibility of an MA(1) process

MA processes are by definition stationary. This means that you do not need to impose any constraints on the parameter \(\theta_1\) to ensure the process fluctuates around a constant (here: \(\mu\)), and does so with a constant variance (here: \(\sigma_y^2\)). However, the parameters of an MA model have to be constrained to ensure the process meets a condition known as invertibility. In the time series literature you often see the concept invertibility discussed in two related ways (e.g., Chatfield, 2004; Hamilton, 1994).

First, invertibility means that the MA model can be rewritten as an autoregressive (AR) model of infinite order; how this is done exactly, is further explained in the article in which the mixed autoregressive moving-average (ARMA) model is discussed. If an MA model is invertible, this means that the resulting AR(\(\infty\)) model is characterized by autoregressive parameters that go to zero as the lag increases.

Second, invertibility ensures that given a autocorrelation function, there is only one MA process possible. This is important, because in the case of an MA(1) process, there are always two versions that lead to the exact same autocorrelations: An MA(1) process with \(\theta_1\) as its MA parameter, and an MA(1) with \(\frac{1}{\theta_1}\) as its MA parameter. To see this, you can try out the value \(\theta_1=0.5\); using this in the expression for the autocorrelation presented above you get

\[ \rho_1 = \frac{0.5}{1+0.5^2} = \frac{0.5}{1+0.25} = \frac{0.5}{1.25}= 0.4.\]

If you take the inverse of \(\theta_1\), you get \(\frac{1}{0.5}=2\); using this in the expression for the autocorrelation at lag 1, you get

\[ \rho_1 = \frac{2}{1+2^2} = \frac{2}{5} = 0.4,\]

which shows that indeed \(\theta_1\) and \(\frac{1}{\theta_1}\) result in the same value for \(\rho_1\). Since for an MA(1) process, all autocorrelations at lags larger than 1 are zero, the two MA(1) processes will by definition have the same autocorrelations at larger lags.

Hence, invertibility of an MA process is considered important because it ensures that there is a well-defined AR version (with parameters that converge to zero), and that there is only a single process for a given autocorrelation function. For an MA(1) model to be invertible, the moving-average parameter needs to lie between -1 and 1, that is

\[ -1 < \theta_1 < 1\]

or

\[ |\theta_1| <1 \] When \(\theta_1=1\), you have an unweighted sum of random shocks. When \(\theta_1>1\), a past random shock has more effect than the current random shock. While these two models may be of interest in some scenarios, they do not fall within the class of invertible MA(1) models.

2 Second-order moving-average (MA(2)) model

You may also consider \(y_t\) as a weighted sum of the current random shock and the two most recent shocks. This would be a second-order moving-average (MA(2)) model, which is visualized in Figure 3.

Figure 3: Path diagram of a second-order moving-average or MA(2) model.

From this visualization you can see that observations \(y_t\) and \(y_{t-1}\) are related: They share two common causes, that is, \(\epsilon_{t-1}\) and \(\epsilon_{t-2}\). Moreover, \(y_t\) and \(y_{t-2}\) are also related: They have \(\epsilon_{t-2}\) as their common cause. Beyond lag 2, there are no relations for an MA(2). This implies that the autocorrelations at lag 1 and lag 2 (i.e., \(\rho_1\) and \(\rho_2\)) will differ from zero, whereas the autocorrelations at lag 3 and beyond will all be zero.

The MA(2) model can also be expressed as

\[y_t = \mu + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2},\] showing that the current score \(y_t\) is a weighted sum of current and past random shocks or innovations, where the weights are formed by the moving-average parameters \(\theta_1\) and \(\theta_2\). The intercept \(\mu\) in the equation represents the mean of the observed series \(y\) (because the random shocks all have a mean of zero).

In this section you find a more elaborate explanation of the various parameters of the MA(2) model, as well as an interactive tool that allows you to investigate the role that each parameter plays in the kind of temporal patterns that an MA(2) model can generate. There is also an explanation of the constraints that are needed to ensure an MA(2) process is invertible.

2.1 The variance of \(y\) versus the variance of \(\epsilon\)

The variance of an MA(2) process can be expressed as a function of the weights and the innovation variance, that is

\[\sigma_y^2 = (1+\theta_1^2 + \theta_2^2)\sigma_{\epsilon}^2.\] This implies that

\[\sigma_{\epsilon}^2 = \frac{\sigma_y^2}{1+\theta_1^2 + \theta_2^2}.\]

2.2 Interpretation of the moving-average parameters \(\theta_1\) and \(\theta_2\)

The parameters \(\theta_1\) and \(\theta_2\) can be interpreted as the degree to which past random shocks impact the current observation \(y_t\). When taking a causal perspective and thinking about an MA(2) as a data generating mechanism, these parameters capture delayed effects of causes that occurred in the past (Slutzky, 1937). From a predictive perspective, these parameters can be used to make forecasts for the next two occasions, but not beyond that forecast horizon (Hyndman & Athanasopoulos, 2021). From a descriptive perspective, the parameters can be seen as weights that are used to obtain a weighted average of the current and most recent random shocks, thereby smoothing the random fluctuations of the innovations.

The parameters can also be related to the autocorrelations of the process. Specifically, the autocorrelation at lag 1 (so between \(y_t\) and \(y_{t-1}\)) can be expressed as

\[\rho_1 = \frac{\theta_1 + \theta_1\theta_2}{1 + \theta_1^2 + \theta_2^2},\]

and the correlation at lag 2 (so between \(y_t\) and \(y_{t-2}\)) can be expressed as

\[\rho_2 = \frac{\theta_2}{1 + \theta_1^2 + \theta_2^2}.\]

Beyond lag 2, the autocorrelations of an MA(2) process are zero.

Example: Sleep patterns

Ximena has obtained repeated measures of her sleep quality using a wearable device that combined measures of sleep duration, efficiency (i.e., how much of time in bed was spent asleep), and latency (how long it takes to fall asleep after going to bed). This results in a single score per day; Ximena has data over a period of three months.

Ximena assumes that daily hassles and experiences affect her sleep, and that many of such factors tend to have a delayed effect and take a few days to lose their impact. However, she has not recorded self-reports on such daily factors, and can thus not include these as predictors in her analysis.

When Ximena reads about MA models, she recognizes the potential such models have for the ideas she has about the many daily factors that may impact sleep. She decides to use a second-order MA model which allows for today’s, yesterday’s and the day-before-yesterday’s factors to impact sleep, without requiring actual measurements of these factors. She expects \(\theta_1\) and \(\theta_2\) to lie between 0 and 1, and for \(\theta_1\) to be larger than \(\theta_2\); that way, more distant experiences have less impact than more recent experiences.

2.3 Interactive demo of an MA(2) process

Below is an interactive tool that allows you to obtain an intuition for what kinds of temporal patterns an MA(2) model can generate. Specifically, it allows you to change the number of occasions and the values of the mean, variance, and moving-average parameters of an MA(2) model, so that you can see how each of them plays a role in the process.

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 410
library(shiny)
library(bslib)

## file: app.R
library(shiny)
library(bslib)
library(ggplot2)
library(patchwork)

simulateData <- function(n, mu, sigma, theta1, theta2) {
  
  burnin <- 100
  
  burnin_t <- n + burnin
  
  innov_var <- sigma / (1 + theta1 ^ 2 + theta2 ^ 2)

  e <- rnorm(burnin_t, sd = sqrt(innov_var))
  y <- numeric(burnin_t)
  
  for (t in 3:burnin_t) {
    y[t] <- mu + e[t] + theta1 * e[t - 1] + theta2 * e[t - 2]
  }
  
  y <- y[(burnin + 1):burnin_t]
  
  dat <- data.frame(y = y, t = 1:n)
  
  return(dat)
  
}

nDec <- function(x) {
  sapply(x, function(y) nchar(sub("^-?\\d*\\.?", "", format(y, scientific = F))))
}

plotTimeSeries <- function(dat, plot_widths = c(3, 3, 6, 1)) {
  
  n <- nrow(dat)
  x_breaks <- pretty(1:n)

  y_breaks <- pretty(dat$y)
  y_lim <- c(min(y_breaks), max(y_breaks))

  axis_line_width <- 0.3
  
  dat$y_lag1 <- c(NA, dat$y[-n])
  dat$y_lag2 <- c(NA, NA, dat$y[-c(n, n-1)])
  
  coryy1 <- round(cor(dat$y,  dat$y_lag1, use = "complete.obs"), 3)
  coryy2 <- round(cor(dat$y,  dat$y_lag2, use = "complete.obs"), 3)
  meany <- mean(dat$y)
  
  lm1 <- lm(y ~ y_lag1, dat)
  ys1 <- cbind(1, c(min(y_breaks), max(y_breaks))) %*% coef(lm1)
  
  lm2 <- lm(y ~ y_lag2, dat)
  ys2 <- cbind(1, c(min(y_breaks), max(y_breaks))) %*% coef(lm2)
  
  angle_x <- ifelse(max(nDec(y_breaks)) > 0, 30, 0)
  
  p_ss1 <- ggplot(dat, aes(x = y_lag1, y = y)) +
    geom_segment(aes(x = min(y_breaks), xend = max(y_breaks),
                     y = ys1[1], yend = ys1[2]),
                 linewidth = 0.3, colour = "gray") +
    geom_point(size = 2, shape = 21, fill = "white") +
    scale_y_continuous(breaks = y_breaks) +
    scale_x_continuous(breaks = y_breaks) +
    coord_cartesian(ylim = y_lim, xlim = y_lim) +
    theme_void() +
    labs(x = bquote(y[t-1]), y = bquote(y[t])) +
    theme(text = element_text(family = "sans", size = 16),
          axis.text = element_text(margin = margin(5, 5, 5, 5)),
          axis.text.x = element_text(angle = angle_x),
          axis.title = element_text(margin = margin(5, 5, 5, 0)),
          axis.ticks = element_line(lineend = "butt",
                                    linewidth = axis_line_width),
          axis.ticks.length = unit(2.5, "pt"),
          panel.spacing = unit(5, units = "pt"),
          plot.margin = margin(0, 5, 0, 0),
          legend.position = "none") +
    geom_segment(x = -Inf, xend = -Inf, y = y_breaks[1],
                 yend = y_breaks[length(y_breaks)],
                 linewidth = axis_line_width, lineend = "square") +
    geom_segment(y = -Inf, yend = -Inf, x = y_breaks[1],
                 xend = y_breaks[length(y_breaks)],
                 linewidth = axis_line_width, lineend = "square") +
    annotate("text", x = -Inf, y = Inf, hjust = -0.2, vjust = 2,
             label = bquote(hat(rho)[1]~"="~.(coryy1)), size = 5)
  
  p_ss2 <- ggplot(dat, aes(x = y_lag2, y = y)) +
    geom_segment(aes(x = min(y_breaks), xend = max(y_breaks),
                     y = ys2[1], yend = ys2[2]),
                 linewidth = 0.3, colour = "gray") +
    geom_point(size = 2, shape = 21, fill = "white") +
    scale_y_continuous(breaks = y_breaks) +
    scale_x_continuous(breaks = y_breaks) +
    coord_cartesian(ylim = y_lim, xlim = y_lim) +
    theme_void() +
    labs(x = bquote(y[t-2])) +
    theme(text = element_text(family = "sans", size = 16),
          axis.text.x = element_text(margin = margin(5, 5, 5, 5), angle = angle_x),
          axis.title.x = element_text(margin = margin(5, 5, 5, 5)),
          axis.ticks = element_line(lineend = "butt",
                                    linewidth = axis_line_width),
          axis.ticks.length = unit(2.5, "pt"),
          panel.spacing = unit(5, units = "pt"),
          plot.margin = margin(0, 5, 0, 0),
          legend.position = "none") +
    geom_segment(x = -Inf, xend = -Inf, y = y_breaks[1],
                 yend = y_breaks[length(y_breaks)],
                 linewidth = axis_line_width, lineend = "square") +
    geom_segment(y = -Inf, yend = -Inf, x = y_breaks[1],
                 xend = y_breaks[length(y_breaks)],
                 linewidth = axis_line_width, lineend = "square") +
    annotate("text", x = -Inf, y = Inf, hjust = -0.2, vjust = 2,
             label = bquote(hat(rho)[2]~"="~.(coryy2)), size = 5)
  
  p_ts <- ggplot(dat) + 
    geom_segment(x = x_breaks[1], xend = max(x_breaks), y = meany, yend = meany,
                 linewidth = 0.3, colour = "gray") +
    geom_line(aes(x = t, y = y)) +
    geom_point(aes(x = t, y = y), size = 2, shape = 21, fill = "white") +
    scale_y_continuous(breaks = y_breaks) +
    scale_x_continuous(breaks = x_breaks) +
    coord_cartesian(ylim = y_lim, xlim = c(min(x_breaks), max(x_breaks))) +
    theme_void() +
    labs(x = "t") + 
    theme(text = element_text(family = "sans", size = 16),
          axis.text.x = element_text(margin = margin(5, 5, 5, 5)),
          axis.title.x = element_text(margin = margin(5, 5, 5, 5)),
          axis.ticks = element_line(lineend = "butt",
                                    linewidth = axis_line_width),
          axis.ticks.length = unit(2.5, "pt"),
          panel.spacing = unit(5, units = "pt"),
          plot.margin = margin(0, 5, 0, 0),
          legend.position = "none") +
    geom_segment(x = -Inf, xend = -Inf, y = y_breaks[1], yend = max(y_breaks),
                 linewidth = axis_line_width, lineend = "square") +
    geom_segment(y = -Inf, yend = -Inf, x = x_breaks[1], xend = max(x_breaks),
                 linewidth = axis_line_width, lineend = "square")
  
  p_hist <- ggplot(dat) + 
    geom_histogram(aes(y = y, x = ..density..), fill = "gray", colour = "black") +
    geom_density(aes(y = y)) +
    geom_hline(yintercept = meany, colour = "gray", linewidth = 0.5) +
    scale_y_continuous(breaks = y_breaks) +
    scale_x_continuous(breaks = x_breaks) +
    coord_cartesian(ylim = y_lim) +
    theme_void() +
    labs(x = "", y = "") +
    theme(text = element_text(family = "sans", size = 16),
          axis.title = element_text(margin = margin(5, 5, 5, 5)),
          panel.spacing = unit(5, units = "pt"),
          plot.margin = margin(0, 5, 0, 0))
  
  p <- p_ss1 + p_ss2 + p_ts + p_hist + plot_layout(widths = plot_widths)
  
  return(p)
  
}

ui <- fluidPage(
  fluidRow(
    column(2,
      numericInput("n", label = h4("# occasions T"), value = 100,
                   min = 10, max = 1e5, step = 10)
    ),
    column(2,
           numericInput("mean", label = HTML("<h4>Mean &mu;</h4>"), value = 0, step = 1)
    ),
    
    column(2,
           numericInput("var", label = HTML("<h4>Variance &sigma;<sup>2</sup><sub style='position: relative; left: -0.4em;'>y</sub></h4>"),
                        value = 1, min = 0, max = 100, step = 1)
    ),
    
    column(3,
           sliderInput("ma1", label = HTML("<h4>Moving average \U03B8<sub>1</sub></h4>"), min = -2, 
                       max = 2, value = 0.3, step = 0.05)
    ),
    
    column(3,
           sliderInput("ma2", label = HTML("<h4>Moving average \U03B8<sub>2</sub></h4>"), min = -1, 
                       max = 1, value = 0.3, step = 0.05)
    )
  ),
  
  actionButton("refresh", "Refresh"),
  
  plotOutput(outputId = "main_plot", height = "250px"),
  
)

server <- function(input, output) {
  
  dat <- reactive({
    validate(
      need((input$ma2 + input$ma1) < 1 & (input$ma2 - input$ma1) < 1,
           "The specified moving average parameter values do not satisfy invertibility conditions.")
    )
    
    input$refresh  # include to make it reactive to the button
    simulateData(input$n, input$mean, input$var,
                 input$ma1, input$ma2)
  })
  
  output$main_plot <- renderPlot({
    
    plotTimeSeries(dat())
    
  })
  
}

shinyApp(ui = ui, server = server)

The tool shows four visualizations of data that are generated from an MA(2) model with the specified parameter values and number of occasions. The first figure is an autocorrelation scatter plot of \(y\) at occasion \(t\) against \(y\) at the previous occasion \(t-1\), with the lag-1 autocorrelation \(\hat{\rho}_1\) of the generated data and a straight line indicating the linear relationship. The second figure is an autocorrelation scatter plot of \(y\) at occasion \(t\) against \(y\) two occasions ago \(t-2\), with the lag-2 autocorrelation \(\hat{\rho}_2\) of the generated data and a straight line indicating the linear relationship. The third figure is a time series plot of \(y\) against time, where the horizontal grey line indicates the observed mean of the series. Finally, the right figure shows the distribution as a histogram and a density line, again with a horizontal grey line indicating the observed mean. The refresh button allows you to generate a new data set with the same specifications.

2.4 Invertibility of an MA(2) process

Like an MA(1) process, an MA(2) process is by definition stationary. To ensure an MA(2) process can be expressed as an AR(\(\infty\)) process with parameters that converge to zero, and that there is a single MA process for an autocorrelation function, the process should be invertible. This requires the following inequality constraints to hold:

\(-1 < \theta_2 < 1\)
\((\theta_2 + \theta_1) < 1\)
\((\theta_2 - \theta_1) < 1\)

When comparing these constraints to the constraints needed on the autoregressive parameters \(\phi_1\) and \(\phi_2\) of an AR(2) process to ensure stationarity, you may notice they have a similar form. However, the current constraints are needed for invertibility, while stationarity is a given; for an AR(2), invertibility is not an issue, but stationarity requires the constraints on the autoregressive parameters.

3 Higher order moving-average models: MA(\(q\)) model

A general expression for the moving-average model of order \(q\) is

\[y_t = \mu + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \dots + \theta_q \epsilon_{t-q},\] showing it is a weighted sum of current and past random shocks. Alternatively this can be expressed using the summation symbol as

\[y_t = \mu + \sum_{k=1}^q \theta_k \epsilon_{t-k} + \epsilon_t.\] In this section, you can read about the relations between MA parameters and the variance of an MA(\(q\)) process. In addition, the interpretation of the MA parameters is discussed, and the constraints that are necessary to ensure an MA(\(q\)) model is invertible are discussed.

3.1 The variance of \(y\) versus the variane of \(\epsilon\)

If \(y\) is an MA(\(q\)) process, its variance can be expressed as a function of the MA parameters and the innovation variance through

\[\sigma_y^2 = (1 + \theta_1^2 + \theta_2^2 + \dots + \theta_q^2)\sigma_\epsilon^2.\] This also implies that

\[\sigma_\epsilon^2 = \frac{\sigma_y^2}{1 + \theta_1^2 + \theta_2^2 + \dots + \theta_q^2}.\]

3.2 Interpretation of the moving-average parameters

The interpretation of the parameters \(\theta_1\) to \(\theta_q\) in an MA(q) process follows the same reasoning as the parameters \(\theta_1\) and \(\theta_2\) in an MA(2) process described above. While parameters may become smaller with increasing lags, this is not necessarily the case.

The parameters can also be used to get analytical expressions of the autocorrrelations at lag 1 to \(q\), that is

\[\rho_k = \frac{\theta_k + \theta_1\theta_{1+k} + \dots + \theta_{q-k}\theta_{q}}{1 + \theta_1^2 + \theta_2^2 + \dots + \theta_q^2}.\] Beyond lag \(q\), all autocorrelations are zero.

3.3 Invertibility of an MA(q) process

The conditions that are needed to ensure an MA(q) process is invertible are similarly complex as the conditions needed for stationarity of an AR(p) process: It requires the roots of the lag polynomial to lie outside the unit root circle (cf. Hamilton, 1994). To understand what is meant by this, you need to know what a lag polynomial is, what its roots are, and what the unit root circle is; this is beyond the scope of the current article.

When the invertibility constraints are imposed, this implies that the MA(q) model can be rewritten as an AR(\(\infty\)) model with parameters that converge to zero. Moreover, it implies that there is only one MA process for a particular autocorrelation function. Without invertibility constraints, there may be up to \(q^2\) different MA processes that can generate the exact same autocorrelation function (see slide 33 from Robert Kunst through the link at the bottom of this article). As a result, when estimating the parameters of an MA process, there may be multiple solutions; the invertibility constraints ensure there is a unique solution.

4 Relations with other models

The class of MA models is intimately related to the class of AR models: Combined they form the general class of autoregressive moving average (ARMA) models. This can be considered a fundamental modeling framework in the N=1 time series literature. You can read more about the connection between MA, AR and ARMA models in the article about the latter.

In case of multivariate time series, you can revert to the class of [vector autoregressive moving average (VARMA) models]. Additionally, you can include an exogenous variable into your MA model. For instance, you can use time as the exogenous variable to model a linear increasing or decreasing trend. Alternatively, you can include, for instance, dummy variables as exogenous variables to account for day of the week or time of day in the measures of daily or momentary tiredness. Another exogenous variable could be weather conditions, which may affect a person’s mood.

These models may also be considered as the basis for dynamic multilevel models that can be used for N>1 ILD.

5 Think more about

MA models require the time intervals between the observations to be of equal length. Hence, these models are suitable for [daily diary studies], where measurements are obtained once per day in the evening. It may also be used for [physiological measurements] or [expert ratings] of behaviors based on video recordings; such measurements tend to be regularly spaced in time, for instance, every second or every five seconds.

In contrast, [experience sampling] or [signal-contingent] measurements are based on having irregular time intervals between measurement occasions; this is to avoid participants anticipating the next measurement occasion. Also, [event-contingency] measurements, which require measurements to occur after a particular event (i.e., a social interaction or a panic attack), are likely to have wildly varying time intervals. MA models may be less suitable for these kind of data, unless the observations can be made approximately equally spaced by adding missing values, or by using an estimation technique that can handle the irregular intervals.

To estimate the parameters of an MA model, Box & Jenkins (1976) developed a two-step procedure in which first initial parameter estimates were obtained based on the autocorrelations, and subsequently a filtering procedure was used to obtain maximum likelihood estimates. Nowadays, estimation of MA models typically relies on a Kalman filter procedure; this requires specifying your model in state-space form.

Finally, to determine the order of an MA model in empirical applications, you may use either theoretical or data-driven reasoning. Also, you may want to consider other models from the broader ARMA models family. You can read more about this in the article about ARMA models.

6 Takeaway

The reasoning behind an MA model was described by Slutzky (1937) in terms of a causal process: “The magnitude of each consequence is determined by the influence, not of one, but of a number of the preceding causes, as for instance, the size of a crop is determined, not by one day’s rainfall, but by many.” (p.108). The causes are not observed variables in this context, but are represented by the random shocks or innovations that can be inferred from the observed series through fitting an MA model to the data.

While the class of MA models is closely related to the class of AR models, in psychology it is a much less well-known class of time series models. It may be of interest to investigate whether and when MA models should be preferred over AR models. This may depend on substantive ideas you have about the underlying process, but also on whether your research goal is to engage in description, prediction, or causation. The parameters of an MA process have clear interpretations as the weights a random shock has in the moving average that is computed. When thinking of MA models as data generating mechanisms, you can interpret the MA parameters as delayed effects of random shocks (cf Slutzky, 1937).

MA models are by definition stationary, but constraints are needed to ensure an MA model is invertible. This implies it can be rewritten as an AR(\(\infty\)) model with parameters that converge to zero at larger lags, and that there is a unique MA model associated with a particular autocorrelation function.

7 Further reading

We have collected various topics for you to read more about below.

References

Box, G. E. P., & Jenkins, G. M. (1976). Time series analysis: Forecasting and control (Revised). Holden-Day. https://doi.org/10.1002/9781118619193

Chatfield, C. (2004). The analysis of time series: An introduction (6th ed.). Chapman; Hall. https://doi.org/10.1007/978-1-4899-2923-5

Hamilton, J. D. (1994). Time series analysis. Princeton University Press. https://doi.org/10.2307/j.ctv14jx6sm

Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and practice (3rd ed.). OTexts. OTexts.com/fpp3. Accessed on: May 14, 2025.

Slutzky, E. (1937). The summation of random causes as the source of cyclic processes. Econometrica, 5, 105–146. https://doi.org/10.2307/1907241

Citation

BibTeX citation:

@article{hamaker2025,
  author = {Hamaker, Ellen L. and Berkhout, Sophie W.},
  title = {Moving-Average Models},
  journal = {MATILDA},
  number = {2025-08-28},
  date = {2025-08-28},
  url = {https://matilda.fss.uu.nl/articles/moving-average-model.html},
  langid = {en}
}

For attribution, please cite this work as:

Hamaker, E. L., & Berkhout, S. W. (2025). Moving-average models. MATILDA, 2025-08-28. https://matilda.fss.uu.nl/articles/moving-average-model.html