Markov-switching autoregressive model

Ellen L. Hamaker; Sophie W. Berkhout

Markov-switching autoregressive model

Authors

Affiliation

Ellen L. Hamaker

Methodology & Statistics Department, Utrecht University

Sophie W. Berkhout

Methodology & Statistics Department, Utrecht University

Published

2026-01-22

This article has not been peer-reviewed yet and may be subject to change.

Want to cite this article? See citation info.

This article is about the Markov-switching autoregregressive (MSAR) model, which forms a specific category in the broader class of regime-switching models. Such models are of interest when you are studying a process that is characterized by recurrent switches between two or more distinct states or regimes that have different means, variances and/or dynamics. For instance, you may believe that a person diagnosed with bipolar disorder switches over time between being in a depressed state, a manic state, and a euthymic state.

In an MSAR model, each regime is formed by a distinct autoregressive (AR) model, which is characterized by its own mean, variance, and dynamics. Typical of a MSAR model is that the switching process is a latent Markov process, which means that the current state the system is in depends only on the state that the system was in at the occasion immediately preceding it, whereas further past states hold no additional information about the current state. Hence, this model can be useful to you when you believe there are distinct states underlying the observed process, but you do not have external information about when the system is in a particular state.

Below you can read more about: 1) how the basic MSAR model is built from a Markov model and autoregressive processes; 2) the characteristics of the distinct regimes of an MSAR model; 3) an alternative formulation of an MSAR model; and 4) estimation and comparisons of MSAR models.

1 The basic MSAR model

An MSAR is based on a latent Markov process which governs the switches between two or more distinct AR processes. In this section you can first read about the underlying Markov model, and then about how this is combined with AR models. Subsequently, an interactive tool is presented that allows you to try out various versions of the MSAR model.

If you are not yet that familiar with simple autoregressive models, you may want to check out the article about the AR model for more details first.

1.1 A Markov model

A Markov model is characterized by switches between two or more discrete states over time. The switching from one state to another depends only on the state the process is currently in, denoted as \(S_{t}\), whereas state history is unimportant. If there are two distinct states, there are in total four probabilities, that is,

\(\pi_{1|1} = P(S_{t+1}=1 | S_{t}=1)\): the probability of staying in state 1 (when currently in state 1)
\(\pi_{2|1} = P(S_{t+1}=2 | S_{t}=1)\): the probability of switching to state 2 (when currently in state 1)
\(\pi_{1|2} = P(S_{t+1}=1 | S_{t}=2)\): the probability of switching to state 1 (when currently in state 2)
\(\pi_{2|2} = P(S_{t+1}=2 | S_{t}=2)\): the probability of staying in state 2 (when currently in state 2).

The first and last probabilities, \(\pi_{1|1}\) and \(\pi_{2|2}\) are referred to as staying probabilities, as they imply the chances of staying in the same state from one occasion to the next, depending on the current state. The other probabilities, \(\pi_{2|1}\) and \(\pi_{1|2}\) are referred to as moving probabilities, as they represent the changes of moving from one state to the other, depending on the current state.

When there are two states, and you are in state 1, than either you stay in it, or you move to the other state. Hence, these two probabilities have to add up to one, that is

\[ \pi_{1|1} + \pi_{2|1} = 1\]

and thus \(1-\pi_{1|1} = \pi_{2|1}\). Similarly, when you are in state 2, you either stay in it or move to state 1, so that

\[ \pi_{2|2} + \pi_{1|2} = 1\]

and thus \(1-\pi_{2|2} = \pi_{1|2}\).

In Figure 1 you can see three examples of a Markov process with two states. The first process is characterized by equal staying probabilities, that is \(\pi_{1|1} = \pi_{2|2} = 0.8\). As you can see, when the system is in a particular state, it tends to stay in that (as the staying probability is larger than the switching probability in both states). The second process is characterized by these same probabilities for state 1 (in purple), but state 2 is characterized by a small probability of staying in it (i.e., \(\pi_{2|2}=0.1\)), and thus a fairly large chance of switching back to state 1 (i.e., \(\pi_{1|2}=1-\pi_{2|2} = 0.9\)). The third process is characterize by this same probability of staying in state 2, but now the probability of staying in state 1 is increased from 0.8 to 0.95 (i.e., \(\pi_{1|1}= 0.95\)).

Figure 1: Three examples of Markov models with two discrete states, represented in purple and green. The first process is characterized by \(\pi_{1|1}=\pi_{2|2}=0.8\). The second process is characterized by \(\pi_{1|1}=0.8\) and \(\pi_{2|2}=0.1\). The third process is characterized by \(\pi_{1|1}=0.95\) and \(\pi_{2|2}=0.1\).

Example: Menopause rage

Samuel wants to study menopause rage, which is described as sudden episodes of intense anger or irritability associated with changing estrogen levels in middle aged women. For this, he wants to collect data consisting of whether or not a woman is experiencing excessive anger at the moment of measurement. He plans to obtain measurements multiple times per day, for a period of two weeks.

Samuel expects that when the participant is in the non-angry state, she is more likely to remain in this state than to switch to the angry state; hence, he is expecting \(\pi_{1|1}>\pi_{2|1}\). Moreover, he expects that, while the degree of anger is intense, this tends to be short-lived, so that the probability of staying in the anger state is quite small, and much smaller than the probability of switching back to the non-angry state; hence, he is expecting \(\pi_{2|2}<\pi_{1|2}\).

When there are two distinct states, there are two probabilities that govern the switching process: If you know \(\pi_{1|1}\) and \(\pi_{2|2}\), the other two are given. This generalizes also to having more than two states: When there are \(k\) discrete states, there are in total \(k\times k\) probabilities of which \(k\) are determined by the others, such that there are \(k\times (k - 1)\) probabilities that define the switching process.

1.2 Combining the Markov model with the AR model

An MSAR model is based on a Markov model for regime switching. This regime-switching process is unobserved, and is therefore said to be latent of hidden. The observations are based on two or more separate AR models; depending on the regime or state the system is in, the observation is generated with one of these models.

In its simplest form, an MSAR model has two regimes and each regime is formed by a distinct first-order autoregressive (AR(1)) process. This can be expressed as

\[ y_t = \begin{cases} c_{(1)} + \phi_{(1)} y_{t-1} + \sigma_{(1)}\epsilon_t & \text{if } S_{t} = 1 \\[2mm] c_{(2)} + \phi_{(2)} y_{t-1} + \sigma_{(2)}\epsilon_t & \text{if } S_{t} = 2 \end{cases} \] where the residuals \(\epsilon_t\) come from a white noise process with a mean of zero and a variance of 1. As a result, the residual terms \(\sigma_{(1)}\epsilon_t\) and \(\sigma_{(2)}\epsilon_t\) will have variances \(\sigma_{(1)}^2\) and \(\sigma_{(2)}^2\) respectively. The difference between the two regimes can thus be in the intercepts \(c_{(1)}\) versus \(c_{(2)}\), in the autoregression \(\phi_{{(1)}}\) versus \(\phi_{(2)}\), and/or in the residual variances \(\sigma_{(1)}^2\) versus \(\sigma_{(2)}^2\).

An illustration of this is given in Figure 2, which uses different colors for the two regimes: The first regime, in purple, is characterize by \(c_{(1)}=1\) and \(\phi_{(1)}=0.2\); the second regime, in green, is characterized by \(c_{(1)}=3\) and \(\phi_{(1)}=0.4\). In both regimes the innovation variance is 1. Moreover, the probability to stay in the same regime is 0.9 for both regimes.

Figure 2: Markov-switching autoregressive model, with two regimes, each of which is characterized by a first-order autoregressive model. Probabilities of remaining in the same regime are \(\pi_{1|1}=\pi_{2|2}=0.9\); intercepts are \(c_{(1)}=1\) and \(c_{(2)}=3\); autoregressions are \(\phi_{(1)}=0.2\) and \(\phi_{(2)}=0.4\).

The MSAR model thus allows for a process that is characterized by qualitatively different states, each of which is characterized by its own equilibrium (i.e., long-run mean), regulation (i.e., dynamics), and exposure and reactivity to external factors. How the parameters are related to this, is explained in more detail in the following section.

Example: Irritation

Samuel has decided that, instead of asking participants whether they are currently experiencing excessive anger or not, he will ask them to what degree they feel currently irritated. He assumes that there are two states that the participant can be in: A state with high levels of irritation, and a state with low levels of irritation. He also thinks that the state with high levels of irritation is characterized by more carry-over from one occasion to the next resulting in a higher autoregression, whereas the state with low irritation is characterized by an enhanced ability to self-regulate, which is reflected by a lower autoregression.

Samuel wonders how dense his measurements should be to make sure that he will capture a few consecutive measurements within the same state; that will help to estimate the dynamics that characterize each of the regimes. He considers talking to participants to ask them how long they believe their episodes of irritation tend to last. He also is thinking of using video recordings or physiological measurements instead of self-report to tap into a more dense timescale; but he is somewhat doubtful about how to capture the emotional state he is interested in when not using self-reports.

1.3 Interactive tool of an MSAR model

You can further explore the kinds of patterns that can be generated with a simple MSAR model with the interactive tool provided below. It is based on an MSAR model with two regimes, each of which is formed by an AR(1) process.

For each of the two regimes, you can specify the probability to stay in this regime (i.e., \(\pi_{1|1}\) and \(\pi_{2|2}\)), the intercept (i.e., \(c_{(1)}\) and \(c_{(2)}\)), the innovation variance (i.e., \(\sigma^2_{\epsilon(1)}\) and \(\sigma^2_{\epsilon(2)}\)), and the autoregressive parameter (i.e., \(\phi_{(1)}\) and \(\phi_{(2)}\)). You can also change the number of occasions. Furthermore, with the refresh button you can generate a new series based on the same settings to see how sampling fluctuations may affect the patterns you get to see.

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 790
library(shiny)
library(bslib)

## file: app.R
library(shiny)
library(bslib)
library(ggplot2)
library(patchwork)

# --- Data simulation function ---
simulateData <- function(n, c, sigma, phi, p11, p22) {
  
  burnin <- 100
  burnin_t <- n + burnin
  p12 <- 1 - p11
  p21 <- 1 - p22
  P <- matrix(c(p11, p12, p21, p22), 2, 2)
  
  ini_p <- 0.5
  s <- sample(1:2, 1, prob = c(ini_p, 1 - ini_p))
  regime <- numeric(burnin_t)
  regime[1] <- s
  
  innov_var1 <- sigma[1] 
  innov_var2 <- sigma[2] 
  e1 <- rnorm(burnin_t, sd = sqrt(innov_var1))
  e2 <- rnorm(burnin_t, sd = sqrt(innov_var2))
  y <- numeric(burnin_t)
  
  for (t in 2:burnin_t) {
    if (regime[t - 1] == 1) {
      regime[t] <- sample(1:2, 1, prob = P[,1])
    } else {
      regime[t] <- sample(1:2, 1, prob = P[,2])
    }
    
    if (regime[t] == 1) {
      y[t] <- c[1] + phi[1] * y[t - 1] + e1[t]
    } else {
      y[t] <- c[2] + phi[2] * y[t - 1] + e2[t]
    }
  }
  
  y <- y[(burnin + 1):burnin_t]
  r <- regime[(burnin + 1):burnin_t]
  dat <- data.frame(y = y, t = 1:n, r = r)
  attr(dat, "c") <- c
  attr(dat, "phi") <- phi
  return(dat)
}

# --- Plotting function ---
plotTimeSeries <- function(dat, plot_widths = c(2,5,1)) {
  
  c_param <- attr(dat, "c")
  phi_param <- attr(dat, "phi")
  
  dat$r <- as.factor(dat$r)
  dat$y_lag <- c(NA, dat$y[-length(dat$y)])
  dat$tleft  <- dat$t - 0.5
  dat$tright <- dat$t + 0.5
  
  shades <- c("1" = "#75216a", "2" = "#2fa18f")
  
  x_breaks <- pretty(1:nrow(dat))
  y_breaks <- pretty(dat$y)
  y_lim <- c(min(y_breaks), max(y_breaks))
  axis_line_width <- 0.3
  meany <- mean(dat$y)
  
  # --- Scatterplot panel ---
  p_ss <- ggplot(dat[!is.na(dat$y_lag), ], aes(x = y_lag, y = y)) +
    geom_abline(intercept = c_param[1], slope = phi_param[1], color = shades["1"], linewidth = 1) +
    geom_abline(intercept = c_param[2], slope = phi_param[2], color = shades["2"], linewidth = 1) +
    geom_vline(xintercept = 0, linewidth = 0.4, colour = "black") +
    geom_hline(yintercept = 0, linewidth = 0.4, colour = "black") +
    geom_point(aes(color = r), size = 2.5, shape = 21, fill = "white", stroke = 1.5) +
    scale_color_manual(values = shades) +
    scale_y_continuous(breaks = y_breaks) +
    scale_x_continuous(breaks = y_breaks) +
    coord_cartesian(ylim = y_lim, xlim = y_lim) +
    theme_void() +
    labs(x = bquote(y[t-1]), y = bquote(y[t])) +
    theme(text = element_text(family = "sans", size = 16),
          axis.text = element_text(margin = margin(5,5,5,5)),
          axis.title = element_text(margin = margin(5,5,5,5)),
          axis.ticks = element_line(lineend = "butt", linewidth = axis_line_width),
          axis.ticks.length = unit(2.5, "pt"),
          plot.margin = margin(0,5,0,0),
          legend.position = "none") +
    geom_segment(x = -Inf, xend = -Inf, y = y_breaks[1], yend = max(y_breaks), linewidth = axis_line_width) +
    geom_segment(y = -Inf, yend = -Inf, x = y_breaks[1], xend = max(y_breaks), linewidth = axis_line_width)
  
  # --- Time series panel ---
  p_ts <- ggplot(dat) +
    geom_rect(aes(xmin = tleft, xmax = tright, fill = r), ymin = -Inf, ymax = Inf, alpha = 0.05) +
    scale_fill_manual(values = shades) +
    geom_line(aes(x = t, y = y)) +
    geom_point(aes(x = t, y = y, color = r), size = 2.5, shape = 21, fill = "white", stroke = 1.5) +
    scale_color_manual(values = shades) +
    geom_hline(yintercept = mean(dat$y[dat$r=="1"]), color = shades["1"], linewidth = 1) +
    geom_hline(yintercept = mean(dat$y[dat$r=="2"]), color = shades["2"], linewidth = 1) +
    scale_y_continuous(breaks = y_breaks) +
    scale_x_continuous(breaks = x_breaks) +
    coord_cartesian(ylim = y_lim) +
    theme_void() +
    labs(x = "t") +
    theme(text = element_text(family = "sans", size = 16),
          axis.text.x = element_text(margin = margin(5,5,5,5)),
          axis.title.x = element_text(margin = margin(5,5,5,5)),
          axis.ticks = element_line(lineend = "butt", linewidth = axis_line_width),
          axis.ticks.length = unit(2.5, "pt"),
          plot.margin = margin(0,5,0,0),
          legend.position = "none") +
    geom_segment(x = -Inf, xend = -Inf, y = y_breaks[1], yend = max(y_breaks), linewidth = axis_line_width) +
    geom_segment(y = -Inf, yend = -Inf, x = x_breaks[1], xend = max(x_breaks), linewidth = axis_line_width)
  
  p_hist <- ggplot(dat) + 
    geom_histogram(aes(y = y, x = ..density..),
                   fill = "gray", colour = "black") +
    geom_density(aes(y = y)) +
    geom_hline(yintercept = meany, colour = "gray", linewidth = 0.5) +
    scale_y_continuous(breaks = y_breaks) +
    coord_flip() +               # rotate
    coord_cartesian(ylim = y_lim) + # align y-axis with middle panel
    theme_void() +
    labs(x = "", y = "") +
    theme(text = element_text(family = "sans", size = 16),
          axis.title = element_text(margin = margin(5, 5, 5, 5)),
          panel.spacing = unit(5, units = "pt"),
          plot.margin = margin(0, 5, 0, 0))
  
  # assemble panels
  p <- p_ss + p_ts + p_hist + plot_layout(widths = plot_widths)
  return(p)
}

# --- Shiny App ---
ui <- fluidPage(
  fluidRow(
    column(3, sliderInput("p11", label = HTML("<h4>Probability &pi;<sub>1|1</sub></h4>"), min = 0, max = 1, value = 0.8, step = 0.05)),
    column(3, numericInput("c1", label = HTML("<h4>Intercept c<sub>(1)</sub></h4>"), value = -1, step = 1)),
    column(3, numericInput("var1", label = HTML("<h4>Innovation variance &sigma;<sup>2</sup><sub>&#x03F5(1)</sub></h4>"), value = 1, min = 0, max = 100, step = 0.5)),
    column(3, sliderInput("phi1", label = HTML("<h4>Autoregression \U03D5<sub>(1)</sub></h4>"), min = -0.999, max = 0.999, value = 0.2, step = 0.05))
  ),
  fluidRow(
    column(3, sliderInput("p22", label = HTML("<h4>Probability &pi;<sub>2|2</sub></h4>"), min = 0, max = 1, value = 0.8, step = 0.05),
           numericInput("n", label = h4("Number of occasions T"), value = 100, min = 10, max = 1e5, step = 10)),
    column(3, numericInput("c2", label = HTML("<h4>Intercept c<sub>(2)</sub></h4>"), value = 2, step = 1)),
    column(3, numericInput("var2", label = HTML("<h4>Innovation variance &sigma;<sup>2</sup><sub>&#x03F5(2)</sub></h4>"), value = 1, min = 0, max = 100, step = 0.5)),
    column(3, sliderInput("phi2", label = HTML("<h4>Autoregression \U03D5<sub>(2)</sub></h4>"), min = -0.999, max = 0.999, value = 0.4, step = 0.05))
  ),
  actionButton("refresh", "Refresh"),
  plotOutput(outputId = "main_plot", height = "400px")
)

server <- function(input, output) {
  dat <- reactive({
    input$refresh
    d <- simulateData(input$n,
                      c = c(input$c1, input$c2),
                      sigma = c(input$var1, input$var2),
                      phi = c(input$phi1, input$phi2),
                      p11 = input$p11, p22 = input$p22)
    attr(d, "c") <- c(input$c1, input$c2)
    attr(d, "phi") <- c(input$phi1, input$phi2)
    d
  })
  
  output$main_plot <- renderPlot({
    plotTimeSeries(dat())
  })
}

shinyApp(ui, server)

The tool shows three visualizations of the data that were generated with the MSAR model, using the settings you specified. The left plot is the autoregressive scatter plot with \(y_t\) plotted against \(y_{t-1}\), using purple for observations from regime 1 and green for observations from regime 2. The regime-specific regression lines implied by the intercepts (i.e., \(c_{(1)}\) and \(c_{(2)}\)) and slopes (i.e., \(\phi_{(1)}\) and \(\phi_{(2)}\)) are also included using the same coloring.

The middle plot is the time series plotted against time, using the same coloring again. The solid purple line represents the sample mean of observations from the first regime, whereas the solid green line represents the sample mean of observations from the second regime.

The right plot shows the distribution as a histogram and a density line, with a horizontal grey line indicating the observed mean. No distinction is made here between observations from the two distinct regimes; instead, it can be used to see whether the marginal distribution of the empirical data shows evidence for multiple regimes by being bimodal, or that such a clue is absent. Bimodality can flag that there are multiple regimes; yet, there may also be multiple regimes that overlap to such an extent that it will not show up as bimodality.

1.4 Extensions

There are various ways in which the MSAR model can be extended. Basic extensions are based on: a) having more than two regimes; and b) having higher-order AR processes in each regime.

Example: Patterns of switching between in bipolar disorder

Asenath is interested in the switches between various affective states in bipolar disorder. Specifically, she assumes that underlying the affective daily diary measurements she obtained from a patient, there are three states: a depressed state, a manic state, and a euthymic state.

Her research question concerns the way in which the patient switches from a depressed state to a manic state and vice versa: Asenath wants to know whether this always happens by first going through the euthymic state, or that the patient also sometime switches directly from being manic to being depressed or vice versa.

To account for the day-to-day carry-over, Asenath wants to make use of an MSAR with three states, each of which is characterized by an AR(1) process. She will have to check the parameter estimates to see whether she can interpret the three regimes as representing a depressed, a manic, and a euthymic state: For this, she will need to find the means of the regimes. Once she has been able to label the regimes as a depressed, manic and euthymic state, she will consider the transition probabilities to see what the typical switching pattern is. Specifically, she will investigate whether certain switching probabilities are (very close to) zero.

Another extension consist of having switching probabilities that can change over time. For instance, the probabilities may depend on a time-varying predictor, or on the amount of time spent in the same regime (e.g., switching may become more or less likely when being in a regime for a longer period of time).

The MSAR models described here can also be seen as special cases within the broader class of regime-switching state-space models (Kim & Nelson, 1999). These allow for further extensions, such as the inclusion of moving average terms and/or measurement error in each regime.

2 Characteristics of the MSAR regimes

While the dynamics that characterize the distinct regimes can be immediately read off by considering the autoregressive parameters, it is less obvious what the means and variances are that characterize each of the regimes. Yet, if you want to tie a substantive interpretation to the regimes, the long-run equilibrium and the variability that characterize each regimes are quite important.

2.1 Means of the MSAR regimes

From the article about autoregressive models, you can see that the long-run mean of a linear AR(1) process is actually a function of the intercept and the autoregressive parameter, that is

\[\mu = \frac{c}{1-\phi}.\]

Based on this, you may be tempted to assume this also holds for an MSAR, and that you can derive the long-run regime-specific means by using a similar formula for the parameters of each of the regimes.

However, due to the autoregression, there will be some carry-over from the features of one regime into the other. How big this effect is, depends on the autoregression in each regime, the difference in means between the regimes, and how often the process switches between the regimes; this is further shown below.

2.2 Carry-over between regimes

To see how the features of one regime carry-over into the other regime, consider the MSAR process plotted in Figure 3. This MSAR process is based on intercepts \(c_{(1)}=1\) and \(c_{(2)}=3\), and autoregressions \(\phi_{(1)}=0.2\) and \(\phi_{(2)}=0.8\). As a result, the second regime (indicated in green) will be characterized by a higher long-run mean than the first regime (shown in purple).

Figure 3: MSAR process with two regimes, each of which are characterized by an AR(1) process. The intercepts are \(c_{(1)}=1\) and \(c_{(2)}=3\), and the autoregressive parameters are \(\phi_{(1)}=0.2\) and \(\phi_{(2)}=0.8\).

But what you can see quite clearly, is that each time that the process switches from the first (purple) regime to the second (green) regime, the process tends to continue climbing for quite some time; apparently, it only slowly approaches its new equilibrium. The reason for this behavior is the high autoregression in the second regime. In contrast, the first regime is characterized by a much lower autoregression, and as a result the carry-over from one occasion to the next—and therefore also the effect of the other regime on this one—is much smaller.

For illustrative purposes, the same Markov process was used for the MSAR process in Figure 4; the only difference is that now the autoregression in the second regime was set equal to the autoregression in the first regime, that is \(\phi_{(1)}=0.2=\phi_{(2)}=0.2\). Furthermore, to ensure a similar difference between the two regimes as before, the intercept for the second regime was set to \(c_{(2)}\)=11.

Figure 4: MSAR process that is similar to the MSAR model in Figure 3, except that the autoregressive parameters are now both relatively close to zero, that is, \(\phi_{(1)}=0.2=\phi_{(2)}=0.2\), and the intercept of the second regime was changed to \(c_{(2)}=11\).

You see that when the process switches to the second regime, it reaches its alternative equilibrium almost instantly and there is no longer this slow climb towards a new mean that was present in Figure 3. This illustrates that, due to the lower autoregression, the carry-over of features from the first regime into the second is much smaller now.

The two MSAR models above differ in both the intercept and the autoregression for the second regime. To see what happens when only the autoregression is changed, you can compare the MSAR model in Figure 3 with the one in Figure 5.

Figure 5: MSAR process that is similar to the MSAR model in Figure 3, except that the autoregressive parameters are now both relatively close to zero, that is, \(\phi_{(1)}=0.2=\phi_{(2)}=0.2\)

You can see that the mean of the second regime (in green) is now much lower; this is only because the autoregressive parameter in the second regime changed form 0.8 to 0.2. This shows that the mean of a regime is determined by both the intercept (see the difference between Figure 4 and Figure 5) and the autoregressive parameter (see the difference between Figure 3 and Figure 5).

You may also have noticed that the fluctuations in the second (green) regime in the MSAR process plotted in Figure 3 seem to take place around an increasing trajectory, rather than around a stable mean. This is not because this regime is characterized by a trend, but rather because it takes time to reach the new equilibrium due to the high autoregression. The process does not stay long enough in the second regime to clearly show it reaches a new equilibrium over time.

To see this more clearly, you can look at the MSAR process in Figure 5. The parameter settings are the same as for the process in Figure 3, except that the probability of remaining in the second (green) regime is now increased from 0.9 to \(\pi_{2|2}=0.95\). As a result, the process tends to stay longer in this regime once it has entered it.

Figure 6: MSAR process that is similar to the MSAR model in Figure 3, except that the probability of staying in the second (green) regime is now \(\pi_{2|2}=0.95\).

At the start of the observations, the process is in the second (green) regime, and seem to fluctuate around a stable mean with a stable variance. When the process switches to the first regime (in purple), it reaches its regime-specific long-run equilibrium almost immediately as a result of the low autoregression in this regime.

The interesting aspect to notice is when the process switches back to the second regime (in green) again: There you see it really needs to to climb back up to the long-run equilibrium and it does not reach that point yet before switching again. When it switches to the second (green) regime again, it stays in it for a longer period of time, and this shows that after about 20 occasions in the second regime, the process seems to have reached its long-run regime-specific equilibrium again.

To summarize, this shows that when you have a process with one or more regimes that have high autoregression, and the process tends to switch somewhat regularly between the two regimes, then the long-run equilibria will typically not be reached, simply because it takes too much time to get there and there is not enough time spent in a regime to make this happen. This also shows that it is hard to determine what the means of the specific regimes will be based on the intercepts and autoregressive parameters.

2.3 Variances of the MSAR regimes

Similar to the issue discussed above for the regime-specific means, the regime-specific variances also cannot be based on the well-known expressions for the variance of a regular AR model. That is, due to the autoregression, there is some effect of the variance of the first regime on that of the second regime and vice versa. When there are large mean differences between the regimes and when the switching occurs more often, this effect is further amplified.

While the regime-specific variance is of interest because it tells you something about the amount of variability in a specific regime, the regime-specific innovation variance may also be considered of interest. It represent the degree to which the process is impacted by external factors, and may thus capture the degree to which a person is exposed to variability in such factors, as well as the degree to which a person responds to such factors (Jongerling et al., 2015). This may be considered an important clue about the nature of a regime from a substantive point of view.

Example: Regime differences in variance

Rogier is interested in the variability in reaction times of an individual in a computer task. He assumes that people may switch between two distinct states: One in which they are extremely focused and respond consistently fast, and one in which they are distracted and respond on average slower but most notably also with much more variation in their reaction times.

Rogier therefore wants to use a model with two regimes, and allow for an autoregressive process in each regime. He is specifically interested in differences in the variability that characterizes each regime. But he wonders whether it is the variance of the observed reaction times or the innovation variance that he is most interested in. He realizes that the variance of the observed reaction times may also capture something of the transition from one equilibrium to the other, and that this is actually not what he is interested in. Hence, he concludes that his primary interest is in the innovation variances that characterize the two regimes, and how these differ from each other.

2.4 How to get the regime-specific means and variances

A pragmatic way to get the means and variances that characterize the regimes, is by simulating a very long time series with a specific constellation of parameter values—for instance the values you have estimated based on an empirical data set. Subsequently, you can determine the mean and variance of the simulated observations within each regime. However, this will not inform you about the uncertainty of the estimates; that would require a more extensive simulation approach.

3 Alternative formulation of an MSAR

Instead of the formulation that was used above, in which the raw variable is used as a lagged version of the outcome (i.e., \(y_{t-1}\)), you can also choose a formulation based on regime-specific centering of the lagged variable (Hamilton, 1994). If you consider an MSAR with two (or more) regimes and an AR(1) in each regime, this can be expressed as

\[ y_t = \mu_{(S_t)} + \phi_{(S_t)} (y_{t-1} - \mu_{(S_{t-1})} ) + \sigma_{S_t}\epsilon_t \]

where \(S_t\) indicates the regime the process is in at occasion \(t\), and \(S_{t-1}\) indicates the regime the process was in at occasion \(t-1\). In this formulation \(y_t\) is regressed not on \(y_{t-1}\) but on the deviation of \(y_{t-1}\) from the mean of the regime that \(y_{t-1}\) comes from (represented by \(\mu_{(S_{t-1})}\)). This has several implications.

First, when the predictor on the right-hand side is centered and therefore has a mean of zero, then the intercept in this regression equation, \(\mu_{(S_t)}\), will represent the mean of the outcome variable (given it is in regime \(S_t\)). This may be a much more meaningful parameter to obtain than the intercepts \(c_{(S_t)}\) that are included in the model presentation provided above, and this may be a reason to prefer the current formulation.

Second, this model is not mathematically equivalent to the model presented before. This is in stark contrast to these kind of model formulations in the linear case, which you can read about in the article on the AR model. An easy way to see that the current formulation and the previous one cannot be equivalent, is by thinking again of the examples presented in Figure 3 and Figure 5: These were characterized by large autoregression in the second (green) regime, which resulted in quite a lot of carry-over from the first regime when the second regime was entered. This showed up as a slow climbing towards the long run equilibrium of the second regime. Such behavior cannot be generated with the current specification based on centering, as it does not use \(y_{t-1}\) but the centered \(y_{t-1} - \mu_{(S_{t-1})}\). As a result, the current model formulation will result in rather sudden switches to the new equilibrium, regardless of how large the autoregressive parameter is.

When estimating an MSAR, it is therefore very important to know which version of the model is implemented. The centered version may be attractive, as the mean parameters have more direct and substantive meaningful interpretations than the intercepts in the model specification above. However, another consideration is whether you believe the switches are abrupt and fully realized at once, or that the transitions are somewhat smoother over time. In the latter case, you should prefer the formulation without centering, as it allows for more carry-over from one regime into the next.

4 Estimating and comparing MSAR models

Estimating the parameters of an MSAR model will depend on the formulation of the model that is used. A rather general modeling framework that can be used for this is based on the regime-switching state-space model, developed by Kim & Nelson (1999); in the framework, no centering is used.

Comparing models with different numbers of regimes to decide how many regimes are present is a tricky problem (Hamilton, 1994). It is somewhat similar to the problem that you encounter if you want to determine the number of regimes in the context of change point models with unknown timing of the change points or TAR models with unknown threshold values. It can be described as the problem of having nuisance parameters that are unidentified under the null model (i.e., the model with fewer regimes).

To see the problem, consider the case where you want to compare a MSAR with two regimes and a linear AR model to determine which of these models was more likely to have given rise to the observed data. The linear AR model is nested under the MSAR model, but you can obtain it in two ways:

you can set the intercepts, autoregressions, and innovations equal across the two regimes; in this case, the transition probabilities are unidentified (i.e., they can take on any value); or
you can set the transition probabilities in such a way that you only can be in one of the regimes (e.g., by setting \(\pi_{1|1}=1\) and \(\pi_{2|2}=0\)); then the parameters that characterize the other regime are unidentified (i.e., they can take on any value).

Hence, in both cases, there are nuisance parameters (i.e., parameters that are not of interest) that are unidentified, meaning they can take on various values and there is no unique solution. Of course you can decide to impose all the constraints, but that is actually more then needed to get from the MSAR model to the AR model; while this may help in estimation, it does not solve the problem of model comparison using a log-likelihood test. Therefore, alternative tests have been developed (Hamilton, 1994).

Alternatively, you can compare these models with information criteria without any adjustment; you simply count the number of free parameters per model and weigh these with the penalty of the information criterion that you use. This is in contrast to the model comparisons for change point models and TAR models with unknown change point or threshold values: In those cases, regular information criteria are not appropriate, because change points or thresholds are nonregular parameters (Ninomiya, 2005).

Example: One or two regimes in affect fluctuations

Samuel wants to determine whether there actually are two regimes or only one in the affective fluctuations of his participants. He decides to use second-order autoregressive models in each regime, and allow for different innovation variances in each regime. Hence, his MSAR model is characterized by the following free parameters: 2 probabilities for the Markov process, 2 intercepts, 4 autoregressive parameters, and 2 innovation variances. Hence, in total there are 10 free parameters.

In contrast, the linear AR model has 1 intercept, 2 autoregressive parameters, and 1 innovation variance, making a total of 4 free parameters. The MSAR model will fit better than the AR model and thus result in a larger log likelihood; in the AIC, this is multiplied by -2 to quantify the model’s misfit. The number of free parameters of the model is then multiplied by 2 to represent the model’s complexity, and added to the model’s misfit to get the AIC. This way, model fit and model complexity are both taken into account.

The comparison then assesses whether the improvement in fit of the MSAR model is sufficient to outweigh its increased complexity relative to the simpler AR model.

5 Think more about

MSAR models fall within the more extended category of regime-switching models, which also include the change point model and threshold autoregressive (TAR) models. It can be useful to understand when these models become similar or even identical, as this also helps to better understand how they tend to differ from each other.

Typical of a change point model (with two regimes) is that it changes from the first to the second regime only once, such that there is a specific point in time when this switch occurs. If you have an MSAR (with two regimes), and you specify one of the regimes as an absorbing state, meaning that once this regime is entered it is impossible to leave it, this model is actually equivalent to a change point model.

In comparison to the TAR model the most notable feature of the MSAR model is that the switching process is hidden (or latent), whereas switching in a TAR model occurs based on an observed threshold variable. A model that may be a bit in between these two regime-switching models is a MSAR model in which the switching probabilities depend on an observed time-varying variable \(x_t\). In that case, the switching is not a deterministic function of \(x_t\) (as it is assumed to be in the TAR model), nor is it completely independent from \(x_t\) (as in the model where the switching probabilities are invariant over time).

Furthermore, it may be good to realize that the MSAR model is in general a stationary model. Although it is based on switching between regimes that can have distinct means, variability and dynamics, the switching is not triggered by time itself. Moreover, while the switching is reversible, the switches do not occur according to a repetitive deterministic temporal pattern. The only exception is when there is an absorbing state as described above (e.g., \(\pi_{2|2}=1\)): In that case, sooner or later the process will enter this regime, and then forever stay in it, which makes it a non-stationary regime-switching process.

6 Takeaway

The MSAR model is based on having two or more regimes, each of which is characterized by a distinct AR process with its own parameters. The switching between these regimes is governed by an unobserved Markov process. In estimating an MSAR model, the interest is in obtaining the parameters that characterize the various regimes so that these can be interpreted from a substantive point of view, and to obtain the transition probabilities such that the dynamics of the switching process can be described.

The nature of the various regimes may be somewhat challenging to determine, as the mean and variance that characterize them depend not only on the intercept, autoregressions and innovation variance of that regime, but also on the variance and mean of the other regime(s) and the switching probabilities. To obtain more insight into the features of a regime, it can be helpful to simulate a very long time series based on the parameters that were obtained, and determine the features of interest per regime.

The MSAR model is based on abrupt switches from one regime to another. However, as you have been able to see above, when there is strong autoregression in a regime, this smooths the abrupt transition somewhat. This is why the MSAR model—like the more general regime-switching state-space model by Kim & Nelson (1999)—can capture both abrupt change in an observed process, but also to some extent a more gradual change. There is also an alternative formulation for the MSAR model, in which the lagged terms are centered with the mean of the regime they came from (Hamilton, 1994); such a version allows for quite abrupt changes between regimes only. Hence, when interested in using a MSAR model, it is important to decide which of these versions you want to use, and to make sure that this is indeed the model that is being estimated.

7 Further reading

We have collected various topics for you to read more about below.

Read more: Regime-switching and other time-varying models

References

Hamilton, J. D. (1994). Time series analysis. Princeton University Press. https://doi.org/10.2307/j.ctv14jx6sm

Jongerling, J., Laurenceau, J.-P., & Hamaker, E. L. (2015). A multilevel AR(1) model: Allowing for inter-individual differences in trait-scores, inertia, and innovation variance. Multivariate Behavioral Research, 184, 334–349. https://doi.org/10.1080/00273171.2014.1003772

Kim, C-J, & Nelson, C. R. (1999). State-space models with regime switching: Classical and Gibbs-sampling approaches with applications. The MIT Press. https://doi.org/10.7551/mitpress/6444.001.0001

Ninomiya, Y. (2005). Information criterion for Gaussian change-point model. Statistics and Probability, 72, 237–247. https://doi.org/10.1016/j.spl.2004.10.037

Citation

BibTeX citation:

@article{hamaker2026,
  author = {Hamaker, Ellen L. and Berkhout, Sophie W.},
  title = {Markov-Switching Autoregressive Model},
  journal = {MATILDA},
  number = {2026-01-22},
  date = {2026-01-22},
  url = {https://matilda.fss.uu.nl/articles/ms-ar-model.html},
  langid = {en}
}

For attribution, please cite this work as:

Hamaker, E. L., & Berkhout, S. W. (2026). Markov-switching autoregressive model. MATILDA, 2026-01-22. https://matilda.fss.uu.nl/articles/ms-ar-model.html