Autoregressive intergrated moving-average (ARIMA) model
This article is about the autoregressive integrated moving-average (ARIMA) model, which forms a key family of time series models in the N=1 time series literature. Members of this family can be used to account for various short- and long-term patterns of dependency in your data.
Understanding how the stationary component of the ARIMA model—consisting of the autoregressive moving-average (ARMA) model—comes together with the non-stationary component—known as the integrated (I) model—helps you to see that there are many possible model forms beyond the commonly used first order autoregressive (AR(1)) model. Exploring this broader class of models could support you in choosing the one that best captures the temporal dynamics in your intensive longitudinal data (ILD).
Below you find information on: 1) the two building blocks of an ARIMA model and how these come together; 2) four specific examples of ARIMA models; and 3) how to decide on the orders \(p\), \(d\), and \(q\) of the ARIMA model that you will fit to your data.
1 Building blocks
An ARIMA model can be thought of as a stationary autoregressive moving-average (ARMA) model that is embedded in a non-stationary unit root process. The latter is also referred to as an integrated process, which is the reason for the I in ARIMA. In this section you can read more about the ARMA model first, followed by what a unit root or integrated process is, after which you learn how the two are combined to get the ARIMA model.
1.1 The ARMA(\(p,q\)) process
The ARMA process is one of the two building blocks of an ARIMA model. The ARMA model is itself a combination of the autoregressive (AR) model and the moving-average (MA) model. It is by definition stationary, which means that the mean, variance and other characteristics of the time series that are generated by an ARMA model do not change over time.
The general ARMA(\(p,q\)) model combines \(p\) autoregressive (AR) terms with \(q\) moving-average (MA) terms, and can be expressed as
\[y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_p y_{t-p} + \theta_q \epsilon_{t-q} + \dots + \theta_2 \epsilon_{t-2} + \theta_1 \epsilon_{t-1} + \epsilon_t.\]
The \(\phi\)-parameters are the AR coefficients by which the current value of the time series is predicted from preceding values of the time series, and the \(\theta\)-parameters are the MA coefficients by which past random shocks have a predictive effect as well. When thinking about this as a data generating mechanism, you can think of the AR terms as ways in which past values of the process carry-over into later values of the process, while the MA terms capture delayed effects of the random shocks that are given to the system.
If you want to gain more insight in the characteristics and workings of an ARMA process, you can read more about this in the article about the ARMA model, and the accompanying articles about its special cases: the pure AR model and the pure MA model.
1.2 The I(\(d\)) process
The second building block of an ARIMA process is the integrated or unit root process, denoted by I(\(d\)). This is the part of the model that makes the process non-stationary. It produces a stochastic trend, and results in ever-increasing uncertainty when you try to predict further into the future (Hyndman & Athanasopoulos, 2021).
An integrated process of order 1—denoted as I(1)—implies that
\[y_t = y_{t-1} + a_t.\]
This can be understood as: Every score is built on the preceding one by adding a random shock to it. The resulting time series is a non-stationary process that is characterized by single unit root. If you difference these data once, you end up with
\[\Delta y_t = y_t - y_{t-1} = a_t,\]
which is a stationary series. You can study the short-term dynamics in the process by further analyzing the differenced series and seeing whether this forms a white noise sequence, or a stationary process that is characterized by non-zero autocorrelations. When interpreting the results from analyzing the differenced series \(a_t\), it is important to keep in mind that this concerns changes in the observed series \(y_t\). For instance, the autocorrelation of \(a_t\) at lag 1 indicates the correlation between the change in \(y_t\) between two consecutive time points (\(t\) and \(t-1\)) and the change between two subsequent time points (\(t\) ad \(t+1\)).
An integrated process can contain more than one unit root. Suppose \(d=2\), meaning \(y_t\) is an I(2) process, characterized by two unit roots. This implies that
\[y_t = y_{t-1} + k_t,\]
where
\[k_t = k_{t-1} + a_t,\]
with \(a_t\) being a stationary series of random shocks.
If you have observed an I(2) process, you have to difference the time series twice to obtain a stationary series, that is,
\[\Delta^2 y_t = \Delta y_t - \Delta y_{t-1} = (y_t - y_{t-1}) - (y_{t-1} - y_{t-2}) = y_t - 2 y_{t-1} + y_{t-2} = a_t.\]
Subsequently, you can now use the stationary series \(a_t\) to investigate whether there are remaining dynamics in the process. However, in describing the results, you should keep in mind that \(a_t\) represents the change in the change (over time) of \(y\). This can make it rather difficult to interpret autocorrelations or the parameters you use to model the series \(a_t\) in a substantive meaningful way.
In general, when you have an integrated process of order \(d\), this is a process that is said to have \(d\) unit roots. It requires you to difference the data \(d\) times to obtain a stationary series. Hence, to determine how many unit roots there are, you have to determine how often you need to difference the data to obtain a stationary series.
1.3 Putting ARMA(\(p,q\)) and I(\(d\)) together
You can think of an ARIMA(\(p,d,q\)) process as stemming from a sequence of two steps in which white noise \(\epsilon_t\) is transformed into the observed series \(y_t\). This sequence is vizualized in Figure 1. First, the white noise series is used as input to an ARMA(\(p,q\)) model, transforming the uncorrelated stationary series \(\epsilon_t\) into an autocorrelated stationary series \(a_t\). Second, the stationary autocorrelated \(a\)’s are used as the random shocks in an I(\(d\)) process, transforming the stationary series into a non-stationary series.
In analyzing the data, you try to reverse these steps by differening the observed series \(y_t\) often enough to obtain a stationary series \(a_t\), and then modeling the remaining short-term dependencies using an ARMA model to get to the uncorrelated residuals \(\epsilon_t\). When interpreting the parameters of the ARMA model, it is important to keep in mind how often \(y_t\) was differenced, and whether it thus concerns temporal dependencies between changes in \(y\) (when \(d=1\)), changes in changes in \(y\) (when \(d=2\)), or even more complex aspects of \(y\) (when \(d>2\)).
2 Four examples of the ARIMA(\(p,d,q\)) model
To help you obtain a better understanding of how the stationary and the non-stationary parts of an ARIMA model come together, four specific examples are shown below: an ARIMA(0,1,0) model; an ARIMA(1,1,1) model; an ARIMA(1,1,0) model; and an ARIMA(0,1,1) model.
2.1 ARIMA(0,1,0) model
The simplest non-stationary process from the ARIMA family is the ARIMA(0,1,0) model. This model is also known as a random walk in the time series literature, and can be expressed as
\[y_t = y_{t-1} + \epsilon_t,\]
where the random shocks \(\epsilon_t\) form a white noise sequence. It is a process in which the current score is based on taking the preceding score and adding a random shock to it. This is visualized in
To get from the observed non-stationary series to a stationary series, you need to difference \(y_t\) once in this case, that is
\[\Delta y_t = y_t - y_{t-1} = \epsilon_t.\]
When the observed data are generated by a random walk this means that the (partial) autocorrelations of the differenced data will all be (close to) zero. This shows you that the differenced series is a white noise process, and thus that the observed series is a random walk.
2.2 ARIMA(1,1,0) model
When \(y_t\) has one unit root and random shocks that follow an AR(1) process, you have an ARIMA(1,1,0) model. This implies a white noise sequence is first used as the input for an AR(1) process, that is
\[a_t = \phi_1 a_{t-1} + \epsilon_t\] and this autocorrelated series forms the random shocks given to the unit root process
\[y_t = y_{t-1} + a_t.\] This process is visualized as a path diagram in Figure 3.
The autoregressive component in the random shocks results in temporal behavior that is quite distinct from that of a random walk (when \(\phi_1=0\)). To see this, consider Figure 4, where an ARIMA(0,1,0) and an ARIMA(1,1,0) are constructed using the same random shocks \(\epsilon_t\); the series differ in that the darker series is a random walk (i.e., \(y_t = y_{t-1}+\epsilon_t\)), while the lighter series is a unit root process with AR(1) shocks (i.e., \(y_t = y_{t-1}+a_t\), where \(a_t=\phi_1 a_{t-1} +\epsilon_t\), with \(\phi_1=0.6\)).
The AR part of the latter process implies that the change upward or downward at the previous occasion (i.e., \(a_{t-1}\)) is partly repeated at the following occasion—albeit less strongly—through \(\phi_1 a_{t-1}\). Another way to think about an ARIMA(1,1,0) process is by rewriting it in terms of the infinite sum of past random shocks \(\epsilon\). You can do this algebraically, but it may be more intuitive to consider the visualization of an ARIMA(1,1,0) in Figure 3.
From the path diagram you can be seen that the contributions of past innovations depend on how far back in the past they occurred. For instance, the effect of \(\epsilon_{t-1}\) on \(y_t\) is \((1 + \phi_1)\), the effect of \(\epsilon_{t-2}\) on \(y_t\) is \((1 + \phi_1 + \phi_1^2)\), the effect of \(\epsilon_{t-3}\) on \(y_t\) is \((1 + \phi_1 + \phi_1^2 + \phi_1^3)\), and so on. This implies that for a positive \(\phi_1\), the further into the past a random shock occurred, the stronger its effect becomes. Note that since \(\phi_1\) has to be smaller than 1, the degree by which the effects grow becomes smaller and smaller as we go further backward in time.
2.3 ARIMA(1,1,1) model
To see what it means to have an ARMA(1,1,1), we start again with the white noise sequence \(\epsilon_t\) and see how this serves as input to the ARMA process, in this case an ARMA(1,1). This can be expressed as
\[ a_t = \phi_1 a_{t-1} + \epsilon_t + \theta_1 \epsilon_{t-1}.\] Subsequently, the resulting stationary autocorrelated series \(a_t\) serves as the random shocks given to the integrated process resulting in the observed series \(y_t\), that is
\[ y_t = y_{t-1} + a_t.\]
This process can be visualized as in Figure 5, showing how the white noise process is transformed into the non-stationary series through the intermediate stationary autocorrelated series.
As with the random walk described above, since \(d=1\) you need to difference the observed non-stationary data once to obtain a stationary process. But in contrast to the random walk which resulted in white noise when the observed series were differenced, the differenced data from the current model are not a white noise sequence, but are instead characterized by temporal dependencies, which result from the ARMA(\(1,1\)) process.
You can see this by writing
\[ \Delta y_t = y_t - y_{t-1} = a_t.\]
In ARIMA modeling, you pursue with the differenced stationary series and fit an ARMA model to this to account for the remaining dependencies in the time series. You can check whether all dependencies have been accounted for by checking whether the innovations \(\epsilon_t\) behave as a white noise process (Hyndman & Athanasopoulos, 2021).
It may be interesting to see that you can also write \(y_t\) in terms of past \(y\)’s and current and past \(\epsilon\)’s only, without using $a_t = \(\Delta y_t\). This will give you
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;y_t = y_{t-1} + (\phi_1 \Delta y_{t-1} + \epsilon_t + \theta_1 \epsilon_{t-1})\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = y_{t-1} + \phi_1 (y_{t-1} - y_{t-2}) + \epsilon_t + \theta_1 \epsilon_{t-1}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = (1 + \phi_1) y_{t-1} - \phi_1 y_{t-2} + \epsilon_t + \theta_1 \epsilon_{t-1}.\)
While the latter may give you the impression that you are dealing with an ARMA(2,1) process, the two autoregressive parameters—that is, \((1+\phi_1)\) for the first autoregressive term and \((-\phi_1)\) for the second autoregressive term—will not obey the stationarity constraints needed for an AR(2) process (for details see the article on AR models). Hence, this process is not stationary, and is therefore not an ARMA process; rather, it is a non-stationary member of the more extended ARIMA family.
2.4 ARIMA(0,1,1) model
When an integrated process of order 1 has MA(1) random shocks, this implies an ARIMA(0,1,1) model. Hence, it starts with
\(\;\;\;\;\;\;\;\;\;\;\;\;a_t = \theta_1 \epsilon_{t-1} + \epsilon_t,\)
and subsequently, the series \(a_t\) serves as the random shocks in
\(\;\;\;\;\;\;\;\;\;\;\;\;y_t = y_{t-1} + a_t.\)
This process is visualized in Figure 6, showing how the non-stationary series \(y_t\) emerges from the white noise sequence \(\epsilon_t\).
On closer inspection of the model in Figure 6, you may notice that the intermediate layer \(a_t\) is not doing anything in particular. For simplicity, you could remove it from the path diagram, without losing anything. This can also be shown by plugging the expression for \(a_t\) into the expression for \(y_t\), which gives you
\(\;\;\;\;\;\;\;\;\;\;\;\;y_t = y_{t-1} + \theta_1 \epsilon_{t-1} + \epsilon_t.\)
The ARIMA(0,1,1) model has been considered of particular interest in psychology by Fortes et al. (2004). Their interest in this model is based on expressing the predictable part of \(y_t\), denoted as \(\hat{y}_t\), as a function of the predictable part of \(y_{t-1}\), denoted as \(\hat{y}_{t-1}\), and showing the role of the MA parameter \(\theta_1\) in this. To see this, consider first
\(\;\;\;\;\;\;\;\;\;\;\;\;\hat{y}_t = y_t - \epsilon_t\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;= y_{t-1} + \theta_1 \epsilon_{t-1}.\)
Then replaced the previous score \(y_{t-1}\) on the right-hand side by \(y_{t-1} = \hat{y}_{t-1} + \epsilon_{t-1}\), that is, the sum of the predicable and the unpredictable part. This results in
\(\;\;\;\;\;\;\;\;\;\;\;\;\hat{y}_t = (\hat{y}_{t-1} + \epsilon_{t-1}) + \theta_1 \epsilon_{t-1}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;= \hat{y}_{t-1} + (1+\theta_1) \epsilon_{t-1}.\)
Then latter shows that the predictable part of \(y\) at a certain occasion is equal to the predictable part of \(y\) at the preceding occasion, plus some function of the random white noise shock at the preceding occasion. The weight given to the latter is \((1+\theta_1)\), showing a critical role for the MA parameter \(\theta_1\).
Fortes et al. (2004) interpreted the parameter \(\theta_1\) in an ARIMA(0,1,1) process as balancing two forces, that is:
adaption to external forces that impacting the process; this force is dominant when \(\theta_1\) is close to 0 (i.e., when the process approximates a random walk), because the process is then moved in the direction of the random shock, which has a strong and lasting effect on the process; and
preservation of the previous state; this force is dominant when \(\theta_1\) is close to -1, because then the current prediction \(\hat{y}_t\) will be close to the preceding prediction \(\hat{y}_{t-1}\), and random shocks are largely erased after one occasion, with only a small residue of it persisting.
To see how this preserving effect of a \(\theta_1\) close to -1 plays out, consider the two processes in Figure 7. Both are instances of an ARIMA(0,1,1) process with \(\theta_1=-0.9\), which implies that preservation is relatively strong here. These are non-stationary processes due to their unit root (i.e., they have \(d=1\)), which you can see from the fact that they tend to move away from each other as time continues. Yet, they may easily seem stationary in the shorter run due to their strong degree of preservation.
Marina, Didier and Grégory are interested in the degree to which momentary self-esteem and physical-self experiences are characterized by adaption versus preservation. They obtained measures of self-esteem and five aspects of physical-self twice daily from seven participants for 228 days.
When considering the autocorrelation functions (ACFs) of the 42 time series, Marina and her colleagues noticed that many of these had significant positive autocorrelations for more than 10 lags. They believed this was most likely the result of a unit root process, and decided to use first-order differencing to make the series stationary. Subsequently, they used the ACFs to determine the order needed for the MA model, and considered the significance of the parameters to see whether the order was not too large. They concluded that most of the differenced series could be well described with an MA(1); for 5 series an MA(2) was needed, and for 2 series an MA(3) was necessary.
The \(\theta_1\) parameters (or the sums of the various \(\theta\) parameters in case of \(q>1\)) that Marina, Didier and Grégory found, varied between -0.33 and -0.88; hence, neither full adaption nor full preservation was found for any of the series. Yet, Marina and her colleagues noticed two interesting regularities. First, 5 of the 7 individuals were characterized by ARIMA(0,1,1) models for all 6 processes; the other two participants were characterized by higher-order MA components for multiple series. Second, the size of \(\theta_1\) (or the sum of the various \(\theta\) parameters in case of \(q>1\)) was relatively similar within a person across the different variables, while it differed across individuals. Marina and her colleagues interpret these findings as suggesting that there is a certain regularity within a person, which may point to an individual differences characteristic.
This example is based on Fortes et al. (2004).
3 How to decide on the orders of an ARIMA(\(p,d,q\))
When considering an ARIMA(\(p,d,q\)) model for your data, the first step is to decide on the order \(d\), that is, the number of times you need to difference the data to obtain a stationary series. The most appropriate way to determine whether or not there is a unit root that requires you to difference the data, is through using a stationarity test. While the autocorrelation function (ACF) may also provide some indication—that is, slowly decaying autocorrelations—this may also be indicative of other forms of non-stationarity, such as a deterministic trend. Since deterministic and stochastic trends need to be handled differently in your analysis, it is important to make sure you have a stochastic trend before you start to difference to remove the non-stationarity.
If the test you use points out that the series \(y_t\) are characterized by a unit root, you have to difference the data and then investigate again whether the differenced series \(\Delta y_t\) is stationary or that it is characterized by a second unit root. In the latter case, you differ the differenced series, and submit the resulting \(\Delta^2 y_t\) series again to a stationarity test.
When your differenced series is stationary, you can investigate with the ACF whether it is characterized by further dependencies over time, or that it behaves like white noise. When there are dependencies, you can determine the orders of \(p\) and \(q\) in the same way as when fitting an ARMA(\(p,q\)) model to a stationary observed series; you can read more about this in the article about ARMA models.
Instead of taking all these steps yourself, you can also make use of an automated procedure for finding the orders of an ARIMA(\(p,d,q\)) model. A procedure for this in R is described in Chapter 9 of Hyndman & Athanasopoulos (2021).
4 Think more about
Sometimes you need to difference a series multiple times to obtain a stationary series. For economic data—which often serve as examples in the time series literature—it is often not necessary to difference more than ones or twice to obtain a stationary series. However, when you are dealing with physiological data, these often are clearly non-stationary, and differencing once or twice may not be enough. Such data may then require another approach, like accounting for the underlying trajectory with a flexible trend based on splines, or by including observed predictors that can help to capture the non-stationarity in the process. An example of the latter is when you use posture and movement to account for the non-stationarity in heart rate.
When using differencing to obtain a stationary series, the implication of this is that the subsequent ARMA model concerns the changes (or changes in changes, if you have differenced twice) in \(y\) over time. For instance the autoregression in an ARIMA(1,1,0) refers to how the change in \(y\) between occasions \(t-1\) and \(t\) is predicted (or affected) by the change in \(y\) between occasions \(t-2\) and \(t-1\).
The ARIMA(\(p,d,q\)) model presented in this article can be considered a special case of the even broader family of processes known as the fractionally integrated autoregressive moving-average model, denoted as either ARFIMA or FARIMA. In the ARFIMA model, \(d\) is no longer restricted to integer values (i.e., 0, 1, 2, etc.), but it can take on any non-negative real value (i.e., \(d \ge 0\)); this is why it is referred to as fractionally integrated (Hosking, 1981). When \(0<d<0.5\), the process is stationary; when \(0.5\le d <1\), the process is non-stationary, but becomes stationary after differencing.
ARFIMA processes are long memory processes, meaning that the effect of a random shock tends to die out very slowly, or to not die out at all depending on whether \(d\) is smaller than 0.5 or not. In both cases, autocorrelations remain high across many lags. This long-memory feature of ARFIMA processes been considered of particular interest in the field of cognitive psychology, although Wagenmakers et al. (2004) have critiqued the way researchers were establishing the presence of fractional integration. Specifically, they showed that the patterns in the data could often be just as well explained by stationary processes, such as an ARMA(1,1) process. While standard stationarity tests are not designed to detect fractionally integration, there are alternative diagnostic methods available; you can read more about these in Torre et al. (2007).
5 Takeaway
The ARIMA model occupies a central place in the \(N=1\) time series literature and combines a stationary ARMA component with a nonstationary integrated component. Together these components allow to represent individual time series that are not stable around a fixed mean but instead show both moment-to-moment autocorrelation and longer-term trends or drifts. Stationary ARMA models form a subclass within the broader ARIMA family. Another well-known member of the ARIMA family is the random walk.
To determine whether you are dealing with a non-stationary member from the ARIMA family, you can make use of stationarity test, also referred to as a unit root test. When the test points out that there is a unit root, you difference the data once and submit the differenced data again to a test. When you have differenced enough times to obtain a stationary series, you can further investigate the short-term dynamics of this process using ARMA modeling.
Failing to account for non-stationarity in your data is likely to lead to a distorted picture of the short-term dynamics that govern the series. But additionally, it is also important to determine what kind of non-stationarity is present and whether this requires differencing or another approach; you can read more about this in the article about deterministic versus stochastic trends.
When you have to difference the data to make the series stationary, you have to keep this in mind when discussing any further results based on the differenced series. For instance, the autocorrelations of the differenced series will concern autocorrelations in the changes in \(y_t\). Similarly, an ARMA model that is used for the differenced series, concerns the changes in the observed variable \(y\). Moreover, when you need to difference twice, your interpretation must be in terms of the changes in changes. Having to difference more than twice makes it very challenging to attach substantive interpretations to the results for the subsequent analyses.
6 Further reading
We have collected various topics for you to read more about below.
- [Multilevel AR models]
- [Dynamic structural equation modeling]
- [Replicated time series analysis]
- Kalman filter
- State-space model
- Stationarity test
- [Dynamic structural equation modeling]
References
Citation
@article{hamaker2026,
author = {Hamaker, Ellen L. and Hoekstra, Ria H. A.},
title = {Autoregressive Intergrated Moving-Average {(ARIMA)} Model},
journal = {MATILDA},
number = {2026-01-02},
date = {2026-01-02},
url = {https://matilda.fss.uu.nl/articles/arima-model.html},
langid = {en}
}