Deterministic versus stochastic trends
This article is about different kinds of trends over time that may be present in you data. You can think of a trend as a slow, long-term gradual movement in your data, that results in systematic change in a certain direction over time—typically upward or downward, although the direction can also change. While trends may not be of specific interest to you, you nevertheless have to account for them in your analysis: Failing to account for them may result in identifying dynamic patterns that are not reflective of the actual dynamics of the process, but rather of the long-term trend that characterizes the process.
In the N=1 time series literature two broad classes of trends are distinguished: deterministic trends versus stochastic trends (Hamilton, 1994; Ryan et al., 2025). While these trends can look similar in the short run, their long-term behavior is quite distinct. Knowing about the difference between these two categories of trends is important, because they require different strategies when analyzing the data, and using the wrong method may fail to eliminate some of the dependencies due to the trend or introduce dependencies that are actually not characteristic of the process under investigation; again, this will distort subsequent results in your analysis (Hamilton, 1994).
Below you can read more about: 1) deterministic trends, stochastic trends, and the combination of the two; 2) what happens when you use detrending or differencing to handle a deterministic or stochastic trend in your data; 3) what happens when you use detrending or differencing when you have a combination of a deterministic and stochastic trend in your data; and 4) how to determine whether you are dealing with a deterministic or a stochastic trend in you data.
1 Some examples of a deterministic or stochastic trend
There are two distinct categories of trends in the time series literature: deterministic trends and stochastic trends. A simple example of the first is a linear increasing or decreasing trend over time. Typical of a deterministic trend is that the trend is a direct function of time \(t\). In contrast, stochastic trends are the result of a unit root process. Examples of this are a random walk (with or without drift), or specific members the extended family of autoregressive integrated moving-average (ARIMA) models.
Below, a brief description of each category is provided; for more details you can read the articles that deal specifically with these trends. Subsequently, the combination of a deterministic and stochastic trend is described.
1.1 Deterministic trends
There is a wide variety of deterministic trends that could be present in your data, the most basic of which is the linear trend over time. This can be expressed as
\[ y_t = b_0 + b_1 t + \epsilon_t, \] where \(b_0\) is the intercept, \(b_1\) is the slope representing the expected change from one occasion to the next, and \(\epsilon_t\) is the random residual at occasion \(t\) (i.e., the temporal deviation from the underlying deterministic trend). These residuals form a white noise series over time, which means all its autocorrelations are zero.
Instead of having white noise residuals, you may have autocorrelated residuals instead. A general way to express this for a linear trend is
\[ y_t = b_0 + b_1 t + a_t, \] where—as before—\(b_0\) and \(b_1\) are the intercept and slope of the underlying linear trend. But now the temporal deviation from this trend, \(a_t\), is a member of the family of autoregressive moving-average (ARMA) models. The most common version of this is the first-order autoregressive (AR(1)) model, which implies \(a_t = \phi_1 a_{t-1} + \epsilon_t\).
Instead of a linear trend, you may consider higher-order polynomials or other trends that allow for more variation in the underlying trajectory over time. You can read more about this in the article about deterministic trends.
1.2 Stochastic trends
Stochastic trends are the result of a unit root process, of which the random walk is the most simple version. A random walk can be expressed as
\[ y_t = y_{t-1} + \epsilon_t,\] where \(\epsilon_t\) is a random shock that comes from a white noise process, with mean zero and variance \(\sigma_{\epsilon}^2\).
Other versions of a unit root process come from the ARIMA(\(p,d,q\)) family, where the integer \(d\) denotes the number of unit roots of such a process. This indicates the number of times you have to difference the original series \(y_t\) to obtain a stationary series. Subsequently \(p\) and \(q\) indicate the orders of the autoregressive and moving-average components in the model.
For instance, an ARIMA (\(p,1,q\)) model can be written as
\[ y_t = y_{t-1} + a_t,\]
where \(a_t\) is an ARMA(\(p,q\)) process, with innovation variance \(\sigma_{\epsilon}^2\). By differencing once, you get \(\Delta y_t = y_t - y_{t-1} = a_t\), which is the stationary ARMA process that can be further modeled using the right orders \(p\) and \(q\) to ultimately account for all the dependencies in the data.
1.3 Combination of a deterministic and stochastic trend
In the time series literature, there are also models that combine a deterministic trend with a stochastic trend. The most fundamental version of such a model is known as a random walk with drift, which can be expressed as
\[ y_t = \delta + y_{t-1} + \epsilon_t.\]
The constant \(\delta\) is known as the drift parameter. It is added at every occasion, but it is also retained from every past addition through the random walk component. Hence, if you can assume that the process started at \(t=0\), then drift creates a deterministic trend \(\delta t\), similar to the deterministic linear trend discussed above.
To see this, it helps to consider the expressions for \(y\) at subsequent occasions, given a particular starting point \(y_0\):
\(\;\;\;\;\; y_1 = \delta + y_0 + \epsilon_1\)
\(\;\;\;\;\; y_2 = \delta + y_1 + \epsilon_2 = 2\delta + y_0 + \epsilon_1 + \epsilon_2\)
\(\;\;\;\;\; y_3 = \delta + y_2 + \epsilon_3 = 3\delta + y_0 + \epsilon_1 + \epsilon_2 + \epsilon_3\)
\(\;\;\;\;\; y_4 = \delta + y_3 + \epsilon_4 = 4\delta + y_0 + \epsilon_1 + \epsilon_2 + \epsilon_3 + \epsilon_4\)
\(\;\;\;\;\; ...\)
\(\;\;\;\;\; y_t = \delta + y_{t-1} + \epsilon_t = \delta t + y_0 + \epsilon_1 + \epsilon_2 + \dots + \epsilon_t\)
This shows that the random walk with drift \(\delta\) can be rewritten as a deterministic trend \(\delta t\) plus a random walk (without drift). Hence, a random walk with drift combines a deterministic trend with a stochastic trend.
Jacky is interested in the process of learning a new language. She decides to focus on vocabulary, and sets up a study in which she tries to learn ten new words every day, while keeping track of the total number of words that she knows every day, including the words she studied on previous days.
Jacky wonders what would be a reasonable model for the learning curve she expects to see in the data: Should she think of this as a deterministic growth curve with some random fluctuations around it? Or is a random walk, with or without drift, a more reasonable option to capture this process that is characterized by both retention and forgetting?
Jacky thinks that from a substantive point of view, the latter makes more sense. But she also realizes that such a process has quite distinct features, and that she should check whether these features are present in her data by using designated stationarity tests.
2 Detrending or differencing?
To handle deterministic trends and stochastic trends, two major strategies exist: detrending and differencing. It is important to use the correct strategy, as they can lead to quite different results (Hamilton, 1994; Hyndman & Athanasopoulos, 2021; Ryan et al., 2025).
In this section you can read about what happens when you use the correct strategy, that is: detrending for a deterministic trend, and differencing for a stochastic trend. Subsequently, you can read about what happens when you use the incorrect strategy, that is: differencing for a deterministic trend, and detrending for a stochastic trend.
2.1 Doing it right: Detrending a deterministic trend
Historically, the advice in the time series literature for handling deterministic trends was to detrend the data. This means that as a first step, you use regression analysis with \(t\) or functions of \(t\)—for instance, \(t\), \(t^2\) or log(\(t\))—to predict \(y_t\). Using the predictions \(\hat{y}_t\) (i.e., the fitted values), you can obtain the residuals (i.e., \(y_t - \hat{y}_t\)), which are referred to as the detrended data, as the systematic time component is removed. These detrended data are then used in a second step for further analysis.
As an example, consider a linear trend. You will fit
\[ \hat{y}_t = \hat{b}_0 + \hat{b}_1 t.\]
Subtracting this from the observations results in \(y_t - \hat{y}_t = \hat{a}_t\). The residuals \(\hat{a}_t\) can then be used for further analysis, for instance, to determine what kind of ARMA structure is present in them. You can use a similar approach for other deterministic trends, such as a quadratic trend or higher-order polynomials, or the log, exponential, or square root of time.
Nowadays it is probably more common to model the deterministic trend and the autocorrelation structure in a single step, rather than in two steps. Such joint analyses are possible with techniques like the Kalman filter based on the state-space model. Strictly speaking, this approach is not based on detrending; rather, it can be described as modeling a trend-stationary process with ARMA errors or estimating a regression with ARMA residuals. You can read more on this in the article about deterministic trends.
2.2 Doing it right: Differencing a stochastic trend
To handle a stochastic trend resulting from a unit root, the typical approach is to difference the data. This is based on subtracting the previous observed score form the current observed score, to get \(\Delta y_t = y_t - y_{t-1}\). The goal of differencing is to get a series that is stationary, which means that its mean and variance (and other characteristics like skewness and kurtosis) are invariant over time. Sometimes it is necessary to difference more than once to obtain a stationary series.
There are various processes that are characterized by a stochastic trend, all of which can be considered special cases of the ARIMA model. To see how differencing works, consider an ARIMA (\(p,1,q\)) process, which was presented above: \(y_t = y_{t-1} + a_t\) where \(a_t\) is an ARMA(\(p,q\)) process. If you difference the series \(y_t\) once, you get
\[ \Delta y_t = y_t - y_{t-1} = a_t. \] This shows that the differenced series is the ARMA (\(p,q\)), which you can further analyze to obtain the parameters of the process.
As differencing does not involve any estimation of parameters, it does not lead to issues regarding efficiency like you saw above with detrending. However, when there are missing values for the observed variable \(y\), this is a problem: It implies that when there is a missing observation at a particular occasion \(t\), you cannot compute the difference score \(\Delta y_t\), nor the next one \(\Delta y_{t+1}\), as both involve \(y_t\). This implies that the modeling procedure that you use for the differenced series should be able to handle missing data.
An elegant option for this is to use a Kalman filter approach. This approach requires you to specify your ARMA model state-space format. Alternatively, you could also specify the entire ARIMA model for the observed series (rather than the ARMA model for the differenced series) in state-space format and use the Kalman filter to estimate its parameters. In that case, the difference scores are obtained as part of the model rather than through a pre-processing step. This one-step approach is probably preferable, as it will use all the available data.
2.3 Doing it wrong: Differencing a deterministic trend
When you use differencing to handle a deterministic trend, this may lead to unwanted results. To see this, consider a model that has a linear trend with autocorrelated residuals, that is
\[ y_t = b_0 + b_1 t + a_t,\]
where \(a_t\) can be any ARMA(\(p,q\)) process.
When taking differences here, you get
\(\;\;\;\;\;\;\;\;\;\;\;\Delta y_t = y_t - y_{t-1}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;= (b_0 + b_1 t + a_t) - (b_0 + b_1 (t-1) + a_{t-1})\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;= b_1 + a_t - a_{t-1}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;= b_1 - \Delta a_t.\)
What this shows is that when you consider the differenced series \(\Delta y_t\), this has a mean \(b_1\) (i.e., the mean is equal to the linear slope in the original series). But instead of having ARMA(\(p,q\)) residuals, its residuals are now a differenced ARMA(\(p,q\)) process, and the dynamic structure of the differenced series \(\Delta y_t\) is much more complicated than the dynamic structure of the residuals around the trend (i.e., \(a_t\)).
Because the above is still a bit abstract, it may help to look at a specific example. Suppose \(a_t\) is a first-order autoregressive (AR(1)) process, which can be expressed as \(a_t = \phi_1 a_{t-1} + \epsilon_t\). In that case the differenced series \(\Delta a_t\) equals
\(\;\;\;\;\;\;\;\;\;\;\;\Delta a_t = (\phi_1 a_{t-1} + \epsilon_t) - a_{t-1}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;= (\phi_1 - 1) a_{t-1} + \epsilon_t.\)
At first, you may think of this as an AR(1) process with autoregressive parameter \((\phi_1 - 1)\); however, this is incorrect, because for something to be an AR process, the outcome on the left must be the same variable as the predictors on the right (albeit at a different occasion). Here you have a change score on the left (i.e., \(\Delta a_t\)), which is predicted by a different lagged variable (i.e., \(a_{t-1}\), not \(\Delta a_{t-1}\)). Hence, the resulting expression is not a member of the ARMA family, and using ARMA modeling for the differenced series will therefore lead to distorted picture of the underlying short-term dynamics.
Even when there are no autocorrelations in the residuals—meaning the residuals \(a_t\) are just a white noise sequence \(\epsilon_t\)—using differencing to get rid of a linear trend is not appropriate. It results in
\(\;\;\;\;\;\;\;\;\;\;\;\Delta y_t = y_t - y_{t-1}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;= (b_0 + b_1 t + \epsilon_t) - (b_0 + b_1 (t-1) + \epsilon_{t-1})\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;= b_1 + \epsilon_t - \epsilon_{t-1}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;= b_1 - \Delta \epsilon_t.\)
The series \(\Delta \epsilon_t\) will be characterized by a non-zero autocorrelation at lag 1, as it correlates \(\epsilon_t - \epsilon_{t-1}\) with \(\epsilon_{t-1} - \epsilon_{t-2}\); this correlation will be \(\rho_1= -0.5\). All subsequent autocorrelations will be zero. Hence, the differenced series will show characteristics of a first-order moving average (MA) process, with an MA parameter of 1 (which makes it non-invertible; you can read more about this in the article about ARMA processes.
These examples clearly illustrate that when there is a deterministic linear trend, differencing is in general not the right strategy to tackle this, because it tends to result in temporal dependencies in the differenced data that are not representative of the temporal dependencies in the raw data. There is one exception to this, which you can read about below: It is when the residuals actual form a random walk.
For reasons of simplicity, the examples above only contain linear trends. But this has the particular characteristic that the expected change between any two consecutive occasions will be of the same size \(b_1\). Hence, when you take the differences, this linear trend parameters will be nicely represented by the constant or mean that you get for the differences]d series. Differencing series that are characterized by non-linear trends—when the size of the deterministic change changes over time—will give less clean results in this regard; this will further complicates the interpretation of the residual part.
2.4 Doing it wrong: Detrending a stochastic trend
You may now wonder what happens when you use detrending when there is actually a stochastic trend in your data resulting from a unit root process. This is harder to show algebraically, as the stochastic trend is not determined by any particular parameter, but rather results from coincidence; it is generated by a random sequence of shocks and is therefore inherently random.
While using a deterministic trend may be helpful to locally describe such data—showing you whether there is an increase or decrease, or perhaps a decrease first which levels off and is followed by an increase—this would not capturing any underlying mechanism that actually generated the data. Moreover, if you want to make forecasts for the next few time points, using a deterministic trend when there is actually a stochastic trend, your predictions will be off, and do not involve the right amount of uncertainty (Hamilton, 1994; Hyndman & Athanasopoulos, 2021; Ryan et al., 2025).
3 When a linear and stochastic trend are combined
Above, you have seen an example of the combination of a deterministic and stochastic trend in the form of a random walk with drift, that is, \(y_t = \delta + y_{t-1} + \epsilon_t\). More generally, you can have an ARIMA(\(p,1,q\)) model, which can be expressed as \(y_t = \delta + y_{t-1} + a_t\), where \(a_t\) may be characterized by autocorrelations due to a stationary ARMA(\(p,q\)) process.
For such models, you can handle the deterministic trend that stems from the drift \(\delta\) either through differencing or detrending. Below, you can see how each of these strategies works for a process like this.
3.1 Differencing
You can handle the linear deterministic trend, which results from having drift in a unit root process, through differencing, that is
\(\;\;\;\;\;\;\;\;\;\;\;\Delta y_t = y_t - y_{t-1}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;=(\delta + y_{t-1} + a_t) - y_{t-1}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;= \delta + a_t.\)
This shows that when the differenced series has a non-zero mean, this indicates that there is drift in the original series. The remaining fluctuations in the differenced series form a stationary process that can contain further autocorrelations due to an ARMA structure.
3.2 Detrending
Another way to handle the linear deterministic trend in a random walk with drift is through detrending the data using the linear trend \(\delta t\). If you subtract this trend from both sides, you get
\(\;\;\;\;\;\;\;\;\;\;\;y_t - \delta t = \delta + y_{t-1} + a_t - \delta t\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;= \delta - \delta + (y_{t-1} - \delta (t-1)) + a_t.\)
If you replace \(x_t = y_t - \delta t\) and \(x_{t-1} = y_{t-1} - \delta (t-1)\), this expression becomes
\(\;\;\;\;\;\;\;\;\;\;\ x_t = x_{t-1} + a_t,\)
which you can recognize as a unit root process (or an ARIMA(\(p,1,q\)) without drift. Hence, using linear detrending gets rid of the drift in a unit root process with drift, which forms the deterministic trend in \(y_t\). Note however that to get rid if the stochastic trend, you still need to difference the detrended data, that is
\(\;\;\;\;\;\;\;\;\;\;\Delta x_t = x_t - x_{t-1} = a_t.\)
3.3 To conclude
In general, you should use differencing to handle stochastic trends, and detrending (or modeling the trend) to handle deterministic trends. When you have a process in which both a stochastic and a deterministic trend that is linear are present, you can handle the linear determnistic trend either through detrending or through differencing the data.
When you use detrending, you still need to difference the detrended data to get rid of the stochastic trend and obtain the stationary process \(a_t\) that can be used to investigate short-term dynamics. Hence, you might as well difference the original data in this case, as it gets rid of both the deterministic and stochastic trends in one action.
4 How to detect a deterministic versus a stochastic trend
Both a deterministic and a stochastic trend result in an autocorrelation function (ACF) that is characterized by large and slowly decaying autocorrelations. Differencing both kind of processes will often—although not always—lead to a series that has an ACF that is no longer characterized by this feature. This implies that you cannot use the ACF of the original data and the differenced data to distinguish between these two kinds of trends.
Instead, you can make use of specific tests that have been developed to determine whether there is a stochastic trend, or whether there is a stable mean or linear trend with stationary residuals around it; these are known as stationarity tests.
When using these, it is important to keep in mind that each of these tests only tests for a specific form of non-stationarity (or absence of stationarity), and they do not provide general answers. For instance, when you do a test and establish that the series do not seem to be characterized by a unit root, this does not imply you can say the series are stationary; there may be other forms of non-stationarity present in the data, such as a deterministic trend or a repetitive pattern like a sine or block wave. Moreover, the variability and/or dynamics may change over time, which would also imply the series are non-stationary.
5 Think more about
While the procedures for handling trends in your data are quite well established in the time series literature, it is important to think about the consequences of your actions when interpreting the results. First, whenever you are using a deterministic trend in you analyses, you should keep in mind that—in principle—these trends are just local descriptions: While they may capture specific developmental trajectories that are present for the time that the process was observed, you should be (extremely) cautious in trying to extrapolate this trend beyond the duration of the study, especially if you are trying to make [forecasts] further into the future.
Second, when you are dealing with physiological data like heart rate or skin conductance, it may be difficult to handle their non-stationarity effectively using the techniques described above. If you use differencing, you may need to difference many times before the data are stationary. This is likely to result in a variable that is hard to interpret from a substantive point of view. Such data may therefore benefit from another approach, like detrending with a high-order polynomial, accounting for the underlying trajectory through splines, or including predictors like physical posture and movement. Sometimes it may be helpful to aggregate the data over time, to make it less granular and thereby removing some of the temporal dependence that is present when data are obtained very frequently.
Third, whatever method you use—whether detrending, differencing, including observed predictors, or time aggregating—it is important to acknowledge this in the interpretation of further results. For instance, when using detrending, you have to realize that your subsequent analysis are concerned with autocorrelation in the temporal deviations from the modeled deterministic trend. Similarly, when analyzing differenced data, you are dealing with autocorrelation in the changes from on one occasion to the next (or changes in the changes, when the data were differenced twice). This may make it difficult to arrive at meaningful interpretations from a substantive point of view, and it may be challenging to present the results in an intuitively appealing way.
Fourth, these challenges become even more relevant when you move to the multivariate case, and your interest is specifically in the relatedness between two or more processes over time. It makes a difference whether the temporal deviation from a linear trend in one variable depends on the temporal deviation from the trend in another variable at the preceding time point, or that the change in one variable between two occasions depends on the preceding change in another variable. For pure predictive purposes, this may be less of an issue, but when your goal is to describe or to involve in causal inference, this certainly is something to be careful about.
Finally, in addition to deterministic and stochastic trends, you may also want to consider other deterministic patterns that may be present in the data, most notably, cyclic patterns. These presents as systematic, predictable changes that repeat themselves around a baseline. Examples are the circadian rhythm, week patterns, month cycles, and annual periodicity. While these may also be accounted for using determinstic functions of time, they are typically not referred to as trends but rather as cycles or repeating time structures.
6 Takeaway
Trends are gradual changes that in the long-run result in a systematic movement, typically either up or down although the direction may change along the way. Trends can be either deterministic or stochastic. To handle these trends when analyzing your data, different strategies should be used:
Deterministic trends should be handled by detrending the data or by including the deterministic trend in your model;
Stochastic trends should be handled by differencing the data.
When using the incorrect method (i.e., differencing for a deterministic trend, or detrending for a stochastic trend), this will typically result in a failure to adequately account for the dependencies that stem from the trend. As a result, the analyses that you do to study the dynamics that characterize the process after the trend has been accounted for, will be contaminated by dependencies that are partly left or even introduced by using the wrong technique.
7 Further reading
We have collected various topics for you to read more about below.
- Threshold autoregressive (TAR) models
- Time-varying autoregressive (TV-AR) models
- [Markov-switching autoregressive (MS-AR) model(ms-ar-model.qmd)
- [Multilevel AR models]
- [Dynamic structural equation modeling]
- [Replicated time series analysis]
- Kalman filter
- State-space model
- Stationarity test
- [Dynamic structural equation mdoeling]
References
Citation
@article{hamaker2026,
author = {Hamaker, Ellen L. and Hoekstra, Ria H. A.},
title = {Deterministic Versus Stochastic Trends},
journal = {MATILDA},
number = {2026-01-02},
date = {2026-01-02},
url = {https://matilda.fss.uu.nl/articles/deterministic-vs-stochastic-trend.html},
langid = {en}
}