Description, prediction, causation
This article describes how research questions can be categorized as pertaining to one of three different scientific goals: Descriptive research, predictive research, or causal research. Having a clear understanding of your scientific goal is important as it shapes the subsequent steps in your research process, such as decisions regarding study design, choice of variables, measurement, and statistical analyses (Shmueli, 2010). In this article, we also relate the three different scientific goals to important methodological considerations in ILD research.
Note that one scientific goal is not more important or better than the other, and there is no prescribed scientific goal for any particular application. Rather, this article describes what these different scientific goals are, what these goals imply in terms of practical application, and what are some major methodological concerns that are associated with each goal. Ultimately it is up to you to decide which goal is of interest for your research project.
In this article, you will find an explanation of the three scientific goals: 1) description, 2) prediction, and 3) causation.
1 Description
Generally speaking, descriptive research involves summarizing the data that you have in a manner that you find insightful for a particular purpose. The summary could be a quantitative summary of a single variable, such as the mean stress level that a participant experiences across a day and the variability in their stress level across this period, or a quantitative summary concerning multiple variables, such as the correlation between stress and another variable or a partial correlation (Hernán et al., 2019). Furthermore, you can compute such quantitative summaries for multiple individuals and then inspect or model individual differences in, for example, means, variances, or autoregression. Descriptive research can also involve visual summaries, such as the various network visualizations that represent associations between (a large number of) variables (Stadel et al., 2024).
Characteristically, in descriptive research, researchers do not aim to make claims about the origin of particular relationships (i.e., why certain variables are related). Descriptive research is also not well-suited for prediction purposes such as screening, selecting, or identifying individuals and development trajectories, and forecasting future events (Hamaker et al., 2020).
Clara wants to know the daily average stress levels of multiple individuals. She wants to use this information to help individuals gain more insight into what their stress levels look like throughout the week (Leertouwer et al., 2022). Therefore, she collects ecological momentary assessment (EMA) data in which individuals report on their stress levels multiple times per day, and then computes the average stress level per individual, per day. By providing personalized feedback, the individuals can compare their measured daily stress levels with how they retrospectively remember their daily stress level. As a follow-up study, Clara considers computing other descriptive summaries of the EMA data, such as the stress variability across a day and the day-to-day autocorrelation, which represents the stability of stress levels.
Simon aims to study if feelings of positive affect are related to the context that a person is in, such as being engaged in “sport” or “hobby”, or being in the presence of a “friend” or “partner” (Heininga et al., 2019). This information is useful to improve insight into in which contexts particular participants are most happy, stressed, etc. Research methods like experience sampling method or ambulatory assessment are well-suited to study this question as we obtain measurements of time-varying constructs in real-life contexts. Simon can use the ILD from these methods to quantify the relation between positive affect and different contexts using a point-biserial correlation and regression analysis, or, alternatively, visualize measured levels of positive affect in a context using plots like radar charts.
2 Prediction
At a general level, predictive research is concerned with estimating new outcomes based on previous and current information (Shmueli, 2010). In the context of ILD, the new outcomes are typically future values of a particular variable for a particular person, and hence this goal is also referred to as [forecasting] (Hyndman & Athanasopoulos, 2021). The previous and current information are the ILD that you have observed up to a specific point in time.
There are many different applications for predictive research, and the characteristics of these applications determine to a large degree how one can best go about predictive modeling. For example, one can perform predictive research for the purpose of relapse prevention. In this case, a researcher is interested in monitoring patients with psychopathology and then using a statistical model to predict, based on current and past patient measurements, how likely it is for a relapse to occur in the next day or week. That is, a researcher aims to continuously make relatively short-term and [within-person] predictions. If instead a researcher is interested in the selection of patients who are likely to respond well to a particular treatment program, prediction happens at the between-person level and might instead be focused on longer-term outcomes, rather than the short-term.
Regardless of the exact application, the goal here is to make predictions as accurate as possible, not to explain why these particular outcomes occur. An important methodological consideration in this regard is something called out-of-sample prediction performance. That is, while statistical prediction models are fitted based on observed data (the sample), the purpose of prediction is to make predictions about future observations that are not on your sample. Of concern here is that when fitting a statistical model on observed data, it is (too) well-optimized to make predictions regarding the observed data, but fails to produce accurate predictions for new, future observations. Therefore, when the goal is to do prediction, researchers should take this issue into account in their statistical analysis strategy with tools such as cross-validation (Bulteel et al., 2018; James et al., 2017).
Sam is a psychologist who wants to build a tool for clinicians to help them detect symptom shifts like a depressive episode in patients ahead of time. That is, they want to find early-warning signals, specifically particular patient variables or patterns in data that can predict ahead of time how likely a symptom shift is to occur in the next two or three days. Sam uses ecological momentary assessments of four months (with five measurements per day) to investigate if sudden increases or decreases in self-esteem and anxiety levels, but also variables like the weather and time of year, predict future changes in depressive symptomatology in depressed patients.
However, Sam is not limited to absolute levels of predictors to predict a future outcome. With ILD, researchers can also use the dynamics of a process as a basis for prediction. For example, the phenomenon of “critical slowing down” states that an increase in inertia (typically captured with an autoregressive parameter) forgoes psychopathological episodes Helmich et al. (2024). To use this theory for prediction, Sam would thus first have to model the dynamics of a process, and then use this information to make a prediction about the future probability of a symptom shift occurring.
3 Causation
Generally, causal research is research that answers a question of the form: “What will happen to an outcome if an individual scores high on a particular variable \(X\) versus if they score low on the variable \(X\)?” For example, “How will someone’s happiness change if they do sports one time per week versus if they do sports four times per week?” or “What will someone’s feelings of affection for their romantic partner be if an individual experiences struggles at work compared to a world where they do not experience any struggles at work?”
The gold standard for studying causal questions is the randomized controlled trial. Due to the random assignment of individuals across treatment arms (i.e., levels of the treatment \(X\)) before a treatment is given, average differences between groups after the treatment can only be explained by the treatment itself, which allows you to answer a causal research question. However, there are concerns about the ecological validity of RCTs, which has led to the development of various experimental designs within ILD research, such as micro-randomized controlled trials (MRTs) (Neubauer et al., 2025) and single case experimental designs (SCEDs) Epstein & Dallery (2022). These designs combine the strength of obtaining longitudinal measurements of constructs in real-life contexts, with the advantages of random assignment. While the presence of randomization in ILD is a major advantage from a causal inference perspective, sometimes randomization cannot occur due to practical or ethical constraints. Hence, there is also a large body of methodological literature that provides tools for performing causal research with nonexperimental ILD.
There are different perspectives on the purposes of causal research, and different analysis methods associated with these. A currently popular framework is the potential outcomes framework. It frames the overarching purpose of causal research as informing the development of interventions to improve people’s lives and decision-making surrounding these interventions. For example, if your goal is to aid clinicians in the field to make decisions about whether or not to implement a particular intervention, this potential outcomes approach to causal research might suffice.
In contrast, there is the [dynamic modeling] perspective to causation (also referred to as “explanation”, the “mechanistic view” on causal inference, or “understanding”). Here, the primary interest is the exact underlying causal mechanism through which an exposure affects an outcome is of primary interest. Concretely, this perspective implies a strong interest in important mediators through which the presumed cause impacts the outcome, as well as biological or even biochemical processes that take place in a particular unit over time. If, for example, the goal of your study is to improve understanding of the mechanism through which an exposure affects and outcome (i.e., why does \(X\) help to improve \(Y\)?), a dynamic modeling perspective to causal research might be more fruitful.
Regardless of these different research designs and different perspectives on causal research, a detailed and precise description of which causal effect is of interest is critical (Hernán, 2018). This includes, amongst other things, specification of (a) the timescale at which the causal effect of interest plays out, (b) the outcome, including how it is measured and when it is measured; and (c) the target population, describing which specific group of individuals you attempt to make inference about, and when they become eligible for study. Without such details, the specific causal effect one is interested in studying is ambiguous. This subsequently complicates the interpretation of results and the evaluation of whether the results actually provide an answer to the causal question.
4 Think more about
The distinction between descriptive, predictive, and causal research is important as it influences for what practical purposes the results of a study can be used (e.g., forecasting, selecting individuals, or informing interventions/treatments). Nonetheless, in practice, a study’s scientific goal can be ambiguous (Grosz et al., 2020). For example, in the social sciences, students are commonly taught to avoid the use of explicit causal language for studies without random assignment. Instead, they are encouraged to use noncausal terms to describe relationships of interest. Commonly, one then finds implicit causal terms in the hypothesis, discussion and conclusion sections of a paper such as “\(X\) protects against \(Y\)”, “\(X\) leads to \(Y\)”, or “the reciprocal effects between \(X\) and \(Y\)”.
However, your study design does not determine your scientific goal, and ambiguity in your research goal leads to confusion among readers about what your study’s results can be used for. Hernán (2018) therefore recommends to explicitly state your scientific goals in the title, the introduction, the method section, and in the discussion section when describing the interpretation of the results and its limitations. The results section simply presents the results without giving an (in depth) interpretation. Being explicit about your scientific goals, whether it is description, prediction, or causation, helps others correctly interpret your study results, and allows other researchers to build on your work.
5 Takeaway
Setting a clear scientific goal for your study is absolutely critical. For you as a researcher, the scientific goal impacts the appropriateness of nearly every aspect of a study, including research design, choice of variables, measurement, and statistical analyses. For you as a consumer of science, the scientific goal impacts how you can interpret study results and for what practical purposes you can use a particular study. There is not one research goal that is inherently better than the other.
6 Further reading
We have collected various topics for you to read more about.
- [Intra-individual versus inter-individual variation]
- [The ‘within/between’ problem]
- Ergodicity
- Common data types: Cross-sectional, time series, panel, and intensive longitudinal data
Acknowledgments
Jeroen D. Mulder is supported by Stress in Action (SiA). SiA is funded through the Gravitation Program of the Dutch Research Council and the Dutch Ministry of Education, Culture and Science (NWO grant number 024.005.010).
References
Citation
@article{mulder2025,
author = {Mulder, Jeroen D.},
title = {Description, Prediction, Causation},
journal = {MATILDA},
number = {2025-05-23},
date = {2025-05-23},
url = {https://matilda.fss.uu.nl/articles/description-prediction-causation.html},
langid = {en}
}