Single item or multiple item measures

Noémi K. Schuurman

Single item or multiple item measures

Author

Affiliation

Noémi K. Schuurman

Methodology & Statistics Department, Utrecht University

Published

2025-05-23

This article has not been peer-reviewed yet and may be subject to change.

Want to cite this article? See citation info.

This article considers several theoretical implications for deciding on using single or multiple items to measure a construct for your ILD study.

In case a single item is used, the response to this single item is used in further analyses and results for your study. Common examples of constructs for which single items are often used are ‘age’, or ‘current activity’.
In case multiple items are used to measure a construct, some aggregation is performed on the multiple items to get a single measure for usage in analyses and results: For example, by creating a sum score or mean score, or by using some statistical measurement model to do this (e.g., a latent variable model such as a factor model or item response theory model, see Bollen, 2002). Common examples of constructs for which multiple items are often used are ‘math ability’, ‘depression’ and ‘positive affect’.
It is also possible that both single item and multiple item measures are used in a study.

Considering whether you’ll use single or multiple items is important, because your choice implies certain assumptions about the mechanism between the construct you want to measure, and the data you observe. These in turn may also affect your analyses, results, and end conclusions. They may also have implications for how to validate your measurements.

In this article, you will find: 1) using single item measures; and 2) using multiple item measures. In 2), we further discuss typical reasons for using multiple item measures, and how these reasons may contradict each other.

1 Measuring your construct with single items

In psychology we have had a tradition of relying on multiple item measures for measuring constructs that we cannot directly observe or derive (e.g. positive/negative affect). Single items are usually mainly used for constructs that are more straightforward to directly observe or derive (e.g., age, occupation, what someone is currently doing, or who someone is with).

For ILD, researchers more often reach for single items measures even in the former case. Often, this is for important practical reasons, such as keeping the burden for participants low. However, considerations related to theory can and should also play a role in this decision: For example, a single item may be deemed sufficient, or the most direct, or the most clear, way of measuring the construct of interest. Or, perhaps typical latent variable theory (see Borsboom et al., 2003; Borsboom, 2008), that is usually the basis of multiple item measures, may be deemed unsuitable for a construct.

Example: Social support 1

Kim wants to get a good impression of someone’s daily experienced received social support. They want to a) keep close to the participants own impression and interpretation of the social support the participants experienced b) keep the participants’ burden low.

Kim decides to use a single item asking “To what extent did you feel supported by others today?”, rather than asking multiple questions. Kim will use the responses to this item for further analyses on the construct experienced social support.

Using single items does not have to be problematic if a single item gives a good enough measurement of the construct of interest for the research goals. As is the case for multiple item measures, it is essential to (find ways to) evaluate whether this is the case.

2 Measuring your construct with multiple items

There can be many reasons for why a researcher may choose to study a construct with multiple items. Common reasons researchers opt for this, in our experience, are:

content validity (measuring all relevant aspects of a construct);
reliability (to account for random measurement error);
mirroring inter-individual difference questionnaires: Use an adapted version of a gold standard questionnaire from inter-individual difference studies (e.g., cross-sectional studies).

It is important to realize that each of these reasons for opting for multiple items has theoretical implications, and these may be different, and potentially not reconcilable with each other. Read more about these three reasons below.

2.1 Multiple items for reasons of content validity

When you have content validity as the main reason for using multiple items, you use these items to cover all relevant aspects of the construct of interest. You use different items to measure different important aspects of the relevant construct. You aggregate the responses to these multiple items in some way to get an overall measure of the construct of interest. You then use the aggregate for further analyses.

Example: Social support 2

Max wants to get a good impression of someone’s daily experienced received social support. To get an impression of all relevant aspects of social support for their study, Max breaks down the construct into different aspects, measured with different items.

Rather than using a single item asking ’How much social support did you receive today?“, they asked multiple questions like”Did you receive help with your job today?“,”Did you feel you could reach out for emotional support from your friends/family/colleagues today?“, and”Did you rely on financial support today?“. In this way, various distinct aspects of receiving social support are covered by the multiple items.

To get an overall measure of experienced social support for further use in their analyses, Max will aggregate the responses to these items; either using a mean score or a more complex latent variable model.

If you consider using multiple items to improve content validity, carefully consider the theoretical implications of how you should aggregate the multiple items into a single measure to be used in your analyses.

2.2 Multiple items for reasons of reliability

Another common reasons for using multiple items is reliability: Having multiple items to measure the same construct can be helpful in this case, because it allows using internal consistency reliability approaches. For these techniques, the general idea is that the items all measure the same underlying construct. The shared variance among those items is the result of variation in the true construct, while the unique (non-shared) variance among the items is considered measurement error.

Multiple items that measure the same construct, combined with internal consistency reliability approaches, can be used to both estimate the reliability of (a set of) items for measuring a construct of interest. That is, to validate a questionnaire. Furthermore, they can be used to account for measurement error for a construct of interest during an analysis. That is, to prevent measurement error from biasing the results of analyses.

Example: Social support 3

Ume has developed a short experience-sampling questionnaire where three items are used to measure daily experienced received social support. The items are “I felt supported by others today…”, “I felt grateful to others today…” and “Other people had my back today…”. The shared variance in the items over time is considered variance in the ‘true’ received social support, and the unique variance in the items over time is considered measurement error.

Well-known internal consistency approaches for inter-individual difference (e.g., cross-sectional) studies are reliability measures such as Cronbach’s Alpha (Cronbach, 1951; Sijtsma, 2009; Tavakol & Dennick, 2011), and latent variable models such as factor models and Item Response Theory models (Bollen, 2002). These techniques cannot be used directly on intensive longitudinal data, but extensions of these techniques and similar alternatives are available for ILD as well (e.g., see Castro-Alvarez et al., 2022; Geldhof et al., 2014; Molenaar, 1985; Nezlek, 2017; Song & Ferrer, 2012).

Example: Social support 4

Ume uses a reflective linear factor model with the three items as indicators, and one latent variable that should reflect ‘received social support’. They use the model to estimate the reliability for their short questionnaire, as part of their validation study. Reliability will be high when the items consistently increase and decrease together over the repeated measures (if they covary a lot of time).

If the questionnaire is deemed valid based on Ume’s study, in future studies researchers can use the same factor model to correct for measurement error in their analyses.

When you use use multiple items for such reliability purposes, it is important to consider that

each internal consistency reliability technique is based on theoretical assumptions, and hence has theoretical implications. It is important to carefully think about what measurement model is appropriate for your research question and context;
it is possible to evaluate reliability and correct for measurement error with single item measures as well, for both single or multiple persons.

2.3 Multiple items to mirror interindividual difference questionnaires

Another common reason for using multiple item measures is because you want to translate constructs established in inter-individual difference research to intra-individual differences for your ILD study: For example, you may want to use constructs such as ‘positive affect’ or ‘introversion’. In that case, the first intuition may be to simply take the questionnaires from inter-individual difference research and adapt them to the intra-individual context.

Example: PANAS 1

Sasha wants to measure intra-individual differences in ‘Positive Affect’ and ‘Negative Affect’, constructs established in the context of individual differences research. There, it is commonly measured with the PANAS Schedule (Tran, 2020; Watson et al., 1988). It measures how much positive and negative affect people tend to generally experience, and is used to study trait differences between people in how much positive and negative affect they tend to experience, and how this relates to other differences between people (e.g., their sex or personality traits).

Sasha takes the original items of the PANAS scale, but changes the phrasing to suit intra-individual difference research.

For example: The original PANAS instructions were “Indicate the extent you have felt this way over the past week.” for various adjectives (e.g., “irritable”, “guilty”, “afraid”), rated on a 5 point likert scale (“very slightly/not at all” - “extremely”). Sasha changes the instruction to “Indicate the extent you have felt this way in the last hour.”, keeping the remaining the same.

When you consider taking this approach, take into account the following points:

Constructs established inter-individual difference research may be less relevant, or even irrelevant for studying intra-individual differences.

Example PANAS 2

Sylvie is in doubt whether she should measure ’negative affects, and adapt the PANAS to this end for her ILD study.

She knows that based on interindividual difference research, the items “irritable”, “guilty”, “afraid” (and others) are considered suitable items for establishing consistent differences in ‘overall negative affect’ (NA) among different people (Watson et al., 1988). Based on factor modeling studies (Wedderhoff et al., 2021), it has been established that people that tend to overall feel more irritable than other people, tend to also feel more guilty and afraid than other people. This property is captured in a latent variable, which is considered to capture differences in people’s overall ‘negative affect’.

She also realizes however that this doesn’t mean such a latent variable will also capture differences in a persons’ intra-individual differences well, that is, how a person’s affect fluctuates over time.

She realizes that if she applies the ‘negative affect’ latent variable to intra-individual differences over time, using the same adjectives as the original scale, this would imply the following pattern over time: If a person feels relatively irritable at a certain occasion, that person tends to also feel relatively guilty and afraid at that same occasion.

She thinks it makes more sense that whether a person feels irritable at a given occasion, and whether the person also feels guilty and afraid, depends mainly on the context the participant is in at that occasion - rather than this it is a result of a latent variable ‘negative affect’. For some experiences they may feel all three emotions to a large extent due to the nature of the event (participant is publicly called out on a big mistake), while for other experiences they may be inclined to feel mostly one of them (participant encounters an angry swarm of bees during a walk).

In that case, over time, people’ s emotion would be considered separate constructs, and whether they covary or not depends mostly on the particular features of changeable circumstances.

Sylvie concludes that the concept of general negative effect may not be directly relevant for studying how people’s emotions fluctuate over time (intra-individual differences); even though it may be very suitable for studying general tendencies of people compared to other people (inter-individual differences). She decides to develop a new measurement instrument that suit her studies purposes instead of adapting the PANAS.
Questionnaires for intra-individual difference studies, often need to have different qualities than questionnaires on inter-individual differences. For example, they typically need to be administered often, hence they may need to be shorter. It is also important to consider what items are suitable for your temporal lens of interest, or vice versa, what temporal lens suits your items of interest. Items that measure constructs that change little of time may be well suited to an inter-individual difference questionnaire, but are probably ill suited to an ILD adaptation.

Example: PANAS 3

Mark is considering to adapt the PANAS for his ILD study, such that he can measure positive and negative affect with experience sampling. However, he is worried about the adapted item ‘guilty’, for which participants now would see “Indicate the extent you have felt guilty way in the last hour.”

He worries that if he collects his data over a short time span as suits his temporal lens, participants may often or even always respond ‘not at all/very slightly’ for the majority of measures for this item. This would result in little to no within-person variance in the repeated measures of guilt, making intra-individual difference analyses difficult or impossible for that item (see also floor and ceiling effects).
valid and reliable questionnaires from inter-individual difference research, may not be valid and reliable for intra-individual difference research.

Example: PANAS 4

Hildebrand wants to be sure that her measurement instruments for her ILD study are reliable.

She thought to adapt the PANAS for het ILD study, and determines the reliability of her measurement instrument.

She finds that while the original PANAS for interindividual differences may have a reliability between .8-.9 Thompson (2007), the reliability of her adapted PANAS for intra-individual differences differs for each of her participants, and those reliabilities are typically lower than .8 (Schuurman & Hamaker, 2019).

Taking the above three points into account means that questionnaires or items adapted from inter-individual difference research need to be completely re-validated for ILD research.

We generally recommend to avoid copying constructs from inter-individual research directly to intra-individual difference studies unless there are very good theoretical reasons to do so. That is, the constructs need to make sense for studying intra-individual differences, for the given research question, and context, preferably backed by empirical evidence. In many cases it may be necessary, and more fruitful to establish (new) constructs tailored to intra-individual differences (to illustrate, see Andresen et al., 2024 for possible approaches to this applied to measuring personality).

2.4 Combining different reasons for using multiple items

In practice, you may want to use multiple items for more than one of the reasons discussed above, for example, you may want to both cover different aspects of a construct (content validity), and use multiple items to estimate the reliability of your multiple item measure. It is important that you think about about whether and how these different goals can be reconciled with each other for a given construct. This will depend on your goals, the nature of the construct, the design of the items, and how you will aggregate the items.

Example: Social support 5

Max wants to get a good impression of someone’s daily experienced received social support. They care about both content validity, and they also want to evaluate the measures’ reliability, to make sure it captures as little measurement error as possible.

To ensure their measure is content valid, Max breaks down the construct into different aspects, measured with different items.

Rather than using a single item asking ’How much social support did you receive today?“, they asked multiple questions like”Did you receive help with your job today?“,”Did you feel you could reach out for emotional support from your friends/family/colleagues today?“, and”Did you rely on financial support today?“. In this way, various distinct aspects of receiving social support are covered by the multiple items.

Max uses a reflective factor model to estimate the internal consistent reliability of the items. However, because this reliability estimate relies on the idea that what is reliable is the variance that is shared among the items, all the unique variability in the items is considered measurement error. Hence, all the unique contributions of the items, which is why Max used multiple items with respect to content validity, are not considered in this reliability estimate. As a result, the reliability estimate is likely to underestimate the true reliability of the measure.

Max’ internal consistency reliability approach, does not mesh well with their approach to content validity.

3 Takeaway

Whether you should use a single item or multiple items to measure a construct of interest, depends on your research goals, research question, and the nature of the construct. There is no reason to assume that single item measures or multiple item measures will perform better or worse in general. Rather, you should take both the nature of your study and your process of your interest into account when you select or design your measurement instrument. Moreover you should ensure it is valid it for your intended use.

Keep in mind that that different goals you may have for your instrument - such as achieving high content validity and reliability - may compete with each other. Moreover, that there is no reason to assume that what worked well for cross-sectional or panel studies will also work well for your ILD study. Rather than limiting your ILD study to what has been established based on such research, rethink what is relevant when you want to study processes that take place within persons over time.

4 Further reading

We have collected various topics for you to read more about below.

Read more: Evaluating the quality of single item measures

Reliability for single items

Read more: Interindividual versus intraindividual differences

Types of variation
Common data types
[The within/between problem]
Ecological fallacy
Intra-individual versus inter-individual correlation

References

Andresen, P. K., Schuurman, N. K., & Hamaker, E. L. (2024). How to measure and model personality traits in everyday life: A qualitative analysis of 300 big five personality items. Journal of Research in Personality, 112, 104528. https://doi.org/10.1016/j.jrp.2024.104528

Bollen, K. A. (2002). Latent variables in psychology and the social sciences. Annual Review of Psychology, 53(1), 605–634. https://doi.org/10.1146/annurev.psych.53.100901.135239

Borsboom, D. (2008). Latent variable theory. Measurement: Interdisciplinary Research and Perspectives, 6(1-2), 25–53. https://doi.org/10.1080/15366360802035497

Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110(2), 203. https://doi.org/10.1037/0033-295X.110.2.203

Castro-Alvarez, S., Tendeiro, J. N., Jonge, P. de, Meijer, R. R., & Bringmann, L. F. (2022). Mixed-effects trait-state-occasion model: Studying the psychometric properties and the person–situation interactions of psychological dynamics. Structural Equation Modeling: A Multidisciplinary Journal, 29(3), 438–451. https://doi.org/10.1080/10705511.2021.1961587

Crawford, J. R., & Henry, J. D. (2004). The positive and negative affect schedule (PANAS): Construct validity, measurement properties and normative data in a large non-clinical sample. British Journal of Clinical Psychology, 43(3), 245–265. https://doi.org/10.1348/0144665031752934

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1007/BF02310555

Geldhof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72. https://doi.org/10.1037/a0032138

Molenaar, P. C. (1985). A dynamic factor model for the analysis of multivariate time series. Psychometrika, 50(2), 181–202. https://doi.org/10.1007/BF02294246

Nezlek, J. B. (2017). A practical guide to understanding reliability in studies of within-person variability. Journal of Research in Personality, 69, 149–155. https://doi.org/10.1016/j.jrp.2016.06.020

Schuurman, N. K., & Hamaker, E. L. (2019). Measurement error and person-specific reliability in multilevel autoregressive modeling. Psychological Methods, 24(1), 70. https://doi.org/10.1037/met0000188

Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of cronbach’s alpha. Psychometrika, 74, 107–120. https://doi.org/10.1007/s11336-008-9101-0

Song, H., & Ferrer, E. (2012). Bayesian estimation of random coefficient dynamic factor models. Multivariate Behavioral Research, 47(1), 26–60. https://doi.org/10.1080/00273171.2012.640593

Tavakol, M., & Dennick, R. (2011). Making sense of cronbach’s alpha. International Journal of Medical Education, 2, 53. https://doi.org/10.5116/ijme.4dfb.8dfd

Thompson, E. R. (2007). Development and validation of an internationally reliable short-form of the positive and negative affect schedule (PANAS). Journal of Cross-Cultural Psychology, 38(2), 227–242. https://doi.org/10.1177/0022022106297301

Tran, V. (2020). Positive affect negative affect scale (PANAS). In Encyclopedia of behavioral medicine (pp. 1708–1709). Springer. https://doi.org/10.1007/978-3-030-39903-0_978

Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology, 54(6), 1063. https://doi.org/10.1037/0022-3514.54.6.1063

Wedderhoff, N., Gnambs, T., Wedderhoff, O., Burgard, T., & Bošnjak, M. (2021). On the structure of affect. Zeitschrift Für Psychologie. https://doi.org/10.1027/2151-2604/a000434

Citation

BibTeX citation:

@article{schuurman2025,
  author = {Schuurman, Noémi K.},
  title = {Single Item or Multiple Item Measures},
  journal = {MATILDA Preprints},
  number = {2025-05-23},
  date = {2025-05-23},
  url = {https://matilda.fss.uu.nl/articles/single-item-muliple-item-measures.html},
  langid = {en}
}

For attribution, please cite this work as:

Schuurman, N. K. (2025). Single item or multiple item measures. MATILDA Preprints, 2025-05-23. https://matilda.fss.uu.nl/articles/single-item-muliple-item-measures.html