Aiming for a representative sample: Simulating random versus purposive strategies for hospital selection

Loan R. van Hoeven, Mart P. Janssen, Kit C.B. Roes, Hendrik Koffijberg

Research output: Contribution to journalArticleAcademicpeer-review

3 Citations (Scopus)
55 Downloads (Pure)

Abstract

Background: A ubiquitous issue in research is that of selecting a representative sample from the study population. While random sampling strategies are the gold standard, in practice, random sampling of participants is not always feasible nor necessarily the optimal choice. In our case, a selection must be made of 12 hospitals (out of 89 Dutch hospitals in total). With this selection of 12 hospitals, it should be possible to estimate blood use in the remaining hospitals as well. In this paper, we evaluate both random and purposive strategies for the case of estimating blood use in Dutch hospitals.

Methods: Available population-wide data on hospital blood use and number of hospital beds are used to simulate five sampling strategies: (1) select only the largest hospitals, (2) select the largest and the smallest hospitals (‘maximum variation’), (3) select hospitals randomly, (4) select hospitals from as many different geographic regions as possible, (5) select hospitals from only two regions. Simulations of each strategy result in different selections of hospitals, that are each used to estimate blood use in the remaining hospitals. The estimates are compared to the actual population values; the subsequent prediction errors are used to indicate the quality of the sampling strategy.

Results: The strategy leading to the lowest prediction error in the case study was maximum variation sampling, followed by random, regional variation and two-region sampling, with sampling the largest hospitals resulting in the worst performance. Maximum variation sampling led to a hospital level prediction error of 15 %, whereas random sampling led to a prediction error of 19 % (95 % CI 17 %-26 %). While lowering the sample size reduced the differences between maximum variation and the random strategies, increasing sample size to n = 18 did not change the ranking of the strategies and led to only slightly better predictions.

Conclusions: The optimal strategy for estimating blood use was maximum variation sampling. When proxy data are available, it is possible to evaluate random and purposive sampling strategies using simulations before the start of the study. The results enable researchers to make a more educated choice of an appropriate sampling strategy.
Original languageEnglish
Article number90
Pages (from-to)90-
JournalBMC medical research methodology
Volume15
Issue number90
DOIs
Publication statusPublished - 2015

Fingerprint

Sample Size
Population
Proxy
Research Personnel
Research

Keywords

  • Sampling strategy
  • Hospital selection
  • Representativeness
  • Random vs. purposive sampling
  • Maximum variation
  • Simulation
  • Model-based inference

Cite this

@article{3bcc935659d7474fb064919a5681180e,
title = "Aiming for a representative sample: Simulating random versus purposive strategies for hospital selection",
abstract = "Background: A ubiquitous issue in research is that of selecting a representative sample from the study population. While random sampling strategies are the gold standard, in practice, random sampling of participants is not always feasible nor necessarily the optimal choice. In our case, a selection must be made of 12 hospitals (out of 89 Dutch hospitals in total). With this selection of 12 hospitals, it should be possible to estimate blood use in the remaining hospitals as well. In this paper, we evaluate both random and purposive strategies for the case of estimating blood use in Dutch hospitals.Methods: Available population-wide data on hospital blood use and number of hospital beds are used to simulate five sampling strategies: (1) select only the largest hospitals, (2) select the largest and the smallest hospitals (‘maximum variation’), (3) select hospitals randomly, (4) select hospitals from as many different geographic regions as possible, (5) select hospitals from only two regions. Simulations of each strategy result in different selections of hospitals, that are each used to estimate blood use in the remaining hospitals. The estimates are compared to the actual population values; the subsequent prediction errors are used to indicate the quality of the sampling strategy.Results: The strategy leading to the lowest prediction error in the case study was maximum variation sampling, followed by random, regional variation and two-region sampling, with sampling the largest hospitals resulting in the worst performance. Maximum variation sampling led to a hospital level prediction error of 15 {\%}, whereas random sampling led to a prediction error of 19 {\%} (95 {\%} CI 17 {\%}-26 {\%}). While lowering the sample size reduced the differences between maximum variation and the random strategies, increasing sample size to n = 18 did not change the ranking of the strategies and led to only slightly better predictions.Conclusions: The optimal strategy for estimating blood use was maximum variation sampling. When proxy data are available, it is possible to evaluate random and purposive sampling strategies using simulations before the start of the study. The results enable researchers to make a more educated choice of an appropriate sampling strategy.",
keywords = "Sampling strategy, Hospital selection, Representativeness, Random vs. purposive sampling, Maximum variation, Simulation, Model-based inference",
author = "{van Hoeven}, {Loan R.} and Janssen, {Mart P.} and Roes, {Kit C.B.} and Hendrik Koffijberg",
year = "2015",
doi = "10.1186/s12874-015-0089-8",
language = "English",
volume = "15",
pages = "90--",
journal = "BMC medical research methodology",
issn = "1471-2288",
publisher = "BioMed Central Ltd.",
number = "90",

}

Aiming for a representative sample: Simulating random versus purposive strategies for hospital selection. / van Hoeven, Loan R.; Janssen, Mart P.; Roes, Kit C.B.; Koffijberg, Hendrik.

In: BMC medical research methodology, Vol. 15, No. 90, 90, 2015, p. 90-.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Aiming for a representative sample: Simulating random versus purposive strategies for hospital selection

AU - van Hoeven, Loan R.

AU - Janssen, Mart P.

AU - Roes, Kit C.B.

AU - Koffijberg, Hendrik

PY - 2015

Y1 - 2015

N2 - Background: A ubiquitous issue in research is that of selecting a representative sample from the study population. While random sampling strategies are the gold standard, in practice, random sampling of participants is not always feasible nor necessarily the optimal choice. In our case, a selection must be made of 12 hospitals (out of 89 Dutch hospitals in total). With this selection of 12 hospitals, it should be possible to estimate blood use in the remaining hospitals as well. In this paper, we evaluate both random and purposive strategies for the case of estimating blood use in Dutch hospitals.Methods: Available population-wide data on hospital blood use and number of hospital beds are used to simulate five sampling strategies: (1) select only the largest hospitals, (2) select the largest and the smallest hospitals (‘maximum variation’), (3) select hospitals randomly, (4) select hospitals from as many different geographic regions as possible, (5) select hospitals from only two regions. Simulations of each strategy result in different selections of hospitals, that are each used to estimate blood use in the remaining hospitals. The estimates are compared to the actual population values; the subsequent prediction errors are used to indicate the quality of the sampling strategy.Results: The strategy leading to the lowest prediction error in the case study was maximum variation sampling, followed by random, regional variation and two-region sampling, with sampling the largest hospitals resulting in the worst performance. Maximum variation sampling led to a hospital level prediction error of 15 %, whereas random sampling led to a prediction error of 19 % (95 % CI 17 %-26 %). While lowering the sample size reduced the differences between maximum variation and the random strategies, increasing sample size to n = 18 did not change the ranking of the strategies and led to only slightly better predictions.Conclusions: The optimal strategy for estimating blood use was maximum variation sampling. When proxy data are available, it is possible to evaluate random and purposive sampling strategies using simulations before the start of the study. The results enable researchers to make a more educated choice of an appropriate sampling strategy.

AB - Background: A ubiquitous issue in research is that of selecting a representative sample from the study population. While random sampling strategies are the gold standard, in practice, random sampling of participants is not always feasible nor necessarily the optimal choice. In our case, a selection must be made of 12 hospitals (out of 89 Dutch hospitals in total). With this selection of 12 hospitals, it should be possible to estimate blood use in the remaining hospitals as well. In this paper, we evaluate both random and purposive strategies for the case of estimating blood use in Dutch hospitals.Methods: Available population-wide data on hospital blood use and number of hospital beds are used to simulate five sampling strategies: (1) select only the largest hospitals, (2) select the largest and the smallest hospitals (‘maximum variation’), (3) select hospitals randomly, (4) select hospitals from as many different geographic regions as possible, (5) select hospitals from only two regions. Simulations of each strategy result in different selections of hospitals, that are each used to estimate blood use in the remaining hospitals. The estimates are compared to the actual population values; the subsequent prediction errors are used to indicate the quality of the sampling strategy.Results: The strategy leading to the lowest prediction error in the case study was maximum variation sampling, followed by random, regional variation and two-region sampling, with sampling the largest hospitals resulting in the worst performance. Maximum variation sampling led to a hospital level prediction error of 15 %, whereas random sampling led to a prediction error of 19 % (95 % CI 17 %-26 %). While lowering the sample size reduced the differences between maximum variation and the random strategies, increasing sample size to n = 18 did not change the ranking of the strategies and led to only slightly better predictions.Conclusions: The optimal strategy for estimating blood use was maximum variation sampling. When proxy data are available, it is possible to evaluate random and purposive sampling strategies using simulations before the start of the study. The results enable researchers to make a more educated choice of an appropriate sampling strategy.

KW - Sampling strategy

KW - Hospital selection

KW - Representativeness

KW - Random vs. purposive sampling

KW - Maximum variation

KW - Simulation

KW - Model-based inference

U2 - 10.1186/s12874-015-0089-8

DO - 10.1186/s12874-015-0089-8

M3 - Article

VL - 15

SP - 90-

JO - BMC medical research methodology

JF - BMC medical research methodology

SN - 1471-2288

IS - 90

M1 - 90

ER -