This study was
motivated by the observation that recent epidemiological research is based on existing large data-bases and that 100% sampling
was the usual practice. This is a reversal of the traditional epidemiological practice of selecting a probability sample from
a study population in order to reach conclusion about the target population.
The objective of the research was to survey epidemiological
research published in 2006 in three high-impact journals to ascertain whether the tendency of 100% sampling from large data
bases had become the norm. The three journals selected for study were the American Journal of Epidemiology, the European Journal
of Epidemiology, and the International Journal of Epidemiology.
A pre-tested data abstract form was used to abstract
the following essential information from each original research article: title, authors, issue and volume number, date of
publication, type of study (cross sectional, case control, cohort, randomized community control, randomized clinical), target
population, study population, sampling fraction, type of sampling (simple random, stratified random, systematic random, multi-stage,
non-random), source of data (existing data base, fresh data collection, prior study), and total number of study subjects.
The data was keyed into an SPSS data base for categorical analysis using the chi-square test statistic to test for association.
Results will be presented showing the increasing
trend of doing epidemiological research based on large data sets of routinely collected data or data left over from previous
research. The research trends will be described and characterized regarding size of study, methods of sampling, and implications
on both internal and external validity
The findings of the study indicate a major change
in epidemiological research with serious practical and theoretical implications. The availability of large data bases and
high speed computers has encouraged epidemiologists to analyze data without probability sampling. A large data set gives very
stable parameters but the same degree of precision could have been obtained from a smaller sample. What is the lost is the
ability of the epidemiologist to inspect a small manageable data set, internalize it, and let his intuition act before the
data is analyzed. The more intimate contact of the epidemiologist with the data traditionally accounted for deep understanding
and discussion which are missed in the new trend. Easy availability of large databases also encourages epidemiologists to
plunge into data analysis before serious thought about the research questions. In some cases the research questions can be
prompted by preliminary analysis which can lead to numerous biases. Use of large data sets has the advantage of external validity
which had never been the primary objective of epidemiological research. Epidemiologists have traditionally aimed at carrying
out a small study based on probability sampling so that they can easily identify and control confounding and other sources
of bias with the ultimate aim of internal validity. They knew that external validity (generalization) would be attained inductively
by consideration of several studies that are internally valid. Use of large sets of routinely collected data also raises the
issue of the quality of the data which is collected with service and administrative and not research considerations in mind.
The paper concludes by highlighting that more thought should be given to the implications of the observed change in
the paradigms and practices of epidemiological research.