Data Collection


Data Collection is the process of gathering data, typically in the context of a research project or for ongoing surveillance and tracking. This involves measuring information on variables of interest in a pre-established, systematic way that enables researchers to address research questions, test hypotheses, and evaluate outcomes.  There are many methods of data collection including manual and automated, and the methods typically depend on the intended outcomes to be measured and analyzed. Data can be collected through forms or survey instruments; qualitative interviews or focus groups; extractions from EHR data; measurements from sensors, scales, and other lab or clinical equipment; video or audio recordings; review of corpuses of text. 

As with all data collection, data quality issues should be considered through data stewardship, including concepts such as data validation, ontologies (e.g., RXNorm, ICD codes. etc.), units of measure, calibration, clear audio and video, and data backup and security.


This article provides an overview of the different data collection approaches commonly used in clinical, public health, and translational research: Saczynski, J. S., McManus, D. D., & Goldberg, R. J. (2013). Commonly used data-collection approaches in clinical research. The American Journal of Medicine, 126(11), 946–950. 

This module by Measure Evaluation, funded by USAID, provides information on the basics of public health surveillance: 


Similar Terms

Data Capture
Clinical Data Collection
Data Extraction
Data Entry
Data Surveillance

REDCap is a very popular tool for clinical research data collection.

Relevant Literature

The U.S. Department of Health and Human Services’ Office of Research Integrity provides a guide to data collection in their “Responsible Conduct in Data Management” website: 

Search for a Term