Big data refers to datasets that are too large to process on a personal computer. Compared to traditional, smaller datasets that can be stored, analyzed, and easily managed on a personal computer, big data refers to datasets that are much larger, are created or added to more quickly, are more varied in their structures, and are stored on large, cloud-based storage systems.
Researchers working with big data use specialized software tools, supercomputers, and high performance computing clusters designed to handle the volume and complexity of the datasets. Creators of artificial intelligence often train their programs with big data, and researchers may use machine learning to better understand or describe large datasets.
Awesome Public Datasets is a long list of big datasets taken from public data sources and arranged into categories:
Google also provides access to public datasets via BigQuery, which hosts and provides access to big datasets:
This animated video presents a short history of the European Organization for Nuclear Research, CERN, as a way of describing big data and providing an example of working with massive data sets:
This article shows that big data researchers in psychology and sociology do not share one standard definition for big data, but associate various terms and methodologies with it:
Favaretto M, De Clercq E, Schneble CO, Elger BS (2020) What is your definition of Big Data? Researchers’ understanding of the phenomenon of the decade. PLOS ONE 15(2): e0228987. https://doi.org/10.1371/journal.pone.0228987
This article describes some of the ethical issues involved in working with big data that includes biomedical information:
Saqr M. (2017). Big data and the emerging ethical challenges. International journal of health sciences, 11(4), 1–2. https://pubmed.ncbi.nlm.nih.gov/29085259/