Data mining is the process of identifying patterns and relationships in large datasets and extracting this information. This is accomplished with statistics and/or machine learning techniques. Data mining differs from data analysis in that it is approached without a hypothesis. Data mining often involves the automated collection of large quantities of data to “extract” previously unknown or interesting patterns in data.
An example of the use of data mining in healthcare is looking for patterns in large sets of EHR data to identify harmful drug interactions.
The tidyverse is a heavily used, well-supported set of libraries for R programming with functions that are very useful for data cleaning, analysis, and visualization.
Pandas is a library for Python for data cleaning and analysis, with some basic data visualization functionality.
Sadiku, M. N. O., Shadare, A. E., & Musa, S. M. (2015). DATA MINING: A BRIEF INTRODUCTION. European Scientific Journal, ESJ, 11(21). Retrieved from https://eujournal.org/index.php/esj/article/view/6017
Gupta, S. (2022). “Introduction to Data Mining: A Complete Guide.” Springboard Blog. Retrieved from https://www.springboard.com/blog/data-science/data-mining