Data Wrangling is a broad term referring to the processes involved when preparing data for analysis. It can include acquiring data, enriching, changing the format and shape of the data, combining, subsetting and sampling data, and cleaning data.
Some common steps involved with Data Wrangling are:
- Discovering and gathering the data needed
- Merging data from different sources, if necessary
- Fixing flaws in the data entries
- Extracting the necessary data and put it in the proper structure
- Storing it in the proper format for further use
Merging data from different sources and fixing flaws or errors in data entries.
This short Coursera video (What is Data Wrangling?) provides an excellent overview of the data wrangling process and common tasks involved when preparing data for analysis and publication.
Data Science for Practicing Clinicians: Data Wrangling is a Data Carpentry lesson that provides hands-on experience with installing and using dplyr, a core package in Tidyverse in the R programming language. Basic instructions for filtering, summarizing, parsing, and cleaning data are provided.
The Book Practical Data Wrangling (2017) by Allan Visochek provides information on data wrangling techniques in Python.