Data Lake

Definition

A data lake is a storage repository that holds vast amounts of raw data in its native format, so this data can be structured, unstructured, or semi-structured. This is in contrast to a data warehouse, in which data is structured in a common data model.‍ While this allows the flexibility of ingesting different types of data into the data lake in different formats, this lack of structure can lead to a “data swamp” if not carefully managed. Since data lakes store raw data, analysis is flexible and can occur at a granular level to ad-hoc queries. At its core, a data lake is a data storage and processing repository in which all of the data in an organization can be placed and data flows into it and insights flow out. Cloud computing can be used for data lakes, or they can be stored locally, with Hadoop being a popular platform for local storage.

Examples

Hadoop

Azure

 

Search for a Term

Send us your feedback or suggestions for new terms

Contact information
CAPTCHA
6 + 10 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
This question is to prevent spam submissions. Contact nwso@hshsl.umaryland.edu for any accessibility issues.