Hiprup

What is a Data Lake in AWS?

A Data Lake is a centralized repository that stores structured, semi-structured, and unstructured data at any scale — usually for analytics, ML, and search.

  • Storage layerAmazon S3 is the canonical Data Lake storage; cheap, durable, infinitely scalable.

  • CatalogAWS Glue Data Catalog stores schemas, partitions, and metadata.

  • Ingestion — Kinesis Data Firehose, DMS, Glue, Snowball, partner ETL tools.

  • Query / Analytics — Athena (serverless SQL), Redshift Spectrum, EMR (Spark/Hive), QuickSight.

  • GovernanceAWS Lake Formation manages permissions, row/column security, and audit.

Data Lake vs Data Warehouse: Lake holds raw data in any format (schema-on-read); Warehouse (Redshift) holds curated structured data (schema-on-write).

S3 is the data lake. Add Glue Catalog + Lake Formation + Athena to sketch the modern AWS lake stack in one breath.

What is a Data Lake in AWS? | Hiprup