Skip to content

Data Warehousing 🏢 (Analytics at Scale)

A data warehouse is a system designed to store structured data for analytics and reporting.

🧠 If pipelines move data, warehouses make data useful for decision making.


🎯 Why Data Warehouses Exist

Operational databases (OLTP) are not suitable for analytics because:

  • They are optimized for transactions
  • They handle frequent writes and updates
  • They are not optimized for large aggregations

A data warehouse solves this by:

  • Optimizing for read-heavy workloads
  • Storing historical data
  • Supporting complex queries

🧭 OLTP vs OLAP


OLTP (Online Transaction Processing)

Used for day-to-day operations.

Examples:

  • Banking systems
  • E-commerce checkout
  • User login systems

✔ Fast inserts/updates
✔ Real-time operations
❌ Not good for analytics


OLAP (Online Analytical Processing)

Used for analysis and reporting.

Examples:

  • Business dashboards
  • Sales analysis
  • Customer insights

✔ Complex queries
✔ Large-scale aggregations
❌ Not optimized for writes


🏗️ Data Warehouse Architecture

Sources → ETL/ELT → Data Warehouse → BI Tools

Flow:

  • Data is extracted from multiple systems
  • Transformed and cleaned
  • Loaded into warehouse
  • Queried by analysts and dashboards

🧱 Core Components


1. Staging Layer

  • Raw data landing zone
  • Temporary storage
  • Used for preprocessing

2. Data Storage Layer

  • Cleaned and structured data
  • Organized into schemas

3. Presentation Layer

  • Final layer for analytics
  • Used by BI tools and analysts

🧠 Schema Design Models


1. Star Schema ⭐

  • Fact table in center
  • Dimension tables around it

Example:

  • Fact: Sales
  • Dimensions: Customer, Product, Time

✔ Fast queries
✔ Easy to understand
❌ Redundant data


2. Snowflake Schema ❄️

  • Normalized version of star schema
  • More relationships between tables

✔ Less redundancy
❌ More joins → slower queries


⚙️ Key Concepts in Warehousing


1. Fact Tables

  • Contain measurable data
  • Example: sales amount, clicks, revenue

2. Dimension Tables

  • Descriptive data
  • Example: customer info, product details

3. Aggregations

Warehouses are optimized for:

  • SUM()
  • COUNT()
  • AVG()
  • GROUP BY

🔥 Modern Data Warehouses


Cloud-Based Systems

  • Snowflake
  • BigQuery
  • Redshift

Characteristics:

  • Scalable storage
  • Separation of compute & storage
  • Pay-as-you-use model

⚡ Data Lake vs Data Warehouse

FeatureData LakeData Warehouse
DataRawProcessed
SchemaOn readOn write
CostLowHigher
Use caseML, raw dataAnalytics, BI

🚨 Common Warehouse Problems

  • Slow query performance
  • Poor schema design
  • Data duplication
  • High storage costs
  • Stale data issues

🔗 How This Connects

  • Data Modeling → defines structure
  • Storage → holds data
  • Processing → transforms data
  • Pipelines → move data
  • Warehouse → makes data usable for analytics

🎯 Goal of Data Warehousing Knowledge

You should be able to:

  • Design analytics systems
  • Choose schema types
  • Explain OLAP vs OLTP
  • Understand warehouse architecture
  • Optimize query performance

“A data warehouse is where raw data becomes business intelligence.”