Appearance
📊 SQL Aggregations (Data Engineering Focus)
Aggregations are used to summarize data.
In data engineering, aggregations are used in:
- dashboards
- reporting tables
- ETL pipelines
- metrics calculation
🧠 Mental Model
Think of aggregation as:
👉 Converting raw events → meaningful metrics
Example:
- orders → total revenue
- users → active users
- events → daily counts
📌 COUNT
Counts rows.
SELECT COUNT(*)
FROM orders;
📌 SUM
Adds numeric values.
SELECT SUM(amount)
FROM payments;
📌 AVG
Calculates average.
SELECT AVG(salary)
FROM employees;
📌 MIN / MAX
Finds smallest or largest value.
SELECT MIN(salary), MAX(salary)
FROM employees;
📌 GROUP BY (MOST IMPORTANT)
Used to group rows before aggregation.
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
👉 One row per department
📌 HAVING (FILTER AFTER GROUPING)
Used to filter aggregated results.
SELECT department, COUNT()
FROM employees
GROUP BY department
HAVING COUNT() > 5;
👉 Departments with more than 5 employees
🧠 WHERE vs HAVING
WHERE → filters rows before grouping
HAVING → filters after grouping
Example:
SELECT department, COUNT()
FROM employees
WHERE salary > 50000
GROUP BY department
HAVING COUNT() > 3;
📌 REAL WORLD EXAMPLE
Find total revenue per customer:
SELECT customer_id, SUM(amount)
FROM orders
GROUP BY customer_id;
📌 MULTIPLE AGGREGATIONS
SELECT department,
COUNT(*) AS total_employees,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
🧠 DATA ENGINEERING USE CASES
Aggregations are used in:
- daily/weekly dashboards
- KPI calculations
- billing systems
- ETL summary tables
⚠️ COMMON MISTAKES
- Using SELECT columns without GROUP BY
- Confusing WHERE and HAVING
- Aggregating without proper grouping
- Using COUNT(*) when COUNT(column) is needed
🎯 INTERVIEW QUESTIONS
- Difference between WHERE and HAVING?
- What happens if you miss GROUP BY?
- COUNT(*) vs COUNT(column)?
- Can we use multiple aggregations together?
🚀 SUMMARY
- COUNT → number of rows
- SUM → total value
- AVG → average value
- GROUP BY → grouping logic
- HAVING → filter groups