Skip to content

📊 SQL Aggregations (Data Engineering Focus)

Aggregations are used to summarize data.

In data engineering, aggregations are used in:

  • dashboards
  • reporting tables
  • ETL pipelines
  • metrics calculation

🧠 Mental Model

Think of aggregation as:

👉 Converting raw events → meaningful metrics

Example:

  • orders → total revenue
  • users → active users
  • events → daily counts

📌 COUNT

Counts rows.

SELECT COUNT(*)
FROM orders;


📌 SUM

Adds numeric values.

SELECT SUM(amount)
FROM payments;


📌 AVG

Calculates average.

SELECT AVG(salary)
FROM employees;


📌 MIN / MAX

Finds smallest or largest value.

SELECT MIN(salary), MAX(salary)
FROM employees;


📌 GROUP BY (MOST IMPORTANT)

Used to group rows before aggregation.

SELECT department, COUNT(*)
FROM employees
GROUP BY department;

👉 One row per department


📌 HAVING (FILTER AFTER GROUPING)

Used to filter aggregated results.

SELECT department, COUNT()
FROM employees
GROUP BY department
HAVING COUNT(
) > 5;

👉 Departments with more than 5 employees


🧠 WHERE vs HAVING

WHERE → filters rows before grouping
HAVING → filters after grouping

Example:

SELECT department, COUNT()
FROM employees
WHERE salary > 50000
GROUP BY department
HAVING COUNT(
) > 3;


📌 REAL WORLD EXAMPLE

Find total revenue per customer:

SELECT customer_id, SUM(amount)
FROM orders
GROUP BY customer_id;


📌 MULTIPLE AGGREGATIONS

SELECT department,
COUNT(*) AS total_employees,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department;


🧠 DATA ENGINEERING USE CASES

Aggregations are used in:

  • daily/weekly dashboards
  • KPI calculations
  • billing systems
  • ETL summary tables

⚠️ COMMON MISTAKES

  • Using SELECT columns without GROUP BY
  • Confusing WHERE and HAVING
  • Aggregating without proper grouping
  • Using COUNT(*) when COUNT(column) is needed

🎯 INTERVIEW QUESTIONS

  • Difference between WHERE and HAVING?
  • What happens if you miss GROUP BY?
  • COUNT(*) vs COUNT(column)?
  • Can we use multiple aggregations together?

🚀 SUMMARY

  • COUNT → number of rows
  • SUM → total value
  • AVG → average value
  • GROUP BY → grouping logic
  • HAVING → filter groups