Skip to content

⚡ SQL Window Functions (Data Engineering Focus)

Window functions are used to perform calculations across rows without collapsing them.

They are critical in:

  • analytics
  • ranking systems
  • time-series processing
  • deduplication
  • ETL pipelines

🧠 Mental Model

Think of window functions as:

👉 “Performing calculations while still keeping row-level detail”

Unlike GROUP BY:

  • GROUP BY → reduces rows
  • Window functions → keeps all rows

📌 Basic Syntax

SELECT column_name,
window_function() OVER (PARTITION BY ... ORDER BY ...)
FROM table;


📌 ROW_NUMBER()

Assigns unique row numbers.

SELECT name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn
FROM employees;

👉 Highest salary gets rank 1


📌 RANK()

Ranks with gaps.

SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees;

👉 If two people tie, ranks skip numbers

Example: 1, 2, 2, 4


📌 DENSE_RANK()

Ranks without gaps.

SELECT name, salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS drnk
FROM employees;

Example: 1, 2, 2, 3


📌 PARTITION BY (VERY IMPORTANT)

Resets ranking inside groups.

SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees;

👉 Rank employees within each department


📌 RUNNING TOTAL

SELECT name, salary,
SUM(salary) OVER (ORDER BY salary) AS running_total
FROM employees;

👉 Keeps cumulative sum


📌 MOVING AVERAGE

SELECT name, salary,
AVG(salary) OVER (ORDER BY salary ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM employees;

👉 Smooths trends over time


📌 REAL-WORLD USE CASES

1. Top N per group

SELECT * FROM ( SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees ) t
WHERE rn <= 3;

👉 Top 3 employees per department


2. Deduplication

SELECT * FROM ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn
FROM users ) t
WHERE rn = 1;

👉 Keep latest record per user


3. Time-series analysis

SELECT date, sales,
SUM(sales) OVER (ORDER BY date) AS cumulative_sales
FROM daily_sales;


🧠 DATA ENGINEERING USE CASES

Window functions are used in:

  • fraud detection
  • user analytics
  • sessionization
  • data deduplication
  • KPI trends

⚠️ COMMON MISTAKES

  • Confusing GROUP BY with window functions
  • Missing PARTITION BY when needed
  • Wrong ORDER BY inside window
  • Using window functions before understanding row-level logic

🎯 INTERVIEW QUESTIONS

  • Difference between RANK, DENSE_RANK, ROW_NUMBER?
  • What is PARTITION BY used for?
  • Difference between GROUP BY and window functions?
  • How do you do deduplication using SQL?

🚀 SUMMARY

  • Window functions do NOT reduce rows
  • PARTITION BY splits data into groups
  • ORDER BY defines sequence
  • Used heavily in real-world analytics