Appearance
⚡ SQL Window Functions (Data Engineering Focus)
Window functions are used to perform calculations across rows without collapsing them.
They are critical in:
- analytics
- ranking systems
- time-series processing
- deduplication
- ETL pipelines
🧠 Mental Model
Think of window functions as:
👉 “Performing calculations while still keeping row-level detail”
Unlike GROUP BY:
- GROUP BY → reduces rows
- Window functions → keeps all rows
📌 Basic Syntax
SELECT column_name,
window_function() OVER (PARTITION BY ... ORDER BY ...)
FROM table;
📌 ROW_NUMBER()
Assigns unique row numbers.
SELECT name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn
FROM employees;
👉 Highest salary gets rank 1
📌 RANK()
Ranks with gaps.
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees;
👉 If two people tie, ranks skip numbers
Example: 1, 2, 2, 4
📌 DENSE_RANK()
Ranks without gaps.
SELECT name, salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS drnk
FROM employees;
Example: 1, 2, 2, 3
📌 PARTITION BY (VERY IMPORTANT)
Resets ranking inside groups.
SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees;
👉 Rank employees within each department
📌 RUNNING TOTAL
SELECT name, salary,
SUM(salary) OVER (ORDER BY salary) AS running_total
FROM employees;
👉 Keeps cumulative sum
📌 MOVING AVERAGE
SELECT name, salary,
AVG(salary) OVER (ORDER BY salary ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM employees;
👉 Smooths trends over time
📌 REAL-WORLD USE CASES
1. Top N per group
SELECT * FROM ( SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees ) t
WHERE rn <= 3;
👉 Top 3 employees per department
2. Deduplication
SELECT * FROM ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn
FROM users ) t
WHERE rn = 1;
👉 Keep latest record per user
3. Time-series analysis
SELECT date, sales,
SUM(sales) OVER (ORDER BY date) AS cumulative_sales
FROM daily_sales;
🧠 DATA ENGINEERING USE CASES
Window functions are used in:
- fraud detection
- user analytics
- sessionization
- data deduplication
- KPI trends
⚠️ COMMON MISTAKES
- Confusing GROUP BY with window functions
- Missing PARTITION BY when needed
- Wrong ORDER BY inside window
- Using window functions before understanding row-level logic
🎯 INTERVIEW QUESTIONS
- Difference between RANK, DENSE_RANK, ROW_NUMBER?
- What is PARTITION BY used for?
- Difference between GROUP BY and window functions?
- How do you do deduplication using SQL?
🚀 SUMMARY
- Window functions do NOT reduce rows
- PARTITION BY splits data into groups
- ORDER BY defines sequence
- Used heavily in real-world analytics