Introduction to SQL Aggregations
SQL aggregate functions are the powerhouse of data analysis, transforming millions of individual records into meaningful insights. Whether you're calculating total sales, finding average customer ratings, or counting active users, aggregate functions make complex calculations simple and efficient.
Why Aggregations Matter
In the real world, raw data tells only part of the story. A table with a million transactions is overwhelming, but aggregate functions can instantly answer critical business questions: What's our total revenue? How many customers bought products this month? What's the average order value? These insights drive strategic decisions across every industry.
SQL provides five fundamental aggregate functions—COUNT, SUM, AVG, MIN, and MAX—that work on sets of rows to produce single summary values. Combined with GROUP BY and HAVING clauses, these functions become incredibly powerful tools for segmentation, trend analysis, and business intelligence reporting.
Aggregate Functions
Aggregate functions perform calculations on multiple rows of data and return a single result. Unlike regular functions that operate row-by-row, aggregates summarize entire datasets or groups of rows into meaningful metrics like totals, averages, counts, minimums, and maximums.
Why it matters: Every business report—from sales dashboards to customer analytics—relies on aggregations. Understanding these functions is essential for data analysts, business intelligence professionals, and anyone working with databases to extract insights from raw data.
The Five Core Aggregate Functions
SQL's aggregate functions are designed to answer specific types of questions about your data. Let's explore what each function does:
| Function | Purpose | Example Use Case |
|---|---|---|
| COUNT() | Counts the number of rows or non-NULL values | How many customers placed orders this month? How many products are in each category? |
| SUM() | Calculates the total sum of numeric values | What's the total revenue for Q4? What's the sum of all inventory quantities? |
| AVG() | Computes the arithmetic mean of numeric values | What's the average order value? What's the mean customer satisfaction score? |
| MIN() | Finds the smallest value in a set | What's the lowest price in our catalog? When was the first order placed? |
| MAX() | Finds the largest value in a set | What's the highest salary in the company? What's the most recent transaction date? |
Real-World Applications
Aggregate functions appear everywhere in data-driven organizations. Here are common scenarios you'll encounter:
Sales Analytics
Calculate total revenue, average order values, and sales counts by region, product, or time period to track performance and identify trends.
Customer Insights
Count active users, calculate average lifetime value, find top spenders, and segment customers based on aggregated behavior patterns.
Inventory Management
Sum total inventory quantities, find minimum stock levels, calculate average turnover rates, and monitor reorder thresholds.
Basic Aggregation Example
Let's start with a simple example using a sales table. Suppose we have this data:
-- Sample sales table
CREATE TABLE sales (
sale_id INT,
product_name VARCHAR(100),
category VARCHAR(50),
price DECIMAL(10,2),
quantity INT,
sale_date DATE
);
-- View some sample data
SELECT * FROM sales LIMIT 5;
Now let's use aggregate functions to get insights from this data:
-- Total number of sales transactions
SELECT COUNT(*) AS total_transactions
FROM sales;
-- Result: 1247
-- Total revenue across all sales
SELECT SUM(price * quantity) AS total_revenue
FROM sales;
-- Result: 284,592.50
-- Average price per product
SELECT AVG(price) AS average_price
FROM sales;
-- Result: 45.67
-- Lowest and highest prices
SELECT
MIN(price) AS lowest_price,
MAX(price) AS highest_price
FROM sales;
-- Result: lowest_price = 5.99, highest_price = 299.99
COUNT & SUM Functions
COUNT and SUM are the workhorses of data analysis—COUNT tells you how many records exist, while SUM calculates totals. Mastering their variations and nuances is essential for accurate reporting and data validation.
Understanding COUNT()
The COUNT function has three important variations, and understanding the differences is critical for accurate analysis:
| Syntax | Behavior |
|---|---|
| COUNT(*) | Counts all rows in the result set, including rows with NULL values. This is the most common form. |
| COUNT(column_name) | Counts only non-NULL values in the specified column. NULLs are ignored completely. |
| COUNT(DISTINCT column) | Counts only unique non-NULL values. Duplicates and NULLs are excluded from the count. |
Let's see these variations in action with a customers table:
-- Sample customers table
CREATE TABLE customers (
customer_id INT,
name VARCHAR(100),
email VARCHAR(100),
phone VARCHAR(20),
city VARCHAR(50)
);
-- Count all customers (includes everyone)
SELECT COUNT(*) AS total_customers
FROM customers;
-- Result: 500
-- Count customers with email addresses (excludes NULLs)
SELECT COUNT(email) AS customers_with_email
FROM customers;
-- Result: 487 (13 customers have NULL emails)
-- Count unique cities (no duplicates)
SELECT COUNT(DISTINCT city) AS unique_cities
FROM customers;
-- Result: 42
The SUM() Function
SUM calculates the total of numeric values. It's commonly used for revenue calculations, quantity totals, and accumulating numeric metrics. Like COUNT(column), SUM automatically ignores NULL values.
-- Calculate total revenue from orders
SELECT SUM(order_total) AS total_revenue
FROM orders
WHERE order_date >= '2025-01-01';
-- Result: 1,247,850.00
-- Calculate total quantity sold across all products
SELECT SUM(quantity) AS total_units_sold
FROM order_items;
-- Result: 28,459
-- Sum with calculation: total revenue from price * quantity
SELECT SUM(price * quantity) AS calculated_revenue
FROM order_items;
-- Result: 892,340.50
NULL Handling in Aggregates
Most aggregate functions (SUM, AVG, MIN, MAX, COUNT(column)) automatically ignore NULL values. Only COUNT(*) includes rows with NULLs. This behavior is crucial for accurate calculations—a NULL doesn't mean zero, it means "unknown" or "missing data."
Why it matters: If you have 100 orders and 5 have NULL totals, SUM(order_total) calculates the sum of the 95 known values. If you need to treat NULLs as zeros, use COALESCE: SUM(COALESCE(order_total, 0)).
Combining COUNT and SUM
Often you'll use multiple aggregate functions together to get comprehensive insights:
-- Sales summary: count and total
SELECT
COUNT(*) AS total_orders,
COUNT(DISTINCT customer_id) AS unique_customers,
SUM(order_total) AS total_revenue,
SUM(quantity) AS total_items_sold
FROM orders
WHERE order_date >= '2025-01-01'
AND order_date < '2026-01-01';
-- Result:
-- total_orders: 3,847
-- unique_customers: 1,205
-- total_revenue: 487,920.00
-- total_items_sold: 12,489
Conditional Counting with CASE
You can combine COUNT with CASE statements to count rows that meet specific conditions—a powerful technique for creating summary reports:
-- Count orders by status in a single query
SELECT
COUNT(*) AS total_orders,
COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed_orders,
COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending_orders,
COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled_orders
FROM orders;
-- Result:
-- total_orders: 5,200
-- completed_orders: 4,850
-- pending_orders: 280
-- cancelled_orders: 70
Practice Questions: COUNT & SUM
Test your understanding with these coding challenges.
Task: Write a query to count the total number of products and the number of unique categories in a products table.
Expected columns: total_products, unique_categories
Show Solution
SELECT
COUNT(*) AS total_products,
COUNT(DISTINCT category) AS unique_categories
FROM products;
-- Example output:
-- total_products: 1,247
-- unique_categories: 12
Scenario: You have an order_items table with columns: order_id, product_id, quantity, price.
Task: Calculate the total revenue (price × quantity for all items) and count the total number of items sold.
Show Solution
SELECT
SUM(price * quantity) AS total_revenue,
SUM(quantity) AS total_items_sold,
COUNT(*) AS total_line_items,
COUNT(DISTINCT order_id) AS total_orders
FROM order_items;
-- Example output:
-- total_revenue: 284,592.50
-- total_items_sold: 8,459
-- total_line_items: 5,234
-- total_orders: 1,847
Task: Write a single query that counts orders in three status categories (pending, completed, cancelled) plus total revenue from completed orders only.
Show Solution
SELECT
COUNT(*) AS total_orders,
COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending_count,
COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed_count,
COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled_count,
SUM(CASE WHEN status = 'completed' THEN order_total ELSE 0 END) AS completed_revenue
FROM orders;
-- Example output:
-- total_orders: 5,200
-- pending_count: 280
-- completed_count: 4,850
-- cancelled_count: 70
-- completed_revenue: 487,920.00
AVG, MIN & MAX Functions
While COUNT and SUM tell you quantities and totals, AVG, MIN, and MAX reveal patterns, extremes, and central tendencies in your data. These statistical functions are essential for pricing analysis, performance metrics, and identifying outliers.
The AVG() Function
AVG calculates the arithmetic mean by summing values and dividing by the count of non-NULL values. It's one of the most commonly used statistics in business reporting:
-- Calculate average order value
SELECT AVG(order_total) AS average_order_value
FROM orders
WHERE order_date >= '2025-01-01';
-- Result: 127.45
-- Average product price by category
SELECT AVG(price) AS avg_price
FROM products;
-- Result: 45.67
-- Average with DISTINCT (unique values only)
SELECT AVG(DISTINCT price) AS avg_unique_price
FROM products;
-- Result: 52.30 (excludes duplicate prices)
MIN() and MAX() Functions
MIN finds the smallest value, MAX finds the largest. They work with numbers, dates, and even text (alphabetically). These functions are perfect for finding extremes, ranges, and boundaries in your data:
-- Find price range in products
SELECT
MIN(price) AS lowest_price,
MAX(price) AS highest_price,
MAX(price) - MIN(price) AS price_range
FROM products;
-- Result: lowest=5.99, highest=299.99, range=294.00
-- Find date range of orders
SELECT
MIN(order_date) AS first_order,
MAX(order_date) AS latest_order
FROM orders;
-- Result: first_order=2024-01-15, latest_order=2025-12-28
-- Alphabetically first and last customer names
SELECT
MIN(name) AS first_alphabetically,
MAX(name) AS last_alphabetically
FROM customers;
-- Result: first=Aaron Adams, last=Zoe Zhang
Statistical Measures
AVG gives you the central tendency (mean), while MIN and MAX show the range extremes. Together, they provide a complete picture: AVG tells you the typical value, MIN/MAX show the spread. A small range suggests consistency; a large range indicates high variability.
Why it matters: In pricing analysis, knowing the average price ($50) plus the range ($10-$200) helps you understand your product mix better than average alone. High variability might indicate diverse product lines or pricing inconsistencies.
Combining All Statistical Aggregates
For comprehensive analysis, use multiple aggregate functions together to get the complete statistical picture:
-- Complete statistical summary of order values
SELECT
COUNT(*) AS total_orders,
SUM(order_total) AS total_revenue,
AVG(order_total) AS average_order,
MIN(order_total) AS smallest_order,
MAX(order_total) AS largest_order,
MAX(order_total) - MIN(order_total) AS order_range
FROM orders
WHERE status = 'completed';
-- Result:
-- total_orders: 4,850
-- total_revenue: 487,920.00
-- average_order: 100.60
-- smallest_order: 5.99
-- largest_order: 1,299.00
-- order_range: 1,293.01
Practical Use Case: Product Performance Analysis
Let's analyze product performance using all five aggregate functions together:
-- Product sales performance summary
SELECT
COUNT(DISTINCT product_id) AS total_products_sold,
SUM(quantity) AS total_units_sold,
SUM(price * quantity) AS total_revenue,
AVG(price) AS avg_selling_price,
MIN(price) AS lowest_price_sold,
MAX(price) AS highest_price_sold,
AVG(quantity) AS avg_quantity_per_order,
MAX(quantity) AS largest_single_order
FROM order_items
WHERE order_date >= '2025-01-01';
-- Result shows:
-- - How many different products were sold
-- - Total volume and revenue
-- - Pricing statistics
-- - Order size patterns
Finding Records with MIN/MAX Values
A common challenge: MIN/MAX give you the value, but how do you find the actual record? Use a subquery:
-- Find the most expensive product (name and price)
SELECT product_name, price
FROM products
WHERE price = (SELECT MAX(price) FROM products);
-- Result: Laptop Pro X1, 1299.99
-- Find customer with the most orders
SELECT
customer_id,
name,
COUNT(*) AS order_count
FROM orders
JOIN customers USING (customer_id)
GROUP BY customer_id, name
HAVING COUNT(*) = (
SELECT MAX(order_count)
FROM (
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
) AS counts
);
Practice Questions: AVG, MIN & MAX
Test your understanding with these statistical analysis challenges.
Task: Write a query to find the average price, minimum price, maximum price, and price range for all products.
Show Solution
SELECT
AVG(price) AS average_price,
MIN(price) AS lowest_price,
MAX(price) AS highest_price,
MAX(price) - MIN(price) AS price_range
FROM products;
-- Example output:
-- average_price: 67.89
-- lowest_price: 5.99
-- highest_price: 299.99
-- price_range: 294.00
Task: Identify which customer has spent the most money total across all their orders. Show customer name and total spent.
Show Solution
-- Step 1: Calculate total per customer with subquery
SELECT
c.customer_id,
c.name,
SUM(o.order_total) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
HAVING SUM(o.order_total) = (
SELECT MAX(customer_total)
FROM (
SELECT customer_id, SUM(order_total) AS customer_total
FROM orders
GROUP BY customer_id
) AS totals
);
-- Example output:
-- customer_id: 1847
-- name: Jennifer Martinez
-- total_spent: 12,487.50
Task: Show product name, price, category average price, and calculate how much above/below the category average each product is.
Show Solution
SELECT
p.product_name,
p.price,
p.category,
AVG(p2.price) AS category_avg_price,
ROUND(p.price - AVG(p2.price), 2) AS diff_from_avg,
ROUND((p.price - AVG(p2.price)) / AVG(p2.price) * 100, 1) AS percent_diff
FROM products p
JOIN products p2 ON p.category = p2.category
GROUP BY p.product_id, p.product_name, p.price, p.category
ORDER BY percent_diff DESC
LIMIT 10;
-- Shows which products are priced highest above their category average
GROUP BY Clause
GROUP BY is where aggregate functions truly shine. Instead of summarizing all data into one row, GROUP BY segments your data into categories and calculates aggregates for each group separately. This is the foundation of dimensional analysis and business reporting.
Understanding GROUP BY
Think of GROUP BY as creating buckets: it divides rows into groups based on column values, then applies aggregate functions to each group independently. Each unique value (or combination of values) becomes a separate group with its own aggregate result.
-- Without GROUP BY: one total for everything
SELECT SUM(order_total) AS total_revenue
FROM orders;
-- Result: 487,920.00 (single row)
-- With GROUP BY: totals per category
SELECT
category,
SUM(order_total) AS total_revenue
FROM orders
JOIN order_items USING (order_id)
JOIN products USING (product_id)
GROUP BY category;
-- Results: multiple rows, one per category
-- Electronics | 245,890.00
-- Clothing | 128,450.00
-- Home & Garden| 87,320.00
-- Books | 26,260.00
GROUP BY Rule
Every column in the SELECT clause must either be in the GROUP BY clause OR be inside an aggregate function. This rule ensures each group produces exactly one output row. Non-aggregated columns must be used for grouping, otherwise SQL doesn't know which value to display.
Why it matters: Violating this rule causes errors in most databases. If you SELECT category and product_name but only GROUP BY category, SQL doesn't know which product_name to show for each category (there are many). Solution: either add product_name to GROUP BY, or aggregate it (e.g., COUNT(DISTINCT product_name)).
Common GROUP BY Patterns
Let's explore the most common grouping scenarios you'll encounter in business analytics:
-- 1. Group by single column: Sales by category
SELECT
category,
COUNT(*) AS product_count,
AVG(price) AS avg_price,
SUM(quantity_sold) AS total_units
FROM products
GROUP BY category
ORDER BY total_units DESC;
-- 2. Group by date: Daily sales trend
SELECT
order_date,
COUNT(*) AS daily_orders,
SUM(order_total) AS daily_revenue
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY order_date
ORDER BY order_date;
-- 3. Group by multiple columns: Sales by region and category
SELECT
region,
category,
COUNT(*) AS order_count,
SUM(order_total) AS revenue
FROM orders
JOIN customers USING (customer_id)
JOIN order_items USING (order_id)
JOIN products USING (product_id)
GROUP BY region, category
ORDER BY region, revenue DESC;
Grouping by Multiple Columns
When you GROUP BY multiple columns, each unique combination becomes a separate group. This is powerful for multi-dimensional analysis:
-- Sales by year and month
SELECT
YEAR(order_date) AS year,
MONTH(order_date) AS month,
COUNT(*) AS orders,
SUM(order_total) AS revenue,
AVG(order_total) AS avg_order_value
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;
-- Result shows trends over time:
-- 2024 | 1 | 245 | 24,567.00 | 100.27
-- 2024 | 2 | 287 | 28,940.00 | 100.84
-- 2024 | 3 | 312 | 31,450.00 | 100.80
-- ... and so on
Practical Use Case: Customer Segmentation
Let's use GROUP BY to segment customers based on their purchasing behavior—a common analytics task:
-- Customer segmentation by order frequency and value
SELECT
CASE
WHEN order_count >= 10 THEN 'Frequent Buyer'
WHEN order_count >= 5 THEN 'Regular Customer'
WHEN order_count >= 2 THEN 'Occasional Buyer'
ELSE 'One-Time Customer'
END AS customer_segment,
COUNT(*) AS customers_in_segment,
AVG(total_spent) AS avg_lifetime_value,
SUM(total_spent) AS segment_revenue
FROM (
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(order_total) AS total_spent
FROM orders
GROUP BY customer_id
) AS customer_stats
GROUP BY
CASE
WHEN order_count >= 10 THEN 'Frequent Buyer'
WHEN order_count >= 5 THEN 'Regular Customer'
WHEN order_count >= 2 THEN 'Occasional Buyer'
ELSE 'One-Time Customer'
END
ORDER BY avg_lifetime_value DESC;
-- Result shows value of each customer segment
-- Frequent Buyer | 487 | 2,847.50 | 1,386,742.50
-- Regular Customer | 1,205 | 892.30 | 1,075,221.50
-- Occasional Buyer | 2,847 | 284.60 | 810,170.20
-- One-Time Customer | 5,461 | 67.80 | 370,257.80
WHERE vs HAVING: Filtering Before vs After Grouping
A critical distinction: WHERE filters rows BEFORE grouping, HAVING filters groups AFTER aggregation. Understanding this difference is essential for writing correct queries:
| Clause | When It Executes | What It Filters |
|---|---|---|
| WHERE | Before grouping and aggregation | Filters individual rows. Cannot use aggregate functions (no SUM, AVG, COUNT in WHERE). |
| HAVING | After grouping and aggregation | Filters groups based on aggregate results. Can use aggregate functions (SUM, AVG, COUNT, etc.). |
-- WHERE filters rows before grouping
SELECT category, COUNT(*) AS product_count
FROM products
WHERE price > 50 -- Filter: only expensive products
GROUP BY category;
-- Result: count of expensive products per category
-- HAVING filters groups after aggregation
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 10; -- Filter: only categories with many products
-- Result: only categories that have more than 10 products
-- Using both together
SELECT
category,
COUNT(*) AS product_count,
AVG(price) AS avg_price
FROM products
WHERE in_stock = true -- First: filter to in-stock products
GROUP BY category
HAVING COUNT(*) >= 5 -- Then: only categories with 5+ in-stock items
ORDER BY avg_price DESC;
Practice Questions: GROUP BY
Test your grouping and segmentation skills.
Task: Group products by category and show: category name, product count, average price, and lowest/highest prices.
Show Solution
SELECT
category,
COUNT(*) AS product_count,
ROUND(AVG(price), 2) AS avg_price,
MIN(price) AS lowest_price,
MAX(price) AS highest_price
FROM products
GROUP BY category
ORDER BY product_count DESC;
-- Example output:
-- Electronics | 487 | 125.67 | 9.99 | 1299.99
-- Clothing | 892 | 45.30 | 5.99 | 199.99
-- Home & Garden | 245 | 67.89 | 12.50 | 499.00
Task: Group orders by month and calculate: month number, order count, total revenue, and average order value for 2025.
Show Solution
SELECT
MONTH(order_date) AS month,
COUNT(*) AS order_count,
SUM(order_total) AS monthly_revenue,
ROUND(AVG(order_total), 2) AS avg_order_value
FROM orders
WHERE YEAR(order_date) = 2025
GROUP BY MONTH(order_date)
ORDER BY month;
-- Shows monthly trends:
-- 1 | 245 | 24,567.89 | 100.28
-- 2 | 287 | 28,940.50 | 100.84
-- 3 | 312 | 31,450.00 | 100.80
Task: For each region, show the top 5 customers by total spending. Include region, customer name, order count, and total spent.
Show Solution
-- Using window functions (modern SQL)
WITH customer_spending AS (
SELECT
c.region,
c.customer_id,
c.name,
COUNT(*) AS order_count,
SUM(o.order_total) AS total_spent,
ROW_NUMBER() OVER (PARTITION BY c.region ORDER BY SUM(o.order_total) DESC) AS rank
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.region, c.customer_id, c.name
)
SELECT
region,
name,
order_count,
total_spent
FROM customer_spending
WHERE rank <= 5
ORDER BY region, rank;
-- Shows top 5 spenders per region
HAVING Clause
While WHERE filters individual rows before grouping, HAVING filters entire groups after aggregation. This powerful clause lets you answer questions like "Which categories have more than 100 products?" or "Show only customers who've spent over $1,000 total."
Understanding HAVING
HAVING works exclusively with GROUP BY to filter groups based on aggregate conditions. Think of it as WHERE for aggregated data—but with a crucial difference in timing and capability:
-- Show categories with more than 50 products
SELECT
category,
COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 50
ORDER BY product_count DESC;
-- Result: Only categories with 50+ products
-- Electronics | 487
-- Clothing | 892
-- Home & Garden | 245
-- (categories with <50 products are excluded)
WHERE vs HAVING
WHERE filters rows BEFORE grouping (can't use aggregates). HAVING filters groups AFTER aggregation (can use SUM, AVG, COUNT, etc.). WHERE reduces data volume early; HAVING filters final results. Use both together for maximum efficiency and flexibility.
Why it matters: Understanding this distinction prevents errors and optimizes performance. Filter with WHERE when possible (more efficient). Use HAVING only when filtering requires aggregate calculations that don't exist until after grouping.
Common HAVING Patterns
Let's explore typical HAVING use cases in business analytics:
-- 1. Filter by COUNT: High-volume customers
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(order_total) AS total_spent
FROM orders
GROUP BY customer_id
HAVING COUNT(*) >= 5 -- Only customers with 5+ orders
ORDER BY total_spent DESC;
-- 2. Filter by SUM: High-revenue products
SELECT
product_id,
product_name,
SUM(quantity) AS total_sold,
SUM(price * quantity) AS total_revenue
FROM order_items
JOIN products USING (product_id)
GROUP BY product_id, product_name
HAVING SUM(price * quantity) > 10000 -- Only products with $10k+ revenue
ORDER BY total_revenue DESC;
-- 3. Filter by AVG: Above-average performers
SELECT
region,
COUNT(*) AS customer_count,
AVG(order_total) AS avg_order_value
FROM orders
JOIN customers USING (customer_id)
GROUP BY region
HAVING AVG(order_total) > 100 -- Only regions with $100+ average order
ORDER BY avg_order_value DESC;
Combining WHERE and HAVING
The most powerful queries use both WHERE and HAVING together—WHERE filters rows early, HAVING filters aggregated results:
-- Find active categories with high average prices
SELECT
category,
COUNT(*) AS product_count,
AVG(price) AS avg_price,
SUM(quantity_in_stock) AS total_inventory
FROM products
WHERE
status = 'active' -- WHERE: Filter rows first
AND quantity_in_stock > 0 -- (more efficient)
GROUP BY category
HAVING
COUNT(*) >= 10 -- HAVING: Filter groups
AND AVG(price) > 50 -- (after aggregation)
ORDER BY avg_price DESC;
-- Execution order:
-- 1. WHERE filters to active, in-stock products
-- 2. GROUP BY creates category groups
-- 3. Aggregates (COUNT, AVG, SUM) are calculated
-- 4. HAVING filters to categories meeting conditions
-- 5. ORDER BY sorts final results
Multiple Conditions in HAVING
Just like WHERE, you can combine multiple conditions in HAVING using AND/OR:
-- Find valuable customer segments: frequent AND high-spending
SELECT
customer_id,
customer_name,
COUNT(*) AS order_count,
SUM(order_total) AS total_spent,
AVG(order_total) AS avg_order_value
FROM orders
JOIN customers USING (customer_id)
WHERE order_date >= '2025-01-01' -- This year only
GROUP BY customer_id, customer_name
HAVING
COUNT(*) >= 3 -- At least 3 orders
AND SUM(order_total) > 500 -- Total spending over $500
ORDER BY total_spent DESC;
-- Find problem categories: many products but low sales
SELECT
category,
COUNT(*) AS product_count,
SUM(quantity_sold) AS total_sold,
AVG(quantity_sold) AS avg_sold_per_product
FROM products
GROUP BY category
HAVING
COUNT(*) >= 20 -- Many products
AND AVG(quantity_sold) < 10 -- But low average sales
ORDER BY avg_sold_per_product;
-- Identifies categories that may need marketing attention
HAVING with Subqueries
Advanced technique: Use subqueries in HAVING to compare groups against overall statistics:
-- Find categories performing above average
SELECT
category,
COUNT(*) AS product_count,
AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING AVG(price) > (
SELECT AVG(price) FROM products -- Overall average
)
ORDER BY avg_price DESC;
-- Find high-performing sales regions
SELECT
region,
SUM(order_total) AS region_revenue
FROM orders
JOIN customers USING (customer_id)
GROUP BY region
HAVING SUM(order_total) > (
SELECT AVG(region_total)
FROM (
SELECT region, SUM(order_total) AS region_total
FROM orders
JOIN customers USING (customer_id)
GROUP BY region
) AS region_stats
)
ORDER BY region_revenue DESC;
Practice Questions: HAVING Clause
Test your group filtering skills.
Task: Show categories that have more than 100 products, ordered by product count descending.
Show Solution
SELECT
category,
COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 100
ORDER BY product_count DESC;
-- Example output:
-- Clothing | 892
-- Electronics | 487
-- Home & Garden | 245
Task: Find customers who have placed at least 5 orders AND spent at least $1,000 total. Show customer name, order count, and total spent.
Show Solution
SELECT
c.customer_id,
c.name,
COUNT(*) AS order_count,
SUM(o.order_total) AS total_spent,
ROUND(AVG(o.order_total), 2) AS avg_order_value
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
HAVING
COUNT(*) >= 5
AND SUM(o.order_total) >= 1000
ORDER BY total_spent DESC;
-- Shows VIP customers qualifying for both criteria
Task: Find products that are currently in stock (quantity > 0) but have sold fewer than 10 units total. Use WHERE for stock filter, HAVING for sales filter.
Show Solution
SELECT
p.product_id,
p.product_name,
p.category,
p.quantity_in_stock,
COALESCE(SUM(oi.quantity), 0) AS total_sold
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
WHERE p.quantity_in_stock > 0 -- In stock
GROUP BY p.product_id, p.product_name, p.category, p.quantity_in_stock
HAVING COALESCE(SUM(oi.quantity), 0) < 10 -- Low sales
ORDER BY total_sold, quantity_in_stock DESC;
-- Identifies products that may need price adjustment or promotion
Advanced Aggregations
Master-level aggregation techniques combine multiple concepts—nested queries, conditional logic, joins with aggregates, and window functions. These advanced patterns solve complex business problems that require sophisticated data analysis.
Nested Aggregations with Subqueries
Sometimes you need to aggregate aggregated data—like finding the average of group totals. This requires subqueries to create intermediate results:
-- Average revenue per category (aggregate of aggregates)
SELECT AVG(category_revenue) AS avg_category_revenue
FROM (
SELECT
category,
SUM(order_total) AS category_revenue
FROM orders
JOIN order_items USING (order_id)
JOIN products USING (product_id)
GROUP BY category
) AS category_totals;
-- Result: 121,980.00 (average across all category totals)
-- Find customers who spent more than their region's average
SELECT
c.customer_id,
c.name,
c.region,
SUM(o.order_total) AS total_spent,
(SELECT AVG(customer_total)
FROM (
SELECT customer_id, SUM(order_total) AS customer_total
FROM orders o2
JOIN customers c2 ON o2.customer_id = c2.customer_id
WHERE c2.region = c.region
GROUP BY customer_id
) AS region_totals
) AS region_avg
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name, c.region
HAVING SUM(o.order_total) > (
SELECT AVG(customer_total)
FROM (
SELECT customer_id, SUM(order_total) AS customer_total
FROM orders o2
JOIN customers c2 ON o2.customer_id = c2.customer_id
WHERE c2.region = c.region
GROUP BY customer_id
) AS region_totals
)
ORDER BY region, total_spent DESC;
Correlated Subqueries
A correlated subquery references columns from the outer query, executing once per row. While powerful for row-by-row comparisons against aggregates, they can be slow on large datasets. Modern alternatives include window functions or CTEs (Common Table Expressions).
Why it matters: Correlated subqueries enable sophisticated comparisons—like "customers above their region's average" or "products priced higher than their category median"—but optimization is crucial. Consider window functions for better performance on large tables.
Conditional Aggregation with CASE
CASE statements inside aggregates enable pivot-like reporting and conditional calculations in a single query:
-- Revenue breakdown by product category in columns
SELECT
YEAR(order_date) AS year,
MONTH(order_date) AS month,
SUM(CASE WHEN category = 'Electronics' THEN order_total ELSE 0 END) AS electronics_revenue,
SUM(CASE WHEN category = 'Clothing' THEN order_total ELSE 0 END) AS clothing_revenue,
SUM(CASE WHEN category = 'Books' THEN order_total ELSE 0 END) AS books_revenue,
SUM(order_total) AS total_revenue
FROM orders
JOIN order_items USING (order_id)
JOIN products USING (product_id)
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;
-- Customer purchase frequency segmentation
SELECT
COUNT(*) AS total_customers,
SUM(CASE WHEN order_count = 1 THEN 1 ELSE 0 END) AS one_time_buyers,
SUM(CASE WHEN order_count BETWEEN 2 AND 4 THEN 1 ELSE 0 END) AS occasional,
SUM(CASE WHEN order_count BETWEEN 5 AND 9 THEN 1 ELSE 0 END) AS regular,
SUM(CASE WHEN order_count >= 10 THEN 1 ELSE 0 END) AS frequent,
ROUND(AVG(total_spent), 2) AS avg_lifetime_value
FROM (
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(order_total) AS total_spent
FROM orders
GROUP BY customer_id
) AS customer_stats;
Combining Joins with Aggregations
Real-world analytics often requires joining multiple tables before aggregating. Understanding join order and filtering is critical for correct results:
-- Sales performance by employee and region
SELECT
e.employee_id,
e.name AS employee_name,
e.region,
COUNT(DISTINCT o.order_id) AS orders_processed,
COUNT(DISTINCT o.customer_id) AS unique_customers,
SUM(oi.quantity) AS total_items_sold,
SUM(oi.price * oi.quantity) AS total_revenue,
ROUND(AVG(o.order_total), 2) AS avg_order_value
FROM employees e
LEFT JOIN orders o ON e.employee_id = o.sales_rep_id
LEFT JOIN order_items oi ON o.order_id = oi.order_id
WHERE e.department = 'Sales'
AND o.order_date >= '2025-01-01'
GROUP BY e.employee_id, e.name, e.region
HAVING COUNT(DISTINCT o.order_id) > 0 -- Only employees with orders
ORDER BY total_revenue DESC;
-- Product popularity by customer segment
SELECT
p.product_name,
p.category,
COUNT(DISTINCT CASE WHEN c.age < 25 THEN o.customer_id END) AS young_buyers,
COUNT(DISTINCT CASE WHEN c.age BETWEEN 25 AND 45 THEN o.customer_id END) AS middle_buyers,
COUNT(DISTINCT CASE WHEN c.age > 45 THEN o.customer_id END) AS senior_buyers,
SUM(oi.quantity) AS total_sold,
ROUND(AVG(oi.price), 2) AS avg_selling_price
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o ON oi.order_id = o.order_id
JOIN customers c ON o.customer_id = c.customer_id
GROUP BY p.product_id, p.product_name, p.category
HAVING SUM(oi.quantity) >= 100 -- Popular products only
ORDER BY total_sold DESC;
Window Functions for Advanced Analytics
Window functions (also called analytic functions) perform calculations across row sets without collapsing them into groups. They're perfect for rankings, running totals, and comparisons:
-- Running total of daily revenue
SELECT
order_date,
SUM(order_total) AS daily_revenue,
SUM(SUM(order_total)) OVER (ORDER BY order_date) AS running_total,
AVG(SUM(order_total)) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7day
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY order_date
ORDER BY order_date;
-- Rank products by revenue within each category
SELECT
product_id,
product_name,
category,
SUM(price * quantity) AS total_revenue,
RANK() OVER (
PARTITION BY category
ORDER BY SUM(price * quantity) DESC
) AS category_rank,
SUM(SUM(price * quantity)) OVER (
PARTITION BY category
) AS category_total_revenue,
ROUND(
SUM(price * quantity) * 100.0 / SUM(SUM(price * quantity)) OVER (PARTITION BY category),
2
) AS percent_of_category
FROM order_items
JOIN products USING (product_id)
GROUP BY product_id, product_name, category
ORDER BY category, category_rank;
Common Table Expressions (CTEs)
CTEs provide a cleaner, more maintainable way to write complex aggregation queries:
-- Multi-step analysis using CTEs
WITH
-- Step 1: Calculate monthly totals per customer
monthly_customer_sales AS (
SELECT
customer_id,
YEAR(order_date) AS year,
MONTH(order_date) AS month,
COUNT(*) AS order_count,
SUM(order_total) AS monthly_total
FROM orders
GROUP BY customer_id, YEAR(order_date), MONTH(order_date)
),
-- Step 2: Find customer lifetime totals
customer_lifetime AS (
SELECT
customer_id,
SUM(monthly_total) AS lifetime_value,
COUNT(DISTINCT CONCAT(year, '-', month)) AS active_months
FROM monthly_customer_sales
GROUP BY customer_id
),
-- Step 3: Segment customers
customer_segments AS (
SELECT
customer_id,
lifetime_value,
active_months,
CASE
WHEN lifetime_value >= 5000 THEN 'Premium'
WHEN lifetime_value >= 1000 THEN 'Gold'
WHEN lifetime_value >= 500 THEN 'Silver'
ELSE 'Bronze'
END AS segment
FROM customer_lifetime
)
-- Final: Analyze segments
SELECT
segment,
COUNT(*) AS customer_count,
ROUND(AVG(lifetime_value), 2) AS avg_lifetime_value,
ROUND(SUM(lifetime_value), 2) AS segment_total_revenue,
ROUND(AVG(active_months), 1) AS avg_active_months
FROM customer_segments
GROUP BY segment
ORDER BY avg_lifetime_value DESC;
Practice Questions: Advanced Aggregations
Test your mastery of complex aggregation techniques.
Task: Show monthly revenue with separate columns for each product category using conditional aggregation.
Show Solution
SELECT
YEAR(o.order_date) AS year,
MONTH(o.order_date) AS month,
SUM(CASE WHEN p.category = 'Electronics'
THEN oi.price * oi.quantity ELSE 0 END) AS electronics,
SUM(CASE WHEN p.category = 'Clothing'
THEN oi.price * oi.quantity ELSE 0 END) AS clothing,
SUM(CASE WHEN p.category = 'Books'
THEN oi.price * oi.quantity ELSE 0 END) AS books,
SUM(oi.price * oi.quantity) AS total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= '2025-01-01'
GROUP BY YEAR(o.order_date), MONTH(o.order_date)
ORDER BY year, month;
Task: Show products where total sales exceed their category's average sales, including both the product and category averages.
Show Solution
WITH product_sales AS (
SELECT
p.product_id,
p.product_name,
p.category,
SUM(oi.quantity) AS total_sold,
SUM(oi.price * oi.quantity) AS total_revenue
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name, p.category
),
category_averages AS (
SELECT
category,
AVG(total_revenue) AS avg_category_revenue
FROM product_sales
GROUP BY category
)
SELECT
ps.product_name,
ps.category,
ps.total_revenue,
ROUND(ca.avg_category_revenue, 2) AS category_avg,
ROUND(ps.total_revenue - ca.avg_category_revenue, 2) AS diff_from_avg
FROM product_sales ps
JOIN category_averages ca ON ps.category = ca.category
WHERE ps.total_revenue > ca.avg_category_revenue
ORDER BY diff_from_avg DESC;
Task: Calculate what percentage of customers who made their first purchase each month came back to purchase again.
Show Solution
WITH first_purchase AS (
SELECT
customer_id,
MIN(order_date) AS first_order_date
FROM orders
GROUP BY customer_id
),
first_purchase_cohort AS (
SELECT
DATE_FORMAT(first_order_date, '%Y-%m') AS cohort_month,
customer_id
FROM first_purchase
),
repeat_customers AS (
SELECT
fp.customer_id,
fpc.cohort_month
FROM first_purchase fp
JOIN first_purchase_cohort fpc ON fp.customer_id = fpc.customer_id
JOIN orders o ON fp.customer_id = o.customer_id
WHERE o.order_date > fp.first_order_date
GROUP BY fp.customer_id, fpc.cohort_month
)
SELECT
fpc.cohort_month,
COUNT(DISTINCT fpc.customer_id) AS new_customers,
COUNT(DISTINCT rc.customer_id) AS repeat_customers,
ROUND(
COUNT(DISTINCT rc.customer_id) * 100.0 / COUNT(DISTINCT fpc.customer_id),
2
) AS repeat_rate_percent
FROM first_purchase_cohort fpc
LEFT JOIN repeat_customers rc ON fpc.customer_id = rc.customer_id
GROUP BY fpc.cohort_month
ORDER BY fpc.cohort_month;
Key Takeaways
Aggregate Functions
COUNT, SUM, AVG, MIN, MAX perform calculations on sets of rows, returning single values for analysis and reporting.
GROUP BY Power
GROUP BY divides data into groups, applying aggregate functions to each group separately for category analysis and segmentation.
HAVING vs WHERE
WHERE filters individual rows before grouping; HAVING filters groups after aggregation is complete. Know when to use each.
Business Insights
Aggregations transform raw data into actionable metrics: sales totals, customer counts, average values, and trends over time.
Joins + Aggregations
Combining joins with aggregations enables complex multi-table analytics for comprehensive business reporting and insights.
NULL Handling
Most aggregate functions ignore NULLs; COUNT(*) counts all rows, COUNT(column) excludes NULLs. This matters for accuracy.
Knowledge Check
Test your understanding of SQL aggregation concepts with these questions.