What are SQL aggregate functions?
SQL aggregate functions perform calculations on a set of rows and return a single value. They are commonly used with the GROUP BY clause to summarize data within groups, but can also be applied to an entire table.
Introduction to Aggregate Functions
Aggregate functions are a fundamental part of SQL, allowing users to derive meaningful insights from large datasets by summarizing information. Instead of returning a value for each individual row, they operate on a collection of rows to produce a single result. For example, you can calculate the total sum of sales, the average price of products, or the count of employees in a department.
Common SQL Aggregate Functions
Here are some of the most frequently used aggregate functions in SQL:
COUNT()
Counts the number of rows that match a specified criterion. COUNT(*) counts all rows, COUNT(column_name) counts non-NULL values in a column, and COUNT(DISTINCT column_name) counts unique non-NULL values.
SELECT COUNT(*) FROM Employees;
SELECT COUNT(DISTINCT DepartmentID) FROM Employees;
SUM()
Calculates the total sum of a numeric column. It ignores NULL values.
SELECT SUM(Salary) FROM Employees WHERE DepartmentID = 3;
AVG()
Calculates the average value of a numeric column. It ignores NULL values.
SELECT AVG(Price) FROM Products;
MIN()
Finds the minimum value in a specified column. This can be used with numeric, string, or date/time data types. It ignores NULL values.
SELECT MIN(OrderDate) FROM Orders;
SELECT MIN(ProductName) FROM Products;
MAX()
Finds the maximum value in a specified column. Similar to MIN(), it works with various data types and ignores NULL values.
SELECT MAX(Salary) FROM Employees;
SELECT MAX(ProductName) FROM Products;
The GROUP BY Clause
Aggregate functions are most powerful when combined with the GROUP BY clause. The GROUP BY clause is used to group rows that have the same values in specified columns into summary rows, enabling the aggregate functions to operate on each group independently.
SELECT DepartmentID, AVG(Salary)
FROM Employees
GROUP BY DepartmentID;
The HAVING Clause
While the WHERE clause filters individual rows before grouping, the HAVING clause is used to filter groups based on conditions applied to aggregate functions. It must be used after the GROUP BY clause.
SELECT DepartmentID, COUNT(EmployeeID), AVG(Salary)
FROM Employees
GROUP BY DepartmentID
HAVING COUNT(EmployeeID) > 5 AND AVG(Salary) > 60000;
Key Characteristics and Rules
- Aggregate functions always return a single value per group or for the entire result set.
- They ignore NULL values by default, except for COUNT(*).
- They cannot be directly used in the WHERE clause because WHERE filters individual rows before grouping. Use HAVING for filtering grouped results.
- When using an aggregate function with other columns in the SELECT statement, those columns must appear in the GROUP BY clause.