What is correlated subquery?
A correlated subquery is a subquery that depends on the outer query for its values and executes once for each row processed by the outer query. Unlike a regular subquery, it cannot be executed independently.
Definition
In a correlated subquery, the inner query references one or more columns from the table in the outer query. Because of this dependency, the subquery is re-evaluated for every row returned by the outer query, making its execution intertwined with the outer query's processing.
How it Works
- The outer query starts processing its rows.
- For each row selected by the outer query, the correlated subquery is executed.
- The subquery uses a value from the current row of the outer query in its
WHEREclause or other conditions. - The result of the subquery is then used by the outer query to filter or select the current row.
Example
Suppose you want to find all employees whose salary is greater than the average salary of their respective department.
SELECT E1.employee_name, E1.salary, E1.department_id
FROM Employees E1
WHERE E1.salary > (
SELECT AVG(E2.salary)
FROM Employees E2
WHERE E2.department_id = E1.department_id
);
In this example, the subquery (SELECT AVG(E2.salary) FROM Employees E2 WHERE E2.department_id = E1.department_id) is correlated because it refers to E1.department_id from the outer query (aliased as E1). For each employee (E1) the outer query considers, the inner query calculates the average salary for *that specific employee's department*.
Characteristics
- Row-by-Row Execution: Executes once for each row processed by the outer query, which can impact performance for large datasets.
- Dependency: Explicitly references columns from the outer query.
- Versatility: Useful for complex row-level comparisons that are difficult to express with simple joins.
- Keywords: Often used with
EXISTS,NOT EXISTS, comparison operators (=,>,<), or aggregate functions in the subquery'sSELECTclause.
When to Use
- To find records that have a specific relationship with other records within the same table (self-referencing logic).
- When an aggregate function needs to be computed for each group defined by the outer query's current row.
- For existence checks (e.g., finding customers who have placed at least one order).
Alternatives
- JOINs with Derived Tables/CTEs: Often, correlated subqueries can be rewritten using
JOINoperations combined with Common Table Expressions (CTEs) or derived tables to pre-calculate values, which can be more efficient. - Non-Correlated Subqueries: For simpler cases where the inner query's result is independent of the outer query, a non-correlated subquery is more appropriate and generally faster.