SQL Interview Questions
π‘ Click Show Answer to generate an AI-powered answer instantly.
What is the difference between WHERE and HAVING clause?
In SQL, both the WHERE and HAVING clauses are used to filter data, but they operate at different stages of query processing and on different types of data. Understanding their distinct roles is crucial for writing efficient and correct SQL queries.
The WHERE Clause
The WHERE clause is used to filter individual rows based on specified conditions *before* any grouping occurs. It operates on rows retrieved from the FROM clause and can filter data based on columns that are not aggregated. It cannot directly contain aggregate functions.
SELECT product_name, price
FROM products
WHERE price > 50;
The HAVING Clause
The HAVING clause is used to filter *groups* of rows based on specified conditions *after* the GROUP BY clause has been applied. It typically operates on the results of aggregate functions (like SUM, COUNT, AVG, MAX, MIN) and filters the groups that meet the criteria. If no GROUP BY clause is present, HAVING acts on the entire result set as a single group.
SELECT department, COUNT(employee_id) AS total_employees
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 5;
Key Differences and Comparison
| Feature | WHERE Clause | HAVING Clause |
|---|---|---|
| Purpose | Filters individual rows | Filters groups of rows |
| Execution Stage | Before GROUP BY | After GROUP BY |
| Applicability | Works on individual rows/columns | Works on aggregate functions/groups |
| Aggregate Functions | Cannot use aggregate functions directly | Can and often does use aggregate functions |
| Data Filtering | Filters data before aggregation | Filters data after aggregation |
| Columns Used | Non-aggregated columns | Aggregated columns (from GROUP BY) or aggregate functions |
Summary
In essence, use WHERE to filter individual records before they are grouped, and use HAVING to filter the results of groups after aggregation. Combining both clauses allows for precise control over data filtering at different stages of a SQL query's execution.
Explain different types of JOINs in SQL.
SQL JOINs are fundamental operations used to combine rows from two or more tables based on a related column between them. They are essential for retrieving meaningful data from relational databases, allowing you to create a comprehensive view of scattered information.
Introduction to SQL JOINs
In relational databases, data is often distributed across multiple tables to ensure normalization and reduce redundancy. A JOIN clause is used to combine rows from two or more tables, based on a common field between them. The type of JOIN determines which rows are kept from each table when a match is found or not found.
Types of SQL JOINs
INNER JOIN
The INNER JOIN keyword selects all rows from both tables as long as there is a match between the columns in both tables. It returns only the rows where the join condition is met in both tables, effectively discarding rows that do not have a match in the other table.
SELECT orders.order_id, customers.customer_name
FROM orders
INNER JOIN customers ON orders.customer_id = customers.customer_id;
LEFT JOIN (or LEFT OUTER JOIN)
The LEFT JOIN keyword returns all rows from the left table (table1), and the matching rows from the right table (table2). If there is no match in the right table, NULL is used for columns from the right table. It's often used when you want to see all entries from one table, and any related entries from another.
SELECT customers.customer_name, orders.order_id
FROM customers
LEFT JOIN orders ON customers.customer_id = orders.customer_id;
RIGHT JOIN (or RIGHT OUTER JOIN)
The RIGHT JOIN keyword returns all rows from the right table (table2), and the matching rows from the left table (table1). If there is no match in the left table, NULL is used for columns from the left table. This is essentially the mirror image of a LEFT JOIN.
SELECT employees.employee_name, departments.department_name
FROM employees
RIGHT JOIN departments ON employees.department_id = departments.department_id;
FULL OUTER JOIN (or OUTER JOIN)
The FULL OUTER JOIN keyword returns all rows when there is a match in either table. It combines the results of both LEFT and RIGHT outer joins. If there are rows in either table that do not have matches in the other table, those rows will still be included, with NULL values for the columns of the table that lacked a match.
SELECT employees.employee_name, departments.department_name
FROM employees
FULL OUTER JOIN departments ON employees.department_id = departments.department_id;
CROSS JOIN
A CROSS JOIN produces a Cartesian product of the tables involved in the join. This means it combines each row from the first table with every row from the second table. If table A has N rows and table B has M rows, a CROSS JOIN will result in N * M rows. It does not require a join condition.
SELECT products.product_name, colors.color_name
FROM products
CROSS JOIN colors;
SELF JOIN
A SELF JOIN is a regular join, but the table is joined with itself. It is used to combine rows with other rows in the same table. This is particularly useful for querying hierarchical data or comparing rows within the same table, often requiring table aliases to differentiate between the two instances of the table.
SELECT A.employee_name AS Employee, B.employee_name AS Manager
FROM employees A, employees B
WHERE A.manager_id = B.employee_id;
What is the difference between INNER JOIN and LEFT JOIN?
SQL JOIN clauses are used to combine rows from two or more tables, based on a related column between them. This document will focus on explaining the fundamental differences and use cases for INNER JOIN and LEFT JOIN.
SQL JOINs Overview
JOINs are a core concept in relational databases, enabling you to retrieve data from multiple tables simultaneously. They establish a relationship between tables based on common columns, typically primary and foreign keys. For our examples, consider two tables: 'Customers' and 'Orders'.
| CustomerID | Name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| 3 | Charlie |
| OrderID | CustomerID | Amount |
|---|---|---|
| 101 | 1 | 150.00 |
| 102 | 2 | 25.00 |
| 103 | 1 | 75.00 |
| 104 | 4 | 50.00 |
INNER JOIN
An INNER JOIN returns only the rows that have matching values in both tables. If a record in one table does not have a matching record in the other table, it is excluded from the result set. It's the most common type of JOIN and is often implied if you simply use the JOIN keyword without specifying a type.
SELECT C.CustomerID, C.Name, O.OrderID, O.Amount
FROM Customers C
INNER JOIN Orders O ON C.CustomerID = O.CustomerID;
Result of the INNER JOIN on 'Customers' and 'Orders':
| CustomerID | Name | OrderID | Amount |
|---|---|---|---|
| 1 | Alice | 101 | 150.00 |
| 1 | Alice | 103 | 75.00 |
| 2 | Bob | 102 | 25.00 |
LEFT JOIN (or LEFT OUTER JOIN)
A LEFT JOIN (also known as LEFT OUTER JOIN) returns all rows from the left table, and the matching rows from the right table. If there is no match for a row in the left table, the columns from the right table will contain NULLs in the result set. It preserves all records from the 'left' table (the first table mentioned in the FROM clause).
SELECT C.CustomerID, C.Name, O.OrderID, O.Amount
FROM Customers C
LEFT JOIN Orders O ON C.CustomerID = O.CustomerID;
Result of the LEFT JOIN on 'Customers' and 'Orders':
| CustomerID | Name | OrderID | Amount |
|---|---|---|---|
| 1 | Alice | 101 | 150.00 |
| 1 | Alice | 103 | 75.00 |
| 2 | Bob | 102 | 25.00 |
| 3 | Charlie | NULL | NULL |
Key Differences Summarized
- Matching Rows: INNER JOIN returns only rows where a match exists in both tables. LEFT JOIN returns all rows from the left table, and matched rows from the right table.
- Unmatched Rows: INNER JOIN excludes unmatched rows from either table. LEFT JOIN includes all rows from the left table, padding columns from the right table with 'NULL's where no match exists.
- Result Size: The result of an INNER JOIN can be smaller than or equal to the smallest of the two tables. The result of a LEFT JOIN will always have at least as many rows as the left table.
When to Use Which?
Use INNER JOIN when:
- You only care about records that have a direct relationship in both tables.
- You want to find all customers who have placed at least one order.
- You need to combine information only where there's a common data point across both datasets.
Use LEFT JOIN when:
- You want to retrieve all records from one table (the 'left' table) regardless of whether they have a match in the second table.
- You want to find all customers, and if they have orders, include order details (otherwise, show 'NULL's for order details).
- You need to identify records in the left table that *do not* have a match in the right table (often achieved by adding
WHERE right_table.id IS NULL).
Conclusion
Choosing between INNER JOIN and LEFT JOIN depends entirely on the specific data you need to retrieve. INNER JOIN is for intersecting data, while LEFT JOIN is for preserving all data from one table while optionally adding related data from another. Understanding their distinct behaviors is crucial for writing accurate and efficient SQL queries.
What is GROUP BY clause?
The SQL GROUP BY clause is a powerful command used with aggregate functions to group rows that have the same values into summary rows. It is an essential tool for data analysis and reporting, allowing you to perform calculations on subsets of data rather than the entire dataset.
Understanding the GROUP BY Clause
The primary function of the GROUP BY clause is to arrange identical data into groups. When combined with aggregate functions like COUNT(), SUM(), AVG(), MAX(), and MIN(), it allows you to compute a single summary value for each group, making it invaluable for generating summarized reports and statistics.
Syntax
SELECT column1, aggregate_function(column2)
FROM table_name
WHERE condition
GROUP BY column1, column3
ORDER BY column1;
Key Concepts
- Aggregate Functions: GROUP BY is almost always used in conjunction with aggregate functions to perform calculations (e.g., sum, average, count) on each group.
- Non-aggregated Columns: Any column that appears in the SELECT list and is not part of an aggregate function must be included in the GROUP BY clause.
- Filtering Groups: Use the
HAVINGclause to filter groups based on aggregate conditions, unlikeWHEREwhich filters individual rows *before* grouping occurs.
Example
Consider a table named Orders with columns CustomerID, OrderDate, and Amount. To find the total amount spent by each customer, you would use GROUP BY as follows:
SELECT CustomerID, SUM(Amount) AS TotalAmountSpent
FROM Orders
GROUP BY CustomerID;
This query would return a result set where each row represents a unique CustomerID, and the TotalAmountSpent column shows the sum of all Amount values for orders placed by that specific customer.
Common Uses and Best Practices
- Sales Analysis: Grouping sales data by product category, region, or time period to identify trends.
- User Activity: Counting user actions (e.g., logins, purchases) per user or per day.
- Reporting: Generating summary reports, such as monthly sales summaries or departmental expense reports.
- Performance: For very large datasets, ensure you have appropriate indexes on columns used in the GROUP BY clause to optimize query performance.
What is the difference between UNION and UNION ALL?
In SQL, both `UNION` and `UNION ALL` operators are used to combine the result sets of two or more `SELECT` statements into a single result set. While their primary goal is similar, they differ significantly in how they handle duplicate rows and, consequently, their performance characteristics. Understanding these distinctions is crucial for efficient query writing and data manipulation.
Understanding UNION
The UNION operator combines the result sets of two or more SELECT statements and eliminates duplicate rows from the final result. For UNION to work, each SELECT statement must have the same number of columns, and the corresponding columns must have compatible data types. The column names in the final result set are usually taken from the first SELECT statement.
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;
Because UNION performs a distinct operation to remove duplicates, it often involves an implicit sorting or hashing process, which can make it slower and more resource-intensive, especially when dealing with large datasets. It guarantees a unique set of rows in its output.
Understanding UNION ALL
The UNION ALL operator combines the result sets of two or more SELECT statements, but unlike UNION, it retains all duplicate rows. This means if a row exists in both result sets, or multiple times within a single result set, it will appear as many times in the final output. Like UNION, the SELECT statements must have the same number of columns with compatible data types.
SELECT column1, column2 FROM table1
UNION ALL
SELECT column1, column2 FROM table2;
Since UNION ALL does not perform any distinct operation or duplicate checking, it is generally much faster and less resource-intensive than UNION. It simply appends the results of subsequent SELECT statements to the first one, making it the preferred choice when you know there are no duplicates or when preserving duplicates is desired.
Key Differences Summarized
| Feature | UNION | UNION ALL |
|---|---|---|
| Duplicate Rows | Removes duplicates | Includes duplicates |
| Performance | Slower (due to duplicate removal) | Faster (no duplicate removal) |
| Sorting | Often involves implicit sorting/hashing | Does not involve implicit sorting/hashing |
| Resource Usage | Higher (more processing) | Lower (less processing) |
When to Use Which?
Use UNION when:
- You need to combine results from multiple queries and want only unique rows in the final output.
- You are intentionally filtering out duplicate data across your combined sets.
- The overhead of duplicate removal is acceptable for data integrity.
Use UNION ALL when:
- You need to combine results from multiple queries and want to retain all rows, including duplicates.
- Performance is a critical concern, and you are certain there are no duplicates, or duplicates are desired.
- You are simply aggregating all data from different sources without needing to de-duplicate.
What are aggregate functions in SQL?
SQL aggregate functions perform calculations on a set of rows and return a single summary value. They are commonly used with the GROUP BY clause to summarize data for each group, but can also be used to summarize an entire table, providing powerful analytical capabilities.
What Are Aggregate Functions?
Aggregate functions operate on a collection of input values and return a single value summarizing those inputs. Unlike scalar functions, which operate on a single row, aggregate functions process multiple rows to produce one result. They are essential for analytical queries and reporting, allowing users to gain insights into their data by summarizing it.
Common Aggregate Functions
The most frequently used aggregate functions in SQL include COUNT, SUM, AVG, MIN, and MAX. Each serves a distinct purpose in data summarization.
- COUNT(): Returns the number of rows or non-NULL values in a specified column. COUNT(*) counts all rows, while COUNT(column_name) counts non-NULL values.
- SUM(): Calculates the sum of all values in a numeric column. It ignores NULL values.
- AVG(): Computes the average (arithmetic mean) of all values in a numeric column. It also ignores NULL values.
- MIN(): Returns the minimum value in a column. This can apply to numeric, string, or date/time data types.
- MAX(): Returns the maximum value in a column. Similar to MIN(), it works with various data types.
Using with GROUP BY Clause
Aggregate functions are often used in conjunction with the GROUP BY clause to divide the rows into groups and perform the aggregation for each group. This allows you to get summary statistics per category, such as the average salary per department or the total sales per product.
SELECT department, AVG(salary) AS average_salary
FROM employees
GROUP BY department;
Using with HAVING Clause
The HAVING clause is used to filter groups based on the results of an aggregate function. It is applied after the GROUP BY clause, whereas the WHERE clause filters individual rows before grouping. This distinction is crucial for filtering on aggregated data.
SELECT department, COUNT(employee_id) AS num_employees
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 5;
The DISTINCT Keyword
The DISTINCT keyword can be used inside some aggregate functions (like COUNT, SUM, AVG) to operate only on unique values within the specified column, ignoring duplicates. This is particularly useful when you need to count unique occurrences or sum unique values.
SELECT COUNT(DISTINCT city) AS unique_cities
FROM customers;
What is a subquery?
A subquery, also known as an inner query or inner select, is a query nested inside another SQL query. It can be embedded within SELECT, INSERT, UPDATE, or DELETE statements, or even within another subquery. Subqueries are used to return data that will be used by the main query as a condition or for calculation.
What is a Subquery?
In SQL, a subquery is a query (SELECT statement) that is embedded inside another SQL query. The inner query executes first, and its result is then used by the outer query. This allows for more complex data retrieval and manipulation by using the results of one query as input for another.
Subqueries can be used in various clauses of the main query, including WHERE, HAVING, FROM, and SELECT. They are particularly useful for performing operations that require a temporary result set or for filtering data based on values derived from another table or calculation.
Types of Subqueries
Subqueries can be categorized based on the number of rows and columns they return.
Scalar Subquery
A scalar subquery returns a single row and a single column (a single value). It can be used anywhere a single value is expected, such as in the SELECT clause, WHERE clause, or as part of an expression.
SELECT product_name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products);
Row Subquery
A row subquery returns a single row but multiple columns. It is often used in the WHERE or HAVING clause where multiple column values need to be compared against a single row.
SELECT employee_id, first_name, last_name
FROM employees
WHERE (department_id, salary) = (SELECT department_id, MAX(salary) FROM employees GROUP BY department_id HAVING department_id = 10);
Table Subquery
A table subquery returns multiple rows and multiple columns. It is typically used in the FROM clause as a derived table (inline view) or with operators like IN, EXISTS, or ALL/ANY in the WHERE clause.
SELECT c.customer_name, o.order_date
FROM customers c
JOIN (SELECT customer_id, MAX(order_date) AS order_date FROM orders GROUP BY customer_id) o
ON c.customer_id = o.customer_id;
Key Characteristics and Rules
- Subqueries must be enclosed in parentheses.
- An outer query can execute a subquery once for each row processed by the outer query (correlated subquery) or execute once and cache the result (non-correlated subquery).
- Subqueries can return single values, single rows, or multiple rows and columns.
- They can be used with comparison operators (e.g., =, <, >), set operators (e.g., IN, NOT IN, EXISTS), and quantifiers (e.g., ALL, ANY).
- The ORDER BY clause cannot be used directly in a subquery, except when TOP or ROWNUM is specified.
Advantages of Subqueries
- Improve readability and organization of complex queries.
- Allow for structured queries where the output of one query is used as input for another.
- Provide an alternative to complex joins for certain types of queries.
- Easier to maintain and understand compared to very complex single queries.
Disadvantages of Subqueries
- Can be less efficient than joins in some scenarios, especially with large datasets.
- Poor performance if not optimized, particularly for correlated subqueries.
- Debugging can be more challenging due to the nested nature of the queries.
- Lack of clarity in some complex nested structures if not carefully written.
What is correlated subquery?
A correlated subquery is a subquery that depends on the outer query for its values and executes once for each row processed by the outer query. Unlike a regular subquery, it cannot be executed independently.
Definition
In a correlated subquery, the inner query references one or more columns from the table in the outer query. Because of this dependency, the subquery is re-evaluated for every row returned by the outer query, making its execution intertwined with the outer query's processing.
How it Works
- The outer query starts processing its rows.
- For each row selected by the outer query, the correlated subquery is executed.
- The subquery uses a value from the current row of the outer query in its
WHEREclause or other conditions. - The result of the subquery is then used by the outer query to filter or select the current row.
Example
Suppose you want to find all employees whose salary is greater than the average salary of their respective department.
SELECT E1.employee_name, E1.salary, E1.department_id
FROM Employees E1
WHERE E1.salary > (
SELECT AVG(E2.salary)
FROM Employees E2
WHERE E2.department_id = E1.department_id
);
In this example, the subquery (SELECT AVG(E2.salary) FROM Employees E2 WHERE E2.department_id = E1.department_id) is correlated because it refers to E1.department_id from the outer query (aliased as E1). For each employee (E1) the outer query considers, the inner query calculates the average salary for *that specific employee's department*.
Characteristics
- Row-by-Row Execution: Executes once for each row processed by the outer query, which can impact performance for large datasets.
- Dependency: Explicitly references columns from the outer query.
- Versatility: Useful for complex row-level comparisons that are difficult to express with simple joins.
- Keywords: Often used with
EXISTS,NOT EXISTS, comparison operators (=,>,<), or aggregate functions in the subquery'sSELECTclause.
When to Use
- To find records that have a specific relationship with other records within the same table (self-referencing logic).
- When an aggregate function needs to be computed for each group defined by the outer query's current row.
- For existence checks (e.g., finding customers who have placed at least one order).
Alternatives
- JOINs with Derived Tables/CTEs: Often, correlated subqueries can be rewritten using
JOINoperations combined with Common Table Expressions (CTEs) or derived tables to pre-calculate values, which can be more efficient. - Non-Correlated Subqueries: For simpler cases where the inner query's result is independent of the outer query, a non-correlated subquery is more appropriate and generally faster.
What is the difference between DELETE and TRUNCATE?
In SQL, both the DELETE and TRUNCATE commands are used to remove data from tables. However, they operate very differently in terms of how they remove data, their performance, logging, and transactional behavior.
Overview
DELETE is a DML (Data Manipulation Language) command that removes rows one by one, allowing for conditional deletion and transaction logging. TRUNCATE is a DDL (Data Definition Language) command that deallocates data pages, making it faster for removing all rows from a table.
DELETE Statement
The DELETE statement is used to remove one or more rows from a table. It can include a WHERE clause to specify which rows to delete. If no WHERE clause is provided, all rows are deleted. DELETE operations are logged, allowing them to be rolled back and triggering ON DELETE triggers.
DELETE FROM Employees WHERE DepartmentID = 10;
DELETE FROM Products;
- DML (Data Manipulation Language) command.
- Removes rows one by one.
- Allows
WHEREclause for conditional deletion. - Generates rollback segments (can be rolled back).
- Fires
ON DELETEtriggers. - Resets
AUTO_INCREMENTorIDENTITYcolumns only if all rows are deleted and the table is empty. - Slower for large tables compared to TRUNCATE.
- Requires
DELETEprivilege.
TRUNCATE Statement
The TRUNCATE statement is used to remove all rows from a table quickly and efficiently. It works by deallocating the data pages used by the table and logging only the deallocation of pages, rather than individual row deletions. This makes it much faster than DELETE for large tables, but it cannot be rolled back and does not fire triggers.
TRUNCATE TABLE Employees;
- DDL (Data Definition Language) command.
- Removes all rows by deallocating data pages.
- Does not allow
WHEREclause. - Cannot be rolled back (implicit COMMIT).
- Does not fire
ON DELETEtriggers. - Always resets
AUTO_INCREMENTorIDENTITYcolumns. - Faster for large tables.
- Requires
DROPprivilege on the table.
Key Differences
| Feature | DELETE | TRUNCATE |
|---|---|---|
| Command Type | DML | DDL |
| Row-by-row deletion | Yes | No (deallocates pages) |
| `WHERE` clause | Yes | No |
| Rollback | Yes | No (implicit COMMIT) |
| Triggers | Fires `ON DELETE` triggers | Does not fire triggers |
| Auto-Increment Reset | Only if all rows deleted and table empty | Always resets |
| Performance | Slower for large tables | Faster for large tables |
| Logging | Logs each row deletion | Logs page deallocation |
| Privilege | `DELETE` privilege | `DROP` privilege on table |
What is the difference between DROP and TRUNCATE?
The DROP and TRUNCATE commands are both used in SQL to remove data or objects, but they operate at different levels and have distinct implications. Understanding their differences is crucial for effective database management.
DROP Command
The DROP command is a Data Definition Language (DDL) statement used to remove an entire schema object from the database. This includes tables, indexes, views, stored procedures, functions, and more. When you DROP a table, its entire definition (structure), all data within it, and any associated objects like indexes, constraints, and triggers are permanently removed.
- Removes the table definition and all data.
- Frees up the space occupied by the table and its associated objects.
- Usually cannot be rolled back (depends on specific database features or transaction management).
- Implicitly commits the transaction.
- Removes all related indexes, constraints, and triggers.
DROP TABLE Customers;
TRUNCATE Command
The TRUNCATE command is also a DDL statement used to quickly remove all rows from a table. Unlike DELETE, TRUNCATE deallocates the data pages used by the table, making it very fast and efficient for large tables. However, it preserves the table's structure, including its columns, data types, and associated indexes and constraints. Identity columns (auto-incrementing) are typically reset to their seed value.
- Removes all rows from a table, but keeps the table structure intact.
- It's a DDL command, not DML, despite affecting data.
- Faster than DELETE for large tables because it deallocates data pages.
- Usually cannot be rolled back (depends on specific database features).
- Resets identity columns/sequences to their starting value.
- Does not fire triggers defined on the table.
TRUNCATE TABLE Products;
Key Differences
| Feature | DROP | TRUNCATE |
|---|---|---|
| Purpose | Removes the entire table definition and all data | Removes all rows from a table; table structure remains |
| Type | DDL (Data Definition Language) | DDL (Data Definition Language) |
| Rollback | Usually not possible | Usually not possible |
| Speed | Slower overall (due to dropping metadata and associated objects) | Faster for deleting all rows (by deallocating data pages) |
| Space Reclamation | Frees up space for the table and all its associated objects | Frees up space for data, but not the table definition |
| Indexes/Constraints | Removes all associated indexes and constraints | Keeps all associated indexes and constraints |
| Triggers | Not applicable (the object is gone) | Does not fire any DML triggers |
| Identity Column | Removed (along with the table) | Resets to its seed value |
| Logging | Minimal logging (metadata changes) | Minimal logging (deallocates pages; less than DELETE) |
When to Use?
Use DROP when you want to permanently remove a table and its entire definition from the database. Use TRUNCATE when you need to quickly clear all data from a table while keeping its structure, indexes, and constraints intact, typically for reloading or resetting data.