Q101.

What types of SQL joins do you know?

SQL joins are fundamental operations used to combine rows from two or more tables based on a related column between them. Understanding the different types of joins is crucial for retrieving comprehensive and meaningful data from relational databases.

Understanding SQL Joins

Joins are a key component of the SQL language, allowing you to link data from multiple tables. This is essential when your data is normalized across several tables to reduce redundancy and improve data integrity. The type of join you use determines which rows are included in the result set based on whether the join condition is met in one, both, or neither of the tables.

Common SQL Join Types

INNER JOIN

The INNER JOIN keyword returns records that have matching values in both tables. If there are rows in 'TableA' that do not have a match in 'TableB', or vice versa, those rows will not be included in the result set.

sql
SELECT column_name(s)
FROM TableA
INNER JOIN TableB
ON TableA.common_column = TableB.common_column;

LEFT JOIN (or LEFT OUTER JOIN)

The LEFT JOIN keyword returns all records from the left table (TableA), and the matching records from the right table (TableB). If there is no match in the right table, the result is NULL for the columns from the right side. It is also known as LEFT OUTER JOIN.

sql
SELECT column_name(s)
FROM TableA
LEFT JOIN TableB
ON TableA.common_column = TableB.common_column;

RIGHT JOIN (or RIGHT OUTER JOIN)

The RIGHT JOIN keyword returns all records from the right table (TableB), and the matching records from the left table (TableA). If there is no match in the left table, the result is NULL for the columns from the left side. It is also known as RIGHT OUTER JOIN.

sql
SELECT column_name(s)
FROM TableA
RIGHT JOIN TableB
ON TableA.common_column = TableB.common_column;

FULL OUTER JOIN

The FULL OUTER JOIN keyword returns all records when there is a match in either the left (TableA) or the right (TableB) table. If there are no matches, it will still return the rows with NULL values for the columns from the non-matching side. Note: Not all database systems (e.g., MySQL) support FULL OUTER JOIN directly.

sql
SELECT column_name(s)
FROM TableA
FULL OUTER JOIN TableB
ON TableA.common_column = TableB.common_column;

CROSS JOIN

The CROSS JOIN keyword returns the Cartesian product of the rows from the joined tables. This means it combines each row from the first table with every row from the second table, resulting in a result set where the number of rows is the product of the number of rows in both tables. It typically doesn't have an ON clause.

sql
SELECT column_name(s)
FROM TableA
CROSS JOIN TableB;

SELF JOIN

A SELF JOIN is a regular join (usually an INNER JOIN or LEFT JOIN) that joins a table to itself. This is useful when you need to compare rows within the same table, for example, finding all employees who report to the same manager, or hierarchical data.

sql
SELECT A.employee_name, B.employee_name AS manager_name
FROM Employees A, Employees B
WHERE A.manager_id = B.employee_id;

Summary of Join Types

Join TypeDescriptionMatching Rows Included
INNER JOINReturns rows only when there is a match in both tables.Both tables must match
LEFT JOINReturns all rows from the left table, and the matching rows from the right table.Left table and matching right
RIGHT JOINReturns all rows from the right table, and the matching rows from the left table.Right table and matching left
FULL OUTER JOINReturns all rows when there is a match in one of the tables (includes non-matching from both).Both (including non-matching on either side)
CROSS JOINReturns the Cartesian product of rows from both tables (every combination).Every combination of rows
SELF JOINA table is joined with itself using aliases to compare rows within the same table.Based on the join condition within the same table
Q102.

What are SQL aggregate functions?

SQL aggregate functions perform calculations on a set of rows and return a single value. They are commonly used with the GROUP BY clause to summarize data within groups, but can also be applied to an entire table.

Introduction to Aggregate Functions

Aggregate functions are a fundamental part of SQL, allowing users to derive meaningful insights from large datasets by summarizing information. Instead of returning a value for each individual row, they operate on a collection of rows to produce a single result. For example, you can calculate the total sum of sales, the average price of products, or the count of employees in a department.

Common SQL Aggregate Functions

Here are some of the most frequently used aggregate functions in SQL:

COUNT()

Counts the number of rows that match a specified criterion. COUNT(*) counts all rows, COUNT(column_name) counts non-NULL values in a column, and COUNT(DISTINCT column_name) counts unique non-NULL values.

sql
SELECT COUNT(*) FROM Employees;
SELECT COUNT(DISTINCT DepartmentID) FROM Employees;

SUM()

Calculates the total sum of a numeric column. It ignores NULL values.

sql
SELECT SUM(Salary) FROM Employees WHERE DepartmentID = 3;

AVG()

Calculates the average value of a numeric column. It ignores NULL values.

sql
SELECT AVG(Price) FROM Products;

MIN()

Finds the minimum value in a specified column. This can be used with numeric, string, or date/time data types. It ignores NULL values.

sql
SELECT MIN(OrderDate) FROM Orders;
SELECT MIN(ProductName) FROM Products;

MAX()

Finds the maximum value in a specified column. Similar to MIN(), it works with various data types and ignores NULL values.

sql
SELECT MAX(Salary) FROM Employees;
SELECT MAX(ProductName) FROM Products;

The GROUP BY Clause

Aggregate functions are most powerful when combined with the GROUP BY clause. The GROUP BY clause is used to group rows that have the same values in specified columns into summary rows, enabling the aggregate functions to operate on each group independently.

sql
SELECT DepartmentID, AVG(Salary)
FROM Employees
GROUP BY DepartmentID;

The HAVING Clause

While the WHERE clause filters individual rows before grouping, the HAVING clause is used to filter groups based on conditions applied to aggregate functions. It must be used after the GROUP BY clause.

sql
SELECT DepartmentID, COUNT(EmployeeID), AVG(Salary)
FROM Employees
GROUP BY DepartmentID
HAVING COUNT(EmployeeID) > 5 AND AVG(Salary) > 60000;

Key Characteristics and Rules

  • Aggregate functions always return a single value per group or for the entire result set.
  • They ignore NULL values by default, except for COUNT(*).
  • They cannot be directly used in the WHERE clause because WHERE filters individual rows before grouping. Use HAVING for filtering grouped results.
  • When using an aggregate function with other columns in the SELECT statement, those columns must appear in the GROUP BY clause.
Q103.

What are window functions in SQL?

SQL window functions perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions, window functions do not group rows into a single output row; instead, they return a single value for each row in the result set, allowing you to see both individual rows and aggregated or ranked values simultaneously.

What Are Window Functions?

Window functions operate on a 'window' of rows defined by the OVER() clause. This window is a set of rows related to the current row, and the function calculates a value for each row within its window. The key distinction from regular aggregate functions (like SUM, AVG, COUNT) is that window functions do not collapse the rows being aggregated; they return a result for each individual row.

Components of the OVER() Clause

The OVER() clause is fundamental to defining the window on which the function operates. It consists of optional components:

PARTITION BY Clause

Divides the query result set into partitions (groups) to which the window function is applied independently. If omitted, the entire result set is treated as a single partition.

ORDER BY Clause

Specifies the logical order of rows within each partition. This is crucial for functions that depend on order, like RANK(), ROW_NUMBER(), LAG(), and LEAD().

Window Frame

Further refines the set of rows within a partition that are considered for the current row's calculation. This is defined using clauses like ROWS or RANGE, often with PRECEDING, FOLLOWING, or BETWEEN. For example, ROWS BETWEEN 2 PRECEDING AND CURRENT ROW.

Common Window Functions

  • Ranking Functions: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(n)
  • Value Functions: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()
  • Aggregate Functions: SUM(), AVG(), COUNT(), MIN(), MAX() (when used with OVER() clause)

Example: Ranking Products by Sales within Categories

Consider a table of products with their categories and sales figures. We want to rank products by sales within each category.

sql
SELECT
    product_name,
    category,
    sales,
    RANK() OVER (PARTITION BY category ORDER BY sales DESC) AS rank_within_category
FROM
    products;

In this example, PARTITION BY category divides the products into separate groups for each category. ORDER BY sales DESC ranks products from highest to lowest sales within their respective categories. The RANK() function then assigns a rank to each product, with ties receiving the same rank and subsequent ranks skipping numbers.

Benefits of Using Window Functions

  • Simplifies complex analytical queries, often replacing self-joins or correlated subqueries.
  • Enables calculations like running totals, moving averages, and multi-row comparisons.
  • Improves readability and maintainability of SQL code.
  • Can offer performance advantages for certain types of computations.
Q104.

What is the difference between a view and a materialized view?

In SQL, both views and materialized views offer ways to present data from underlying tables, but they differ significantly in how they store and manage that data, impacting performance and data freshness.

Understanding Views

A standard view is a virtual table based on the result-set of an SQL query. It does not store data itself; instead, it's a stored query that retrieves data from its underlying tables every time it's accessed. Views are primarily used to simplify complex queries, enforce security, and present data in a more user-friendly format.

Understanding Materialized Views

A materialized view, often called a snapshot, is a physical copy of the data from a query's result set, stored on disk. Unlike a regular view, it pre-computes and stores the query results. This provides faster access to data, especially for complex queries that involve joins or aggregations, but requires a mechanism to refresh the stored data to keep it current.

Key Differences

FeatureViewMaterialized View
Data StorageNo physical storage; query executed each time.Physically stores pre-computed result set.
PerformancePotentially slower for complex queries as the query runs on demand.Faster retrieval for complex queries; avoids re-execution.
Data FreshnessAlways shows current data directly from base tables.Data can become stale; requires explicit refresh to show current data.
Refresh MechanismImplicit (query runs on demand).Explicit (manual, scheduled, or on commit).
Use CasesSimplifying complex queries, security, abstraction.Reporting, data warehousing, OLAP applications, remote data access.

When to Use Which?

When to use a View

  • To simplify complex queries without needing to store redundant data.
  • To provide a security layer by restricting access to specific rows and columns of underlying tables.
  • When real-time data freshness is paramount and query performance is acceptable for the given load.

When to use a Materialized View

  • For performance-critical queries that run frequently on large datasets, especially those involving joins and aggregations.
  • In data warehousing scenarios for pre-aggregating and summarizing data for reporting.
  • When network latency or resource consumption on the base tables is a significant concern.
  • When occasional data staleness is acceptable for significantly improved query speed.

Choosing between a view and a materialized view depends on specific requirements related to data freshness, query performance, and resource utilization. Views offer flexibility and real-time data, while materialized views prioritize query speed at the cost of potential data staleness and increased storage.