Q11.

What is a primary key?

A primary key is a crucial concept in relational databases, serving as a unique identifier for each record within a table.

What is a Primary Key?

In a relational database, a primary key is a column or a set of columns that uniquely identifies each row (or record) in a table. It ensures that every record in the table can be distinctly identified and accessed, preventing data duplication and aiding in data integrity.

  • Uniqueness: Each value in the primary key column(s) must be unique across all rows.
  • Non-Nullability: A primary key column cannot contain NULL values; it must always have a value.
  • Stability: Primary key values should ideally not change over time.
  • Single Primary Key: A table can have only one primary key, though it can be composed of multiple columns (a composite primary key).

Why are Primary Keys Important?

Primary keys are fundamental for maintaining data integrity and enabling efficient database operations. They facilitate quick data retrieval, establish relationships between tables (through foreign keys), and enforce the uniqueness of records, thereby ensuring the consistency and reliability of the data stored in the database.

Example of a Primary Key

Consider a table named 'Students'. A common practice is to assign a unique 'student_id' to each student, which can serve as the primary key.

sql
CREATE TABLE Students (
    student_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100) UNIQUE
);
student_idfirst_namelast_name
101AliceSmith
102BobJohnson
103CharlieBrown
Q12.

What is a foreign key?

A foreign key is a column or a set of columns in a database table that refers to the primary key of another table. It establishes a link between two tables, enforcing referential integrity and maintaining the consistency of data.

What is a Foreign Key?

A foreign key acts as a cross-reference between tables. It's a field (or collection of fields) in one table that uniquely identifies a row of another table. The table containing the foreign key is called the child table, and the table containing the primary key it refers to is called the parent or referenced table.

Purpose of a Foreign Key

  • Enforces referential integrity: Ensures that relationships between tables remain consistent.
  • Prevents actions that would destroy links between tables (e.g., deleting a parent row when child rows still reference it).
  • Maintains data consistency and accuracy across related tables.
  • Facilitates efficient data querying and joining operations.

How Foreign Keys Work

When you define a foreign key, you're essentially telling the database management system (DBMS) that the values in this column (or columns) in the child table must match values in the primary key column (or columns) of the parent table. This creates a parent-child relationship, where the child table's data is dependent on the parent table's data.

Syntax Example

sql
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

Key Characteristics

  • References a primary key or a unique key in another table.
  • Can have NULL values (unless explicitly defined as NOT NULL), meaning a child record might not have a parent.
  • Can be composed of one or more columns.
  • Multiple foreign keys can exist in a single table, linking it to several other tables.

Constraints and Actions

Foreign keys can be configured with actions that specify what happens to child records when the referenced parent record is deleted or updated. These are known as referential actions or constraints.

ActionDescription
ON DELETE CASCADEIf a row in the parent table is deleted, corresponding rows in the child table are also deleted.
ON UPDATE CASCADEIf the primary key in the parent table is updated, the foreign key in the child table is also updated.
ON DELETE SET NULLIf a row in the parent table is deleted, the foreign key column(s) in the child table are set to NULL.
ON UPDATE SET NULLIf the primary key in the parent table is updated, the foreign key column(s) in the child table are set to NULL.
ON DELETE RESTRICTPrevents deletion of a parent row if there are dependent child rows (default behavior for many databases).
ON UPDATE RESTRICTPrevents update of a parent primary key if there are dependent child rows.
Q13.

What is normalization?

Normalization in SQL is a systematic approach to designing relational database tables to minimize data redundancy and improve data integrity. It involves breaking down a large table into smaller, related tables and defining relationships between them.

What is Database Normalization?

Database normalization is a process used to organize a database into tables and columns. The main goals are to reduce data redundancy (storing the same data multiple times) and improve data integrity (ensuring the accuracy and consistency of data). It was first proposed by Edgar F. Codd, the inventor of the relational model.

Goals of Normalization

  • Eliminate redundant data.
  • Ensure data consistency and integrity.
  • Reduce data storage requirements.
  • Improve database design and maintainability.
  • Avoid anomalies (insertion, update, deletion).

Normal Forms (NFs)

Normalization is achieved through a series of steps called normal forms, with each form building upon the previous one. The most common normal forms are 1NF, 2NF, and 3NF.

First Normal Form (1NF)

  • Each table cell must contain a single, atomic value.
  • Each column must have a unique name.
  • The order of data does not matter.

Second Normal Form (2NF)

  • Must be in 1NF.
  • All non-key attributes must be fully functionally dependent on the primary key (no partial dependencies).

Third Normal Form (3NF)

  • Must be in 2NF.
  • No transitive dependencies (non-key attributes should not depend on other non-key attributes).

Boyce-Codd Normal Form (BCNF)

BCNF is a stricter version of 3NF. A table is in BCNF if and only if for every non-trivial functional dependency X → Y, X is a superkey. It addresses certain anomalies not caught by 3NF, especially in tables with multiple candidate keys where the candidate keys overlap.

Example of Normalization

Consider an unnormalized table for 'Orders' where customer details are repeated for each order.

OrderIDProductNameQuantityCustomerNameCustomerAddress
101Laptop1Alice Smith123 Main St
101Mouse1Alice Smith123 Main St
102Keyboard1Bob Johnson456 Oak Ave

This table suffers from redundancy (CustomerName, CustomerAddress repeated for OrderID 101). To normalize, we separate customer information into a 'Customers' table and link it via a CustomerID.

CustomerIDCustomerNameCustomerAddress
C001Alice Smith123 Main St
C002Bob Johnson456 Oak Ave
OrderIDProductNameQuantityCustomerID
101Laptop1C001
101Mouse1C001
102Keyboard1C002

Denormalization

While normalization reduces redundancy, it can sometimes lead to increased query complexity due to joins across multiple tables. Denormalization is the intentional introduction of redundancy into a database to improve query performance, often used in data warehousing or OLAP systems where read performance is critical.

Q14.

Explain different normal forms.

Normalization is a systematic approach to database design that organizes tables in a way that reduces data redundancy and improves data integrity. It aims to eliminate undesirable characteristics like insertion, update, and deletion anomalies. Different normal forms (NFs) define progressively stricter rules for database design, moving from 1NF to BCNF and beyond.

First Normal Form (1NF)

A table is in First Normal Form (1NF) if it adheres to the most basic set of database organization rules. The primary goal of 1NF is to eliminate repeating groups within a table and ensure atomic data values.

  • Each column must contain only atomic (indivisible) values. This means no multi-valued attributes in a single cell.
  • Each column must have a unique name.
  • There are no repeating groups of columns.
  • The order of data storage does not matter.

Example of a table violating 1NF (e.g., a 'PhoneNumbers' column containing multiple numbers separated by commas):

sql
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    Name VARCHAR(100),
    PhoneNumbers VARCHAR(255) -- Violates 1NF if it holds '123-4567, 987-6543'
);

Second Normal Form (2NF)

A table is in Second Normal Form (2NF) if it meets the criteria for 1NF and all non-key attributes are fully functionally dependent on the primary key. This rule primarily applies to tables with composite primary keys (keys made up of two or more columns).

In simpler terms, no non-key attribute should be dependent on only a *part* of a composite primary key. If a non-key attribute can be determined by only one part of a composite key, it should be moved to a separate table.

Example: Consider an 'OrderDetails' table with (OrderID, ProductID) as a composite primary key. If ProductName and Price depend only on ProductID (a part of the key), they violate 2NF. Quantity, however, would depend on both OrderID and ProductID.

sql
CREATE TABLE OrderDetails (
    OrderID INT,
    ProductID INT,
    ProductName VARCHAR(100), -- Depends only on ProductID (partial dependency)
    Price DECIMAL(10, 2),     -- Depends only on ProductID (partial dependency)
    Quantity INT,
    PRIMARY KEY (OrderID, ProductID)
);

Third Normal Form (3NF)

A table is in Third Normal Form (3NF) if it is in 2NF and there are no transitive dependencies of non-key attributes on the primary key.

A transitive dependency occurs when a non-key attribute is dependent on another non-key attribute, which in turn is dependent on the primary key. If A determines B, and B determines C, then C is transitively dependent on A.

Example: In an 'Employees' table, if EmployeeID determines DepartmentID, and DepartmentID determines DepartmentName, then DepartmentName is transitively dependent on EmployeeID. To achieve 3NF, DepartmentName should be moved to a separate Departments table, with DepartmentID acting as a foreign key in the Employees table.

sql
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    EmployeeName VARCHAR(100),
    DepartmentID INT,
    DepartmentName VARCHAR(100) -- Depends on DepartmentID, which depends on EmployeeID (transitive dependency)
);

Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form (BCNF) is a stricter version of 3NF. A table is in BCNF if and only if for every non-trivial functional dependency X → Y, X is a superkey.

This implies that if a non-key attribute determines another attribute, the determining attribute must be a candidate key (or a superkey) itself. BCNF addresses certain anomalies that 3NF might miss, particularly in scenarios involving tables with multiple overlapping candidate keys.

  • BCNF is generally considered to be a stronger form of 3NF.
  • Most tables that are in 3NF are also in BCNF.
  • The primary cases where a 3NF table might not be in BCNF involve tables with multiple, composite, and overlapping candidate keys where a non-key attribute determines part of a candidate key.

While BCNF aims for maximum data integrity, achieving it can sometimes lead to an excessive number of tables and complex joins. In practice, 3NF is often sufficient for most business applications, balancing data integrity with query performance.

Summary and Practical Considerations

Normal FormKey Rules
1NFAtomic values; no repeating groups
2NF1NF + No partial dependencies on a composite primary key
3NF2NF + No transitive dependencies on the primary key
BCNFFor every functional dependency X → Y, X must be a superkey

While higher normal forms reduce data redundancy and improve integrity, they can also increase the number of tables, potentially leading to more complex queries and slower performance due to the need for more joins. The choice of normal form often involves a trade-off between strict data integrity and practical performance considerations. In some specific scenarios, a controlled degree of denormalization might be applied for performance optimization.

Q15.

What is denormalization?

Q16.

What is an index and why is it used?

In the realm of relational databases, an index is a crucial database object designed to improve the speed of data retrieval operations. It's an essential concept for optimizing query performance and is widely used across various database management systems.

What is an Index?

An index in SQL is a special lookup table that the database search engine can use to speed up data retrieval. It is essentially a copy of selected columns of data from a table, organized in a way that allows for very fast lookups. Think of it like the index in the back of a book, which helps you quickly find information without having to read the entire book from start to finish.

Why are Indexes Used?

The primary purpose of an index is to enhance the performance of database queries. By providing a quick path to data, indexes significantly reduce the time required for the database to locate specific rows, especially in large tables. Without an index, the database might have to perform a full table scan, checking every row for matching values, which can be very slow for large datasets.

  • Faster data retrieval for SELECT statements.
  • Improved performance of WHERE clauses that filter data.
  • Quicker execution of JOIN operations between tables.
  • Enforcement of uniqueness on columns (unique indexes).
  • Faster sorting and grouping of data when columns are indexed.

How Indexes Work (Simplified)

When an index is created on a column or set of columns, the database system builds a data structure (commonly a B-tree) that stores the values from the indexed columns along with pointers to the corresponding rows in the actual table. When a query requests data based on these indexed columns, the database can traverse the B-tree much faster than scanning the entire table, leading to quicker results.

Creating an Index

sql
CREATE INDEX idx_customer_lastname
ON Customers (LastName);

This SQL statement creates a non-clustered index named 'idx_customer_lastname' on the 'LastName' column of the 'Customers' table. This would speed up queries that filter or sort by 'LastName'.

Considerations for Indexing

While indexes offer significant performance gains for read operations, they also come with overhead. Each INSERT, UPDATE, or DELETE operation on an indexed table requires the database to update the index as well. This can slow down write operations. Therefore, indexes should be used judiciously.

  • Columns frequently used in WHERE clauses, JOIN conditions, ORDER BY, or GROUP BY clauses.
  • Columns with a high cardinality (many distinct values).
  • Large tables where query performance is critical and read operations are frequent.
  • Small tables where a full table scan is already fast enough.
  • Columns with low cardinality (few distinct values, e.g., a 'gender' column).
  • Tables with very high write activity (frequent inserts, updates, deletes).
Q17.

What is the difference between clustered and non-clustered index?

Understanding the different types of indexes is crucial for optimizing database performance. In SQL, clustered and non-clustered indexes are the two primary types, each serving distinct purposes in how data is stored, organized, and retrieved, significantly impacting query execution times.

Clustered Index

A clustered index determines the physical order of data storage in a table. Because the data rows themselves can only be stored in one order, a table can have only one clustered index. Think of it like a dictionary where the words are physically sorted alphabetically; when you find a word, its definition is right there with it.

  • Data rows are stored in the same physical order as their corresponding index keys.
  • Each table can have only one clustered index.
  • The leaf level of a clustered index contains the actual data pages of the table.
  • Typically created on primary key columns, but can be on any column(s).
  • Provides fast retrieval for range scans and ordered data.

Non-Clustered Index

A non-clustered index, in contrast, does not alter the physical order of data rows. Instead, it creates a separate structure that contains the index key values and pointers (row locators) to the actual data rows wherever they are stored on disk. Imagine an index at the back of a book; it tells you on which page a topic can be found, but the book's content isn't organized by that index.

  • Data rows are stored independently of the index order.
  • A table can have multiple non-clustered indexes (up to 999 in SQL Server).
  • The leaf level of a non-clustered index contains the index keys and row locators (pointers to the data rows).
  • Useful for frequently queried columns that are not part of the clustered index.
  • Faster for specific lookups when the query only needs columns included in the index.

Key Differences

FeatureClustered IndexNon-Clustered Index
Physical OrderDetermines physical storage order of dataDoes not affect physical storage order
Number per TableOneMultiple (up to 999 in SQL Server)
Leaf LevelContains actual data rowsContains index keys and row locators (pointers)
Data StorageData is sorted and stored physically in index orderData is stored separately; index contains pointers to data
PurposeOptimizes range scans and retrieval of entire rowsOptimizes lookups on specific columns, acts as a 'lookup table'

In summary, while both clustered and non-clustered indexes improve query performance, they do so by organizing data differently. Choosing the right type of index depends on the specific query patterns and data access needs of your application, with each having distinct use cases for optimal database efficiency.

Q18.

What is a composite key?

A composite key in SQL is a primary key that consists of two or more columns. It uniquely identifies each record in a table through the combination of its constituent columns, rather than a single column. This approach is often used when no single attribute can guarantee uniqueness on its own, but a combination of attributes can.

What is a Composite Key?

In relational databases, a primary key serves to uniquely identify each row in a table. While often a single column is sufficient for this purpose, there are scenarios where a combination of multiple columns is necessary to guarantee uniqueness. This combination of columns is known as a composite key (or compound key).

The primary function of a composite key is to ensure that every record in the table can be uniquely distinguished from others. Each column within the composite key might not be unique on its own, but their combined values must be unique across all rows. This is particularly useful in many-to-many relationships and when natural identifiers are composed of multiple attributes.

Why Use Composite Keys?

  • Natural Uniqueness: When the natural identifier for an entity inherently involves multiple attributes (e.g., (flight_number, flight_date) for a flight schedule, or (student_id, course_id) for an enrollment).
  • Resolving Many-to-Many Relationships: Often employed in junction (or bridge) tables to link two other tables, where the combination of foreign keys from those tables forms the primary key of the junction table.
  • Data Integrity: Enforces uniqueness constraints more accurately when a single column is not enough to identify a record uniquely, preventing duplicate logical entries.
  • Avoiding Surrogate Keys: Allows using meaningful data columns as identifiers, rather than generating an artificial, non-meaningful surrogate key.

Example: Student Enrollments

Consider a database system for managing student course registrations. A student can enroll in multiple courses, and a course can have multiple students. To track specific enrollments, a junction table, Enrollments, is created.

Column NameData TypeKey TypeDescription
student_idINTPK (part 1), FKReferences the Students table
course_idINTPK (part 2), FKReferences the Courses table
enrollment_dateDATEDate of enrollment
gradeVARCHAR(2)Grade received in the course (e.g., 'A', 'B-')

In the Enrollments table, neither student_id nor course_id can be the primary key on its own. A student can enroll in many courses (student_id is not unique), and a course can have many students (course_id is not unique). However, the combination of (student_id, course_id) uniquely identifies each distinct enrollment instance, as a student can only be enrolled in a specific course once. Thus, (student_id, course_id) forms the composite primary key.

SQL DDL for Composite Key

sql
CREATE TABLE Students (
    student_id INT PRIMARY KEY,
    student_name VARCHAR(100)
);

CREATE TABLE Courses (
    course_id INT PRIMARY KEY,
    course_title VARCHAR(100)
);

CREATE TABLE Enrollments (
    student_id INT,
    course_id INT,
    enrollment_date DATE,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES Students(student_id),
    FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);

Considerations for Composite Keys

  • Index Size and Performance: Composite keys can lead to larger indexes compared to single-column keys, which might slightly increase storage requirements and potentially impact query performance for very large tables, especially with many columns in the key.
  • Foreign Key Complexity: When other tables need to reference a table with a composite primary key, their foreign keys must also be composite, matching all columns of the referenced key. This can make schema design and queries slightly more complex.
  • Readability and Maintenance: Queries involving composite keys often require referencing multiple columns in JOIN clauses, WHERE clauses, or GROUP BY clauses, which can make SQL statements more verbose.
  • Trade-off with Surrogate Keys: Developers often weigh the benefits of natural composite keys against the simplicity of a single-column, auto-incrementing surrogate key. The choice depends on specific data modeling requirements and performance considerations.
Q19.

What is a unique constraint?

A UNIQUE constraint is a fundamental SQL concept used to ensure that all values in a column or a group of columns are distinct. It prevents duplicate entries in the specified columns, thereby maintaining data integrity within a database table.

What is a UNIQUE Constraint?

A UNIQUE constraint is a rule applied to one or more columns in a database table to ensure that every value in that column (or combination of columns) is distinct. No two rows can have the same value for the constrained column(s). This is similar to a PRIMARY KEY, but a table can have multiple UNIQUE constraints, whereas it can only have one PRIMARY KEY.

Key Characteristics

Here are the key characteristics of a UNIQUE constraint:

  • Ensures uniqueness of values in the specified column(s).
  • A table can have multiple UNIQUE constraints.
  • Allows NULL values, but only one NULL value per constrained column (as NULL is generally considered distinct from another NULL for this purpose in many SQL databases).
  • Can be applied to a single column or a combination of columns (composite unique constraint).
  • Automatically creates an index on the constrained columns for faster data retrieval and enforcement.

Syntax for Creating a UNIQUE Constraint

UNIQUE constraints can be defined when creating a table or added to an existing table.

Defining at Table Creation

sql
CREATE TABLE Customers (
    CustomerID INT NOT NULL UNIQUE,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Email VARCHAR(100) UNIQUE
);

In this example, CustomerID and Email columns are defined with UNIQUE constraints directly during table creation.

Defining with a Named Constraint

sql
CREATE TABLE Products (
    ProductID INT NOT NULL,
    ProductName VARCHAR(100) NOT NULL,
    ProductSKU VARCHAR(50) NOT NULL,
    CONSTRAINT UC_ProductSKU UNIQUE (ProductSKU)
);

Naming the constraint (UC_ProductSKU) makes it easier to refer to when altering or dropping it later.

Adding to an Existing Table

sql
ALTER TABLE Employees
ADD CONSTRAINT UC_EmployeeID UNIQUE (EmployeeID);

This statement adds a UNIQUE constraint to the EmployeeID column in the Employees table.

Composite UNIQUE Constraint

sql
CREATE TABLE Orders (
    OrderID INT NOT NULL,
    ProductID INT NOT NULL,
    OrderDate DATE,
    CONSTRAINT UC_OrderProduct UNIQUE (OrderID, ProductID)
);

Here, the combination of OrderID and ProductID must be unique. For instance, (1, 101) is allowed once, but (1, 101) cannot appear again. However, (1, 102) and (2, 101) are both allowed.

Differences from PRIMARY KEY

FeatureUNIQUE ConstraintPRIMARY KEY Constraint
Null ValuesAllows one NULL value per columnDoes not allow NULL values
Number per TableMultiple per tableOnly one per table
PurposeEnsures uniqueness of non-identifying columns or alternative keysUniquely identifies each record (main identifier)
IndexCreates a unique indexCreates a unique, clustered index (by default in many DBs)

Benefits

  • Data Integrity: Prevents the insertion of duplicate values, ensuring the quality and reliability of data.
  • Improved Performance: Automatically creates an index, which can speed up data retrieval operations involving the constrained columns.
  • Alternative Keys: Can serve as an alternative candidate key for identifying records, useful for foreign key references if not using the primary key.
Q20.

What is a view?

In SQL, a view is a virtual table based on the result-set of an SQL query. It acts like a regular table, but its data is not stored physically; instead, it is computed dynamically when the view is queried.

What is a SQL View?

A SQL view is a stored query that can be treated as a virtual table. It presents data from one or more underlying tables without storing the data itself. When you query a view, the underlying SQL query that defines the view is executed, and the result set is returned. This allows for data abstraction and simplification of complex queries.

Key Characteristics

  • Virtual table: Does not store data physically.
  • Dynamic: Data is generated at query time from base tables.
  • Based on a SELECT statement: Defined by a SQL query.
  • Can be updated: In some cases, views can be updated, inserted into, or deleted from, subject to certain conditions (e.g., simple views based on a single table).

Syntax Example

sql
CREATE VIEW customer_orders_view AS
SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    o.order_id,
    o.order_date,
    o.total_amount
FROM
    customers c
JOIN
    orders o ON c.customer_id = o.customer_id
WHERE
    o.total_amount > 100;

This example creates a view named 'customer_orders_view' that combines data from the 'customers' and 'orders' tables, showing only orders with a total amount greater than 100. Users can then query this view as if it were a regular table, without needing to know the underlying join logic.

Benefits of Using Views

  • Security: Restrict data access by showing only specific columns or rows.
  • Simplification: Simplify complex queries and present a simpler interface to users.
  • Consistency: Ensure consistent data presentation across different applications.
  • Data abstraction: Isolate users from changes in the underlying table structure.
  • Reusability: Define a common query once and reuse it.

Limitations

  • Performance overhead: Views are executed every time they are queried, which can add overhead for very complex views compared to directly querying base tables.
  • Updatability issues: Not all views are updatable; complex views (e.g., those involving joins, aggregate functions, or distinct clauses) are often read-only.
  • Dependency management: Changes to underlying tables can affect views, requiring maintenance.
  • No direct indexing: You cannot create indexes directly on a standard view (though materialized views or indexed views in some systems allow this).