Explain different normal forms.
Normalization is a systematic approach to database design that organizes tables in a way that reduces data redundancy and improves data integrity. It aims to eliminate undesirable characteristics like insertion, update, and deletion anomalies. Different normal forms (NFs) define progressively stricter rules for database design, moving from 1NF to BCNF and beyond.
First Normal Form (1NF)
A table is in First Normal Form (1NF) if it adheres to the most basic set of database organization rules. The primary goal of 1NF is to eliminate repeating groups within a table and ensure atomic data values.
- Each column must contain only atomic (indivisible) values. This means no multi-valued attributes in a single cell.
- Each column must have a unique name.
- There are no repeating groups of columns.
- The order of data storage does not matter.
Example of a table violating 1NF (e.g., a 'PhoneNumbers' column containing multiple numbers separated by commas):
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
Name VARCHAR(100),
PhoneNumbers VARCHAR(255) -- Violates 1NF if it holds '123-4567, 987-6543'
);
Second Normal Form (2NF)
A table is in Second Normal Form (2NF) if it meets the criteria for 1NF and all non-key attributes are fully functionally dependent on the primary key. This rule primarily applies to tables with composite primary keys (keys made up of two or more columns).
In simpler terms, no non-key attribute should be dependent on only a *part* of a composite primary key. If a non-key attribute can be determined by only one part of a composite key, it should be moved to a separate table.
Example: Consider an 'OrderDetails' table with (OrderID, ProductID) as a composite primary key. If ProductName and Price depend only on ProductID (a part of the key), they violate 2NF. Quantity, however, would depend on both OrderID and ProductID.
CREATE TABLE OrderDetails (
OrderID INT,
ProductID INT,
ProductName VARCHAR(100), -- Depends only on ProductID (partial dependency)
Price DECIMAL(10, 2), -- Depends only on ProductID (partial dependency)
Quantity INT,
PRIMARY KEY (OrderID, ProductID)
);
Third Normal Form (3NF)
A table is in Third Normal Form (3NF) if it is in 2NF and there are no transitive dependencies of non-key attributes on the primary key.
A transitive dependency occurs when a non-key attribute is dependent on another non-key attribute, which in turn is dependent on the primary key. If A determines B, and B determines C, then C is transitively dependent on A.
Example: In an 'Employees' table, if EmployeeID determines DepartmentID, and DepartmentID determines DepartmentName, then DepartmentName is transitively dependent on EmployeeID. To achieve 3NF, DepartmentName should be moved to a separate Departments table, with DepartmentID acting as a foreign key in the Employees table.
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
DepartmentID INT,
DepartmentName VARCHAR(100) -- Depends on DepartmentID, which depends on EmployeeID (transitive dependency)
);
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form (BCNF) is a stricter version of 3NF. A table is in BCNF if and only if for every non-trivial functional dependency X → Y, X is a superkey.
This implies that if a non-key attribute determines another attribute, the determining attribute must be a candidate key (or a superkey) itself. BCNF addresses certain anomalies that 3NF might miss, particularly in scenarios involving tables with multiple overlapping candidate keys.
- BCNF is generally considered to be a stronger form of 3NF.
- Most tables that are in 3NF are also in BCNF.
- The primary cases where a 3NF table might not be in BCNF involve tables with multiple, composite, and overlapping candidate keys where a non-key attribute determines part of a candidate key.
While BCNF aims for maximum data integrity, achieving it can sometimes lead to an excessive number of tables and complex joins. In practice, 3NF is often sufficient for most business applications, balancing data integrity with query performance.
Summary and Practical Considerations
| Normal Form | Key Rules |
|---|---|
| 1NF | Atomic values; no repeating groups |
| 2NF | 1NF + No partial dependencies on a composite primary key |
| 3NF | 2NF + No transitive dependencies on the primary key |
| BCNF | For every functional dependency X → Y, X must be a superkey |
While higher normal forms reduce data redundancy and improve integrity, they can also increase the number of tables, potentially leading to more complex queries and slower performance due to the need for more joins. The choice of normal form often involves a trade-off between strict data integrity and practical performance considerations. In some specific scenarios, a controlled degree of denormalization might be applied for performance optimization.