What is normalization?
Normalization in SQL is a systematic approach to designing relational database tables to minimize data redundancy and improve data integrity. It involves breaking down a large table into smaller, related tables and defining relationships between them.
What is Database Normalization?
Database normalization is a process used to organize a database into tables and columns. The main goals are to reduce data redundancy (storing the same data multiple times) and improve data integrity (ensuring the accuracy and consistency of data). It was first proposed by Edgar F. Codd, the inventor of the relational model.
Goals of Normalization
- Eliminate redundant data.
- Ensure data consistency and integrity.
- Reduce data storage requirements.
- Improve database design and maintainability.
- Avoid anomalies (insertion, update, deletion).
Normal Forms (NFs)
Normalization is achieved through a series of steps called normal forms, with each form building upon the previous one. The most common normal forms are 1NF, 2NF, and 3NF.
First Normal Form (1NF)
- Each table cell must contain a single, atomic value.
- Each column must have a unique name.
- The order of data does not matter.
Second Normal Form (2NF)
- Must be in 1NF.
- All non-key attributes must be fully functionally dependent on the primary key (no partial dependencies).
Third Normal Form (3NF)
- Must be in 2NF.
- No transitive dependencies (non-key attributes should not depend on other non-key attributes).
Boyce-Codd Normal Form (BCNF)
BCNF is a stricter version of 3NF. A table is in BCNF if and only if for every non-trivial functional dependency X → Y, X is a superkey. It addresses certain anomalies not caught by 3NF, especially in tables with multiple candidate keys where the candidate keys overlap.
Example of Normalization
Consider an unnormalized table for 'Orders' where customer details are repeated for each order.
| OrderID | ProductName | Quantity | CustomerName | CustomerAddress |
|---|---|---|---|---|
| 101 | Laptop | 1 | Alice Smith | 123 Main St |
| 101 | Mouse | 1 | Alice Smith | 123 Main St |
| 102 | Keyboard | 1 | Bob Johnson | 456 Oak Ave |
This table suffers from redundancy (CustomerName, CustomerAddress repeated for OrderID 101). To normalize, we separate customer information into a 'Customers' table and link it via a CustomerID.
| CustomerID | CustomerName | CustomerAddress |
|---|---|---|
| C001 | Alice Smith | 123 Main St |
| C002 | Bob Johnson | 456 Oak Ave |
| OrderID | ProductName | Quantity | CustomerID |
|---|---|---|---|
| 101 | Laptop | 1 | C001 |
| 101 | Mouse | 1 | C001 |
| 102 | Keyboard | 1 | C002 |
Denormalization
While normalization reduces redundancy, it can sometimes lead to increased query complexity due to joins across multiple tables. Denormalization is the intentional introduction of redundancy into a database to improve query performance, often used in data warehousing or OLAP systems where read performance is critical.