Normalize
Overview
- Simplifying the design of a database to achieve an optimal structure
- Eliminating redundant data
- Ensuring that dependencies make sense (no partial or transient dependencies)
- Usually involves splitting a table into two simpler tables
- Tables are divided in a way that allows them to be put back together again using queries (via "primary key"-"foreign key" relationships)
Normal form
- Listed in increasing order of the quality and efficiency of the design:
First Normal Form 1NF
Second Normal Form 2NF
Third Normal Form 3NF
Boyce Codd Normal Form BCNF
Fourth Normal Form 4NF
Fifth Normal Form 5NF
Domain / Key Normal Form DKNF
- Every higher normal form is a superset of all lower forms; thus, if the design is in 3NF, by definition it is also in 1NF and 2NF
- If the database has been normalized to 3NF, it's likely also BCNF (and maybe 4NF or 5NF)
- 3NF is generally considered good enough for typical applications
Relation
The Normal Forms are based on relations rather than tables. A relation is a special type of table that has the following attributes:
- It describes a single entity
- There are no duplicate rows (the table has a primary key)
- The columns and rows are unordered (manipulation of the table doesn't depend upon a specific ordering of the columns or rows)
Anomaly
- An error or inconsistency in the database
- A poorly designed database may introduce numerous anomalies
- Dependencies cause anomalies
Types of anomalies
Insertion
- Occurs during the insertion of a record (i.e. the insertion of a new row causes a calculated total field stored in another table to report the wrong total)
Update
- Occurs during the updating of a record (i.e. updating a description column for a single part in an inventory database requires you to make a change to thousands of rows)
Deletion
- Occurs during the deletion of a record (i.e. the deletion of a row in the database deletes more information than you wished to delete)
First Normal Form (1NF)
- No redundant columns
- There must be a primary key
- Tables store data for a single entity
- The entity is identified by its primary key
- All entries in a column must be of the same kind (domain)
- All column values must be atomic (not an array or list of values)
-- Invalid example
"Hammer, 2, Wrench, 1, Pliers, 1"
- Repeating groups of columns are not allowed (no three dimensional tables)
-- Invalid examples
Item1, Item2, Item3, ...
Item1, Quantity1; Item2, Quantity2, ...
Second Normal Form (2NF)
- No redundant data
- No partial dependencies
- Every non-key column is dependent on the entire primary key
- The only way to retrieve the right value for a non-key column (i.e. Quantity), is by specifying the entire primary key
- If the correct value can be retrieved using anything less than the entire primary key, then the table is not 2NF
- Remove data that applies to more than one row
- This rule only applies to tables with a composite primary key
Third Normal Form (3NF)
- No derived data
- No transitive dependencies
- No non-key column can be dependent on another non-key column
-- Invalid examples
Quantity, PerItemCost, TotalCost -- TotalCost is dependent on Quantity and PerItemCost
ProductId, ProductDescription -- ProductDescription is dependent on ProductId
Boyce Codd Normal Form (BCNF)
- Every determinant is a candidate key
Fourth Normal Form (4NF)
- No table can have 2 or more independent multi-valued dependencies
- Isolate independent multiple relationships
- No table may contain two or more one-to-many or many-to-many relationships that are not directly related
Fifth Normal Form (5NF)
- aka Projection-Join Normal Form (PJNF)
- Isolate semantically-related multiple relationships
- There may be practical constraints on information that justify separating logically-related many-to-many relationships.
- Every join dependency in R is a consequence of the candidate keys of R
Domain / Key Normal Form (DKNF)
- Modification anomalies are not possible
- Constraints have been applied for the primary key and the column domains
- All constraints and dependencies on the relation can be enforced by enforcing the domain constraints and key constraints on the relation
- Every constraint on the domain is a logical result of the key
- DKNF is not always achievable
Resources URL:
notes/databases/resources
Sources URL:
notes/databases/sources