Normalize

Overview

  • Simplifying the design of a database to achieve an optimal structure
  • Eliminating redundant data
  • Ensuring that dependencies make sense (no partial or transient dependencies)
  • Usually involves splitting a table into two simpler tables
  • Tables are divided in a way that allows them to be put back together again using queries (via "primary key"-"foreign key" relationships)

Normal form

  • Listed in increasing order of the quality and efficiency of the design:
First Normal Form             1NF
Second Normal Form 2NF
Third Normal Form 3NF
Boyce Codd Normal Form BCNF
Fourth Normal Form 4NF
Fifth Normal Form 5NF
Domain / Key Normal Form DKNF
  • Every higher normal form is a superset of all lower forms; thus, if the design is in 3NF, by definition it is also in 1NF and 2NF
  • If the database has been normalized to 3NF, it's likely also BCNF (and maybe 4NF or 5NF)
  • 3NF is generally considered good enough for typical applications

Relation


The Normal Forms are based on relations rather than tables. A relation is a special type of table that has the following attributes:
  • It describes a single entity
  • There are no duplicate rows (the table has a primary key)
  • The columns and rows are unordered (manipulation of the table doesn't depend upon a specific ordering of the columns or rows)
In practice, the term "relation" is often used interchangeably with "table".

Anomaly

  • An error or inconsistency in the database
  • A poorly designed database may introduce numerous anomalies
  • Dependencies cause anomalies

Types of anomalies

Insertion
  • Occurs during the insertion of a record (i.e. the insertion of a new row causes a calculated total field stored in another table to report the wrong total)
Update
  • Occurs during the updating of a record (i.e. updating a description column for a single part in an inventory database requires you to make a change to thousands of rows)
Deletion
  • Occurs during the deletion of a record (i.e. the deletion of a row in the database deletes more information than you wished to delete)

First Normal Form (1NF)

  • No redundant columns
  • There must be a primary key
  • Tables store data for a single entity
  • The entity is identified by its primary key
  • All entries in a column must be of the same kind (domain)
  • All column values must be atomic (not an array or list of values)
-- Invalid example
"Hammer, 2, Wrench, 1, Pliers, 1"
  • Repeating groups of columns are not allowed (no three dimensional tables)
-- Invalid examples
Item1, Item2, Item3, ...
Item1, Quantity1; Item2, Quantity2, ...

Second Normal Form (2NF)

  • No redundant data
  • No partial dependencies
  • Every non-key column is dependent on the entire primary key
  • The only way to retrieve the right value for a non-key column (i.e. Quantity), is by specifying the entire primary key
  • If the correct value can be retrieved using anything less than the entire primary key, then the table is not 2NF
  • Remove data that applies to more than one row
  • This rule only applies to tables with a composite primary key

Third Normal Form (3NF)

  • No derived data
  • No transitive dependencies
  • No non-key column can be dependent on another non-key column
-- Invalid examples
Quantity, PerItemCost, TotalCost -- TotalCost is dependent on Quantity and PerItemCost
ProductId, ProductDescription -- ProductDescription is dependent on ProductId

Boyce Codd Normal Form (BCNF)

  • Every determinant is a candidate key

Fourth Normal Form (4NF)

  • No table can have 2 or more independent multi-valued dependencies
  • Isolate independent multiple relationships
  • No table may contain two or more one-to-many or many-to-many relationships that are not directly related

Fifth Normal Form (5NF)

  • aka Projection-Join Normal Form (PJNF)
  • Isolate semantically-related multiple relationships
  • There may be practical constraints on information that justify separating logically-related many-to-many relationships.
  • Every join dependency in R is a consequence of the candidate keys of R

Domain / Key Normal Form (DKNF)

  • Modification anomalies are not possible
  • Constraints have been applied for the primary key and the column domains
  • All constraints and dependencies on the relation can be enforced by enforcing the domain constraints and key constraints on the relation
  • Every constraint on the domain is a logical result of the key
  • DKNF is not always achievable

Resources URL: 
notes/databases/resources
Sources URL: 
notes/databases/sources

See Also