Normalizing Your Database

If you’ve been working with databases for any length of time, you have heard the term normalization.

Normalization is the process of efficiently organizing data in a database.  There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table).  Reaching these two goals reduces the space used by the database and ensures the data is stored logically.

Guidelines have been developed to confirm that databases are normalized.  These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through six (sixth normal form or 6NF).  However, the 5th and 6th normal forms are rarely used, so I’m not going to mention those below.  Also note that most database architects start out designing in 3rd normal form, so it’s not necessary to perform these normalizations in order.

First Normal Form (1NF)

First Normal Form (1NF) sets the very basic rules for an organized database:

  • Eliminate duplicative columns from the same table
  • Create separate tables for each set of related data
  • Identify each row with a unique column (the primary key)

Second Normal Form (2NF)

Second normal form (2NF) further addresses the concept of removing duplicative data:

  • Meet all the requirements of the first normal form
  • Remove subsets of data that apply to multiple rows of a table and place them in separate tables
  • Create relationships between these new tables and their predecessors through the use of foreign keys

Third Normal Form (3NF)

Third normal form (3NF) goes one large step further:

  • Meet all the requirements of the second normal form
  • Remove columns that are not dependent upon the primary key

Boyce-Codd Normal Form (BCNF or 3.5NF)

The Boyce-Codd Normal Form, also referred to as the “third and half (3.5) normal form”, adds one more requirement:

  • Meet all the requirements of the third normal form
  • Every determinant must be a unique primary key (candidate key)

Fourth Normal Form (4NF)

Finally, fourth normal form (4NF) has one additional requirement:

  • Meet all the requirements of the third normal form
  • A relation is in 4NF if it has no multi-valued dependencies

More info:

Database Normalization Basics

First, second, and third normal form

Database Normalization: First, Second, and Third Normal Forms

Description of the database normalization basics

Video Normalisation Demonstration

Stairway to Database Design Level 9: Normalization

Third Normal Form

The Road to Professional Database Development: Database Normalization

About James Serra

James currently works for Microsoft specializing in big data and data warehousing using the Analytics Platform System (APS), a Massively Parallel Processing (MPP) architecture. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence/MDM architect and developer, specializing in the Microsoft BI stack. He is a SQL Server MVP with over 25 years of IT experience.
This entry was posted in Data warehouse, SQLServerPedia Syndication. Bookmark the permalink.

4 Responses to Normalizing Your Database

  1. Molly Fagan says:

    It’s also worth noting that 3NF and BCNF are equivalent if a table only contains one candidate key.

  2. Pingback: Data Warehouse Architecture - Kimball and Inmon methodologies | James Serra's Blog

  3. Tom Thomson says:

    Probably it’s worth mentioning that for some patterns of functional dependency it is impossible to normalise a schema to BCNF, because BCNF is not consistent with representing those patterns; and then mentioning EKNF, which lies between 3NF and BCNF, and is the only normal form which is achievable for all functional dependency patterns.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>