Topic > Data Normalization - 761

Data NormalizationData normalization is an important step in any database development process. Through this tedious process a developer can eliminate duplication and develop standards against which all data can be measured. This article addresses the history and function of data normalization applied to the course in question. In 1970, Dr. E.F. Codd's seminal article "A Relational Model for Large Shared Databases" was published in Communications of the ACM. This article introduced the topic of data normalization, so called because, at the time, President Nixon was normalizing relations with China. Data normalization is a technique used during logical data modeling to ensure that there is only one way to know a fact, i.e. by removing all structures that provide more than one way to know the same fact represented in a database relationship (table ). The goal of normalization is to control and eliminate redundancy and mitigate the effects of modification anomalies, which are generally insertion and deletion anomalies. (Insertion anomalies occur when storing information about one attribute requires additional information about a second attribute. Deletion anomalies occur when deletion of one fact results in the loss of a second fact.) NormalizationThere are six normal forms of relation generally recognized: first normal form, second normal form, third normal form, Boyce/Codd normal form, fourth normal form and fifth normal form, also called projection/union normal form. Other normal forms exist (e.g. Domain/Key) but they will not be discussed here. Normal forms are hierarchical, meaning each normal form builds on the previous one. Although many people only consider a relationship normalized when it is in third normal form, technically speaking, a relationship only in first normal form can be considered normalized. Normal formsFirst normal form (1NF) - All attributes must be atomic. That is, there cannot be repeated groups in an attribute. For example, in a report describing a student, the student's classes should not be stored in a field, separated by commas. Rather, classes should be moved into their own relationship, which should include a link to the student relationship (called a foreign key). Second Normal Form (2NF) - A relation is in second normal form if it is in first normal form and each attribute is completely functionally dependent on the entire primary key. That is, no subset of the key can determine the value of an attribute. Third normal form (3NF) – A relation is in third normal form if it is in second normal form and every non-key attribute depends completely functionally on the entire primary key and not on any other non-key attribute.