Archive
DB Basics – How to control Data Redundancy in a database system (RDBMS) by Normalization
Redundancy in Database systems occurs with various insert, update, and delete anomalies.
To avoid these anomalies in first step you need to make sure your database tables or relations are in good normal forms or normalized upto a certain level.
Normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free the anomalies discussed above. that could lead to a loss of data integrity.
Normal forms in a database or the concept of Normalization makes a Relation or Table free from insert/update/delete anomalies and saves space by removing duplicate data.
–> According to E. F. Codd the objectives of normalization were stated as follows:
1. To free the collection of relations from undesirable insertion, update and deletion dependencies.
2. To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs.
3. To make the relational model more informative to users.
4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.
As of now there are total 8 normal forms, but to keep our data consistent & non-redundant the first 3 Normal Forms are sufficient.
–> Anomalies like: Let’s say you have a single table that stores Employee and Department details, thus:
1. Insert Anomaly: If you are inserting a detail of an Employee then his department detail will also be entered for every employee record, thus departments details will be repeated with multiple records, thus storing duplicate data for Departments.
2. Update Anomaly: While updating a department detail you have to update the same department for various employees, which may lead to inconsistent state if any record is left while updating or on any error.
3. Delete Anomaly: If a department is closed, then deleting department record will also delete the Employee records, thus missing records.
The process of normalization makes this EmployeeDepartment table to decompose or split into 2 or more tables and linked them by Foreign Keys, thus eliminating duplicate records, data redundancy and making data/records consistent across all relations/tables.
– 1st NF talks about atomic values and non-repeating groups.
– 2nd NF enforces that a non-Key attribute should belong to entire Key attribute.
– 3rd NF makes sure that there should be no transitive dependency between a non-Key and a Key attribute.
For details on these 3 NFs check my blog post on [Database Normalization].
DB Basics – Database Normalization | 1NF, 2NF, 3NF
In the field of Relational Database design, normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics—insertion, update, and deletion anomalies—that could lead to a loss of data integrity.
According to E. F. Codd the objectives of normalization were stated as follows:
1. To free the collection of relations from undesirable insertion, update and deletion dependencies.
2. To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs.
3. To make the relational model more informative to users.
4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.
E. F. Code the inventor of Relational Model, introduced the concept of normalization (1NF at 1970, 2-3NF at 1971, then with R. F. Boyce defined the BCFN in 1974).
C. Date, H. Darwin, R. Fagin, N. Lorentzos defined other higher forms upto 6NF by 2002.
As of now there are total 8 normal forms, as follows:
1. First normal form (1NF)
2. Second normal form (2NF)
3. Third normal form (3NF)
4. Boyce-Codd normal form (BCNF)
5. Fourth normal form (4NF)
6. Fifth normal form (5NF)
7. Domain/key normal form (DKNF)
8. Sixth normal form (6NF)
But to keep our data consistent & non-redundant the first 3 Normal Forms are sufficient.
1. The 1st Normal Form:
– There are no duplicate rows and each row should have a unique identifier (or Primary key). A table should be free from repeating groups.
– The values in each column of a table are atomic. Meaning a field value cannot be decomposed into smaller pieces or should not be divided into parts with more than one kind of data in it.
Like: A Person’s Name column could be further divided into First, Middle, Last Name columns.
2. The 2nd Normal Form:
– A table should be in 1st Normal Form.
– Any Candidate key (K) and any Attribute (A) that is not a constituent of a candidate key, A depends upon whole of K rather than just part of it.
Means all its non-prime attributes are functionally dependent on the whole of a candidate key.
In Simple terms, any non-key columns must be dependent on the entire primary key. In the case of a composite primary key, this means that a non-key column cannot depend on only part of the composite key.
3. The 3rd Normal Form:
– A table should be in 2nd Normal Form.
– Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every candidate/primary key of R.
– All columns should depend directly on the primary key. Tables violate the Third Normal Form when one column depends on another column, which in turn depends on the primary key (a transitive dependency).
>> Check & Subscribe my [YouTube videos] on SQL Server.