By Mike Scott
Summary
The larger your database, the higher the possibility of data repetition and inaccuracies that compromise the results you pull from the database. Normalization in DBMS exists to counteract those problems by helping you to create more uniform databases in which redundancies are less likely to occur.
Mastering normalization is a key skill in DBMS for the simple fact that an error-strewn database is of no use to an organization. For example, a retailer that has to deal with a database that has multiple entries for phone numbers and email addresses is a retailer that can’t see as effectively as one that has a simple route to the customer. Let’s look at normalization in DBMS and how it helps you to create a more organized database.
Grab a pack of playing cards and throw them onto the floor. Now, pick up the “Jack of Hearts.” It’s a tough task because the cards are strewn all over the place. Some are facing down and there’s no rhyme, reason, or pattern to how the cards lie, meaning you’re going to have to check every card individually to find the one you want.
That little experiment shows you how critical organization is, even with a small set of “data.” It also highlights the importance of normalization in DBMS. Through normalization, you implement organizational controls using a set of principles designed to achieve the following:
If normalization in DBMS is all about organization, it stands to reason that they would be a set process to follow when normalizing your tables and database:
There isn’t a “single” way to achieve normalization in DBMS because every database (and the tables it contains) is different. Instead, there are six normal forms you may use, with each having its own rules that you need to understand to figure out which to apply.
If a relation can’t contain multiple values, it’s in 1NF. In other words, each attribute in the table can only contain a single (called “atomic”) value.
If a retailer wants to store the details of its customers, it may have attributes in its table like “Customer Name,” “Phone Number,” and “Email Address.” By applying 1NF to this table, you ensure that the attributes that could contain multiple entries (“Phone Number” and “Email Address”) only contain one, making contacting that customer much simpler.
A table that’s in 2NF is in 1NF, with the additional condition that none of its non-prime attributes depend on a subset of candidate keys within the table.
Let’s say an employer wants to create a table that contains information about an employee, the skills they have, and their age. An employee may have multiple skills, leading to multiple records for the same employee in the table, with each denoting a skill while the ID number and age of the employee repeat for each record.
In this table, you’ve achieved 1NF because each attribute has an atomic value. However, the employee’s age is dependent on the employee ID number. To achieve 2NF, you’d break this table down into two tables. The first will contain the employee’s ID number and age, with that ID number linking to a second table that lists each of the skills associated with the employee.
In 3NF, the table you have must already be in 2NF form, with the added rule of removing the transitive functional dependency of the non-prime attribute of any super key. Transitive functional dependency occurs if the dependency is the result of a pair of functional dependencies. For example, the relationship between A and C is a transitive dependency if A depends on B, B depends on C, but B doesn’t depend on A.
Let’s say a school creates a “Students” table with the following attributes:
In this case, the “State,” “District,” and “City” attributes all depend on the “Zip Code” attribute. That “Zip” attribute depends on the “Student ID” attribute, making “State,” “District,” and “City” all transitively depending on “Student ID.”
To resolve this problem, you’d create a pair of tables – “Student” and “Student Zip.” The “Student” table contains the “Student ID,” “Name,” and “Zip Code” attributes, with that “Zip Code” attribute being the primary key of a “Student Zip” table that contains the rest of the attributes and links to the “Student” table.
Often referred to as 3.5NF, BCNF is a stricter version of 3NF. So, this normalization in DBMS rule occurs if your table is in 3NF, and for every functional dependence between two fields (i.e., A -> B), A is the super key of your table.
Sticking with the school example, every student in a school has multiple classes. The school has a table with the following fields:
You have several functional dependencies here:
As a result, both the “Student ID” and “Class” attributes are candidate keys but can’t serve as keys alone. To achieve BCNF normalization, you’d break the above table into three – “Student Nationality,” “Student Class,” and “Class Mapping,” allowing “Student ID” and “Class” to serve as primary keys in their own tables.
In 4NF, the database must meet the requirements of BCNF, in addition to containing no more than a single multivalued dependency. It’s often used in academic circles, as there’s little use for 4NF elsewhere.
Let’s say a college has a table containing the following fields:
Each of these attributes is independent of the others, meaning each can change without affecting the others. For example, the college could change the lecturer of a course without altering the recommended reading or the course’s name. As such, the existence of the course depends on both the “Lecturer” and “Recommended Book” attributes, creating a multivalued dependency. If a DBMS has more than one of these types of dependencies, it’s a candidate for 4NF normalization.
If your table is in 4NF, has no join dependencies, and all joining is lossless, it’s in 5NF. Think of this as the final form when it comes to normalization in DBMS, as you’ve broken your table down so much that you’ve made redundancy impossible.
A college may have a table that tells them which lecturers teach certain subjects during which semesters, creating the following attributes:
Let’s say one of the lecturers teaches both “Physics” and “Math” for “Semester 1,” but doesn’t teach “Math” for Semester 2. That means you need to combine all of the fields in this table to get an accurate dataset, leading to redundancy. Add a third semester to the mix, especially if that semester has no defined courses or lecturers, and you have to join dependencies.
The 5NF solution is to break this table down into three tables:
With normalization in DBMS being so much work, you need to know the following benefits to show that it’s worth your effort:
Normalization in DBMS does have some drawbacks, though these are trade-offs that you accept for the above benefits:
Getting normalization in DBMS is hard, especially when you start feeling like you’re dividing tables into so many small tables that you’re losing track of the database. These tips help you apply normalization correctly:
Normalization may seem needlessly complex, but it serves the crucial role of making the data you extract from your database more refined, accurate, and free of repetition. Mastering normalization in DBMS puts you in the perfect position to create the complex databases many organizations need in a Big Data world. Experiment with the different “normal forms” described in this article as each application of the techniques (even for simple tables) helps you get to grips with normalization.
Visit our FAQ page or get in touch with us!
Write us at +39 335 576 0263
Get in touch at hello@opit.com
Talk to one of our Study Advisors
We can speak in: