This post is part of our series on MySQL. We hope to give the average tester a slightly more technical view into how MySQL works. This blog covers the basics of designing a database and gives you an overview of normalization.
What is Database Modeling?
Database modeling is a process in which you analyze the application requirements, usually retrieved from your clients and design a database schema. Database schema describes the structure of data stored in your database. Before you move on to the application implementation, you should validate whether the schema follows certain good practices (normal forms) and fulfills the client requirements. The outcome of database modeling is database model.
The basic components in a relational database model are:
- Tables that store data
- Columns in tables to store data.
- Primary key in a table to uniquely identify each row in a table.
- Relationships between tables. Data is stored across different tables and we link them by common columns (known as foreign keys).
What is Database Normalization?
Normalization is the process of breaking down our data and storing them in different tables consistently. Without normalization, databases can be inaccurate, slow and inefficient. There are mainly three levels of normalization: first, second, and third normal forms. Other normal forms are also available but you can focus on these three to get started. In this post, I will be explaining about three commonly used normal forms.
First Normal Form (1st NF)
To achieve first normal form for a database, you need to make sure that no table contains multiple columns that you could use to get the same information. For this, we want to achieve the following three rules:
Rule #1: Each row in a table must be uniquely identifiable.
To uniquely identify a row, we define a single column or a group of columns as a primary key. Let’s take an example of a Customer table. Each customer is assigned a unique value for identification.
Rule #2: Each column must only contain one piece of information.
To illustrate this, let’s take an example of Address column, we can see the column as comma separated values in this field. If data is grouped together like this in a single column, it would be difficult to search postal code by country efficiently.
The First Normal Form solves this issue, it requires that each of these values be broken down and stored in its own column. This is known as storing atomic data. To render this data atomic, you separate Address column into several columns like Street, City, PostalCode and Country as shown below
Rule #3 : No two columns should contain the same data
Every column in a row should store data that is different from all other columns. Let’s take an example of order_details table that lists products purchased as shown below
In the table above, there are repeated columns ProductID/ProductID1, UnitPrice/UnitPrice1 and Quantity/Quantity1. For the order (OrderID 1001), the table above stores two products (ProductID = 4 and ProductID1 = 9) in the first row and another two products in the second row. With this structure, it’s difficult to search for information and it creates redundant columns for the same information. To solve the problem, we split the columns and replace them with multiple rows.
Now the table is in First Normal Form, let’s try to understand Second Normal Form. But before that basic understanding of functional dependency is needed here because it’s going to be used in the next two normal forms. Functional dependency describes the concept that all other columns in a table must depend completely on the primary key column.
Second Normal Form (2nd NF)
For a table to be in the Second Normal Form, it must already be in the First Normal Form. After 1st NF, every table has got a primary key or composite primary key (primary key consisting of multiple columns). We should pay attention to other columns (non-key columns) in this table to make sure all these other non-key columns must fully depend on the whole primary key. If not, we will have to remove partial dependencies on the composite primary key and create separate tables for each set of removed data and build relationships between these tables.
If we can’t find partial dependencies of non-key columns on the composite primary key or primary key, we are already at Second Normal Form.
Let’s try to understand this with an example. The table below has a composite primary key that consists of OrderID and ProductID columns. The combination of OrderID and ProductID uniquely identifies each row in this table. Note that UnitPrice and Quantity are the non-key column and are fully dependent on this composite primary key.
Now, look at the other non-key columns OrderDate and DeliverDate. We notice that these columns are repeated. OrderDate and DeliverDate only depend on OrderID column and has nothing to do with ProductID column. We also noticed that there is a partial dependency since the date columns are not fully dependent on the composite primary key. Therefore, this table is not in 2nd NF. Let’s remove the partial dependency by splitting this table into the following two tables. Orders table and Order_details table
Now we have two tables Orders table only stores data related to orders and Order_details tables which stores data related to the combination of OrderID and ProductID. Now if we want to find out OrderDate for OrderID 1002 in Order_details table, we use the relationship between the two tables to work out that information. This relationship is established by OrderID in both tables. This relationship is called foreign key relationship.
Third Normal Form (3rd NF)
To achieve 3rd NF you have to resolve all transitive dependencies. I know that term sounds big and fancy but stick with me for a few more sentences and I’ll give you an easy to understand example. Let’s start with the Products table. The table below shows that ProductID is the primary key of the table. SupplierName is dependent on SupplierID column, which in turn depends on the primary key ProductID. This is called transitive dependency. Third Normal Form requires that if any column that is not DIRECTLY dependent on the primary key or composite primary key, it should be removed and placed into a different table. Here SupplierName clearly violated this rule.
To resolve the problem, we take out SupplierName and put it into a separate Supplier table. See below.
Now Products table only contains SupplierID column which can be linked to Supplier table to get SupplierName.
It may be more practical to apply third normal form only to data that changes frequently. If some dependent fields remain, design your application to require the user to verify all related fields when any one is changed.
Another important thing to keep in mind is that structuring data in normal forms is not a necessary goals for every database. Each one is a guideline. You may choose to use as a reference point when considering the best way to organize your personal database to meet the needs of your business.
© 2013-2017, Indira Nellutla. All rights reserved.