Posts tagged normalization
Database Inheritance : Subtyping
0If you are familiar with any object oriented programming language, the term inheritance may not be new to you. In OO programming such as in Java or C++ you basically put all the common attributes and behavior in a parent class and then you provide specific attribute and behavior implementation in each subclass. This feature is also supported by all major DBMS; it is called subtyping.
Some examples of subtyping are as follows:
- A School can be broken down into: regular school, charter school, administrative school, etc…
- A person can be broken down into: man or woman.
- A customer can be broken down into: an internal customer or an external customer.
- A product can be broken down into several products, books, music, movies.
- A book can be of different types: Hard Cover, PDF format, kindle format etc…
Let’s take a look at it in action:
Let’s depict the diagram above:
Book Entity
- Is the Parent entity, aka Supertype
- The Supertype is at the one and only one side of the relationship.
- Has all the information that is common among all of its subtypes.
- It’s PK is going to be the PK of the Subtype as well so that we can get to the details and/or generic information back and forth.
Paperback Book and Digital Book Entities
- Are the Child entities, aka Subtypes.
- The Subtypes are at the zero or one side of the relationship.
- They contain information specific to them only.
- Their PK is the Parent entity primary key so that you can easily get to the common information.
Alert: Some people put a char column in the parent entity to be able to determine the subtype of that record without having to look at the subtype. Be very careful with this approach because if a parent entity can be of both types the solution would not be consistent; if you do that make sure that the subtypes are mutually exclusive.
Many-to-Many Relationships
1This topic deserves special attention because many people from database designers to application developers (sometimes one person does both) do not get it right. A Many-to-Many relationship occurs when each instance from the entity on the left can have many instances from the entity on the right, and vice versa.
Some examples of many-to-many relationships are as follows:
- Each order can contain many items. Each item can also belong to many orders as it can be purchased by different customers.
- Each student can be signed up for many coursers. Each course can be assigned to multiple students.
Incorrect implementation of many-to-many relationships
We see this problem when people are trying to use foreign keys to solve this problem. Let’s illustrate it with the second example listed above as follows:
What we are trying to say here is that each student can take many courses and that each course can be taken by many students. So to implement this we have put a FK to the student table which will point to the course table.
Do you see any problems with this?
- Yes!!! we will have repeated student rows for each different course a student is taking.
- At the moment you do this your database would no longer be in 3NF because we would have repeated groups.
- This issue above can potentially yield to update anomalies.
- This relationship is said to be unresolved.
Correct implementation of many-to-many relationships
In order to correct the many-to-many implementation above we have to resolve relationship. This is done as follows:
- By adding another table between the two entities; this table is called associative entity or intersection table.
- By making the PK of the associative entity a composite key that consist of each of the parent’s PK (We are using Identifying relationship).
- By naming the intersection table as follows (not required by DBMS, but suggested): ENTITYA_ENTITYB. In our case STUDENT_COURSE or STUDENT_COURSE_REL.
- By Changing the cardinality of the parent tables to be One-to-Many to the associative entity.
Let’s see it in action:
What improvements do you see?
- The student is defined once so we avoid duplicate data.
- The courses are defined once and we avoid the same problem as if we would have put the FK in the course table in scenario #1.
- We do not run the risk of update anomalies.
- We can add meaningful fields in the associative entity. Those are called the Fixed Intersection data.
Alert: The FID will become your FACT tables and the parent entities would become your DIMENSION tables in OLAP database structures. Dimensional modeling is a different modeling technique which uses a lot of denormalization so that you can yield better query performance as data is only 1 join away at most.
Anatomy of a database relationship
0Every relationship is said to have cardinality. The cardinality is the degree of the relationship. There are two types of degrees:
- Optionality: Is the minimum degree of the relationship and it can be of two types: Optional (circle) or Mandatory (perpendicular line).
- Cardinality: Is the maximum degree of the relationship and it can be represented with a crow’s feed for (many) or omitted for one.
Also every relationship has two sentences (or at least you should make an effort of typing them for readability and understanding purposes). The relationship phrases are as follows:
- Parent to child phrase.
- Child to parent phrase.
The following diagram illustrates the anatomy of a database relationship.
As you can see it is very important to determine the cardinality of the relationships in your data model as that will dictate your business rules. One last note, the dependent entity is called the child, and the referenced table is called the parent.
Database Relationships
5In the database world the lines that connect the different entities (tables) together are called relationships; these are the relationships that occur in real life between entities as part of a process. Relationships in the database are very important because can play a significant role in the areas as follows:
- Data quality
- Non repeating groups
- business rules enforcement
- Communication on how the business works (or better said how it has been implemented) to IT staff and business users with some modeling skills.
Every RDBMS out there supports four types of relationships. They are as follows:
- One-to-One relationships.
- One-to-Many relationships.
- Many-to-Many relationships.
- Self referencing relationships (aka recursive relationships)
Relationships can be between 1 or more tables. Here is some nomenclature:
| Number of tables in relationship | Tecnical term | Friendly term |
|---|---|---|
| 1 | Unary | Recursive relationship |
| 2 | Binary | Two table relationship |
| 3 | Ternary | Three table relationship |
| 4 or more | N-ary | Four or more tables relationship |
Update Anomalies and Normalization
3In the database world “update” is a very famous word, one that can make people swallow some saliva in some instances (evil laugh here) . Update anomalies is one of the many computer science paradigms, just so that you can get a feel for the magnitude of the word “update” it also means “insert” and “delete”. Update anomalies in the database space occur as a side effect of having redundant data and a poor normalized data model. The “update” anomalies are as follows:
- Insert anomalies
- Update anomalies
- Delete anomalies
To illustrate these issues we are going to use the database model and its data shown below:
STUDENT_ID STUDENT_FIRST_NAME STUDENT_LAST_NAME
1 Dario Pardo
3 Zinedine Zidane
4 Ronaldinho Gaucho
STUDENT_ID COURSE_CODE COURSE_NAME INSTRUCTOR COURSE_CREDITS
1 1 Data Modeling Graeme Simsion 4
1 2 Analysis Algorithms Weiss 4
3 2 Analysis Algorithms Weiss 4
4 2 Analysis Algorithms Weiss 4
Insert anomalies
What problems do you see?
- Repeated information (course code, course name, instructor, credits).
- Because of the first problem we cannot identify a primary key since every piece of data is repeated across multiple rows.
- Because of the second problem, we have create a composite key that consists of the student id and the course code (This will yield to the insertion anomaly).
The anomaly
The insert anomaly happens when we want to create a course but we do not have a student assigned to it yet. As you can see because of this horrible design we cannot add a new course unless we create a dummy record (in this case a dummy record in the student table) that would be assigned to every new course not available to valid students yet.
Do you see how ugly this solution is?
Do you see how this might be complicated to understand by other developers and business users?
Can you also imagine the ugly code that you will have to do at the application level to filter in/out courses not available yet?
Solution
Normalization. Redesign the database model at least to Third Normal Form.
Update anomalies
What problems do you see?
- Repeated groups of data (Course related and instructor related)
The anomaly
The update anomaly here would happen when we try to update any course related information or any instructor related information information. Say for example that we change the course name “Analysis Algorithms” to “Fundamentals of Algorithms” and that the update only happens on one row instead of on every row possible, we would have left the database in an inconsistent state.
Solution
Normalization. Redesign the database model at least to Third Normal Form.
Delete anomalies
What problems do you see?
- Repeated groups of data (Course related and instructor related)
The anomaly
The delete anomaly here would happen when we try to delete a student . It makes sense that if we delete a student, we also delete any associated rows in the course assignment for that student no? Let’s see what happens if we delete Dario Pardo which is assigned to Data Modeling class among others. Since nobody else is assigned to the Data Modeling class and that was the only row that had information about that class as well as the instructor we would have lost all the data about the course and the instructor. Now…depending on how critical and what type of information that was you or other people above you could loose their job over this; I am not kidding (no evil laugh here)
Solution
Normalization. Redesign the database model at least to Third Normal Form.
A normalized data model
This is a mini solution (it definitely could be much more elaborated, e.g self referencing relationship on courses to handle prerequisites) but for now it will get us away from the update anomalies problem by redesigning the database towards third normal form. Student, Staff, and Courses have been separated out into its own entities and have been connected through associative entities to resolve the many to many relationships to avoid redundant data. With this new design we can do as follows:
- Define new courses without creating a dummy record like we explained in the insertion anomaly.
- Updating course or teacher information now only happens once and touching 1 row only thanks to referential integrity. So we do not have to worry on any possible left outs like explained in the update anomaly.
- Deleting a student and its associated rows in the student_schedule table will not actually delete the course information like explained in the deletion anomaly.









