- A relation with problems
- Redundancy leads to inconsistency
- Deletion anomalies
- Update Anomalies
- Information Loss and Dependency Loss When Decomposing Tables (Warum macht er es nicht gleich richtig?)
- Perfect Decomposition
- Functional Dependencies
- Full functional Dependencies
- Functional Dependencies and Keys
- Overview of Normal Forms
- Normal Forms Definitions
- Kent and Diehr Quote Eselsbrücke
- 1NF BCNF Flow Chart
- Compute With Functional Dependencies | Armstrong’s Rules
- Rule of Reflexivity
- Rule of Augmentation
- Rule of Transitivity
- How to Guarantee Lossless Joins?
- How to Guarantee Preservation of FDs (Functional Dependencies)?
- LineDance App ERD Version 0.3 Relations
- Email >> Interest, 3NF and BCNF
- It Never Happens in Practice
Prof. Leo Mark
Case Study & Team Project
“This will be one of the best and most useful courses you ever took.”
- No redundancy of facts
- No cluttering of facts
- Must preserve information
- Must preserve functional dependencies
Not a relation or NF2 (Non first normal form)
A relation with problems
Email >> CurrCity, Birthyear
Email, birthyear >> Salary
Email, interest >> SinceAge
Redundancy leads to inconsistency
Because, if the student moves from Seattle to somewhere else, you cannot be sure that each of the data gets corrected.
If we insert a new regular user without any interest, then we must insert NULL values for interest and SinceAge (if I do not have a Teaching Video, then I must insert a NULL vlaue, my assumption was right, I cannot do that in proper databse design).
If we insert a pair of BirthYear and Salary, then we must insert NULL for Email, Interest, SinceAge and Current City.
Deletion anomalies
If we want to delete user12 from the database, then we loose the fact that when the birthyear is 1974, then the salary is 38.000:
Update Anomalies
If we update the CurrentCity of a Regular User, then we must do it in multiple places. If I update BirthYear, then we must Update Salary.
Information Loss and Dependency Loss When Decomposing Tables (Warum macht er es nicht gleich richtig?)
If we decompose RegularUser into 2 relations, then we could get too many rows back when we recombine them:
We must include at least one attribute in each of the 2 tables, so that we can recombine them to the original RegularUser Table
Perfect Decomposition
When we want to see RegularUSer again, we join these three tables together.
Functional Dependencies
Let x and y be sets of attributes in R (RELATION)
y is functionally dependent on x in R, iff (and only if) for each x in der Menge von R.x there is precisely one y aus der Menge von R.y
In the 2nd example: Email and Interest together is one entity to look at.
Full functional Dependencies
Let x and y be sets of attributes in Relation
y is FULLY functional dependent on x in Relation iff y is functional dependent on x and y is not functional dependent on any proper subset of x.
In this relation here we would like to represent that Email and Interest together determine SinceAge. That is SinceAge is not determined by Email alone.
This example is not with full functional dependencies, CurrCity is not fully dependent on Email+Interest, because it is dependent on Email alone.
Functional Dependencies and Keys
- We use keys to enforce full functional dependencies, x >> y
- In a relation, the values of the key are unique
- That is why it enforces function
Overview of Normal Forms
To help us identify how well a relation has been defined, we introduce four normal forms:
The whole set of data structures is called Non-First Normal Form NF2. Most of those are not relations in a sense of what we have defined of what a relation is.
A subset of the Non-First Normal Form data structures is the set of First Normal Form relations 1NF.
A subset of the First Normal Form relations is the Second Normal Form relations 2NF. Within the 2NF there is a subset of Third Normal Form relations 3NF.
As a subset of the 3NF is a set of Boyce-Codd Normal Form BCNF relations.
BCNF relations are the relations we are aiming at.
Normal Forms Definitions
NF2 - Non-First Normal Form
1NF - Relation is in 1NF iff all domain values are atomic
2NF - relation is in 2NF iff Relation is in 1NF and every nonkey attribute is fully dependent on the key
3NF - relation is in 3NF iff relation is 2NF and every nonkey attribute is non-transitively dependent on the key
BCNF - Relation is in BCNF iff every determinant is a candidate key
Determinant - a set of attributes in a relation on which some other attribute is fully functionally dependent
Kent and Diehr Quote Eselsbrücke
1NF BCNF Flow Chart
I want to isolate functional dependencies that have different left-hand sides.
Since Email & Interest together determine SinceAge that is a good functional dependency to split out. When you split out that functional dependency, you actually also see that this portion {Email, Interest} of the decomposed relation is in BCNF. Why is that? Because the only determinant of these three is the pair of Email and Interest, and they will have to be a candidate key to enforce that functional dependency.
What remains then is that we need to represent that Email determines CurrCity, BirthYear and Salary and that BirthYear by itself determines salary.
Compute With Functional Dependencies | Armstrong’s Rules
To make sure that we do not loose information and that we preserve the functional dependencies when we decompose relations, we need to be able to compute ßßß meaning. The tools for doing that are calles Armstrong’s rules.
Rule of Reflexivity
If Y is part of Y, then X determines Y as a function. OR IN OTHER WORDS …
If the right-hand side is a subset of the left-hand side, then the left-hand-side determines the right-hand. OR IN OTHER WORDS …
If INTEREST is a subset of EMAIL, INTEREST then EMAIL, INTEREST determines INTEREST.
Rule of Augmentation
If X functional determines Y, then you can augment it with the same thing on the left-hand side as on the right-hand side. OR IN OTHER WORDS
If Email determines BirthYear, then it will also be true that Email, Interest determines BirthYear, Interest
Rule of Transitivity
Finally, very important transitivity.
If X determines Y, and Y determines Z, then X determines Z OR IN OTHER WORDS …
IF Email determines BirthYear and BirthYear determines Salary, then Email determines Salary.
How to Guarantee Lossless Joins?
Question now is, how do we guarantee lossles joins, when we decompose a relation into smaller relations? Here is a very easy way to remember that:
The join field must be a key in at last one of the 2 relations
So, when you look at the above decomposition, the the join field obviuosly is Email, if Email is a key in one of the two tables, the we are guaranteed to not loose information from doing this decomposition. In other words when we join these two together again over Email, then we are guaranteed to not create a tuple that was not there in the relation we started out with.
The simple was to understand why this is the case, that when the join field is a key as it is here, there is no way of blowing duplicates in the Email Colum on the right with the Email column on the left.
How to Guarantee Preservation of FDs (Functional Dependencies)?
The meaning implied by the remaining functional dependencies must be the same as the meaning that was implied by the original set.
So, looking here at the decomposition of the functional dependencies diagram Email >> BirthYear and CurrentCity as one relation and in a separate relation BirthYear and Salary, we can now, by the transisitivity rule see the following:
Email >> BirthYear AND
BirthYear >> Salary THEN
Email >> Salary
LineDance App ERD Version 0.3 Relations
Eigentlich brauche ich doch nur eine Tabelle wie
Email >> Interest, 3NF and BCNF
Let us go back to the example of a good decomposition as we did before starting with the functional dependencies diagram.
1st decomposition
Email needs to be the key
2nd decomposition
There does exist relations which using lossless and dependency preserving decompositions can only be normalized to 3NF not also to BCNF
This only happens in cases where relations have overlapping keys.
It Never Happens in Practice
As illustrated by the Venn diagramm it happens never in practice. Proof by experience:
- it never happend to me in 35 years (what never happened?). I designed a lot of databases in industry, in government and … and in never happenend to me.
- Take a look at one of bookcases in my office. This one bookcase contains database textbooks only. Every single one of these textbooks has in it a chapter on databse normalization. They will show you 1NF, 2NF, 3NF and BCNF. And then, coming to the bottom of the right page, it is said “however, it does exist relations that can only be decomposed to 3NF and not BCNF. It takes a really sick brain to construct an example like that.
So, be confident, if you follow the steps and bring relations through 1NF, 2NF and 3NF, you will be loggy and you will end up in BCNF in practice.