DBMS Normalization

Normalization is a process of arranging relations or set of relations in a manner so that insertion, deletion or updation anomalies do not persist with in the database. It’s being considered foremost while creating any table or relation to maintain consistent data and reduce redundancy. A database table or relation is said to be normalized if it satisfies the following Normal Forms:

1. First Normal Form (1NF)

  • A relation (Set of tuples. Here each Student data like “John who is of class 10th and holds some subjects represent a relation.
  • In RDBMS tables represents relation with other table by establishing Foreign key.) is said to be in 1NF if each attribute in relation contains an atomic value (i.e. ,single valued attribute).
    Consider the below Students table. In which each student is pursuing more than one subject. But it is not the convenient way to represent it in the table. Because our rule state that single value attribute should be maintain by attribute.

Students Table

IdNameSubjectClass
1JohnMaths
Physics
10th
2AmitEnglish
Chemistry
10th

So, we have simplified it by making it single valued attribute as in the below table.

IdNameSubjectClass
1JohnMaths10th
1JohnPhysics10th
2AmitEnglish12th
2AmitChemistry12th

2. Second Normal Form (2NF)

  • 2NF states that relation should be in 1NF firstly.
  • Then all non-prime-key attributes are fully functional dependent on the primary key or we can say no partial dependency.

Consider the below Employee_Project table

EmpIdEmpNameProjectIdProjectNameWorking Hours
E1JohnP1Johnsons4
E2RohnP2Almora6
E3SumitP1Johnsons3
E1JohnP2Almora5

Given table is in 1NF.
Now examine whether the table fulfill our next condition. Here EmpId and ProjectId are primary key attributes and in current table they together are working as composite key. while EmpName and ProjectName are non-primary key attributes.

In the above table ProjectName is non primary key attribute and is functionally dependent on ProjectId and not on primary key EmpId which is making partial dependency. Also, if we have to fetch the working hours for Project Johnsons. Then we will have to use ProjectId in addition with EmpId to uniquely identify a row. Thus ProjectId+EmpId (composite key) act as primary key. We will remove partial dependency by making separate table for Project.

Let’s shatter it into two tables to obey the 2NF rule.

Employee_Project

EmpIdEmpNameProjectIdWorking Hours
E1JohnP14
E2RohanP26
E3SumitP13
E1JohnP25

Projects

ProjectIdProjectName
P1Johnsons
P2Almora

Now both tables are in 2NF.

3. Third Normal Form(3NF)

  • 3NF states that relation in table should be in 2NF.
  • Has no transitive functional dependency for non-prime attributes.
  • Reduce data replication and maintain data integrity.

    Student_Semester table
IdStudentIdSubjectIdMarksSemesterTotalSubjects
11160Sem14
22270Sem25
31280Sem14

Here in the given table, Suppose some student studies in XYZ college. Each semester has its own list of subjects. So, SubjectId functionally depends on Semester, Marks(non-prime attribute) depends on SubjectId(prime attribute) and Semester depends on StudentID. TotalSubjects depends on Semester. Thus TotalSubjects (non-prime attribute)column transitively depends upon StudentId(prime attribute).

Semester depends upon StudentId
TotalSubjects depends upon Semester
Then TotalSubjects indirectly depends upon StudentId

So, we will create three separate tables Student_Subject, Subjects and Semester

StudentIdSubjectIdMarksSemId
11601
22702
31801

Subjects Table

SubjectIdName
1Maths
2English

Semester Table

SemesterIdName
1Semester1
2Semester2

So, now transitive dependency is removed, obey 2NF and our tables are in 3NF. One thing need to remember while normalizing is that each tuple should maintain unique record.

Boyce Codd normal form (BCNF)

  • Advance version of 3NF. Also known as 3.5 Normal Form.
  • States that relation should be in 3NF.
  • For every functional dependency X->Y, X should be the super key (i.e., If Y is prime attribute depends on X then X should not be non-prime attribute).

    Students Table
StudentIdSubjectUniversityTeacher
S1MathsDFGRay
S2EnglishDFGEva
S3ScienceDFGJack
S1CivicsDFGJohn
S1EnglishDFGMack

If we examine then student StudentId + SubjectName together form primary key(i.e. ,composite key). Because with the help of these two we would able to uniquely identify each row.

Subject(prime attribute, say Y) depends on Teacher(non-prime, say X). In the given table BCNF rule is violated:

  • Because prime attribute depends on non-prime.
  • Neither StudentId nor SubjectName independently act as primary key.

Also, this way of storing data in table would generate insert, update and delete anomalies. Consider a scenario, suppose if we have to update the University name then we have to update all rows with value “DFG”. Here we have taken a simple example but think about large database(with 50,000 Or + records) it would be time consuming and increase computational cost. If in any case any of the record left unchanged then will create inconsistency with Updation Anomaly.

As we can see in the given table, same piece of information exist multiple timesl(Like Student with Name S1 and University Name exist multiple times i.e. hold same piece of information).So, we need to normalize it by fragmenting into tables.

Student_University Table

StudentIdUniversityId
S11
S21
S31
S41

Student_Teacher Table

StudentIdTeacherId
1T1
2T2
3T3
1T4
1T5

Teacher_Subject

TeacherIdSubject
T1Maths
T2English
T3Science
T4Civics
T5English

University table

IdName
1DFG
2…… continue

We simply need to change one row to update the University name.

4. Fourth Normal Form(4NF)

  • 4NF states that relation should be in BCNF.
  • For any single value within the table, there should not be any multivalued dependency.
  • Say for a functional dependency X → Y, if for a single value of X, multiple values of Y exists, then the relation will be considered to have multi-valued dependency.
    Consider the below Employee table:
EmpIdSkillsPhoneNo.
12PHP9999999999
12JAVA8888888888
13C++7777777777

Company column has no relation with employee phone number. So, we will disintegrate into two columns.

Emp_Skill

EmpIdSkills
12PHP
12JAVA
13C++

Emp_PhoneNo

EmpIdPhoneNo
129999999999
128888888888
137777777777

Now both tables are in 4NF.

5. Fifth Normal Form(5NF)

  • 5NF states that relation should be in 4NF.
  • It is required that according to the functional dependency, table cannot be further break down into more tables without losing data.
  • Generally disintegration of table is done to avoid redundancy but it should be done without data loss.
  • Should not contain join dependency.
  • Also known as project join normal form.

Let’s firstly try to understand join dependency.

Consider the below table. Here the Student studies subject Electronics from teacher Mario. Mario can teach subject Electronics, C++. One more student Miley studies subject C++ from Mario.

But this is not mandatory in the current relation that any student who study from any teacher is required to take the all subjects taught by any particular teacher.

Students

StudentSubjectTeacher
JohnElectronicsMario
JohnC++Mario
MileyC++Mario
JackElectronicsAlbert

Table contain redundancy as John and Mario appears twice. Now we will decompose it into three tables or create three projections and find out whether the table contain join dependency or not.

Student_Subject

StudentSubject
JohnElectronics
JohnC++
MileyC++
JackElectronics

Subject_Teacher

ElectronicsMario
C++Mario
ElectronicsAlbert

Student_Teacher

StudentTeacher
JohnMario
MileyMario
JackAlbert
StudentSubjectTeacher
JohnElectronicsMario
JohnElectronicsAlbert
JohnC++Mario
MileyC++Mario
JackElectronicsMario
JackElectronicsAlbert

When we have joined Student_Subject and Subject_Teacher. Then 2 more spurious row gets added into the result as shown above. Thus lossy decomposition exist here.

Now we further perform natural join on this result with Student_Teacher and got the original Students table below. Thus the original Students table contains join dependency and is not in 5NF. However table Student_Subject,Subject_Teacher,Student_Teacher are in 5NF. Because these tables can’t be break down further and when we perform natural join on these, original table is generated without loss of information. Also, no any spurious row is added.

If decomposed tables or projections are joined. A table (say Students) is said to have JOIN dependency when there is:

  • No loss of information in the generated table (say Students’).
  • No any spurious row is added in the generated table(say Students’) .

    Students’
StudentSubjectTeacher
JohnElectronicsMario
JohnC++Mario
MileyC++Mario
JackElectronicsAlbert

However in some scenarios, it is not always possible to eliminate the redundancy from the table. Because if we try to do so then it may result in information loss. Consider the Student table below:

Students table

StudentTeacherSubject
JohnMackElectronics
JohnMackC++
JohnAbrahimDBMS
MileyMackDBMS

Student_Teacher

StudentTeacher
JohnMack
JohnAbrahim
MileyMack

Teacher_Subject

TeacherSubject
MackElectronics
MackC++
AbrahimDBMS
MackDBMS

Student_Subject

StudentSubject
JohnElectronics
JohnC++
JohnDBMS
MileyDBMS

So, when we perform join on Student_Teacher and Teacher_Subject, we found two spurious rows within the table.

StudentTeacherSubject
JohnMackElectronics
JohnMackC++
JohnMackDBMS(X(INCORRECT)
JohnAbrahimDBMS
MileyMackElectronicsX(INCORRECT)
MileyMackC++X(INCORRECT)
MileyMackDBMS

Finally when we natural join this resulted table with Student_Subject, we got the below table. Thus it is lossy decomposition and original table including fragmented tables are not in 5NF.

StudentTeacherSubject
JohnMackElectronics
JohnMackC++
JohnMackDBMSX(INCORRECT)
JohnAbrahimDBMS
MileyMackDBMS

Leave a Reply

Your email address will not be published. Required fields are marked *