Fifth normal form
Fifth normal form (5NF), also known as projection–join normal form (PJ/NF), is a level of database normalization designed to remove redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every non-trivial join dependency in that table is implied by the candidate keys. It is the final normal form as far as removing redundancy is concerned.
A 6NF also exists, but its purpose is not to remove redundancy and it is therefore only adopted by a few data warehouses, where it can be useful to make tables irreducible.
A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R.[1]
The fifth normal form was first described by Ronald Fagin in his 1979 conference paper Normal forms and relational database operators.[2]
Example
Consider the following example:
Traveling salesman | Brand | Product type |
---|---|---|
Jack Schneider | Acme | Vacuum cleaner |
Jack Schneider | Acme | Breadbox |
Mary Jones | Robusto | Pruning shears |
Mary Jones | Robusto | Vacuum cleaner |
Mary Jones | Robusto | Breadbox |
Mary Jones | Robusto | Umbrella stand |
Louis Ferguson | Robusto | Vacuum cleaner |
Louis Ferguson | Robusto | Telescope |
Louis Ferguson | Acme | Vacuum cleaner |
Louis Ferguson | Acme | Lava lamp |
Louis Ferguson | Nimbus | Tie rack |
The table's predicate is: products of the type designated by product type, made by the brand designated by brand, are available from the traveling salesman designated by traveling salesman.
The primary key is the composite of all three columns. Also note that the table is in 4NF, since there are no multivalued dependencies (2-part join dependencies) in the table: no column (which by itself is not a candidate key or a superkey) is a determinant for the other two columns.
In the absence of any rules restricting the valid possible combinations of traveling salesman, brand, and product type, the three-attribute table above is necessary in order to model the situation correctly.
Suppose, however, that the following rule applies: A traveling salesman has certain brands and certain product types in their repertoire. If brand B1 and brand B2 are in their repertoire, and product type P is in their repertoire, then (assuming brand B1 and brand B2 both make product type P), the traveling salesman must offer products of product type P those made by brand B1 and those made by brand B2.
In that case, it is possible to split the table into three:
Traveling salesman | Product type |
---|---|
Jack Schneider | Vacuum cleaner |
Jack Schneider | Breadbox |
Mary Jones | Pruning shears |
Mary Jones | Vacuum cleaner |
Mary Jones | Breadbox |
Mary Jones | Umbrella stand |
Louis Ferguson | Telescope |
Louis Ferguson | Vacuum cleaner |
Louis Ferguson | Lava lamp |
Louis Ferguson | Tie rack |
Traveling salesman | Brand |
---|---|
Jack Schneider | Acme |
Mary Jones | Robusto |
Louis Ferguson | Robusto |
Louis Ferguson | Acme |
Louis Ferguson | Nimbus |
Brand | Product type |
---|---|
Acme | Vacuum cleaner |
Acme | Breadbox |
Acme | Lava lamp |
Robusto | Pruning shears |
Robusto | Vacuum cleaner |
Robusto | Breadbox |
Robusto | Umbrella stand |
Robusto | Telescope |
Nimbus | Tie rack |
In this case, it's impossible for Louis Ferguson to refuse to offer vacuum cleaners made by Acme (assuming Acme makes vacuum cleaners) if he sells anything else made by Acme (lava lamp) and he also sells vacuum cleaners made by any other brand (Robusto).
Note how this setup helps to remove redundancy. Suppose that Jack Schneider starts selling Robusto's products breadboxes and vacuum cleaners. In the previous setup we would have to add two new entries one for each product type (<Jack Schneider, Robusto, breadboxes>, <Jack Schneider, Robusto, vacuum cleaners>). With the new setup we need to add only a single entry (<Jack Schneider, Robusto>) in "brands by traveling salesman".
Usage
Only in rare situations does a 4NF table not conform to 5NF. For instance, when the decomposed tables are cyclic. These are situations in which a complex real-world constraint governing the valid combinations of attribute values in the 4NF table is not implicit in the structure of that table. If such a table is not normalized to 5NF, the burden of maintaining the logical consistency of the data within the table must be carried partly by the application responsible for insertions, deletions, and updates to it; and there is a heightened risk that the data within the table will become inconsistent. In contrast, the 5NF design excludes the possibility of such inconsistencies.
A table T is in fifth normal form (5NF) or projection-join normal form (PJ/NF) if it cannot have a lossless decomposition into any number of smaller tables. The case where all the smaller tables after the decomposition have the same candidate key as the table T is excluded.
See also
References
- Analysis of normal forms for anchor-tables
- S. Krishna (1991). Introduction to Data Base and Knowledge Base Systems. ISBN 9810206208.
The fifth normal form was introduced by Fagin
Further reading
- Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM, vol. 26, pp. 120–125.
- Date, C. J., & Darwen, H., & Pascal, F. Database Debunkings.
- Darwen, H.; Date, C. J.; Fagin, R. (2012). "A normal form for preventing redundant tuples in relational databases". Proceedings of the 15th International Conference on Database Theory – ICDT '12 (PDF). pp. 114–126. doi:10.1145/2274576.2274589. ISBN 9781450307918.