473,406 Members | 2,710 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Should I Denormalize?

I have a database design question and wanted to see if folks could lend some
advice. For simplicity sake, I'll illustrate using a book (but the
hierarchy is the same in the real application - just with extra fields).

A book has a title has chapters.
A chapter has a title and has pages.
A page has a title has pictures.
A picture has a title.

My application will show a book, a chapter, a page, or a picture at any
given time. However, regardless of what I'm showing, I need to show my
name, and the names of all my parents (e.g. a page must show it's name, and
the names of it's parent chapter and book).

The question - in database terms, does the second, denormalized table design
give any performance advantage? Here are my two options:

-- Normalized --
book: book_id, title
chapter: chapter_id, book_id, title
page: page_id, chapter_id, title
picture: picture_id, page_id, title

-- Denormalized --
book: book_id, title
chapter: chapter_id, book_id, title
page: page_id, chapter_id, book_id, title
picture: picture_id, page_id, chapter_id, book_id, title

I'll use the most extreme case (the lowest table) as an example. When
viewing a picture, I'll need to get the picture title, and also the titles
of each of my parents. Below are the resulting queries and explain routes:

NORMALIZED QUERY

select book.title, chapter.title, page.title, picture.title
from book, chapter, page, picture
where book.book_id = chapter.book_id
and chapter.chapter_id = page.chapter_id
and page.page_id = picture.page_id
and picture.picture_id = 3601

NORMALIZED EXPLAIN

PRIMARY,fk_picture_2_page
PRIMARY,fk_page_2_chapter
PRIMARY,fk_chapter_2_book
PRIMARY

If I chose the denormalized version, they are:

DENORMALIZED QUERY

select book.title, chapter.title, page.title, picture.title
from book, chapter, page, picture
where book.book_id = picture.book_id
and chapter.chapter_id = picture.chapter_id
and page.page_id = picture.page_id
and picture.picture_id = 3601

DENORMALIZED EXPLAIN

PRIMARY,fk_picture_2_page,fk_picture_2_chapter,fk_ picture_2_book
PRIMARY
PRIMARY
PRIMARY

I've run some basic tests, and haven't noticed any real difference in
performance. I'm pretty new to database design, so if anyone has any
pointers, please let me know.

Thanks!

--
Josh
Jul 23 '05 #1
4 2826
"Josh" <memphis75 - gmail> wrote in message
news:11*************@corp.supernews.com...
<snip>
The question - in database terms, does the second, denormalized table design give any performance advantage?

<snip>

The entire (MySQL) system is designed to perform it's all around best using
textbook
classic normalized tables. The kind every intro to db design preaches. This
seems to be the most difficult thing to "preach" to newcomers to database
design. It just sounds so unconvincing that one *should* spend a bit more
effort writing table joins instead of cramming redundant information into
flat tables. My advice to someone serious about database design would be to
simply take it on faith because the "Ahah!" experience will come to you.
For most of us who begin by hot rodding flat tables at the get go the
revelation experience is more like "Oh sh**! - So that's why you're supposed
to do it that way! ..."

Let me just say that you (might!) get some performance advantage using flat
"denormalized" tables. MySQL would certainly be the *wrong* tool for this.
I would go for a simple Excel table or one of the several flat file database
systems that are out there. As long as the data set is *small* and the
entire system is *simple* you will enjoy a performance advantage and may
never encounter the numerous and onerous booby traps this will set for a
system you expect to grow over time. MySQL, on the other hand, is meant to
scale large. It may not be as fast out of the gate, but neither will it slow
or bog down as the system expands. It's more like a marathon runner. As the
miles wear on the sprinter always falls behind never to catch up again.

But hey! If you are only going to cover 100 or 200 yards - by all means
sprint. Just don't waste your efforts by using a marathon tool like MySQL.

Okay - enough. Now what does your book say about normalizing a database
that you don't quite believe? Or the advantage that you think might be worth
sacrificing for few microseconds faster response time?

Thomas Bartkus

Oh how I wish production MySQL had views or virtual tables. Then when the
next PHP hotshot demands a flat table to simplify his coding I could just
write a nice proper query, store it, and tell him I'm giving him his table -
muttering "jerk!" under my breath.
Jul 23 '05 #2
Thomas - Thank you for taking the time to explain this!
The entire (MySQL) system is designed to perform it's all around best
using textbook classic normalized tables.
This is exactly the information I was hoping to get.
take it on faith because the "Ahah!" experience will come to you.


I loaded a few million records across the tables and ran some performance
tests (multi-threaded). You're absolutely right about the minimal
performance gains. As long as I was joining on primary keys, there was no
significant boost by denormalizing, only extra columns to keep up with!

--
Josh

Jul 23 '05 #3
"Josh" <memphis75 - gmail> wrote in message
news:11*************@corp.supernews.com...
Thomas - Thank you for taking the time to explain this!
The entire (MySQL) system is designed to perform it's all around best
using textbook classic normalized tables.
This is exactly the information I was hoping to get.
take it on faith because the "Ahah!" experience will come to you.


I loaded a few million records across the tables and ran some performance
tests (multi-threaded). You're absolutely right about the minimal
performance gains. As long as I was joining on primary keys, there was no
significant boost by denormalizing, ....

.... only extra columns to keep up with! * * *


Yup - that's one of the ahahs!
Maintaining data integrity is going to be *so much easier and reliable* when
you follow the rules.

There will be more!
Thomas Bartkus
Jul 23 '05 #4
Thomas Bartkus wrote:
"Josh" <memphis75 - gmail> wrote in message
news:11*************@corp.supernews.com...
Thomas - Thank you for taking the time to explain this!

The entire (MySQL) system is designed to perform it's all around best
using textbook classic normalized tables.


This is exactly the information I was hoping to get.

take it on faith because the "Ahah!" experience will come to you.


I loaded a few million records across the tables and ran some performance
tests (multi-threaded). You're absolutely right about the minimal
performance gains. As long as I was joining on primary keys, there was no
significant boost by denormalizing, ....


> .... only extra columns to keep up with! * * *


Yup - that's one of the ahahs!
Maintaining data integrity is going to be *so much easier and reliable* when
you follow the rules.

There will be more!
Thomas Bartkus

Another factor to consider when looking at performance is the
cost of "throwing hardware at the problem". As RAM and hard
disks both become less expensive and faster to access trying for
performance improvements by denormalizing your database (aside
from probably not yielding any significant gains) is short
sighted and will result in additional maintenance costs that
will exceed the cost of additional hardware.
Jul 23 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

303
by: mike420 | last post by:
In the context of LATEX, some Pythonista asked what the big successes of Lisp were. I think there were at least three *big* successes. a. orbitz.com web site uses Lisp for algorithms, etc. b....
3
by: Andy Visniewski | last post by:
Should be easy, but I've been trying to figure this out for about half an hour with no luck. There is a table 'Cybex' which has all the Cybex products we sell, and a table 'Datasheets' which...
36
by: Jack Hughes | last post by:
This argument has come up two or three times lately. We have a web standards document at our company that say "Use valid HTML 4; We don't recommend switching to DIV-based CSS-P for layout just...
12
by: Sanjay | last post by:
hi, We are currently porting our project from VB6 to VB .NET. Earlier we used to make scale transformations on objects like pictureBox , forms etc.Now Such transformations are made on the...
8
by: Chris Mayers | last post by:
I am trying to track a suspected memory leak in an application I'm writing and through repeated simplification of my project, I have found the following which is quite easily replicated should you...
5
by: davidw | last post by:
Last time, I asked a question about if a XPathDocument could be serialized. Actually what I really want to know is how should I do in this case: I have a class that will be serialized. In the...
6
by: Terry Bell | last post by:
We've had a very large A97 app running fine for the last seven years. I've just converted to SQL Server backend, which is being tested, but meanwhile the JET based version, running under terminal...
8
by: FAQ server | last post by:
----------------------------------------------------------------------- FAQ Topic - When should I use eval? ----------------------------------------------------------------------- The ` eval() `...
1
by: baburk | last post by:
Hi all, Is there is any Specific Reasons To Denormalize. Any example?. Thanks
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.