By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,805 Members | 1,100 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,805 IT Pros & Developers. It's quick & easy.

many tables vs one big table

P: n/a
Dear experts,

If I have a number of related groups of data (e.g., stock prices for
different stocks) is it better to put them into many tables (e.g., one
table per stock) or into one big table (e.g., a single table where one
column is the stock id other columns are the price and other data)?

The single table approach with a clustered primary key including the
stock id and other information that is different for each stock seems
cleaner to me. But I worry that having a single table could hurt
preformance when the database gets very large. The clustered primary
key should make queries on the single table fast, but I'm worried about
inserts. According to my vague understanding of a clustered primary
key, records are physically stored on disk based on the primary key. So
if I insert a record for a given stock does the database have to
physicall "move down" all the records that are below that or does it do
something smarter?

To summarize:

1. Should I use a single table or many tables?

2. If using a single table, how are inserts handled by the DB to
prevent inserts from getting slower and slower as more items are added
to the DB?

Thanks in advance for your help and advice.

Jan 2 '07 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Use a single table, not many tables. Use a multi-column key that
includes the stock. When a new stock comes along it should be a user
transaction to add a row to the Stocks master table, not a database
administrator's job to add tables. The database design should be
stable, not in a constant state of change.

Performance of INSERTS does not change much between inserting to a
table with 10000 rows or 10000000 rows.

Roy Harvey
Beacon Falls, CT

On 2 Jan 2007 08:57:21 -0800, "Emin" <em**********@gmail.comwrote:
>Dear experts,

If I have a number of related groups of data (e.g., stock prices for
different stocks) is it better to put them into many tables (e.g., one
table per stock) or into one big table (e.g., a single table where one
column is the stock id other columns are the price and other data)?

The single table approach with a clustered primary key including the
stock id and other information that is different for each stock seems
cleaner to me. But I worry that having a single table could hurt
preformance when the database gets very large. The clustered primary
key should make queries on the single table fast, but I'm worried about
inserts. According to my vague understanding of a clustered primary
key, records are physically stored on disk based on the primary key. So
if I insert a record for a given stock does the database have to
physicall "move down" all the records that are below that or does it do
something smarter?

To summarize:

1. Should I use a single table or many tables?

2. If using a single table, how are inserts handled by the DB to
prevent inserts from getting slower and slower as more items are added
to the DB?

Thanks in advance for your help and advice.
Jan 2 '07 #2

P: n/a
Emin (em**********@gmail.com) writes:
If I have a number of related groups of data (e.g., stock prices for
different stocks) is it better to put them into many tables (e.g., one
table per stock) or into one big table (e.g., a single table where one
column is the stock id other columns are the price and other data)?
There are two reasons why you would have more than one table:

1) Information is so different for different (groups of) stocks,
that you get different set of columns.

2) The table is really big. Then you would might make into a partioned
view or table, based on condition. But that would not be one table
per stock id.

So, in the logical design, that's one table.
The single table approach with a clustered primary key including the
stock id and other information that is different for each stock seems
cleaner to me. But I worry that having a single table could hurt
preformance when the database gets very large. The clustered primary
key should make queries on the single table fast, but I'm worried about
inserts. According to my vague understanding of a clustered primary
key, records are physically stored on disk based on the primary key. So
if I insert a record for a given stock does the database have to
physicall "move down" all the records that are below that or does it do
something smarter?
If the PK is monotonically ascending new rows are just added at the
end, all continues smoothly. If the PK is not ascending, but for instance
a random GUID, then you will get a lots of page splits, and that
takes extra power. But the answer in this case is that you should
not cluster on the GUID, but on something else. It's perfectly possible
to have the clustered index on some other column(s).
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jan 2 '07 #3

P: n/a
On 2 Jan 2007 08:57:21 -0800, Emin wrote:

(snip)
>if I insert a record for a given stock does the database have to
physicall "move down" all the records that are below that or does it do
something smarter?
Hi Emin,

Roy answered most of your questions, but was friendly enough to leave
this bit for me to comment on.

Yes, it does something smarter. Rows are *NOT* physically stored in the
order imposed by the clustered index. The clustered index imposes a
*logical* ordering on the data, which is implemented by pointers that
form a logical chain of database pages.

If a row in inserted, SQL Server first navigates the clustered index to
find the correct location (database page). Then, there are two
possibilities:

1. There's still space left on the page. The row is added and the insert
is finished.

2. There's not enough free space on the page. A page split occurs: half
of the data on the page is moved to a newly allocated page and pointers
are updated to reflect the location of the new page in the chain. After
that, the new row is inserted.

If your database has to process *LOTS* of inserts (in the order of
thousands per second or more), it makes sense to define your clustered
index such that new rows are always inserted at the logical end of the
pointer chain. In that case, page splits will never happen.
For lower amounts of inserts, the overhead of a page split is
insignificant.

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Jan 2 '07 #4

P: n/a
Dear Roy, Erland, and Hugo,

Thank you very much for your excellent responses. It sounds like a
single logical table is the way to go since the frequency of inserts is
not very high. If necessary, where would I get more information about
low level details like page splits (e.g., how to control the page size,
how to determine when and how many page splits are occuring, etc.)?

Thanks again,
-Emin

On Jan 2, 5:53 pm, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
wrote:
On 2 Jan 2007 08:57:21 -0800, Emin wrote:

(snip)
if I insert a record for a given stock does the database have to
physicall "move down" all the records that are below that or does it do
something smarter?Hi Emin,

Roy answered most of your questions, but was friendly enough to leave
this bit for me to comment on.

Yes, it does something smarter. Rows are *NOT* physically stored in the
order imposed by the clustered index. The clustered index imposes a
*logical* ordering on the data, which is implemented by pointers that
form a logical chain of database pages.

If a row in inserted, SQL Server first navigates the clustered index to
find the correct location (database page). Then, there are two
possibilities:

1. There's still space left on the page. The row is added and the insert
is finished.

2. There's not enough free space on the page. A page split occurs: half
of the data on the page is moved to a newly allocated page and pointers
are updated to reflect the location of the new page in the chain. After
that, the new row is inserted.

If your database has to process *LOTS* of inserts (in the order of
thousands per second or more), it makes sense to define your clustered
index such that new rows are always inserted at the logical end of the
pointer chain. In that case, page splits will never happen.
For lower amounts of inserts, the overhead of a page split is
insignificant.

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog:http://sqlblog.com/blogs/hugo_kornelis
Jan 3 '07 #5

P: n/a
One more question, is there a process similar to defragmentation that I
should run periodically to collect split pages together?

Thanks again,
-Emin

On Jan 3, 1:42 pm, "Emin" <emin.shop...@gmail.comwrote:
Dear Roy, Erland, and Hugo,

Thank you very much for your excellent responses. It sounds like a
single logical table is the way to go since the frequency of inserts is
not very high. If necessary, where would I get more information about
low level details like page splits (e.g., how to control the page size,
how to determine when and how many page splits are occuring, etc.)?

Thanks again,
-Emin

On Jan 2, 5:53 pm, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
wrote:
On 2 Jan 2007 08:57:21 -0800, Emin wrote:
(snip)
>if I insert a record for a given stock does the database have to
>physicall "move down" all the records that are below that or does it do
>something smarter?Hi Emin,
Roy answered most of your questions, but was friendly enough to leave
this bit for me to comment on.
Yes, it does something smarter. Rows are *NOT* physically stored in the
order imposed by the clustered index. The clustered index imposes a
*logical* ordering on the data, which is implemented by pointers that
form a logical chain of database pages.
If a row in inserted, SQL Server first navigates the clustered index to
find the correct location (database page). Then, there are two
possibilities:
1. There's still space left on the page. The row is added and the insert
is finished.
2. There's not enough free space on the page. A page split occurs: half
of the data on the page is moved to a newly allocated page and pointers
are updated to reflect the location of the new page in the chain. After
that, the new row is inserted.
If your database has to process *LOTS* of inserts (in the order of
thousands per second or more), it makes sense to define your clustered
index such that new rows are always inserted at the logical end of the
pointer chain. In that case, page splits will never happen.
For lower amounts of inserts, the overhead of a page split is
insignificant.
--
Hugo Kornelis, SQL Server MVP
My SQL Server blog:http://sqlblog.com/blogs/hugo_kornelis
Jan 3 '07 #6

P: n/a
Emin (em**********@gmail.com) writes:
One more question, is there a process similar to defragmentation that I
should run periodically to collect split pages together?
Yes, DBCC DBREINDEX on SQL 2000. The preferred syntax on SQL 2005 is
ALTER INDEX WITH REBUILD or something like that.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jan 3 '07 #7

P: n/a
On 3 Jan 2007 10:42:46 -0800, Emin wrote:
If necessary, where would I get more information about
low level details like page splits (e.g., how to control the page size,
how to determine when and how many page splits are occuring, etc.)?
Hi Emin,

For that kind of information, the only serious resource are the books
written by Kalen Delaney. Quite advanced stuff, definitely not entry
level, but that's unavoidable if you want to know about page splits.

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Jan 4 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.