SQL Server storing large amounts of data in multiple tables

jeff brubaker

Hello,
Currently we have a database, and it is our desire for it to be able
to store millions of records. The data in the table can be divided up
by client, and it stores nothing but about 7 integers.
| table |
| id | clientId | int1 | int2 | int 3 | ... |

Right now, our benchmarks indicate a drastic increase in performance
if we divide the data into different tables. For example,
table_clientA, table_clientB, table_clientC, despite the fact the
tables contain the exact same columns. This however does not seem very
clean or elegant to me, and rather illogical since a database exists
as a single file on the harddrive.

| table_clientA |
| id | clientId | int1 | int2 | int 3 | ...

| table_clientB |
| id | clientId | int1 | int2 | int 3 | ...

| table_clientC |
| id | clientId | int1 | int2 | int 3 | ...

Is there anyway to duplicate this increase in database performance
gained by splitting the table, perhaps by using a certain type of
index?

Thanks,
Jeff Brubaker
Software Developer

Jul 20 '05 #1

Subscribe Post Reply

4129

Chuck Urwiler

Why not create a view that will combine the separate tables back into your
original format. You could even place the different base tables in different
locations (i.e., different servers) - the whole distributed partitioned view
concept.

Other than this, what do you have indexed on this table? What kind of query
is showing a drastic improvement when you separate the table like this?

-Chuck Urwiler, MCSD, MCDBA

"jeff brubaker" <je**@priva.com> wrote in message
news:b7*************************@posting.google.co m...

Hello,
Currently we have a database, and it is our desire for it to be able
to store millions of records. The data in the table can be divided up
by client, and it stores nothing but about 7 integers.
| table |
| id | clientId | int1 | int2 | int 3 | ... |

Right now, our benchmarks indicate a drastic increase in performance
if we divide the data into different tables. For example,
table_clientA, table_clientB, table_clientC, despite the fact the
tables contain the exact same columns. This however does not seem very
clean or elegant to me, and rather illogical since a database exists
as a single file on the harddrive.

| table_clientA |
| id | clientId | int1 | int2 | int 3 | ...

| table_clientB |
| id | clientId | int1 | int2 | int 3 | ...

| table_clientC |
| id | clientId | int1 | int2 | int 3 | ...

Is there anyway to duplicate this increase in database performance
gained by splitting the table, perhaps by using a certain type of
index?

Thanks,
Jeff Brubaker
Software Developer

Jul 20 '05 #2

Erland Sommarskog

[posted and mailed, please reply in news]

jeff brubaker (je**@priva.com) writes:

Currently we have a database, and it is our desire for it to be able
to store millions of records. The data in the table can be divided up
by client, and it stores nothing but about 7 integers.
| table |
| id | clientId | int1 | int2 | int 3 | ... |

Right now, our benchmarks indicate a drastic increase in performance
if we divide the data into different tables. For example,
table_clientA, table_clientB, table_clientC, despite the fact the
tables contain the exact same columns. This however does not seem very
clean or elegant to me, and rather illogical since a database exists
as a single file on the harddrive.
...

Is there anyway to duplicate this increase in database performance
gained by splitting the table, perhaps by using a certain type of
index?

It is not implausible, but with out further knowledge of your tables
and the benchmark queries, it is impossible to tell.

You could get a more informative answer, if you posted:

o The CREATE TABLE statements (both for the unpartitioned table,
and the partitioned table).
o Any indexes on the tables.
o The queries you use for the benchmark.
o If you have scripts that generates data for the benchmarks, that
would extremely useful. (Provided that they reasonably small.)

Which client did you use for the benchmark? Query Analyzer?

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #3

jeff brubaker

Erland Sommarskog <so****@algonet.se> wrote in message news:<Xn**********************@127.0.0.1>...

[posted and mailed, please reply in news]

jeff brubaker (je**@priva.com) writes:
Currently we have a database, and it is our desire for it to be able
to store millions of records. The data in the table can be divided up
by client, and it stores nothing but about 7 integers.
| table |
| id | clientId | int1 | int2 | int 3 | ... |

Right now, our benchmarks indicate a drastic increase in performance
if we divide the data into different tables. For example,
table_clientA, table_clientB, table_clientC, despite the fact the
tables contain the exact same columns. This however does not seem very
clean or elegant to me, and rather illogical since a database exists
as a single file on the harddrive.
...

Is there anyway to duplicate this increase in database performance
gained by splitting the table, perhaps by using a certain type of
index?

It is not implausible, but with out further knowledge of your tables
and the benchmark queries, it is impossible to tell.

You could get a more informative answer, if you posted:

o The CREATE TABLE statements (both for the unpartitioned table,
and the partitioned table).
o Any indexes on the tables.
o The queries you use for the benchmark.
o If you have scripts that generates data for the benchmarks, that
would extremely useful. (Provided that they reasonably small.)

Which client did you use for the benchmark? Query Analyzer?

Okay, sorry for the delay. Here is a SQL Script to setup the
experiment. Basically it builts one table with 100,000 records, and 10
small tables with 10,000 records. To select all the records from
table_5 is substantially faster than selecting all the records from
bigTable where clientID = 5

SET NOCOUNT ON
/* Drop any tables that might exist */

if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[bigTable]') and OBJECTPROPERTY(id, N'IsUserTable')
= 1)
drop table [dbo].[bigTable]
if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[table_0]') and OBJECTPROPERTY(id, N'IsUserTable') =
1)
drop table [dbo].[table_0]
if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[table_1]') and OBJECTPROPERTY(id, N'IsUserTable') =
1)
drop table [dbo].[table_1]
if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[table_2]') and OBJECTPROPERTY(id, N'IsUserTable') =
1)
drop table [dbo].[table_2]
if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[table_3]') and OBJECTPROPERTY(id, N'IsUserTable') =
1)
drop table [dbo].[table_3]
if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[table_4]') and OBJECTPROPERTY(id, N'IsUserTable') =
1)
drop table [dbo].[table_4]
if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[table_5]') and OBJECTPROPERTY(id, N'IsUserTable') =
1)
drop table [dbo].[table_5]
if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[table_6]') and OBJECTPROPERTY(id, N'IsUserTable') =
1)
drop table [dbo].[table_6]
if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[table_7]') and OBJECTPROPERTY(id, N'IsUserTable') =
1)
drop table [dbo].[table_7]
if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[table_8]') and OBJECTPROPERTY(id, N'IsUserTable') =
1)
drop table [dbo].[table_8]
if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[table_9]') and OBJECTPROPERTY(id, N'IsUserTable') =
1)
drop table [dbo].[table_9]
/* Create the tables */

CREATE TABLE [dbo].[bigTable] (
[id1] [int] NULL ,
[id2] [int] NULL ,
[id3] [int] NULL ,
[id4] [int] NULL ,
[id5] [int] NULL ,
[id6] [int] NULL ,
[id7] [int] NULL ,
[id8] [int] NULL ,
[id9] [int] NULL ,
[id10] [int] NULL ,
[id11] [int] NULL ,
[id12] [int] NULL ,
[id13] [int] NULL ,
[id14] [int] NULL ,
[id15] [int] NULL ,
[id16] [int] NULL ,
[id17] [int] NULL ,
[id18] [int] NULL ,
[id19] [int] NULL ,
[id110] [int] NULL ,
[clientid] [int] NULL
) ON [PRIMARY]

CREATE TABLE table_1 (
[id1] [int] NULL ,
[id2] [int] NULL ,
[id3] [int] NULL ,
[id4] [int] NULL ,
[id5] [int] NULL ,
[id6] [int] NULL ,
[id7] [int] NULL ,
[id8] [int] NULL ,
[id9] [int] NULL ,
[id10] [int] NULL ,
[id11] [int] NULL ,
[id12] [int] NULL ,
[id13] [int] NULL ,
[id14] [int] NULL ,
[id15] [int] NULL ,
[id16] [int] NULL ,
[id17] [int] NULL ,
[id18] [int] NULL ,
[id19] [int] NULL ,
[id20] [int] NULL ,
) ON [PRIMARY]

CREATE TABLE table_2 (
[id1] [int] NULL ,
[id2] [int] NULL ,
[id3] [int] NULL ,
[id4] [int] NULL ,
[id5] [int] NULL ,
[id6] [int] NULL ,
[id7] [int] NULL ,
[id8] [int] NULL ,
[id9] [int] NULL ,
[id10] [int] NULL ,
[id11] [int] NULL ,
[id12] [int] NULL ,
[id13] [int] NULL ,
[id14] [int] NULL ,
[id15] [int] NULL ,
[id16] [int] NULL ,
[id17] [int] NULL ,
[id18] [int] NULL ,
[id19] [int] NULL ,
[id20] [int] NULL ,
) ON [PRIMARY]
CREATE TABLE table_3 (
[id1] [int] NULL ,
[id2] [int] NULL ,
[id3] [int] NULL ,
[id4] [int] NULL ,
[id5] [int] NULL ,
[id6] [int] NULL ,
[id7] [int] NULL ,
[id8] [int] NULL ,
[id9] [int] NULL ,
[id10] [int] NULL ,
[id11] [int] NULL ,
[id12] [int] NULL ,
[id13] [int] NULL ,
[id14] [int] NULL ,
[id15] [int] NULL ,
[id16] [int] NULL ,
[id17] [int] NULL ,
[id18] [int] NULL ,
[id19] [int] NULL ,
[id20] [int] NULL ,
) ON [PRIMARY]
CREATE TABLE table_4 (
[id1] [int] NULL ,
[id2] [int] NULL ,
[id3] [int] NULL ,
[id4] [int] NULL ,
[id5] [int] NULL ,
[id6] [int] NULL ,
[id7] [int] NULL ,
[id8] [int] NULL ,
[id9] [int] NULL ,
[id10] [int] NULL ,
[id11] [int] NULL ,
[id12] [int] NULL ,
[id13] [int] NULL ,
[id14] [int] NULL ,
[id15] [int] NULL ,
[id16] [int] NULL ,
[id17] [int] NULL ,
[id18] [int] NULL ,
[id19] [int] NULL ,
[id20] [int] NULL ,
) ON [PRIMARY]
CREATE TABLE table_5 (
[id1] [int] NULL ,
[id2] [int] NULL ,
[id3] [int] NULL ,
[id4] [int] NULL ,
[id5] [int] NULL ,
[id6] [int] NULL ,
[id7] [int] NULL ,
[id8] [int] NULL ,
[id9] [int] NULL ,
[id10] [int] NULL ,
[id11] [int] NULL ,
[id12] [int] NULL ,
[id13] [int] NULL ,
[id14] [int] NULL ,
[id15] [int] NULL ,
[id16] [int] NULL ,
[id17] [int] NULL ,
[id18] [int] NULL ,
[id19] [int] NULL ,
[id20] [int] NULL ,
) ON [PRIMARY]
CREATE TABLE table_6 (
[id1] [int] NULL ,
[id2] [int] NULL ,
[id3] [int] NULL ,
[id4] [int] NULL ,
[id5] [int] NULL ,
[id6] [int] NULL ,
[id7] [int] NULL ,
[id8] [int] NULL ,
[id9] [int] NULL ,
[id10] [int] NULL ,
[id11] [int] NULL ,
[id12] [int] NULL ,
[id13] [int] NULL ,
[id14] [int] NULL ,
[id15] [int] NULL ,
[id16] [int] NULL ,
[id17] [int] NULL ,
[id18] [int] NULL ,
[id19] [int] NULL ,
[id20] [int] NULL ,
) ON [PRIMARY]
CREATE TABLE table_7 (
[id1] [int] NULL ,
[id2] [int] NULL ,
[id3] [int] NULL ,
[id4] [int] NULL ,
[id5] [int] NULL ,
[id6] [int] NULL ,
[id7] [int] NULL ,
[id8] [int] NULL ,
[id9] [int] NULL ,
[id10] [int] NULL ,
[id11] [int] NULL ,
[id12] [int] NULL ,
[id13] [int] NULL ,
[id14] [int] NULL ,
[id15] [int] NULL ,
[id16] [int] NULL ,
[id17] [int] NULL ,
[id18] [int] NULL ,
[id19] [int] NULL ,
[id20] [int] NULL ,
) ON [PRIMARY]
CREATE TABLE table_8 (
[id1] [int] NULL ,
[id2] [int] NULL ,
[id3] [int] NULL ,
[id4] [int] NULL ,
[id5] [int] NULL ,
[id6] [int] NULL ,
[id7] [int] NULL ,
[id8] [int] NULL ,
[id9] [int] NULL ,
[id10] [int] NULL ,
[id11] [int] NULL ,
[id12] [int] NULL ,
[id13] [int] NULL ,
[id14] [int] NULL ,
[id15] [int] NULL ,
[id16] [int] NULL ,
[id17] [int] NULL ,
[id18] [int] NULL ,
[id19] [int] NULL ,
[id20] [int] NULL ,
) ON [PRIMARY]
CREATE TABLE table_9 (
[id1] [int] NULL ,
[id2] [int] NULL ,
[id3] [int] NULL ,
[id4] [int] NULL ,
[id5] [int] NULL ,
[id6] [int] NULL ,
[id7] [int] NULL ,
[id8] [int] NULL ,
[id9] [int] NULL ,
[id10] [int] NULL ,
[id11] [int] NULL ,
[id12] [int] NULL ,
[id13] [int] NULL ,
[id14] [int] NULL ,
[id15] [int] NULL ,
[id16] [int] NULL ,
[id17] [int] NULL ,
[id18] [int] NULL ,
[id19] [int] NULL ,
[id20] [int] NULL ,
) ON [PRIMARY]
CREATE TABLE table_0 (
[id1] [int] NULL ,
[id2] [int] NULL ,
[id3] [int] NULL ,
[id4] [int] NULL ,
[id5] [int] NULL ,
[id6] [int] NULL ,
[id7] [int] NULL ,
[id8] [int] NULL ,
[id9] [int] NULL ,
[id10] [int] NULL ,
[id11] [int] NULL ,
[id12] [int] NULL ,
[id13] [int] NULL ,
[id14] [int] NULL ,
[id15] [int] NULL ,
[id16] [int] NULL ,
[id17] [int] NULL ,
[id18] [int] NULL ,
[id19] [int] NULL ,
[id20] [int] NULL ,
) ON [PRIMARY]
DECLARE @countPerClient int
SET @countPerClient = 10000
DECLARE @counter int
SET @counter = 1

/* Fill the big table with the 10 clients */

WHILE (@counter <= @countPerClient)
BEGIN
INSERT bigTable (clientId) VALUES (0)
SET @counter = @counter + 1
END
SET @counter=1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT bigTable (clientId) VALUES (1)
SET @counter = @counter + 1
END
SET @counter=1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT bigTable (clientId) VALUES (2)
SET @counter = @counter + 1
END
SET @counter=1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT bigTable (clientId) VALUES (3)
SET @counter = @counter + 1
END
SET @counter=1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT bigTable (clientId) VALUES (4)
SET @counter = @counter + 1
END
SET @counter=1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT bigTable (clientId) VALUES (5)
SET @counter = @counter + 1
END
SET @counter=1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT bigTable (clientId) VALUES (6)
SET @counter = @counter + 1
END
SET @counter=1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT bigTable (clientId) VALUES (7)
SET @counter = @counter + 1
END
SET @counter=1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT bigTable (clientId) VALUES (8)
SET @counter = @counter + 1
END
SET @counter=1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT bigTable (clientId) VALUES (9)
SET @counter = @counter + 1
END
/* Fill each of the table with 1 clients */
SET @counter = 1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT table_1 DEFAULT VALUES
SET @counter = @counter + 1
END

SET @counter = 1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT table_2 DEFAULT VALUES
SET @counter = @counter + 1
END
SET @counter = 1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT table_3 DEFAULT VALUES
SET @counter = @counter + 1
END
SET @counter = 1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT table_4 DEFAULT VALUES
SET @counter = @counter + 1
END
SET @counter = 1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT table_5 DEFAULT VALUES
SET @counter = @counter + 1
END
SET @counter = 1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT table_6 DEFAULT VALUES
SET @counter = @counter + 1
END
SET @counter = 1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT table_7 DEFAULT VALUES
SET @counter = @counter + 1
END
SET @counter = 1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT table_8 DEFAULT VALUES
SET @counter = @counter + 1
END

SET @counter = 1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT table_9 DEFAULT VALUES
SET @counter = @counter + 1
END
SET @counter = 1
WHILE (@counter <= @countPerClient)
BEGIN
INSERT table_0 DEFAULT VALUES
SET @counter = @counter + 1
END
GO
/* Now time for the queries */
DECLARE @x datetime
SELECT @x = GetDate()
select count(*) from table_5
SELECT 'Split Tables' as label,DateDiff(millisecond, @x, GetDate())

SELECT @x = GetDate()
select count(*) from bigTable where clientId=5
SELECT 'Big Table' as label,DateDiff(millisecond, @x, GetDate())

Jul 20 '05 #4

Erland Sommarskog

[posted and mailed, please reply in news]

jeff brubaker (je**@priva.com) writes:

Okay, sorry for the delay. Here is a SQL Script to setup the
experiment. Basically it builts one table with 100,000 records, and 10
small tables with 10,000 records. To select all the records from
table_5 is substantially faster than selecting all the records from
bigTable where clientID = 5

Yes, since there is no index at all at your tables, this is not
strange.

First, I had to increase the number of rows per client to 100000 to
get a significant difference.

When I had run the first test, I ran these two statements:

CREATE CLUSTERED INDEX clientid_ix on bigTable (clientId)
go
DBCC DROPCLEANBUFFERS

The first statement builds an index on bigTable.clientId. The second
just cleans out the cache, so that all data will be read from disk.
(Don't do this on a production machine!)

I then ran the benchmarks. table_5 was still faster with 450 ms, where
as the SELECT COUNT(*) from bigTable needed 563 ms. However, on
successive runs, table_5 took 110 ms, where as the read from bigTable
was 16 ms.

Before you consider advanced techniques like splitting tables, you should
make sure that you have a sound index strategy. A clustered index has
the data as its leaf pages, so after adding the clustered index,
bigTable is really like table_1 to table_9 glued together.
--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #5

Similar topics

Linking an Mdb to SQL Server

by: diskoduro | last post by:

Hi!! Years ago I built a database to control the production of a little factory. The users wanted to work in a Windows Net workgroup so I created an mdb with all the tables and data an after...

Microsoft Access / VBA

SQL Server 100MB

by: Kevin Lawrence | last post by:

Hi Anyone know how much 100MB is in SQL Server 2000? Is it a lot? Thanks Kev

ASP.NET

Difference between client server and shared access

by: James | last post by:

Can someone explain the fundamental difference between creating a "multi user" version of an Access DB and creating a client-server Access DB? ie why can't all the users on my network just click...

Microsoft Access / VBA

Need design advice. What's my best approach for storing this data?

by: Mudcat | last post by:

Hi, I am trying to build a tool that analyzes stock data. Therefore I am going to download and store quite a vast amount of it. Just for a general number - assuming there are about 7000 listed...

Python

Download the JAVA , .NET and SQL Server interview with answers

by: Jobs | last post by:

Download the JAVA , .NET and SQL Server interview with answers Download the JAVA , .NET and SQL Server interview sheet and rate yourself. This will help you judge yourself are you really worth of...

C# / C Sharp

strategy question regarding storing of configuration data

by: KarlM | last post by:

After reading some articles regarding confuguration data I'm a bit confused. Where is the right place for storing configuration data? - XML-files? - registry? - INI-files? (from a users point...

C# / C Sharp

Sql server 2000 paging

by: Ilyas | last post by:

Hi all I need to implmenet paging across different tables. The tables all have a different name eg Data01, data02 data03 etc, however they are columns which are common to each table, but each...

ASP.NET

how to secure documents in server

by: RAZZ | last post by:

Hello, Can anyone suggest me solution? I Need to manage different types of documents (doc,xls,ppt etc) in server. I have folder structure to maintain these documents in server. Say folder1 is...

PHP

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing