473,774 Members | 2,129 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Horizontal Partitioning question

I recently came across a database where the data are horizonally partitioned
into 4 tables. I'm not sure if this was a poor design choice, or if it was
done for valid performance reasons. The schema of the tables are essentially
the same, it's just that they are named differenly and the columns are named
differenlty to differentiate the data from a business usage perspective. The
tables could easily be combined inot one by adding a new colum to the
clustered index that would be used to differentiate the business usage. I am
trying to evaluate whether combining the tables would improve performance or
if it would be better to leave them the way they are. Many queries that run
against these tables do not request records from more than one of the
tables, which is good. However, there are a number of processes that query
against all of the tables on the identical clustered index range. I am not
sure exactly how many rows are in the tables but I'm fairly certain the
entire database is < 50 GB.
Jul 20 '05 #1
13 1603
Hi

You don't say if they have been set up as a partitioned view, but your
comment about business usage would tend to imply they haven't? If they
haven't then this would be the change I would look at first, especially if
the growth rate of the system would indicate federation will be necessary

If only a small percentage of queries access all the tables, then this may
also indicate there is a performance benefit. If the tables are on different
filegroups and are on different disc subsystems then performance may have
been a valid reason to split them up.

Without being there when the decission to partition them was made, you will
not know the underlying stats or reasons for this design, and I would bet
they have not been documented!

If you are going to combine them, then create a benchmark test so that you
can compare each configuration, and test the two alternatives in a
controlled environment. If you can't do that, then unless there is a
specific reason to change what is already working (and perfoming well!) then
I wouldn't.

John

"MissLivvy" <Xe************ *******@yahoo.c om> wrote in message
news:DS******** **********@news read1.news.pas. earthlink.net.. .
I recently came across a database where the data are horizonally
partitioned
into 4 tables. I'm not sure if this was a poor design choice, or if it was
done for valid performance reasons. The schema of the tables are
essentially
the same, it's just that they are named differenly and the columns are
named
differenlty to differentiate the data from a business usage perspective.
The
tables could easily be combined inot one by adding a new colum to the
clustered index that would be used to differentiate the business usage. I
am
trying to evaluate whether combining the tables would improve performance
or
if it would be better to leave them the way they are. Many queries that
run
against these tables do not request records from more than one of the
tables, which is good. However, there are a number of processes that query
against all of the tables on the identical clustered index range. I am not
sure exactly how many rows are in the tables but I'm fairly certain the
entire database is < 50 GB.

Jul 20 '05 #2
>> I recently came across a database where the data are horizonally
partitioned into 4 tables. I'm not sure if this was a poor design
choice, or if it was
done for valid performance reasons. <<

Without knowing any more than that, the smart would bet on poor design
...
The schema of the tables are essentially the same, it's just that they are named differenly and the columns are named differenlty to
differentiate the data from a business usage perspective. <<

Here we MAY have a valid design reason. Is the data logically
different in each case? Not just a status change (paid versus unpaid
bills, etc.), really different? If not, then this is a mess.
The tables could easily be combined inot one by adding a new column to the clustered index that would be used to differentiate the
business usage. <<

Bingo! No logical differences, no separate tables in the data model.
I am trying to evaluate whether combining the tables would improve performance or if it would be better to leave them the way they are.
<<

Performance is a secondary issue. Correctness and removing redudant
data element name is the first issue. Make it right, then make it
fast.
Many queries that run against these tables do not request records

[sic] from more than one of the tables, which is good. However, there
are a number of processes that query against all of the tables on the
identical clustered index range. I am not sure exactly how many rows
are in the tables but I'm fairly certain the entire database is < 50
GB. <<

Write some VIEWs on the data. Performance with a clustered index
starting on the status column will be fine.
Jul 20 '05 #3

You don't say if they have been set up as a partitioned view, but your
comment about business usage would tend to imply they haven't?
Correct. There is no partitioned view. I don't think the current design
lends itself to that since there is currenlty no column that could be used
for the check constraint. There exist data spread across all tables with the
same primary key. Data with the same PK are logically related from a
business perspective. To create a check constraint, I think we'd have to add
another column like the one I mention below.
specific reason to change what is already working (and perfoming well!) then

Peformance is definately a problem though with operations that need to query
against all of the tables at the same time. For example, one thing that
users routinely need to do is copy a large range of rows from all of the
tables and insert them back into the same tables (with a new PK, of course).
I will try to find out if different filegroups were used for the different
tables, but I'm guessing this is not the case.

In my case, since sometimes we need to acess all of the tables at once, and
sometimes not, what I need to do is measure the tradeoff between improved
performance in situations where only 1 of the tables need accessed, vs the
penaly paid when all tables need to be accessed. My gut feeling is that
increase in time spent traversing the B-tree in the combined table should be
less significant than the penalty paid for having the data split up when we
need to access all tables at the same time. But again, I really need to
measure this.

Thanks.
"MissLivvy" <Xe************ *******@yahoo.c om> wrote in message
news:DS******** **********@news read1.news.pas. earthlink.net.. .
I recently came across a database where the data are horizonally
partitioned
into 4 tables. I'm not sure if this was a poor design choice, or if it was done for valid performance reasons. The schema of the tables are
essentially
the same, it's just that they are named differenly and the columns are
named
differenlty to differentiate the data from a business usage perspective.
The
tables could easily be combined inot one by adding a new colum to the
clustered index that would be used to differentiate the business usage. I am
trying to evaluate whether combining the tables would improve performance or
if it would be better to leave them the way they are. Many queries that
run
against these tables do not request records from more than one of the
tables, which is good. However, there are a number of processes that query against all of the tables on the identical clustered index range. I am not sure exactly how many rows are in the tables but I'm fairly certain the
entire database is < 50 GB.


Jul 20 '05 #4
> Bingo! No logical differences, no separate tables in the data model.

Hum, may I expose one problem I had. I have been in charge of redesigning a
database. This database contained a table called Directories that contained
the absolute path of some folders frequently used in other tables. There was
a need to differentiate three kind of folders : input, output and binary
folders. The goal was to use nick names of the folders in other tables. So I
had this schema :

Directories
nick_name varchar(20)
type byte //0: input, 1: output, 2:binary
path varchar(1000)
primary key(nick_name, type)

Jobs
input_folder
output_folder
binary_folder
I have been told that this was not a good design because I was not able to
link the Jobs table to the Directories one (the join would require a
constant. For example, input_folder is the nick_name, the type is 0).
The way to solve the problem was to create 3 different tables
InputDirectorie s, OutputDirectori es and BinaryDirectori es and to link the
Jobs table to those 3 directories.

What is best design ?

--
Vincent
Jul 20 '05 #5
MissLivvy (Xe************ *******@yahoo.c om) writes:
In my case, since sometimes we need to acess all of the tables at once,
and sometimes not, what I need to do is measure the tradeoff between
improved performance in situations where only 1 of the tables need
accessed, vs the penaly paid when all tables need to be accessed. My gut
feeling is that increase in time spent traversing the B-tree in the
combined table should be less significant than the penalty paid for
having the data split up when we need to access all tables at the same
time. But again, I really need to measure this.


One option would be to retain the tables, and then build an indexed view
that combines them. Of course, this will double the disk space, and also
come with a cost for updates. But if the main activity is querying, this
could be the best of both words.

Note: to be able to fully use indexed views, you need Enterprise Edition.

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #6
Thanks Erland.
Yes there is a lot of inserting and updating going on with these tables, so
I think we'd be paying too high a price for the querying benefit of the
indexed view.
"Erland Sommarskog" <es****@sommars kog.se> wrote in message
news:Xn******** *************@1 27.0.0.1...
MissLivvy (Xe************ *******@yahoo.c om) writes:
In my case, since sometimes we need to acess all of the tables at once,
and sometimes not, what I need to do is measure the tradeoff between
improved performance in situations where only 1 of the tables need
accessed, vs the penaly paid when all tables need to be accessed. My gut
feeling is that increase in time spent traversing the B-tree in the
combined table should be less significant than the penalty paid for
having the data split up when we need to access all tables at the same
time. But again, I really need to measure this.


One option would be to retain the tables, and then build an indexed view
that combines them. Of course, this will double the disk space, and also
come with a cost for updates. But if the main activity is querying, this
could be the best of both words.

Note: to be able to fully use indexed views, you need Enterprise Edition.

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #7
What about:

Directories
nick_name varchar(20)
type byte //0: input, 1: output, 2:binary
path varchar(1000)
primary key(nick_name, type)

Job
(JobID int primary key,
JobName varchar(20)
)

Job_Directory
(JobID int,
nickname varchar(20),
type (byte)
)
with PK on JobID + nickname + type

"Vincent Lascaux" <no****@nospam. org> wrote in message
news:41******** *************** @news.free.fr.. .
Bingo! No logical differences, no separate tables in the data model.
Hum, may I expose one problem I had. I have been in charge of redesigning

a database. This database contained a table called Directories that contained the absolute path of some folders frequently used in other tables. There was a need to differentiate three kind of folders : input, output and binary
folders. The goal was to use nick names of the folders in other tables. So I had this schema :

Directories
nick_name varchar(20)
type byte //0: input, 1: output, 2:binary
path varchar(1000)
primary key(nick_name, type)

Jobs
input_folder
output_folder
binary_folder
I have been told that this was not a good design because I was not able to
link the Jobs table to the Directories one (the join would require a
constant. For example, input_folder is the nick_name, the type is 0).
The way to solve the problem was to create 3 different tables
InputDirectorie s, OutputDirectori es and BinaryDirectori es and to link the
Jobs table to those 3 directories.

What is best design ?

--
Vincent

Jul 20 '05 #8
>For example, one thing that
users routinely need to do is copy a large range of rows from all of
the
tables and insert them back into the same tables (with a new PK, of
course).

This seems to me like a lot of redundant data will get created
needlessly. It is probably why the db is +50 gig in size. Also a good
indication of poor design. Is this data historic or frequently
updated? if it is historic and is not changed (like a pos sales
record) Why copy the data around so much?

Jul 20 '05 #9
> Directories
nick_name varchar(20)
type byte //0: input, 1: output, 2:binary
path varchar(1000)
primary key(nick_name, type)

Job
(JobID int primary key,
JobName varchar(20)
)

Job_Directory
(JobID int,
nickname varchar(20),
type (byte)
)
with PK on JobID + nickname + type


Considering that any job has one and exactly one path of each type, you have
a 1-3 relationship. I dont know if it is better than 1-1, that I heard is
bad :)
And it makes the SQL queries more complex to write (for no added value)

--
Vincent
Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1375
by: sumGirl | last post by:
Hello all, Thinking about building a new database in the enterprise addition of sql server and using some horizontal parititioning techniques in order to accomaodat what will eventually be a monster huge database. Can you share some hard earned experience, gotchas, etc...with me? We will be setting up this server on a SAN array that will be made up of just one or two huge virtual RAID10 volumes and I am also wondering about the wisdom...
8
4375
by: Duffey, Kevin | last post by:
We are looking for information regarding any capabilities of PostgreSQL in regards to scalability. Ideally we want to be able to scale in both directions. What sort of solutions are out there for either or both directions of scalability? Specifically, open-source solutions would be most in need, but commercial applications are fine as well. Thank you. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system...
18
6364
by: Jeff Boes | last post by:
I'm sure this is a concept that's been explored here. I have a table (fairly simple, just two columns, one of which is a 32-digit checksum) with several million rows (currently, about 7 million). About a million times a day we do select * from my_table where md5 = ? to verify presence or absence of the row, and base further processing on that information.
7
7265
by: Jane | last post by:
In Oracle we can partition a table as follows. What is the equivalent in DB2? CREATE TABLE sales_list (salesman_id NUMBER(5), salesman_name VARCHAR2(30), sales_state VARCHAR2(20), sales_amount NUMBER(10), sales_date DATE) PARTITION BY LIST(sales_state) (
10
3563
by: shsandeep | last post by:
DB2 V8.2 (not Viper yet and no range partitioning!!) I have created a table T1 (col1, col2) with col1 as the primary key. When I try to create a partitioning key on col2, it gives me error that it should have all primary keys included. So, I created table T1 again with col2 as the partitioning key. Now, I do not have col1 as the primary key. When I try to create col1 as the primary key, I get the following error: 1 The primary key, each...
8
3062
by: mitek | last post by:
Hi, All I have strange situation with table design for DB2 9.1 on Windows I have 3 tables with same structure : 1 table - is MDC 2 table - is partitioned MDC table 3 table - is compressed partitioned MDC table Each table in separate DMS tablespace
0
1999
by: harrylarenson | last post by:
Hi, Happy New Year. I am trying to insert a query to a partitioned view but the error is : Server: Msg 4436, Level 16, State 12, Line 1 UNION ALL view 'T' is not updatable because a partitioning column was not found. Here is all my query statements : create table t1 (ID int primary key, Code int check(code between 1 and 15 ))
15
2929
by: Woody Ling | last post by:
I am starting to config a 64 bits DB2 in IBM 595 AIX box with 2 dual core CPU and I would like to assigned one 'processor' for one db partition. Should I config it as a 4 nodes or 2 nodes instances? How about other setting such as IO cleaner, Default degree etc?
0
1880
by: Nate Eaton | last post by:
According to the original whitepaper on UDB range partitioning (http:// www-106.ibm.com/developerworks/db2/library/techarticle/0202zuzarte/ 0202zuzarte.pdf), you can use a range as a criteria, either in the UNION ALL view or in constraints. One restriction it lists, though, is that the optimizer can't use a constraint that references a range versus a discrete value for SQL containing host variables or parameter markers. That was for V7 and...
2
2019
by: mandor | last post by:
Hello, I need some advise in table design, and more specifically about table partitioning. I read some papers and there was mentioned that if a table is expected to hold millions of rows, it's a good idea to partition it. Vertical partitioning, as I understood it, is separating data that differs in some way in a separate table, adding a key field as an identifier to what segment it belongs. The particular table holds signal measurements...
0
9621
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10267
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10040
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9914
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6717
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5484
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4012
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3611
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2852
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.