473,890 Members | 1,379 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Table partitioning for maximum speed?

I'm sure this is a concept that's been explored here. I have a table
(fairly simple, just two columns, one of which is a 32-digit checksum)
with several million rows (currently, about 7 million). About a million
times a day we do

select * from my_table where md5 = ?

to verify presence or absence of the row, and base further processing on
that information.

The idea bandied about now is to partition this table into 16 (or 256,
or ...) chunks by first digit (or 2, or ...). In the simplest case, this
would mean:

create table my_table_0 as select * from my_table where md5 like '0%';

create table my_table_1 as select * from my_table where md5 like '1%';

....

create table my_table_f as select * from my_table where md5 like 'f%';
Then change the code to examine the checksum and create a query to the
appropriate table based on the first digit.

Obviously, this is conceptually similar to what the index on the "md5"
column is supposed to do for us. However, partitioning moves just a
little of the processing load off the database server and onto the
machine running the application. That's important, because we can afford
more application machines as load increases, but we can't as easily
upgrade the database server.

Will a query against a table of 0.5 million rows beat a query against a
table of 7 million rows by a margin that makes it worth the hassle of
supporting 15 "extra" tables?

--
Jeff Boes vox 269.226.9550 ext 24
Database Engineer fax 269.349.9076
Nexcerpt, Inc. http://www.nexcerpt.com
...Nexcerpt... Extend your Expertise

Nov 12 '05
18 6372
Jeff Boes wrote:
Obviously, this is conceptually similar to what the index on the "md5"
column is supposed to do for us. However, partitioning moves just a
little of the processing load off the database server and onto the
machine running the application. That's important, because we can afford
more application machines as load increases, but we can't as easily
upgrade the database server.

Will a query against a table of 0.5 million rows beat a query against a
table of 7 million rows by a margin that makes it worth the hassle of
supporting 15 "extra" tables?


I don't think 16 tables on the same server will help, but if you already
have your app tier physically separate from the database tier, you could
partition your data to more than one database server based on the first
byte of the md5 column. I designed and built something similar a few
years ago. We never got to the point where we really needed that kind of
scalability, but it worked pretty well in (limited) testing.

Joe
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 12 '05 #11
BULL.

How many times does PG have to scan the whole table because of MVCC?
At least with partitioning there is a fighting chance that that won't be
necessary.
Queries that involve the field on which the table is partitioned execute
faster by an order of magnitude.
It also helps with vaccuming as PG can vaccum only one partition at a
time.
I have 17M row table where all records get frequently updated over a
year.
I would do my own partitioning with inheritance if it was not broken.
Partitioning would be a BIG plus in my book. So would visibility of
records but that is another fight.

JLL

Vivek Khera wrote:
>> "JB" == Jeff Boes <jb***@nexcerpt .com> writes:


JB> Will a query against a table of 0.5 million rows beat a query against
JB> a table of 7 million rows by a margin that makes it worth the hassle
JB> of supporting 15 "extra" tables?

I think you'll be better off with a single table, as you won't have
contention for the index pages in the cache.

One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
index bloat, which will make them not fit in cache. Just check the
size of your index in the pg_class table, and when it gets big,
reindex (assuming you do lots of updates/inserts to the table).

Your table splitting solution sounds like something I'd do if I were
forced to use mysql ;-)

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: kh***@kciLink.c om Rockville, MD +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #12
Jean-Luc Lachance wrote:

BULL.

How many times does PG have to scan the whole table because of MVCC?
At least with partitioning there is a fighting chance that that won't be
necessary.
Queries that involve the field on which the table is partitioned execute
faster by an order of magnitude.
It also helps with vaccuming as PG can vaccum only one partition at a
time.
I have 17M row table where all records get frequently updated over a
year.
I would do my own partitioning with inheritance if it was not broken.
Partitioning would be a BIG plus in my book. So would visibility of
records but that is another fight.

I meant to say visibility of record in the index.


JLL

Vivek Khera wrote:
>>>> "JB" == Jeff Boes <jb***@nexcerpt .com> writes:


JB> Will a query against a table of 0.5 million rows beat a query against
JB> a table of 7 million rows by a margin that makes it worth the hassle
JB> of supporting 15 "extra" tables?

I think you'll be better off with a single table, as you won't have
contention for the index pages in the cache.

One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
index bloat, which will make them not fit in cache. Just check the
size of your index in the pg_class table, and when it gets big,
reindex (assuming you do lots of updates/inserts to the table).

Your table splitting solution sounds like something I'd do if I were
forced to use mysql ;-)

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: kh***@kciLink.c om Rockville, MD +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #13
Is this partitioning like the schemas mentioned here:
http://www.postgresql.org/docs/curre...-schemas.html? Would those
help and increase performance?

/B

----- Original Message -----
From: "Jean-Luc Lachance" <jl******@nsd.c a>
To: "Vivek Khera" <kh***@kcilink. com>
Cc: <pg***********@ postgresql.org>
Sent: Friday, October 10, 2003 14:23
Subject: Re: [GENERAL] Table partitioning for maximum speed?

BULL.

How many times does PG have to scan the whole table because of MVCC?
At least with partitioning there is a fighting chance that that won't be
necessary.
Queries that involve the field on which the table is partitioned execute
faster by an order of magnitude.
It also helps with vaccuming as PG can vaccum only one partition at a
time.
I have 17M row table where all records get frequently updated over a
year.
I would do my own partitioning with inheritance if it was not broken.
Partitioning would be a BIG plus in my book. So would visibility of
records but that is another fight.

JLL

Vivek Khera wrote:
>>>> "JB" == Jeff Boes <jb***@nexcerpt .com> writes:


JB> Will a query against a table of 0.5 million rows beat a query against JB> a table of 7 million rows by a margin that makes it worth the hassle
JB> of supporting 15 "extra" tables?

I think you'll be better off with a single table, as you won't have
contention for the index pages in the cache.

One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
index bloat, which will make them not fit in cache. Just check the
size of your index in the pg_class table, and when it gets big,
reindex (assuming you do lots of updates/inserts to the table).

Your table splitting solution sounds like something I'd do if I were
forced to use mysql ;-)

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: kh***@kciLink.c om Rockville, MD +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #14
>>>>> "JL" == Jean-Luc Lachance <jl******@nsd.c a> writes:
JL> BULL.
JL> How many times does PG have to scan the whole table because of MVCC?
JL> At least with partitioning there is a fighting chance that that won't be
JL> necessary.

Huh? His specific query was "WHERE md5 = '....'". Why on earth would
that force a sequence scan if it were an indexed column? Heck, the
btree index should rule out 15/16ths of the rows after the first
character comparison.
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 12 '05 #15
No. There is a big difference between schemas and partitioned tables.

The closest thing would be a bunch of tables enherited from a base
table.
Each tables having a common field all the same for a specific table.

The problem is that the planner cannot be aware of this and make better
use of the implicit key.

JLL
David Busby wrote:

Is this partitioning like the schemas mentioned here:
http://www.postgresql.org/docs/curre...-schemas.html? Would those
help and increase performance?

/B

----- Original Message -----
From: "Jean-Luc Lachance" <jl******@nsd.c a>
To: "Vivek Khera" <kh***@kcilink. com>
Cc: <pg***********@ postgresql.org>
Sent: Friday, October 10, 2003 14:23
Subject: Re: [GENERAL] Table partitioning for maximum speed?
BULL.

How many times does PG have to scan the whole table because of MVCC?
At least with partitioning there is a fighting chance that that won't be
necessary.
Queries that involve the field on which the table is partitioned execute
faster by an order of magnitude.
It also helps with vaccuming as PG can vaccum only one partition at a
time.
I have 17M row table where all records get frequently updated over a
year.
I would do my own partitioning with inheritance if it was not broken.
Partitioning would be a BIG plus in my book. So would visibility of
records but that is another fight.

JLL

Vivek Khera wrote:

>>>>> "JB" == Jeff Boes <jb***@nexcerpt .com> writes:

JB> Will a query against a table of 0.5 million rows beat a query against JB> a table of 7 million rows by a margin that makes it worth the hassle
JB> of supporting 15 "extra" tables?

I think you'll be better off with a single table, as you won't have
contention for the index pages in the cache.

One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
index bloat, which will make them not fit in cache. Just check the
size of your index in the pg_class table, and when it gets big,
reindex (assuming you do lots of updates/inserts to the table).

Your table splitting solution sounds like something I'd do if I were
forced to use mysql ;-)

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: kh***@kciLink.c om Rockville, MD +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #16
I was replying to the general comment that PG cannot profit from having
patitioned tables.

Vivek Khera wrote:
>> "JL" == Jean-Luc Lachance <jl******@nsd.c a> writes:


JL> BULL.
JL> How many times does PG have to scan the whole table because of MVCC?
JL> At least with partitioning there is a fighting chance that that won't be
JL> necessary.

Huh? His specific query was "WHERE md5 = '....'". Why on earth would
that force a sequence scan if it were an indexed column? Heck, the
btree index should rule out 15/16ths of the rows after the first
character comparison.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #17
No. There is a big difference between schemas and partitioned tables.

The closest thing would be a bunch of tables enherited from a base
table.
Each tables having a common field all the same for a specific table.

The problem is that the planner cannot be aware of this and make better
use of the implicit key.

JLL
David Busby wrote:

Is this partitioning like the schemas mentioned here:
http://www.postgresql.org/docs/curre...-schemas.html? Would those
help and increase performance?

/B

----- Original Message -----
From: "Jean-Luc Lachance" <jl******@nsd.c a>
To: "Vivek Khera" <kh***@kcilink. com>
Cc: <pg***********@ postgresql.org>
Sent: Friday, October 10, 2003 14:23
Subject: Re: [GENERAL] Table partitioning for maximum speed?
BULL.

How many times does PG have to scan the whole table because of MVCC?
At least with partitioning there is a fighting chance that that won't be
necessary.
Queries that involve the field on which the table is partitioned execute
faster by an order of magnitude.
It also helps with vaccuming as PG can vaccum only one partition at a
time.
I have 17M row table where all records get frequently updated over a
year.
I would do my own partitioning with inheritance if it was not broken.
Partitioning would be a BIG plus in my book. So would visibility of
records but that is another fight.

JLL

Vivek Khera wrote:

>>>>> "JB" == Jeff Boes <jb***@nexcerpt .com> writes:

JB> Will a query against a table of 0.5 million rows beat a query against JB> a table of 7 million rows by a margin that makes it worth the hassle
JB> of supporting 15 "extra" tables?

I think you'll be better off with a single table, as you won't have
contention for the index pages in the cache.

One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
index bloat, which will make them not fit in cache. Just check the
size of your index in the pg_class table, and when it gets big,
reindex (assuming you do lots of updates/inserts to the table).

Your table splitting solution sounds like something I'd do if I were
forced to use mysql ;-)

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: kh***@kciLink.c om Rockville, MD +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #18
I was replying to the general comment that PG cannot profit from having
patitioned tables.

Vivek Khera wrote:
>> "JL" == Jean-Luc Lachance <jl******@nsd.c a> writes:


JL> BULL.
JL> How many times does PG have to scan the whole table because of MVCC?
JL> At least with partitioning there is a fighting chance that that won't be
JL> necessary.

Huh? His specific query was "WHERE md5 = '....'". Why on earth would
that force a sequence scan if it were an indexed column? Heck, the
btree index should rule out 15/16ths of the rows after the first
character comparison.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #19

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
6075
by: Jay | last post by:
Hi I have a huge table with over 100million records and on regular basis ineed to delete nearly a million records and insert a million records. Currently I delete indexes before going through the process and recreate the indexes which takes a very very long time. IS there a way to disable indexes and re enable them after doing insert and delete by reindexing or anything of that sort? OR Is there an approach to append two tables with...
7
11975
by: Warren Wright | last post by:
Hello, We maintain a 175 million record database table for our customer. This is an extract of some data collected for them by a third party vendor, who sends us regular updates to that data (monthly). The original data for the table came in the form of a single, large text file, which we imported. This table contains name and address information on potential
10
9936
by: Bing Wu | last post by:
Hi Folks, I have a problem while creating a big table space. It reports error: SQL1139N The total size of the table space is too big Explanation: The size of the current table space is too big. The size of a REGULAR table space is limited to 0xFFFFFF (16777215) pages while the size of a TEMPORARY/LONG table space is limited to 2 tera bytes (2 TB). User Response: Check the diagnostic log file db2diag.log for details. Reduce the size...
1
2064
by: Mats Kling | last post by:
Hi all, We are logging approx. 3 million records every day into a history table. Last week we ran into the 64 GB limit in UDB 8 so we recreated the table with 8 k pagesize to get some breathingroom before we hit the 128 GB limit. We are considering partitioning and I just wanted to check with you that our proposal is the best one:
10
2401
by: Sumanth | last post by:
Hi, I have a table that I would like to partition. It has a column c1 which has 100 distinct values. I was planning to partition the table on column c1 using a partioned index, and then apply data partitioned secondary indexes on the table. I then read about partioned table spaces, How do I get the same behaviour as above by creating a partioned table space?Do I create the partion table space, create the
10
3568
by: shsandeep | last post by:
DB2 V8.2 (not Viper yet and no range partitioning!!) I have created a table T1 (col1, col2) with col1 as the primary key. When I try to create a partitioning key on col2, it gives me error that it should have all primary keys included. So, I created table T1 again with col2 as the partitioning key. Now, I do not have col1 as the primary key. When I try to create col1 as the primary key, I get the following error: 1 The primary key, each...
9
17566
by: Veeru71 | last post by:
Can someone point me to good documentation on 'WITH clause" ? (I couldn't get much out of Queries section from SQL Reference manual). We are getting better performance when we explicity use global temp tables to store intermediate results than using "WITH cluase" in our queries. Where does DB2 store the intermediate results if the query uses "WITH clause" ? Thanks
15
3697
by: Piero 'Giops' Giorgi | last post by:
Hi! I have a question: I already have a DB that uses partitions to divide data in US Counties, partitioned by state. Can I use TWO levels of partitioning? I mean... 3077 filegroups and 50 partition functions that address
2
2022
by: mandor | last post by:
Hello, I need some advise in table design, and more specifically about table partitioning. I read some papers and there was mentioned that if a table is expected to hold millions of rows, it's a good idea to partition it. Vertical partitioning, as I understood it, is separating data that differs in some way in a separate table, adding a key field as an identifier to what segment it belongs. The particular table holds signal measurements...
0
9979
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9826
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11234
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10925
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9640
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
7171
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
6058
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4682
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4276
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.