Buglist - PostgreSQL Database

Bo Lorentsen

Hi ...

I'm trying to convince my boss to use posgresql (I need RI, transactions
and views), but he keeps comparing the project to mysql. Until now, I
found the answers to he's questions on the www.postgresql.org page, but
now I'm lost :-)

Where do I find a list of bugs both found and solved, or will I need to
ask on the pgsql-bugs list to know the answer ?

Also have anyone tryed to compare the new transaction model in MySQL 4.x
to PostgreSQL ?

I'm looking forward to recive even more constructive arguements :-)

/BL
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #1

Subscribe Post Reply

4896

Roderick A. Anderson

On 19 Aug 2003, Bo Lorentsen wrote:

Also have anyone tryed to compare the new transaction model in MySQL 4.x
to PostgreSQL ?
Bo, I've recently started having to deal with MySQL. (Web sites
wanting/using php _need/have-to-have_ MySQL. Their words not mine.) And
from going from a "I dislike MySQL" to "I'm really hating MySQL" has been
getting easier and easier.
My dealings with MySQL are for the 3.xx version but I semi-followed a
thread on this several months ago so feel fully qualified to to throw in
my views. :-) My take on others research was that MySQL transaction
model is a bubble gum and bailing wire add on not an integral part of
MySQL. It _was_ tacked onto the top of the database so if either it or
MySQL failed you were likely to loose data.
I'm looking forward to recive even more constructive arguements :-)

How about "Friends don't let friends use MySQL"?

Hopefully others with a stonger knowledge will provide this.
Rod
--
"Open Source Software - Sometimes you get more than you paid for..."
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #2

Shridhar Daithankar

On 19 Aug 2003 at 13:32, Bo Lorentsen wrote:

Hi ...

I'm trying to convince my boss to use posgresql (I need RI, transactions
and views), but he keeps comparing the project to mysql. Until now, I
found the answers to he's questions on the www.postgresql.org page, but
now I'm lost :-)

Where do I find a list of bugs both found and solved, or will I need to
ask on the pgsql-bugs list to know the answer ?
Well, you could look at release notes. That contains lot of information. Of
course archives of pgsql-bugs and pgsql-patches are the ultimate unless you
plan to delve into CVS history and sources..

Also have anyone tryed to compare the new transaction model in MySQL 4.x
to PostgreSQL ?

Check this.. http://www.mysql.com/doc/en/ANSI_diff_Transactions.html

To me, it seems that their definition of transaction is limited to preventing
two guys writing to same row simaltenously, which is of course a limited view
of things.

Few major differences I can see right here. Correct me on mysql side.

- WAL log
- Transactabl DDLs
- Nested transactions coming soon to PG
- PITR coming soon to PG

I would love to see entire checklist but don't have any time to devote to
mysql.

Bye
Shridhar

--
Vulcans never bluff. -- Spock, "The Doomsday Machine", stardate 4202.1
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 11 '05 #3

Bo Lorentsen

On Tue, 2003-08-19 at 14:31, Shridhar Daithankar wrote:

Well, you could look at release notes. That contains lot of information. Of
course archives of pgsql-bugs and pgsql-patches are the ultimate unless you
plan to delve into CVS history and sources.. Ok, I just liked to find something like bugzilla, or an explanation to
how bugs are garantied to be visible. My boos like to compare this to
the Mysql model found on : http://bugs.mysql.com/bugstats
Check this.. http://www.mysql.com/doc/en/ANSI_diff_Transactions.html Hmm, it sound like they have transactions on OmniDB tables, but these
are really slow, and therefor they put much energy into advetising for
the MyISAM files (non transactional). Or, am I missing something.
To me, it seems that their definition of transaction is limited to preventing
two guys writing to same row simaltenously, which is of course a limited view
of things. This sounds like there MyISAM tables, or ???
Few major differences I can see right here. Correct me on mysql side.

- WAL log
- Transactabl DDLs Yes and lets add :
- Views
- subselects
- plperl, plsql, plpython, plXXX
- Nested transactions coming soon to PG
- PITR coming soon to PG Not good for argumenting with my boss about future :-)
I would love to see entire checklist but don't have any time to devote to
mysql.

I do understand, and its no pleasure either :-)

/BL
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #4

Tom Lane

"Shridhar Daithankar" <sh*****************@persistent.co.in> writes:

On 19 Aug 2003 at 13:32, Bo Lorentsen wrote:
Where do I find a list of bugs both found and solved, or will I need to
ask on the pgsql-bugs list to know the answer ?
Well, you could look at release notes. That contains lot of information. Of
course archives of pgsql-bugs and pgsql-patches are the ultimate unless you
plan to delve into CVS history and sources..

Also the pgsql-committers archives. (Personally, when I want to look at
a change history, I use cvs2cl to extract one from the CVS server.) The
release notes are a good high-level view, but if you want details you
need to look to the mailing list archives or CVS logs.

BTW, lots of people have the bad habit of reporting bugs on -general or
-hackers; there are significant bug fixes that never get a mention in
the -bugs list. So you really have to troll all the archives if you are
going to use the archives as your primary source. CVS change history
might be a better starting point.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #5

Shridhar Daithankar

On Tuesday 19 August 2003 18:59, Bo Lorentsen wrote:

My dealings with MySQL are for the 3.xx version but I semi-followed a
thread on this several months ago so feel fully qualified to to throw in
my views. :-) My take on others research was that MySQL transaction
model is a bubble gum and bailing wire add on not an integral part of
MySQL. It _was_ tacked onto the top of the database so if either it or
MySQL failed you were likely to loose data.

But this goes for 3.x have you tried 4.x and there InnoDB tables ?

Well, if you have some time and hardware to play around, I am sure we can
easily test them side by side.

Interested? This could take couple of weeks taking things to real world
workloads but could provide some insight to community.

If you have mysql 4.x and postgresql 7.4beta installed, you can easily setup
these things..

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #6

Shridhar Daithankar

On Tuesday 19 August 2003 19:12, Tom Lane wrote:

BTW, lots of people have the bad habit of reporting bugs on -general or
-hackers; there are significant bug fixes that never get a mention in
the -bugs list. So you really have to troll all the archives if you are
going to use the archives as your primary source. CVS change history
might be a better starting point.

Making pgsql-bugs a open to non-subscription but moderated list might be a
good idea. It really does not matter if a bug gets filed couple of days late
but having to have subscribe to another list could be ditterent.

Or have bugzilla setup somewhere. That way the tracking will be hell lot
visible to outside world..

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #7

Bo Lorentsen

On Tue, 2003-08-19 at 15:45, Shridhar Daithankar wrote:

Well, if you have some time and hardware to play around, I am sure we can
easily test them side by side. Interresting project, is this allowed :-)
Interested? This could take couple of weeks taking things to real world
workloads but could provide some insight to community. Yeps, but this is not the propper time, and I am not sure if I have the
knowledge what it takes.
If you have mysql 4.x and postgresql 7.4beta installed, you can easily setup
these things..

Yeps, but how to test ? And how do one test stability :-)

/BL
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 11 '05 #8

Bo Lorentsen

On Tue, 2003-08-19 at 15:47, Shridhar Daithankar wrote:

Or have bugzilla setup somewhere. That way the tracking will be hell lot
visible to outside world..

I agree on this, as it seems messy from outside not to be able to get an
overview of both solved and not solved bugs.

I know that as developer, this may not seem like a big problem, but it
will help non hackers to get an overview.

/BL
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #9

Tom Lane

Bo Lorentsen <bl@netgroup.dk> writes:

On Tue, 2003-08-19 at 14:37, Roderick A. Anderson wrote:
My take on others research was that MySQL transaction
model is a bubble gum and bailing wire add on not an integral part of
MySQL. It _was_ tacked onto the top of the database so if either it or
MySQL failed you were likely to loose data.
But this goes for 3.x have you tried 4.x and there InnoDB tables ?

It's still bolted on. The entire concept that "transactional integrity
is optional" is ludicrous, IMHO. "Integrity" and "optional" are
contradictory.

One thing you should ask about MySQL is where they keep the system's
metadata (catalog data). In Postgres it's under transactional control
just like everything else, which means it's (a) crash-safe and (b)
rollback-able. This is why all DDL changes are rollback-able in PG.
I honestly don't know what the corresponding arrangements are in MySQL
.... but I suspect that even in an all-InnoDB database, there is critical
system data that is outside the InnoDB table handler and thus not
transaction-safe.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 11 '05 #10

Lincoln Yeoh

At 03:56 PM 8/19/2003 +0200, Bo Lorentsen wrote:

On Tue, 2003-08-19 at 15:45, Shridhar Daithankar wrote:
Well, if you have some time and hardware to play around, I am sure we can
easily test them side by side.

Interresting project, is this allowed :-)
Interested? This could take couple of weeks taking things to real world
workloads but could provide some insight to community.

Yeps, but this is not the propper time, and I am not sure if I have the
knowledge what it takes.

Install an application that can use both DBs. Muck around with it. If you
can't tell the difference, then I'd say go with postgresql - transactions
isn't bolted on, quite a number of other design wins too. If you can tell
the difference and MySQL is better, many of us here would be interested to
know.

If you have mysql 4.x and postgresql 7.4beta installed, you can easily

setup
these things..

Yeps, but how to test ? And how do one test stability :-)

Do lots of concurrent updates and inserts and selects for a long time?

While you are testing and have no critical data you can do stuff like
pressing the reset button midway during a transaction, or have someone trip
over the power cord. Someone on this list was doing stuff like that to
Postgresql and he said it did pretty well.

I'm not saying postgresql will save you from that, but that's one way to
learn how fault tolerant a product is before the uncontrollable faults
appear. So far there have been quite a number of people with flaky RAM or
hardware and a large percentage of them don't seem to have lost much data
(well it's hard to be 100% sure ;) ).

Have fun!
Link.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 11 '05 #11

Lincoln Yeoh

At 03:59 PM 8/19/2003 +0200, Bo Lorentsen wrote:

On Tue, 2003-08-19 at 15:47, Shridhar Daithankar wrote:
Or have bugzilla setup somewhere. That way the tracking will be hell lot
visible to outside world..

I agree on this, as it seems messy from outside not to be able to get an
overview of both solved and not solved bugs.

I know that as developer, this may not seem like a big problem, but it
will help non hackers to get an overview.

AFAIK bugzilla requires mysql (for now).

I've recently installed it and if it can be easily made to work with
postgresql I'd like to know.

Link.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #12

Devrim GUNDUZ

Hi,

On Tue, 19 Aug 2003, Lincoln Yeoh wrote:

AFAIK bugzilla requires mysql (for now).

I've recently installed it and if it can be easily made to work with
postgresql I'd like to know.

https://bugzilla.redhat.com/bugzilla/index.cgi
Bugzilla News
===
January 1st, 2003
Current Red Hat version of Bugzilla using PostgreSQL code available for
download.
====

AFAIK RH runs bugzilla on PostgreSQL (or RHDB, whatever). The code is
available from there.

Regards,

--
Devrim GUNDUZ
de****@gunduz.org de***********@linux.org.tr
http://www.tdmsoft.com
http://www.gunduz.org
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #13

Bo Lorentsen

On Tue, 2003-08-19 at 16:03, Tom Lane wrote:

It's still bolted on. The entire concept that "transactional integrity
is optional" is ludicrous, IMHO. "Integrity" and "optional" are
contradictory. Good point. Also the problem of MyISAM and InnoDB RI :-)
One thing you should ask about MySQL is where they keep the system's
metadata (catalog data). In Postgres it's under transactional control
just like everything else, which means it's (a) crash-safe and (b)
rollback-able. This is why all DDL changes are rollback-able in PG.
I honestly don't know what the corresponding arrangements are in MySQL
... but I suspect that even in an all-InnoDB database, there is critical
system data that is outside the InnoDB table handler and thus not
transaction-safe.

Thats a really nice thing for temporary tables, but "point in time"
backup is a much stonger argument :-)

/BL
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #14

Bo Lorentsen

On Tue, 2003-08-19 at 16:20, Lincoln Yeoh wrote:

Install an application that can use both DBs. Muck around with it. If you
can't tell the difference, then I'd say go with postgresql - transactions
isn't bolted on, quite a number of other design wins too. If you can tell
the difference and MySQL is better, many of us here would be interested to
know. Ok, thanks, we may need to make a test utility, that is open and fair,
but thats a little hard, as PG have some featurs that MySQL does not.
Do lots of concurrent updates and inserts and selects for a long time? I do think I know why you say this :-)
Have fun!

I like to do this, but I'm not sure that I have the time needed. If I
have the time, and I get some results, I let you now.

/BL
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #15

Vivek Khera

>>>>> "BL" == Bo Lorentsen <bl@netgroup.dk> writes:

BL> Hi ...
BL> I'm trying to convince my boss to use posgresql (I need RI, transactions
BL> and views), but he keeps comparing the project to mysql. Until now, I
BL> found the answers to he's questions on the www.postgresql.org page, but
BL> now I'm lost :-)

My big reason to choose postgres was concurrency. My application has
transactions and updates all over the place, and even as a single
developer in the early stages, I could already see problems with
table-level locking that mysql was giving me. Who knows what would
have happened in production with hundreds of people hitting the db
simultaneously! The row-level locking in Postgres has made it
possible for an incredible number of simultaneous actions to be
carried out without any waiting for the users.

Try making a table grow beyond your file size limit in mysql. You
can't. Even if you use an OS with 64-bit file pointers (such as
FreeBSD) you can't grow your file beyond 4Gb in mysql (at least with
mysam tables -- dunno about innodb tables). In Postgres, it is
automagically handled for you.

The *only* drawbacks I find with postgres is the need to dump/restore
on major version updates and the need to vacuum the tables
regularly...

Tops on my wish list is that postgres automatically notice when a row
is no longer needed (all transactional references to it are gone) and
'free' it at that time, rather then needing a special scan to
determine the row is no longer needed and freeing it.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: kh***@kciLink.com Rockville, MD +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #16

Bruno Wolff III

On Tue, Aug 19, 2003 at 19:17:31 +0530,
Shridhar Daithankar <sh*****************@persistent.co.in> wrote:

Making pgsql-bugs a open to non-subscription but moderated list might be a
good idea. It really does not matter if a bug gets filed couple of days late
but having to have subscribe to another list could be ditterent.

All of the pgsql lists including bugs already work this way.

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #17

Joshua D. Drake

It's still bolted on. The entire concept that "transactional integrity
is optional" is ludicrous, IMHO. "Integrity" and "optional" are
contradictory.

Obviously you have never voted in a major election ;)
regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #18

Bruno Wolff III

On Tue, Aug 19, 2003 at 18:01:33 +0530,
Shridhar Daithankar <sh*****************@persistent.co.in> wrote:

Few major differences I can see right here. Correct me on mysql side.

- WAL log
- Transactabl DDLs
- Nested transactions coming soon to PG
- PITR coming soon to PG

Note that the last two are not coming soon as in 7.4, but as in maybe for 7.5.
Which I can't see being out in less than 6 months.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #19

Shridhar Daithankar

On Tuesday 19 August 2003 21:24, Bruno Wolff III wrote:

On Tue, Aug 19, 2003 at 19:17:31 +0530,

Shridhar Daithankar <sh*****************@persistent.co.in> wrote:
Making pgsql-bugs a open to non-subscription but moderated list might be
a good idea. It really does not matter if a bug gets filed couple of days
late but having to have subscribe to another list could be ditterent.

All of the pgsql lists including bugs already work this way.

No.. Ocassionally when somebody cross-posts and I reply, I get 'Post stalled
for modetaro because you are not member' types error. IIRC, the SQL list is
most frequent ones. I haven't seen any other list doing such stuff..

Anyway, good to know that it's done that way..

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 11 '05 #20

Shridhar Daithankar

On Tuesday 19 August 2003 21:30, Bruno Wolff III wrote:

On Tue, Aug 19, 2003 at 18:01:33 +0530,

Shridhar Daithankar <sh*****************@persistent.co.in> wrote:
Few major differences I can see right here. Correct me on mysql side.

- WAL log
- Transactabl DDLs
- Nested transactions coming soon to PG
- PITR coming soon to PG

Note that the last two are not coming soon as in 7.4, but as in maybe for
7.5. Which I can't see being out in less than 6 months.

Right. But as of now, we know that some code exists for each of these. Given
for how long transactions in mysql was vapourware (What, 2 years?), I think
what I am making is pretty much a statement and not a claim.

Of course, by postgresql standards, it may be bit too early to announce, I
admit...:-)

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #21

Bruno Wolff III

On Tue, Aug 19, 2003 at 21:27:15 +0530,
Shridhar Daithankar <sh*****************@persistent.co.in> wrote:

On Tuesday 19 August 2003 21:24, Bruno Wolff III wrote:
On Tue, Aug 19, 2003 at 19:17:31 +0530,

Shridhar Daithankar <sh*****************@persistent.co.in> wrote:
Making pgsql-bugs a open to non-subscription but moderated list might be
a good idea. It really does not matter if a bug gets filed couple of days
late but having to have subscribe to another list could be ditterent.

All of the pgsql lists including bugs already work this way.

No.. Ocassionally when somebody cross-posts and I reply, I get 'Post stalled
for modetaro because you are not member' types error. IIRC, the SQL list is
most frequent ones. I haven't seen any other list doing such stuff..

All that message means is that your message won't be posted until a moderator
approves it.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #22

Vivek Khera

>>>>> "SD" == Shridhar Daithankar <sh*****************@persistent.co.in> writes:

SD> On Tuesday 19 August 2003 21:03, Vivek Khera wrote:

Tops on my wish list is that postgres automatically notice when a row
is no longer needed (all transactional references to it are gone) and
'free' it at that time, rather then needing a special scan to
determine the row is no longer needed and freeing it.

SD> Heh.. we have autovacuum right. Well it does not work the way you
SD> want but it works automatically at least.
There's a big difference between "noticing that a table needs to be
vacuumed and running it" and "automatically having the backend free a
row as soon as we know it is eligible to be (as would normally be
determined by vacuum)".

One of these days when I can afford a 14-hour dump/restore, I'll
upgrade to 7.4 and try autovacuum :-)

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: kh***@kciLink.com Rockville, MD +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #23

Vivek Khera

>>>>> "VK" == Vivek Khera <kh***@kcilink.com> writes:

One more nit:

Since the beginning of time (at least MySQL v3.22) MySQL has silently
ignored the foreign key references in table create statement. Now
that they have foreign key support (version 4.x), do they honor those
statements? Nope. You have to use their own syntax to declare your
FKs. They still silently ignore the references in the table create
statements.

Standards are a good thing, especially when everyone does NOT have
their own.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: kh***@kciLink.com Rockville, MD +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #24

Shridhar Daithankar

On Tuesday 19 August 2003 21:43, Vivek Khera wrote:

There's a big difference between "noticing that a table needs to be
vacuumed and running it" and "automatically having the backend free a
row as soon as we know it is eligible to be (as would normally be
determined by vacuum)".
Agreed and it would be nice to have. But functionally it's *almost* the same..
One of these days when I can afford a 14-hour dump/restore, I'll
upgrade to 7.4 and try autovacuum :-)

FYi, it runs perfectly on 7.3.x IIRC from some posts earlier..So you don't
have to wait too long..

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #25

scott.marlowe

On Tue, 19 Aug 2003, Bruno Wolff III wrote:

On Tue, Aug 19, 2003 at 18:01:33 +0530,
Shridhar Daithankar <sh*****************@persistent.co.in> wrote:

Few major differences I can see right here. Correct me on mysql side.

- WAL log
- Transactabl DDLs
- Nested transactions coming soon to PG
- PITR coming soon to PG

Note that the last two are not coming soon as in 7.4, but as in maybe for 7.5.
Which I can't see being out in less than 6 months.

Good point. We in no way need to point to future improvements anymore to
say Postgresql is a better database than MySQL. Postgresql 7.3.4 is the
best Open Source database in my humble opinion, and the things MySQL is
missing are complex features that need to be well thought out and well
implemented. Given the haphazard approach of MySQL to standards
compliance, you are gambling your company on it's non-standard SQL
sticking around if you use it, or hundreds of man hours replacing MySQL
specific SQL if you ever convert.

Examples:

|| In Postgresql || is the concatenation operator. Why, in god's name,
is || the concatenation operator. Because the Spec says so. In MySQL
it's the OR operator, contrary to spec.

Their FKs silently fail without so much as a notice. Poof, no fks, but
they swallow the syntax as if they do.

They get precision wrong all the time. Spec says numeric(x1,y1) *
numeric(x2,y2) yeilds numeric(size_of_int_portion.y1+y2)

But in MySQL you get a numeric(size_of_int_portion,max(y1,y2)).

Postgresql gives you the answer.

create table mult (i1 numeric(10,2), i2 numeric(10,3));
insert into mult values (123.33,123.354);
select i1*i2 from mult;
MySQL output:
+-----------+
| i1*i2 |
+-----------+
| 15213.249 |
+-----------+

Postgresql output:
?column?
-------------
15213.24882

I don't know if that's changed since 4.x came out, I gave up on MySQL for
anything other than a text storage / content management engine long ago.
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 11 '05 #26

Matthew T. O'Connor

On Tue, 2003-08-19 at 12:13, Vivek Khera wrote:

There's a big difference between "noticing that a table needs to be
vacuumed and running it" and "automatically having the backend free a
row as soon as we know it is eligible to be (as would normally be
determined by vacuum)".
<talking beyond my real knowledge>
Changing Postgres to perform as mentioned above is non-trivial, it would
basicially change the entire core of the system. I think this is due to
the fact that postgres uses a non-overwriting storage manager. This has
many benefits including MVCC, the primary disadvantage is that you need
a vacuum type process
</talking beyond my real knowledge>
One of these days when I can afford a 14-hour dump/restore, I'll
upgrade to 7.4 and try autovacuum :-)

pg_autovacuum does with with 7.3.x, but the source is only included in
7.4. Just get the pg_autovacuum directory from contrib and use it.

Matthew
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #27

Vivek Khera

>>>>> "MTO" == Matthew T O'Connor <ma*****@zeut.net> writes:

MTO> <talking beyond my real knowledge>
MTO> Changing Postgres to perform as mentioned above is non-trivial, it would
MTO> basicially change the entire core of the system. I think this is due to
MTO> the fact that postgres uses a non-overwriting storage manager. This has
MTO> many benefits including MVCC, the primary disadvantage is that you need
MTO> a vacuum type process
MTO> </talking beyond my real knowledge>

I'm not promoting any change in the MVCC. What I'm saying is that it
would be really cool if the backend process itself could recognize
that a row is no longer referenced by any transactions upon
termination of the transaction, and release it back to the system.
This is just what vacuum does but in a batched manner. I would love
to see it incremental. This would result in pretty much near zero
internal fragmentation, I think.

How hard that is, I have no clue. Right now my DB is saturating the
disk and having to squeeze vacuum into a saturated disk bandwidth is
not pleasant. Luckily, the 14-disk raid array just
arrived... hopefully that will have higher bandwidth than the 4-disk
array... ;-)

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 11 '05 #28

Bruno Wolff III

On Tue, Aug 19, 2003 at 21:51:14 -0400,
Vivek Khera <kh***@kcilink.com> wrote:

I'm not promoting any change in the MVCC. What I'm saying is that it
would be really cool if the backend process itself could recognize
that a row is no longer referenced by any transactions upon
termination of the transaction, and release it back to the system.
This is just what vacuum does but in a batched manner. I would love
to see it incremental. This would result in pretty much near zero
internal fragmentation, I think.

Why do you care about about the details of the implementation (rather than
the performance)? If it were faster to do it that way, that's how it would
have been done in the first place. The cost of doing the above is almost
certainly going to be an overall performance loser.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 11 '05 #29

Vivek Khera

>>>>> "BW" == Bruno Wolff, <Bruno> writes:

to see it incremental. This would result in pretty much near zero
internal fragmentation, I think.

BW> Why do you care about about the details of the implementation (rather than
BW> the performance)? If it were faster to do it that way, that's how it would
BW> have been done in the first place. The cost of doing the above is almost
BW> certainly going to be an overall performance loser.

I care for the performance. And how are you so sure that it was
faster the way it is now? Are you sure it was not done this way
because of ease of implementation?

Seriously, how much slower can it be if the backend were to do the
checking for external references upon updating/deleting a row? The
cost would be distributed across time as opposed to concentrated at
once within a vacuum process. I am fairly certian it would reduce
disk bandwidth requirements since at least one necessary page will
already be in memory.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 11 '05 #30

Bruno Wolff III

On Wed, Aug 20, 2003 at 10:31:25 -0400,
Vivek Khera <kh***@kcilink.com> wrote:

I care for the performance. And how are you so sure that it was
faster the way it is now? Are you sure it was not done this way
because of ease of implementation?

Seriously, how much slower can it be if the backend were to do the
checking for external references upon updating/deleting a row? The
cost would be distributed across time as opposed to concentrated at
once within a vacuum process. I am fairly certian it would reduce
disk bandwidth requirements since at least one necessary page will
already be in memory.

It would probably be a lot slower. Any transaction that has started
but not yet finished would need to lock all rows that exist at during
the transaction (for serialized transaction isolation you would only
need to worry about rows that existed at the start of the transaction
or that were modified by the transaction). Immediately that is a big
deal since a typical query may need to lock a bunch of rows that it
will never actually touch (but you don't know that when the transaction
starts). Managing all those locks would take up a lot of semiconductor
memory or a lot of disk writes and be a major source of contention.
The current system just has to mark rows when they are created and
deleted (an update does both operations). The normal vacuum clean up
actually isn't going to be much worse than what you would need to do
at both the start and end of each transaction. The overhead of letting
dead rows hang around for a while after they aren't needed isn't that high.

Also, since at least 7.3, normal vacuums aren't normally going to affect the
performance of your database server that much.

The main issue against the current vacuum system is that it requires the
DBA knowing what vacuum does and figuring out how it should be used in
their situation to get reasonable performance. This makes it a bit harder
for non-DBAs to jump right in to Postgres without running into problems.

However, the work on autovacuum seems to be providing a reasonable solution
to that problem.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 11 '05 #31

scott.marlowe

On Wed, 20 Aug 2003, Vivek Khera wrote:

>> "BW" == Bruno Wolff, <Bruno> writes: to see it incremental. This would result in pretty much near zero
internal fragmentation, I think.

BW> Why do you care about about the details of the implementation (rather than
BW> the performance)? If it were faster to do it that way, that's how it would
BW> have been done in the first place. The cost of doing the above is almost
BW> certainly going to be an overall performance loser.

I care for the performance. And how are you so sure that it was
faster the way it is now? Are you sure it was not done this way
because of ease of implementation?

Seriously, how much slower can it be if the backend were to do the
checking for external references upon updating/deleting a row? The
cost would be distributed across time as opposed to concentrated at
once within a vacuum process. I am fairly certian it would reduce
disk bandwidth requirements since at least one necessary page will
already be in memory.

Time for a mental exercise.

Our server has 2 users. Each backend has to check with all the other
backends when it deletes a tuple (every update is also a delete, remember
every change in an MVCC database is a create / delte cycle.) Let's create
a name for the time it takes to do the update / mark deleted versus the
time it takes to contact each of those other backends. Tw is the Time to
do the work here, and Tc is the time to do the cleanup (i.e. vacuum the
tuple) Note that we'd also need a Ta for answering the requests of all
the other backends, but we can assume that on average, for each request a
child process makes, it will receive exactly that many from each other
backend running. Let x represent the number of backends. So the answer
time is equal to x*Tc

Time = Tw + Tc + Ta

Time = Tw + Tc + (x * Tc)

Time = Tw + ((x+1) * Tc)

and our cleanup time starts to grow at an ugly rate as the number of
backends increases. Lazy vacuuming allows the database to reclaim lost
space in the background, as the newer non-full vacuum does.

Many folks mistake this vacuum process for its older, slower cousin, full
vacuum, which does eat a lot more disk bandwidth and slow the machine
down.

On a Dual CPU X86 box a lazy vacuum running in a continuous loop will eat
about 5% of one CPU and drop pgbench scores by 10 to 15%. The important
thing here, is that the machine will still run quite snappily when you
throw several hundred clients at it, since the lazy vacuum just sits in
the background using the spare cycles and not much more.

that means your storage usage may baloon somewhat under intense usage, but
you won't have an IPC storm kill the performance of the postgresql server.

Knowing the postgresql development team, I'm sure the reasons they chose
are clearly stated in the hackers mailing list somewhere in time, so I'm
gonna go look, but trust me on one thing, the guys programming this
database don't do much because it's easier / faster to implement without
putting something in the TODO list about making it better some day.
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #32

Stephan Szabo

On Wed, 20 Aug 2003, Vivek Khera wrote:

>> "BW" == Bruno Wolff, <Bruno> writes: to see it incremental. This would result in pretty much near zero
internal fragmentation, I think.

BW> Why do you care about about the details of the implementation (rather than
BW> the performance)? If it were faster to do it that way, that's how it would
BW> have been done in the first place. The cost of doing the above is almost
BW> certainly going to be an overall performance loser.

I care for the performance. And how are you so sure that it was
faster the way it is now? Are you sure it was not done this way
because of ease of implementation?

Seriously, how much slower can it be if the backend were to do the
checking for external references upon updating/deleting a row? The
cost would be distributed across time as opposed to concentrated at
once within a vacuum process. I am fairly certian it would reduce
disk bandwidth requirements since at least one necessary page will
already be in memory.

In general, excepting a case with only one session running, I would expect
there to exist one or more transactions that can see the old state of the
row.

I would think it'd be something like:

T1: begin;
T1: update foo set bar=cheese where id=3;
T2: begin;
T2: select * from foo;
-- this should see the old state of the row, so that update better
-- not have removed the old one.
T1: end;
-- at this point read committed transactions may see the new
-- state, unless it's currently running a query which is accessing
-- that table, and in practice it's probably more complicated than
-- that (since you might run functions tha access the table).
-- You'd presumably need to wait until say the end of that full
-- statement at the very least. And any serializable transactions
-- would probably mean you can't do it either.

Under some circumstances it might be better, yes. And there are ways to
do things that don't involve non-overwriting (like Oracle's system) but
they all have downsides. This might not be the best solution, but you're
going to have to put alot more work into showing that your solution is
better. ;)

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #33

Jan Wieck

Alvaro Herrera wrote:

On Wed, Aug 20, 2003 at 10:31:25AM -0400, Vivek Khera wrote:
Seriously, how much slower can it be if the backend were to do the
checking for external references upon updating/deleting a row? The
cost would be distributed across time as opposed to concentrated at
once within a vacuum process. I am fairly certian it would reduce
disk bandwidth requirements since at least one necessary page will
already be in memory.

There's no way to check for "external references", because said
references are actually the running transactions. So you can't drop a
row until all the transactions that were active during your transaction
are finished. Certainly your own backend can't do the checking, because
there's no way to even assert that it will be live when those other
transactions finish. Who will? The last of said transactions? It
certainly will be expensive for a backend to keep track of the
deleted/updated tuples by all _other_ backends, just in case...
Long running transactions can't be neglected, so you can't keep it in
memory.

This isn't so. It's regular backends that turn on the flag in a tuples
header telling that it's committed deleted. And they do so during a
normal scan, so even during a SELECT. It's not the backend who actually
deleted a row that knows, it's a later one checking the tuple visibility
that can tell "this one's dead for all of us and will not come alive via
rollback any more".

But, that doesn't help for the index removal explained earlier.
Jan

--
#================================================= =====================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================= = Ja******@Yahoo.com #
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #34

Vivek Khera

>>>>> "BW" == Bruno Wolff, <Bruno> writes:

BW> It would probably be a lot slower. Any transaction that has started
BW> but not yet finished would need to lock all rows that exist at during
BW> the transaction (for serialized transaction isolation you would only

Why would you need to lock rows? Does the current vacuum need it? I
don't think it does. Why can't the functionality of vacuum be made to
operate incrementally per row delete/update? I don't know if it is
possible.

BW> Also, since at least 7.3, normal vacuums aren't normally going to
BW> affect the performance of your database server that much.

I disagree. Triggering a vacuum on a db that is nearly saturating the
disk bandwidth has a significant impact.

BW> The main issue against the current vacuum system is that it requires the
BW> DBA knowing what vacuum does and figuring out how it should be used in
BW> their situation to get reasonable performance. This makes it a bit harder
BW> for non-DBAs to jump right in to Postgres without running into problems.

BW> However, the work on autovacuum seems to be providing a reasonable solution
BW> to that problem.

Yes, this is a good thing.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #35

Bruno Wolff III

On Wed, Aug 20, 2003 at 12:40:03 -0400,
Vivek Khera <kh***@kcilink.com> wrote:

>> "BW" == Bruno Wolff, <Bruno> writes:

BW> It would probably be a lot slower. Any transaction that has started
BW> but not yet finished would need to lock all rows that exist at during
BW> the transaction (for serialized transaction isolation you would only

Why would you need to lock rows? Does the current vacuum need it? I
don't think it does. Why can't the functionality of vacuum be made to
operate incrementally per row delete/update? I don't know if it is
possible.

How do you plan on knowing when no one is using a row any more?
The way vacuum uses is not suitable for deleting the row after the
last transaction that can see the row completes. When rows are created
they are marked with the transaction id of the transaction that created
the row. When they are deleted they are marked with the transaction id
of the transaction that deleted the row. Any transaction with an id
between those two ids can see the row. So it isn't the transaction that
deletes a row that needs to worry about marking its space as available,
but the last transaction that a row is visible to that would have to
do it. Naively, this entails scanning the whole database, just like a vacuum
would, after each completed transaction.
BW> Also, since at least 7.3, normal vacuums aren't normally going to
BW> affect the performance of your database server that much.

I disagree. Triggering a vacuum on a db that is nearly saturating the
disk bandwidth has a significant impact.

If you are running that close to the edge you have potential problems
of which running vacuum is just one.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #36

Vivek Khera

>>>>> "JW" == Jan Wieck <Ja******@Yahoo.com> writes:

JW> remove all the index entries pointing to these ctid's. Your idea is (so
JW> far) lacking a place where to remember all the single removed rows and I
JW> assume you're not planning to pay the cost of a full scan over all
JW> indexes of a table to reclaim the space of one data row, are you?

Well, that pretty much kills my idea... back to autovacuum ;-)
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #37

Dennis Gearon

I don't think the problem will be deleting the tuple itself, if I've been following this correclty. If every transaction left it's signature in some common, global array, where each transaction finishing its work could find out 'who's out there', killing unnneed transactions could be part of the exit routine.

But getting the indexes cleaned up, that's another story, from what I've been listening to.

Bruno Wolff III wrote:

On Wed, Aug 20, 2003 at 12:40:03 -0400,
Vivek Khera <kh***@kcilink.com> wrote:
>>>"BW" == Bruno Wolff, <Bruno> writes:

BW> It would probably be a lot slower. Any transaction that has started
BW> but not yet finished would need to lock all rows that exist at during
BW> the transaction (for serialized transaction isolation you would only

Why would you need to lock rows? Does the current vacuum need it? I
don't think it does. Why can't the functionality of vacuum be made to
operate incrementally per row delete/update? I don't know if it is
possible.

How do you plan on knowing when no one is using a row any more?
The way vacuum uses is not suitable for deleting the row after the
last transaction that can see the row completes. When rows are created
they are marked with the transaction id of the transaction that created
the row. When they are deleted they are marked with the transaction id
of the transaction that deleted the row. Any transaction with an id
between those two ids can see the row. So it isn't the transaction that
deletes a row that needs to worry about marking its space as available,
but the last transaction that a row is visible to that would have to
do it. Naively, this entails scanning the whole database, just like a vacuum
would, after each completed transaction.

BW> Also, since at least 7.3, normal vacuums aren't normally going to
BW> affect the performance of your database server that much.

I disagree. Triggering a vacuum on a db that is nearly saturating the
disk bandwidth has a significant impact.

If you are running that close to the edge you have potential problems
of which running vacuum is just one.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 11 '05 #38

Jan Wieck

Vivek Khera wrote:

>> "JW" == Jan Wieck <Ja******@Yahoo.com> writes:

JW> remove all the index entries pointing to these ctid's. Your idea is (so
JW> far) lacking a place where to remember all the single removed rows and I
JW> assume you're not planning to pay the cost of a full scan over all
JW> indexes of a table to reclaim the space of one data row, are you?

Well, that pretty much kills my idea... back to autovacuum ;-)

Sorry that.

While at the "reclaim" and [non-] overwriting topic, the other day I had
some brilliant idea about index entries, so here comes your chance for
revenge ...

Assume we would have multiple ctid slots per index entry. An UPDATE on a
row that doesn't change indexed fields allways inserts the same index
value with a new ctid, so it'll come across this index tuple anyway. If
said index tuple has an empty slot, just put the ctid there and done. If
it does not, check the ctid's that are there if they point to "known
dead and ready to vacuum" tuples and make space. If that fails too, well
then insert another index tuple.

This has some advantages. Updating rows without changing key fields does
not result in constant index growth. An index scan will be fast even for
a value with lots of dead heap tuples. For non-unique indexes, multiple
data rows can "share" one index entry.

Index cleanup for vacuum remains pretty easy. Clear the ctid slots
pointing to removed tuples. Remove index tuples that have all slots cleared.

Jan

--
#================================================= =====================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================= = Ja******@Yahoo.com #
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #39

Tom Lane

Vivek Khera <kh***@kcilink.com> writes:

"JW" == Jan Wieck <Ja******@Yahoo.com> writes:
JW> remove all the index entries pointing to these ctid's. Your idea is (so
JW> far) lacking a place where to remember all the single removed rows and I
JW> assume you're not planning to pay the cost of a full scan over all
JW> indexes of a table to reclaim the space of one data row, are you? Well, that pretty much kills my idea... back to autovacuum ;-)

In addition to the index-cleanup issue that Jan explained, there are
locking problems. The tuple-is-dead hint bit mechanism is very
carefully designed so that a backend can set the hint bits while holding
only a shared lock on the page containing the tuple. Physically
removing a tuple requires a far stronger lock (see the source code for
details). Thus, having ordinary interactive backends remove tuples
would have bad consequences for concurrent performance.

But I think the real point here is that there's no reason to think that
doing tuple deletion on-the-fly in foreground transactions is superior
to doing it in background with a vacuum process. You're taking what
should be noncritical maintenance work and moving it into the critical
paths of your foreground applications. Not only that, but you're
probably doing more total work per tuple --- VACUUM "batches" its work
in more ways than just the index cleanup aspect, IIRC.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #40

Andrew Sullivan

On Wed, Aug 20, 2003 at 12:40:03PM -0400, Vivek Khera wrote:

>> "BW" == Bruno Wolff, <Bruno> writes:

BW> Also, since at least 7.3, normal vacuums aren't normally going to
BW> affect the performance of your database server that much.

I disagree. Triggering a vacuum on a db that is nearly saturating the
disk bandwidth has a significant impact.

Vivek is right about this. If your system is already very busy, then
a vacuum on a largish table is painful.

I don't actually think having the process done in real time will
help, though -- it seems to me what would be more useful is an even
lazier vacuum: something that could be told "clean up as cycles are
available, but make sure you stay out of the way." Of course, that's
easy to say glibly, and mighty hard to do, I expect.

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<an****@libertyrms.info> M2P 2A8
+1 416 646 3304 x110
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 11 '05 #41

Jan Wieck

Andrew Sullivan wrote:

On Wed, Aug 20, 2003 at 12:40:03PM -0400, Vivek Khera wrote:
>>>>> "BW" == Bruno Wolff, <Bruno> writes:

BW> Also, since at least 7.3, normal vacuums aren't normally going to
BW> affect the performance of your database server that much.

I disagree. Triggering a vacuum on a db that is nearly saturating the
disk bandwidth has a significant impact.

Vivek is right about this. If your system is already very busy, then
a vacuum on a largish table is painful.

I don't actually think having the process done in real time will
help, though -- it seems to me what would be more useful is an even
lazier vacuum: something that could be told "clean up as cycles are
available, but make sure you stay out of the way." Of course, that's
easy to say glibly, and mighty hard to do, I expect.

What about a little hint to the buffer management that if it has to
evict another buffer to physically read this one (meaning the buffer
pool was full already) then it will not put this buffer at the top of
the LRU chain but rather at it's end? This way a vacuum on a large table
will not cause a complete cache eviction.

Might be a useful hint for sequential scans too.
Jan

--
#================================================= =====================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================= = Ja******@Yahoo.com #
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #42

Karsten Hilbert

> Just nice'ing the VACUUM process is likely to be counterproductive

because of locking issues (priority inversion).

OK, getting out the brown paper bag |-(

Is there any concept of db engine idleness obtainable from
states of PG internal variables that might be leveraged ?

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #43

Tom Lane

Jan Wieck <Ja******@Yahoo.com> writes:

What about a little hint to the buffer management that if it has to
evict another buffer to physically read this one (meaning the buffer
pool was full already) then it will not put this buffer at the top of
the LRU chain but rather at it's end? This way a vacuum on a large table
will not cause a complete cache eviction.

I think what we really need is a way to schedule VACUUM's I/O at a lower
priority than normal I/Os. Wouldn't be very portable :-( ... but if the
OS offers a facility for requesting this, it'd be worth experimenting
with.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #44

Alvaro Herrera

On Wed, Aug 20, 2003 at 05:21:35PM -0400, Jan Wieck wrote:

What about a little hint to the buffer management that if it has to
evict another buffer to physically read this one (meaning the buffer
pool was full already) then it will not put this buffer at the top of
the LRU chain but rather at it's end? This way a vacuum on a large table
will not cause a complete cache eviction.

Might be a useful hint for sequential scans too.

Somebody was playing with using LRU-2 or ARC or some other algorithm for
page replacement instead of the current LRU... wouldn't it help with
this too?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La vida es para el que se aventura"

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #45

Martijn van Oosterhout

On Wed, Aug 20, 2003 at 05:41:18PM -0400, Tom Lane wrote:

Andrew Sullivan <an****@libertyrms.info> writes:
I don't actually think having the process done in real time will
help, though -- it seems to me what would be more useful is an even
lazier vacuum: something that could be told "clean up as cycles are
available, but make sure you stay out of the way." Of course, that's
easy to say glibly, and mighty hard to do, I expect.
I'd love to be able to do that, but I can't think of a good way.

Just nice'ing the VACUUM process is likely to be counterproductive
because of locking issues (priority inversion). Though if anyone cares
to try it on a heavily-loaded system, I'd be interested to hear the
results...

How about the really simple solution: explicit yields. Along the lines of:

for each page
vacuum page
sleep 5ms

As long as you sleep without holding any locks, this means it would (very
slowly) interate over the table. If the auto-vacuum can target tables it
would be even more effective. A slightly more sophisticated version would
be:

VACUUM MAX 10MB/s (or some such syntax)

Then you just keep a count of the number of processed pages and the amount
of time and if you're getting ahead of yourself, sleep a while.

Given lazy vacuum doesn't hold locks for long periods, it could be an idea
to continuously spend 1% of your disk bandwidth on a background vacuum. As
for vacuum full, I don't know if you could do the same thing.

--
Martijn van Oosterhout <kl*****@svana.org> http://svana.org/kleptog/ "All that is needed for the forces of evil to triumph is for enough good
men to do nothing." - Edmond Burke
"The penalty good people pay for not being interested in politics is to be
governed by people worse than themselves." - Plato

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE/RFtNY5Twig3Ge+YRAklHAKDB+zB/FNDsARg2DBySaN4VM0Y5PwCgvzKc
Q9qwEvlTjJkOGQTejP3SSOk=
=fXSP
-----END PGP SIGNATURE-----

Nov 11 '05 #46

Shridhar Daithankar

On 19 Aug 2003 at 15:26, Bo Lorentsen wrote:

On Tue, 2003-08-19 at 14:31, Shridhar Daithankar wrote:
Well, you could look at release notes. That contains lot of information. Of
course archives of pgsql-bugs and pgsql-patches are the ultimate unless you
plan to delve into CVS history and sources.. Ok, I just liked to find something like bugzilla, or an explanation to
how bugs are garantied to be visible. My boos like to compare this to
the Mysql model found on : http://bugs.mysql.com/bugstats
Check this.. http://www.mysql.com/doc/en/ANSI_diff_Transactions.html

Hmm, it sound like they have transactions on OmniDB tables, but these
are really slow, and therefor they put much energy into advetising for
the MyISAM files (non transactional). Or, am I missing something.

Pretty much true but hard to prove by evidence. I would say drop that line f
argument. Mysql has transactions. Period.

OK, if you talk about transactions across two table types, innodb not being
default, might be not being free, then it is valid.

First rule of argument. Always talk facts..

To me, it seems that their definition of transaction is limited to preventing
two guys writing to same row simaltenously, which is of course a limited view
of things.

This sounds like there MyISAM tables, or ???

I haven't used mysql to be that expert.. sorry..

Few major differences I can see right here. Correct me on mysql side.

- WAL log
- Transactabl DDLs Yes and lets add :
- Views
- subselects
- plperl, plsql, plpython, plXXX

Extensible operator and type definition
Table files splitting on 1GB boundary
Rules
Inheritance
True foreign keys
Data integrity ( You should watch some mysql excertps produced on advocacy)

Here is one for your reference...
------------- * PROPER USAGE OF NULL

mysql> select * from ai_test where id is null;
+----+-------+
| id | txt |
+----+-------+
| 1 | hello |
+----+-------+
1 row in set (0.00 sec)

;-). I digress. Off the top of my head, in no particular order:
You're not trying hard enough:

mysql> create table test3 (a date);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test3 values (-1);
Query OK, 1 row affected (0.01 sec)

mysql> insert into test3 values ('1996-02-31');
Query OK, 1 row affected (0.00 sec)

mysql> insert into test3 values ('1996-67-31');
Query OK, 1 row affected (0.00 sec)

mysql> select * from test3;
+------------+
| a |
+------------+
| 0000-00-00 |
| 1996-02-31 |
| 0000-00-00 |
+------------+
3 rows in set (0.00 sec)
-------------

I wouldn't bet my shoe on such database..

- Nested transactions coming soon to PG
- PITR coming soon to PG

Not good for argumenting with my boss about future :-)

May be right. But a decision maker needs to know roadmap as well. As I said in
another mail, there exists real world code for all of this and it is going to
happen. It's not a vapourware. If you can press it, I think talking about
future is a good idea..

I would love to see entire checklist but don't have any time to devote to
mysql.

I do understand, and its no pleasure either :-)

Just to be clear, my unenthusiaism to study mysql has nothing to do with my
time shortage. If I need to study mysql to understand it better and project
postgresql, I would happily do that. But I seriously don't have time..

Bye
Shridhar

--
Feel free to contact me (flames about my english and the useless of thisdriver
will be redirected to /dev/null, oh no, it's full...).(Michael Beck, describing
the PC-speaker sound device)
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #47

Andrew Sullivan

On Thu, Aug 21, 2003 at 03:40:29PM +1000, Martijn van Oosterhout wrote:

Given lazy vacuum doesn't hold locks for long periods, it could be
an idea to continuously spend 1% of your disk bandwidth on a
background vacuum. As for vacuum full, I don't know if you could do
the same thing.

Assuming that one can keep up with the dust bunnies this way, though,
one wouldn't need to do vacuum full. This would definitely be a way
cool feature, if implementable.

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<an****@libertyrms.info> M2P 2A8
+1 416 646 3304 x110
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #48

Andrew Sullivan

On Wed, Aug 20, 2003 at 11:41:41PM +0200, Karsten Hilbert wrote:

You mean, like, "nice 19" or so ?

ISTR someone reporting problems with locking on the performance list
from doing exactly that. The problem is that the vacuum back end
might take a lock and then not get any processor time -- in which
case everybody else gets their processor slice but can't do anything,
because they have to wait until the niced vacuum process gets back in
line.

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<an****@libertyrms.info> M2P 2A8
+1 416 646 3304 x110
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #49

Shridhar Daithankar

On 21 Aug 2003 at 10:59, Andrew Sullivan wrote:

On Wed, Aug 20, 2003 at 05:58:32PM -0400, Tom Lane wrote:
Jan Wieck <Ja******@Yahoo.com> writes:
the LRU chain but rather at it's end? This way a vacuum on a large table
will not cause a complete cache eviction.

I think what we really need is a way to schedule VACUUM's I/O at a lower
priority than normal I/Os. Wouldn't be very portable :-( ... but if the

Hey, they both sounds like nifty ideas to me! The portability sure
worries me, though, on that I/O trick. Still, Oracle (f'rinstance)
made all kinds of optimisations for Sun (and conversely) partly
because, I expect, that's where a lot of their users were, and the
performance or reliability gains were significant. Whether that is
worth doing for PostgreSQL, when there are probably lots of other
targets to aim at, is an open question.

Well, if you guys remember my posts on performance recently, the said project
will probably drift to mysql as performance requirement on solaris platform
seems pretty steep to postgresql.

Personally I think inserting 5M rows in 11 column table should not take more
than an hour. ( That's the performance criteria). But apparently postgresql is
not making it no matter what..

Just an FYI, I think we need to do something for solaris. If a hourse does not
drink despite being taken to water, throw him in water.. After it's the
database users who are stuck. Not Sun..

Bye
Shridhar

--
Fun Facts, #14: In table tennis, whoever gets 21 points first wins. That's how
it once was in baseball -- whoever got 21 runs first won.
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #50