473,855 Members | 2,068 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Buglist

Hi ...

I'm trying to convince my boss to use posgresql (I need RI, transactions
and views), but he keeps comparing the project to mysql. Until now, I
found the answers to he's questions on the www.postgresql.org page, but
now I'm lost :-)

Where do I find a list of bugs both found and solved, or will I need to
ask on the pgsql-bugs list to know the answer ?

Also have anyone tryed to compare the new transaction model in MySQL 4.x
to PostgreSQL ?

I'm looking forward to recive even more constructive arguements :-)

/BL
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05
66 5039
On Wed, Aug 20, 2003 at 10:31:25 -0400,
Vivek Khera <kh***@kcilink. com> wrote:

I care for the performance. And how are you so sure that it was
faster the way it is now? Are you sure it was not done this way
because of ease of implementation?

Seriously, how much slower can it be if the backend were to do the
checking for external references upon updating/deleting a row? The
cost would be distributed across time as opposed to concentrated at
once within a vacuum process. I am fairly certian it would reduce
disk bandwidth requirements since at least one necessary page will
already be in memory.


It would probably be a lot slower. Any transaction that has started
but not yet finished would need to lock all rows that exist at during
the transaction (for serialized transaction isolation you would only
need to worry about rows that existed at the start of the transaction
or that were modified by the transaction). Immediately that is a big
deal since a typical query may need to lock a bunch of rows that it
will never actually touch (but you don't know that when the transaction
starts). Managing all those locks would take up a lot of semiconductor
memory or a lot of disk writes and be a major source of contention.
The current system just has to mark rows when they are created and
deleted (an update does both operations). The normal vacuum clean up
actually isn't going to be much worse than what you would need to do
at both the start and end of each transaction. The overhead of letting
dead rows hang around for a while after they aren't needed isn't that high.

Also, since at least 7.3, normal vacuums aren't normally going to affect the
performance of your database server that much.

The main issue against the current vacuum system is that it requires the
DBA knowing what vacuum does and figuring out how it should be used in
their situation to get reasonable performance. This makes it a bit harder
for non-DBAs to jump right in to Postgres without running into problems.

However, the work on autovacuum seems to be providing a reasonable solution
to that problem.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 11 '05 #31
On Wed, 20 Aug 2003, Vivek Khera wrote:
>> "BW" == Bruno Wolff, <Bruno> writes: to see it incremental. This would result in pretty much near zero
internal fragmentation, I think.


BW> Why do you care about about the details of the implementation (rather than
BW> the performance)? If it were faster to do it that way, that's how it would
BW> have been done in the first place. The cost of doing the above is almost
BW> certainly going to be an overall performance loser.

I care for the performance. And how are you so sure that it was
faster the way it is now? Are you sure it was not done this way
because of ease of implementation?

Seriously, how much slower can it be if the backend were to do the
checking for external references upon updating/deleting a row? The
cost would be distributed across time as opposed to concentrated at
once within a vacuum process. I am fairly certian it would reduce
disk bandwidth requirements since at least one necessary page will
already be in memory.


Time for a mental exercise.

Our server has 2 users. Each backend has to check with all the other
backends when it deletes a tuple (every update is also a delete, remember
every change in an MVCC database is a create / delte cycle.) Let's create
a name for the time it takes to do the update / mark deleted versus the
time it takes to contact each of those other backends. Tw is the Time to
do the work here, and Tc is the time to do the cleanup (i.e. vacuum the
tuple) Note that we'd also need a Ta for answering the requests of all
the other backends, but we can assume that on average, for each request a
child process makes, it will receive exactly that many from each other
backend running. Let x represent the number of backends. So the answer
time is equal to x*Tc

Time = Tw + Tc + Ta

Time = Tw + Tc + (x * Tc)

Time = Tw + ((x+1) * Tc)

and our cleanup time starts to grow at an ugly rate as the number of
backends increases. Lazy vacuuming allows the database to reclaim lost
space in the background, as the newer non-full vacuum does.

Many folks mistake this vacuum process for its older, slower cousin, full
vacuum, which does eat a lot more disk bandwidth and slow the machine
down.

On a Dual CPU X86 box a lazy vacuum running in a continuous loop will eat
about 5% of one CPU and drop pgbench scores by 10 to 15%. The important
thing here, is that the machine will still run quite snappily when you
throw several hundred clients at it, since the lazy vacuum just sits in
the background using the spare cycles and not much more.

that means your storage usage may baloon somewhat under intense usage, but
you won't have an IPC storm kill the performance of the postgresql server.

Knowing the postgresql development team, I'm sure the reasons they chose
are clearly stated in the hackers mailing list somewhere in time, so I'm
gonna go look, but trust me on one thing, the guys programming this
database don't do much because it's easier / faster to implement without
putting something in the TODO list about making it better some day.
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #32
On Wed, 20 Aug 2003, Vivek Khera wrote:
>> "BW" == Bruno Wolff, <Bruno> writes: to see it incremental. This would result in pretty much near zero
internal fragmentation, I think.


BW> Why do you care about about the details of the implementation (rather than
BW> the performance)? If it were faster to do it that way, that's how it would
BW> have been done in the first place. The cost of doing the above is almost
BW> certainly going to be an overall performance loser.

I care for the performance. And how are you so sure that it was
faster the way it is now? Are you sure it was not done this way
because of ease of implementation?

Seriously, how much slower can it be if the backend were to do the
checking for external references upon updating/deleting a row? The
cost would be distributed across time as opposed to concentrated at
once within a vacuum process. I am fairly certian it would reduce
disk bandwidth requirements since at least one necessary page will
already be in memory.


In general, excepting a case with only one session running, I would expect
there to exist one or more transactions that can see the old state of the
row.

I would think it'd be something like:

T1: begin;
T1: update foo set bar=cheese where id=3;
T2: begin;
T2: select * from foo;
-- this should see the old state of the row, so that update better
-- not have removed the old one.
T1: end;
-- at this point read committed transactions may see the new
-- state, unless it's currently running a query which is accessing
-- that table, and in practice it's probably more complicated than
-- that (since you might run functions tha access the table).
-- You'd presumably need to wait until say the end of that full
-- statement at the very least. And any serializable transactions
-- would probably mean you can't do it either.

Under some circumstances it might be better, yes. And there are ways to
do things that don't involve non-overwriting (like Oracle's system) but
they all have downsides. This might not be the best solution, but you're
going to have to put alot more work into showing that your solution is
better. ;)

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #33
Alvaro Herrera wrote:
On Wed, Aug 20, 2003 at 10:31:25AM -0400, Vivek Khera wrote:
Seriously, how much slower can it be if the backend were to do the
checking for external references upon updating/deleting a row? The
cost would be distributed across time as opposed to concentrated at
once within a vacuum process. I am fairly certian it would reduce
disk bandwidth requirements since at least one necessary page will
already be in memory.


There's no way to check for "external references", because said
references are actually the running transactions. So you can't drop a
row until all the transactions that were active during your transaction
are finished. Certainly your own backend can't do the checking, because
there's no way to even assert that it will be live when those other
transactions finish. Who will? The last of said transactions? It
certainly will be expensive for a backend to keep track of the
deleted/updated tuples by all _other_ backends, just in case...
Long running transactions can't be neglected, so you can't keep it in
memory.


This isn't so. It's regular backends that turn on the flag in a tuples
header telling that it's committed deleted. And they do so during a
normal scan, so even during a SELECT. It's not the backend who actually
deleted a row that knows, it's a later one checking the tuple visibility
that can tell "this one's dead for all of us and will not come alive via
rollback any more".

But, that doesn't help for the index removal explained earlier.
Jan

--
#============== =============== =============== =============== ===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#============== =============== =============== ====== Ja******@Yahoo. com #
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #34
>>>>> "BW" == Bruno Wolff, <Bruno> writes:

BW> It would probably be a lot slower. Any transaction that has started
BW> but not yet finished would need to lock all rows that exist at during
BW> the transaction (for serialized transaction isolation you would only

Why would you need to lock rows? Does the current vacuum need it? I
don't think it does. Why can't the functionality of vacuum be made to
operate incrementally per row delete/update? I don't know if it is
possible.

BW> Also, since at least 7.3, normal vacuums aren't normally going to
BW> affect the performance of your database server that much.

I disagree. Triggering a vacuum on a db that is nearly saturating the
disk bandwidth has a significant impact.

BW> The main issue against the current vacuum system is that it requires the
BW> DBA knowing what vacuum does and figuring out how it should be used in
BW> their situation to get reasonable performance. This makes it a bit harder
BW> for non-DBAs to jump right in to Postgres without running into problems.

BW> However, the work on autovacuum seems to be providing a reasonable solution
BW> to that problem.

Yes, this is a good thing.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #35
On Wed, Aug 20, 2003 at 12:40:03 -0400,
Vivek Khera <kh***@kcilink. com> wrote:
>> "BW" == Bruno Wolff, <Bruno> writes:

BW> It would probably be a lot slower. Any transaction that has started
BW> but not yet finished would need to lock all rows that exist at during
BW> the transaction (for serialized transaction isolation you would only

Why would you need to lock rows? Does the current vacuum need it? I
don't think it does. Why can't the functionality of vacuum be made to
operate incrementally per row delete/update? I don't know if it is
possible.


How do you plan on knowing when no one is using a row any more?
The way vacuum uses is not suitable for deleting the row after the
last transaction that can see the row completes. When rows are created
they are marked with the transaction id of the transaction that created
the row. When they are deleted they are marked with the transaction id
of the transaction that deleted the row. Any transaction with an id
between those two ids can see the row. So it isn't the transaction that
deletes a row that needs to worry about marking its space as available,
but the last transaction that a row is visible to that would have to
do it. Naively, this entails scanning the whole database, just like a vacuum
would, after each completed transaction.
BW> Also, since at least 7.3, normal vacuums aren't normally going to
BW> affect the performance of your database server that much.

I disagree. Triggering a vacuum on a db that is nearly saturating the
disk bandwidth has a significant impact.


If you are running that close to the edge you have potential problems
of which running vacuum is just one.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #36
>>>>> "JW" == Jan Wieck <Ja******@Yahoo .com> writes:

JW> remove all the index entries pointing to these ctid's. Your idea is (so
JW> far) lacking a place where to remember all the single removed rows and I
JW> assume you're not planning to pay the cost of a full scan over all
JW> indexes of a table to reclaim the space of one data row, are you?

Well, that pretty much kills my idea... back to autovacuum ;-)
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #37
I don't think the problem will be deleting the tuple itself, if I've been following this correclty. If every transaction left it's signature in some common, global array, where each transaction finishing its work could find out 'who's out there', killing unnneed transactions could be part of the exit routine.

But getting the indexes cleaned up, that's another story, from what I've been listening to.

Bruno Wolff III wrote:
On Wed, Aug 20, 2003 at 12:40:03 -0400,
Vivek Khera <kh***@kcilink. com> wrote:
>>>"BW" == Bruno Wolff, <Bruno> writes:


BW> It would probably be a lot slower. Any transaction that has started
BW> but not yet finished would need to lock all rows that exist at during
BW> the transaction (for serialized transaction isolation you would only

Why would you need to lock rows? Does the current vacuum need it? I
don't think it does. Why can't the functionality of vacuum be made to
operate incrementally per row delete/update? I don't know if it is
possible.

How do you plan on knowing when no one is using a row any more?
The way vacuum uses is not suitable for deleting the row after the
last transaction that can see the row completes. When rows are created
they are marked with the transaction id of the transaction that created
the row. When they are deleted they are marked with the transaction id
of the transaction that deleted the row. Any transaction with an id
between those two ids can see the row. So it isn't the transaction that
deletes a row that needs to worry about marking its space as available,
but the last transaction that a row is visible to that would have to
do it. Naively, this entails scanning the whole database, just like a vacuum
would, after each completed transaction.

BW> Also, since at least 7.3, normal vacuums aren't normally going to
BW> affect the performance of your database server that much.

I disagree. Triggering a vacuum on a db that is nearly saturating the
disk bandwidth has a significant impact.

If you are running that close to the edge you have potential problems
of which running vacuum is just one.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 11 '05 #38
Vivek Khera wrote:
>> "JW" == Jan Wieck <Ja******@Yahoo .com> writes:


JW> remove all the index entries pointing to these ctid's. Your idea is (so
JW> far) lacking a place where to remember all the single removed rows and I
JW> assume you're not planning to pay the cost of a full scan over all
JW> indexes of a table to reclaim the space of one data row, are you?

Well, that pretty much kills my idea... back to autovacuum ;-)


Sorry that.

While at the "reclaim" and [non-] overwriting topic, the other day I had
some brilliant idea about index entries, so here comes your chance for
revenge ...

Assume we would have multiple ctid slots per index entry. An UPDATE on a
row that doesn't change indexed fields allways inserts the same index
value with a new ctid, so it'll come across this index tuple anyway. If
said index tuple has an empty slot, just put the ctid there and done. If
it does not, check the ctid's that are there if they point to "known
dead and ready to vacuum" tuples and make space. If that fails too, well
then insert another index tuple.

This has some advantages. Updating rows without changing key fields does
not result in constant index growth. An index scan will be fast even for
a value with lots of dead heap tuples. For non-unique indexes, multiple
data rows can "share" one index entry.

Index cleanup for vacuum remains pretty easy. Clear the ctid slots
pointing to removed tuples. Remove index tuples that have all slots cleared.

Jan

--
#============== =============== =============== =============== ===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#============== =============== =============== ====== Ja******@Yahoo. com #
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #39
Vivek Khera <kh***@kcilink. com> writes:
"JW" == Jan Wieck <Ja******@Yahoo .com> writes:
JW> remove all the index entries pointing to these ctid's. Your idea is (so
JW> far) lacking a place where to remember all the single removed rows and I
JW> assume you're not planning to pay the cost of a full scan over all
JW> indexes of a table to reclaim the space of one data row, are you? Well, that pretty much kills my idea... back to autovacuum ;-)


In addition to the index-cleanup issue that Jan explained, there are
locking problems. The tuple-is-dead hint bit mechanism is very
carefully designed so that a backend can set the hint bits while holding
only a shared lock on the page containing the tuple. Physically
removing a tuple requires a far stronger lock (see the source code for
details). Thus, having ordinary interactive backends remove tuples
would have bad consequences for concurrent performance.

But I think the real point here is that there's no reason to think that
doing tuple deletion on-the-fly in foreground transactions is superior
to doing it in background with a vacuum process. You're taking what
should be noncritical maintenance work and moving it into the critical
paths of your foreground applications. Not only that, but you're
probably doing more total work per tuple --- VACUUM "batches" its work
in more ways than just the index cleanup aspect, IIRC.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #40

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.