Replaceing records

Alex

Hi,
MySQL has a nice feature that allows to call a replace rather insert
which will attempt to insert if record not present and replace if it does.

Is there any similar feature ?

Currently I run a select prior to any insert and then update or insert
depending on the result of the select. The problem here is that I just
doubled the queries. Doing so on a table with 5Mio rows and on 100k
inserts will take time and I would like to have an efficient way of
doing it.

Any ideas ?
Thanks
Alex

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #1

Subscribe Post Reply

2123

Ron Johnson

On Thu, 2003-09-04 at 03:00, Alex wrote:

Hi,
MySQL has a nice feature that allows to call a replace rather insert
which will attempt to insert if record not present and replace if it does.

Is there any similar feature ?

Currently I run a select prior to any insert and then update or insert
depending on the result of the select. The problem here is that I just
doubled the queries. Doing so on a table with 5Mio rows and on 100k
inserts will take time and I would like to have an efficient way of
doing it.

What if you try do the INSERT, and if it returns with a "key exists"
error, do the UPDATE?

Will the SELECT really slow things down that much, since the record
will be in buffers after you touch it the 1st time?

--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@cox.net
Jefferson, LA USA

"All machines, no matter how complex, are considered to be based
on 6 simple elements: the lever, the pulley, the wheel and axle,
the screw, the wedge and the inclined plane."
Marilyn Vos Savant
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 11 '05 #2

Csaba Nagy

[philosophical post regarding a missing feature of Postgres]

Hi all,

This is exactly the problem I'm facing right now, and I found there's no
good solution to this in postgres.
Basically I have a complex database operation, which spans a transaction
across multiple simple DB operations which can be also executed
atomically. The separate pieces must work also separately.

Now one of the pieces is updating a table with data if the row with the
given key exists, and inserts if not. There is a unique constraint on
the key. I found there's no way to avoid failed inserts because of
unique constraint violations, causing automatic roll-back of the running
transaction.

Now contention on insert has a quite high probability for this operation
in our application.
It's unacceptable to roll back and retry the whole transaction just
because this insert failed, partly because of performance (there's a lot
of stuff done before, and there are lots of threads/clustered machines
doing inserts at the same time, and constantly retrying would painfully
slow down things), partly because it would make our code a lot more
complex than it is already.
Locking is also a bad option, as this is about inserts, so you don't
have anything useful to lock, unless locking the whole table. Finally
I'm using this solution, because performance-wise is about the same as
retrying the transaction (in this particular case at least), but I'm
completely unhappy about this.

This problem would be easily solved if the current transaction would not
be automatically rolled back on the failed insert. Given this, it would
be as easy as trying the insert, and if fails, do the update.

I know that this feature is not an easy one, but I would like to point
out that it's really useful and it's one of the barriers for porting
complex applications to postgres, given that other databases have it
readily available.

Cheers,
Csaba.
On Thu, 2003-09-04 at 11:24, Ron Johnson wrote:

On Thu, 2003-09-04 at 03:00, Alex wrote:
Hi,
MySQL has a nice feature that allows to call a replace rather insert
which will attempt to insert if record not present and replace if it does.

Is there any similar feature ?

Currently I run a select prior to any insert and then update or insert
depending on the result of the select. The problem here is that I just
doubled the queries. Doing so on a table with 5Mio rows and on 100k
inserts will take time and I would like to have an efficient way of
doing it.

What if you try do the INSERT, and if it returns with a "key exists"
error, do the UPDATE?

Will the SELECT really slow things down that much, since the record
will be in buffers after you touch it the 1st time?

--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@cox.net
Jefferson, LA USA

"All machines, no matter how complex, are considered to be based
on 6 simple elements: the lever, the pulley, the wheel and axle,
the screw, the wedge and the inclined plane."
Marilyn Vos Savant
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #3

Shridhar Daithankar

On 4 Sep 2003 at 12:17, Csaba Nagy wrote:

This problem would be easily solved if the current transaction would not
be automatically rolled back on the failed insert. Given this, it would
be as easy as trying the insert, and if fails, do the update.

That violates definition of a transaction. You need nested transaction which
aren't there..

You can use a sequence to insert. If next value of sequence is more than value
you have, probably somebody has inserted the value. Then modify it.

Or do a select for update. If it returns the error, there is no record. So
insert, otherwise update.

It might still fail though but chances will be much less.

HTH

Bye
Shridhar

--
Air Force Inertia Axiom: Consistency is always easier to defend than
correctness.
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #4

Greg Stark

Csaba Nagy <na**@ecircle-ag.com> writes:

This problem would be easily solved if the current transaction would not
be automatically rolled back on the failed insert. Given this, it would
be as easy as trying the insert, and if fails, do the update.

Yeah, that would be nested transactions, it's on the TODO list :)

Fwiw, even if you took that approach you would still need to handle retrying
if the record was deleted between the attempted insert and the attempted
update. Unless you know nothing is deleting these records.

Is there any possibility of moving this insert outside the transaction into a
transaction of its own? If the rest of the transaction commits but the
insert/update hasn't been committed yet is the database in an invalid state?
If not you could try postponing the insert/update until after the main
transaction commits and then performing it in its own transaction.

A more complex, also flawed, approach would be to do the insert/update in a
separate connection. This would mean it would commit first before the rest of
the transaction was committed.
Out of curiosity, what does the mysql syntax look like? How would you handle
something where the insert and update were quite different like:

INSERT INFO foo (pk,value,count,date_ins) values (?,?,1,now())
OR UPDATE foo set value=?, count=count+1, date_upd=now() where pk = ?

--
greg
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #5

Csaba Nagy

On Thu, 2003-09-04 at 15:52, Greg Stark wrote:

Csaba Nagy <na**@ecircle-ag.com> writes:
This problem would be easily solved if the current transaction would not
be automatically rolled back on the failed insert. Given this, it would
be as easy as trying the insert, and if fails, do the update.
Yeah, that would be nested transactions, it's on the TODO list :)

Very good :) The sooner implemented the better ;)

Fwiw, even if you took that approach you would still need to handle retrying
if the record was deleted between the attempted insert and the attempted
update. Unless you know nothing is deleting these records.
In this case there's a burst of insert/updates and no deletion for sure.
In any case it would be acceptable is sometimes the transaction fails,
but only if it happens with a very low probability.
These records are deleted only after a considerable time after all
updating is finished.
Is there any possibility of moving this insert outside the transaction into a
transaction of its own? If the rest of the transaction commits but the
insert/update hasn't been committed yet is the database in an invalid state?
If not you could try postponing the insert/update until after the main
transaction commits and then performing it in its own transaction.

A more complex, also flawed, approach would be to do the insert/update in a
separate connection. This would mean it would commit first before the rest of
the transaction was committed.
Any of these is out of question. Or all should succede, or nothing. The
problem here is that "success" from a logical point of view can happen
also when some individual queries fail. This is where nested
transactions can come handy, or the possibility of by default continuing
the in-process transaction instead of failing it (as Oracle does).
Out of curiosity, what does the mysql syntax look like? How would you handle
something where the insert and update were quite different like:

INSERT INFO foo (pk,value,count,date_ins) values (?,?,1,now())
OR UPDATE foo set value=?, count=count+1, date_upd=now() where pk = ?

No idea, I'm not using mysql. Just the problem was the same.

Cheers,
Csaba.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #6

Harald Fuchs

In article <87************@stark.dyndns.tv>,
Greg Stark <gs*****@mit.edu> writes:

Out of curiosity, what does the mysql syntax look like? How would you handle
something where the insert and update were quite different like: INSERT INFO foo (pk,value,count,date_ins) values (?,?,1,now())
OR UPDATE foo set value=?, count=count+1, date_upd=now() where pk = ?

You can't. The only thing MySQL has to offer is

REPLACE INTO tbl_name [(col_name,...)] VALUES (expr,...)

`REPLACE' works exactly like `INSERT', except that if an old record in
the table has the same value as a new record on a `UNIQUE' index or
`PRIMARY KEY', the old record is deleted before the new record is
inserted.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #7

Greg Stark

Harald Fuchs <no****@sap.com> writes:

You can't. The only thing MySQL has to offer is

Actually I found two things related to this:

http://www.mysql.com/doc/en/INSERT.html

http://www.mysql.com/doc/en/REPLACE.html

You can do
INSERT INTO tab (...) VALUES (...) ON DUPLICATE KEY UPDATE col=val, ...

It seems to be newer than REPLACE.

In any case, both seem doable in postgres since in its MVCC every update is a
delete+insert anyways. It means doing the delete if necessary and doing the
insert unconditionally.

But I'm not sure how convinced the developers are of its usefulness beyond
satisfying mysql migrations. I've never used mysql and I have seen a few times
it might have been useful. Not a lot, but a few. And it seems to be a frequent
question on the mailing list.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #8

Csaba Nagy

> You can do

INSERT INTO tab (...) VALUES (...) ON DUPLICATE KEY UPDATE col=val, ...
This would allow making sure insert won't throw exceptions on unique
constraint violations. Good enough to avoid breaking transactions.

But I'm not sure how convinced the developers are of its usefulness beyond
satisfying mysql migrations. I've never used mysql and I have seen a few times
it might have been useful. Not a lot, but a few. And it seems to be a frequent
question on the mailing list.

If nested transactions is easier to implement, that would be better.
Covers more of the current problems.

Cheers,
Csaba.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #9

Jonathan Bartlett

I think this would be very useful.

Jon

On 4 Sep 2003, Greg Stark wrote:

Harald Fuchs <no****@sap.com> writes:
You can't. The only thing MySQL has to offer is

Actually I found two things related to this:

http://www.mysql.com/doc/en/INSERT.html

http://www.mysql.com/doc/en/REPLACE.html

You can do
INSERT INTO tab (...) VALUES (...) ON DUPLICATE KEY UPDATE col=val, ...

It seems to be newer than REPLACE.

In any case, both seem doable in postgres since in its MVCC every update is a
delete+insert anyways. It means doing the delete if necessary and doing the
insert unconditionally.

But I'm not sure how convinced the developers are of its usefulness beyond
satisfying mysql migrations. I've never used mysql and I have seen a few times
it might have been useful. Not a lot, but a few. And it seems to be a frequent
question on the mailing list.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 11 '05 #10

Richard Ellis

On Thu, Sep 04, 2003 at 12:17:35PM +0200, Csaba Nagy wrote:

[philosophical post regarding a missing feature of Postgres]

I found there's no way to avoid failed inserts because of
unique constraint violations, causing automatic roll-back of the running
transaction.

Now contention on insert has a quite high probability for this operation
in our application.

Did you ever try this:

insert into test (a, b, c, d)
(select 1, 2, 3, 4 where not exists
(select 1 from test where a=1 and b=2 and c=3 and d=4)
);

If your table contains a=1, b=2, c=3, and d=4, nothing will happen, and
there will be no failed transaction. If your table does not contain a=1,
b=2, c=3, and d=4, you'll get an insert of a row containing 1, 2, 3, 4.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #11

Stephan Szabo

On Thu, 4 Sep 2003, Richard Ellis wrote:

On Thu, Sep 04, 2003 at 12:17:35PM +0200, Csaba Nagy wrote:
[philosophical post regarding a missing feature of Postgres]

I found there's no way to avoid failed inserts because of
unique constraint violations, causing automatic roll-back of the running
transaction.

Now contention on insert has a quite high probability for this operation
in our application.

Did you ever try this:

insert into test (a, b, c, d)
(select 1, 2, 3, 4 where not exists
(select 1 from test where a=1 and b=2 and c=3 and d=4)
);

If your table contains a=1, b=2, c=3, and d=4, nothing will happen, and
there will be no failed transaction. If your table does not contain a=1,
b=2, c=3, and d=4, you'll get an insert of a row containing 1, 2, 3, 4.

Unfortunately that doesn't work if two transactions want to insert a row
containing 1,2,3,4 that are running concurrently.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 11 '05 #12

Jonathan Bartlett

However, that doesn't cover the case where you want to update the record
if it already exists.

Jon

insert into test (a, b, c, d)
(select 1, 2, 3, 4 where not exists
(select 1 from test where a=1 and b=2 and c=3 and d=4)
);

If your table contains a=1, b=2, c=3, and d=4, nothing will happen, and
there will be no failed transaction. If your table does not contain a=1,
b=2, c=3, and d=4, you'll get an insert of a row containing 1, 2, 3, 4.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 11 '05 #13

Richard Ellis

On Thu, Sep 04, 2003 at 12:29:17PM -0700, Stephan Szabo wrote:

On Thu, 4 Sep 2003, Richard Ellis wrote:
On Thu, Sep 04, 2003 at 12:17:35PM +0200, Csaba Nagy wrote:
[philosophical post regarding a missing feature of Postgres]

I found there's no way to avoid failed inserts because of
unique constraint violations, causing automatic roll-back of
the running transaction.

Now contention on insert has a quite high probability for this
operation in our application.

Did you ever try this:

insert into test (a, b, c, d)
(select 1, 2, 3, 4 where not exists
(select 1 from test where a=1 and b=2 and c=3 and d=4)
);

If your table contains a=1, b=2, c=3, and d=4, nothing will
happen, and there will be no failed transaction. If your table
does not contain a=1, b=2, c=3, and d=4, you'll get an insert of
a row containing 1, 2, 3, 4.

Unfortunately that doesn't work if two transactions want to insert
a row containing 1,2,3,4 that are running concurrently.

True, if the row does not already exist. But in that situation,
because of the unique constraint premise in the original quote, there
is always going to be at least one failed transaction. So the battle
is already lost before it's even begun.

If, however, the same row already exists in the table, then both of these
inserts will silently do nothing, and both transactions will continue
without aborting.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #14

Jan Wieck

Whatever you guy's try or suggest, it's doomed to suffer.

The whole problem stems from using a non-standard feature. And in my
opinion MySQL's "REPLACE INTO" is less a feature or extension to the
standard than more another stupid and lesser thought through addition of
apparently speed gaining crap at the cost of proper design.

One possible reason why this sort of "feature" was left out of the SQL
standard could be that the source of an ID, that is supposed to be
unique in the end, should by default ensure it's uniqueness. Defining a
column UNIQUE is a last line of defense, and aborted actions because of
constraint violation should be the exception, not the normal mode of
operation. If it's the DB to ensure uniqueness, it has to generate the
ID and one can use a sequence. If it's the application to generate it,
the application should know if this is an INSERT or an UPDATE.

Wherever one is using this "REPLACE INTO" language violation, the client
application or even something in front of it is generating ID's but it's
not sure if it is sending down a new or existing one. The real question
is "why is this piece of garbage unable to tell the ID is newly created
or has to exist already?"

I don't think there should be a way to subsitute this. Fix the
application design instead.
Jan
Richard Ellis wrote:

On Thu, Sep 04, 2003 at 12:29:17PM -0700, Stephan Szabo wrote:

On Thu, 4 Sep 2003, Richard Ellis wrote:
> On Thu, Sep 04, 2003 at 12:17:35PM +0200, Csaba Nagy wrote:
> > [philosophical post regarding a missing feature of Postgres]
> >
> > I found there's no way to avoid failed inserts because of
> > unique constraint violations, causing automatic roll-back of
> > the running transaction.
> >
> > Now contention on insert has a quite high probability for this
> > operation in our application.
>
> Did you ever try this:
>
> insert into test (a, b, c, d)
> (select 1, 2, 3, 4 where not exists
> (select 1 from test where a=1 and b=2 and c=3 and d=4)
> );
>
> If your table contains a=1, b=2, c=3, and d=4, nothing will
> happen, and there will be no failed transaction. If your table
> does not contain a=1, b=2, c=3, and d=4, you'll get an insert of
> a row containing 1, 2, 3, 4.

Unfortunately that doesn't work if two transactions want to insert
a row containing 1,2,3,4 that are running concurrently.

True, if the row does not already exist. But in that situation,
because of the unique constraint premise in the original quote, there
is always going to be at least one failed transaction. So the battle
is already lost before it's even begun.

If, however, the same row already exists in the table, then both of these
inserts will silently do nothing, and both transactions will continue
without aborting.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

--
#================================================= =====================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================= = Ja******@Yahoo.com #
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 11 '05 #15

Greg Stark

Richard Ellis <re*****@yahoo.com> writes:

True, if the row does not already exist. But in that situation,
because of the unique constraint premise in the original quote, there
is always going to be at least one failed transaction. So the battle
is already lost before it's even begun.
Well, no, that was the point. Ideally he wants to try to catch the duplicate
without producing an exception because he can't do nested transactions.

There's no parent record to this record in another table? You could lock the
parent record with SELECT FOR UPDATE, then do a SELECT count(*) on this table,
and do the insert or update as appropriate, then release the lock on the
parent record.

That's not great if you're doing lots of inserts on the same parent record, or
if the parent record is being updated frequently, but it's way better than
doing a table lock.

FWIW:

Jan Wieck <Ja******@Yahoo.com> writes:
Defining a column UNIQUE is a last line of defense, and aborted actions
because of constraint violation should be the exception, not the normal mode
of operation.
Well that's one approach. I don't agree. The database is a tool, unique key
constraints are a tool, they're good at doing certain things, like ensuring
atomic semantics for cases just like this. Why try to reinvent the wheel using
inferior tools in the application. You're doomed to fail and introduce race
conditions.

In fact in this situation I usually prefer to try the insert and handle
exceptions over any of the other approaches. It's cleaner, clearer, faster in
the normal case, and has the least likelihood of race conditions (none if the
table never has deletes).
Wherever one is using this "REPLACE INTO" language violation, the client
application or even something in front of it is generating ID's but it's not
sure if it is sending down a new or existing one. The real question is "why is
this piece of garbage unable to tell the ID is newly created or has to exist
already?"

Well, because that's the database's job. If the application tried to do that
it would have to solve all the same concurrency and atomicity issues that the
database already solves it. That's why I'm using a database in the first
place.
--
greg
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #16

Csaba Nagy

[rant mode]
I have to answer this: I'm not trying to use a non-standard feature, I
try to solve a problem. Namely to be able to try to insert and on
failure continue the transaction. This is by no means a non-standard
feature.
AFAIKT the standard says nothing about rolling back automatically a
transaction on error, it just says that YOU should be able to roll it
back or commit it, and then all or nothing of the changes should be
executed.
The application design can be "fixed", but that means ugly workarounds.
In my case a simple fix would be to always insert all the possible
records before any update would happen, but that would bloat the table
10-fold - I think you agree this is unacceptable.
Please understand me: I'm not after pissing off the postgres developers
by telling Postgres is not up to it, I try to insist that nested
transactions are a very important feature, which can solve lots of
problems which apparently might have nothing to do with nested
transactions.

Cheers,
Csaba.
On Fri, 2003-09-05 at 04:38, Jan Wieck wrote:

Whatever you guy's try or suggest, it's doomed to suffer.

The whole problem stems from using a non-standard feature. And in my
opinion MySQL's "REPLACE INTO" is less a feature or extension to the
standard than more another stupid and lesser thought through addition of
apparently speed gaining crap at the cost of proper design.

One possible reason why this sort of "feature" was left out of the SQL
standard could be that the source of an ID, that is supposed to be
unique in the end, should by default ensure it's uniqueness. Defining a
column UNIQUE is a last line of defense, and aborted actions because of
constraint violation should be the exception, not the normal mode of
operation. If it's the DB to ensure uniqueness, it has to generate the
ID and one can use a sequence. If it's the application to generate it,
the application should know if this is an INSERT or an UPDATE.

Wherever one is using this "REPLACE INTO" language violation, the client
application or even something in front of it is generating ID's but it's
not sure if it is sending down a new or existing one. The real question
is "why is this piece of garbage unable to tell the ID is newly created
or has to exist already?"

I don't think there should be a way to subsitute this. Fix the
application design instead.
Jan

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #17

Peter Childs

Fine says he seeing both sides and wanting to back both sides.

REPLACE INTO throws away data with out the user knowing that they
have ditched the data. This means it has side effects. Standard
Programming Theory says that functions should not have unintended side
REPLACE INTO is in-fact a

DELETE followed by a INSERT

Its also unclear what replace should do with missing fields
1. Fill them in with the defaults.
2. Leave them alone.

If 1 its very dangerous and if 2 you should know what you are
changing anyway and the very least it should return "UPDATE" or "INSERT"
depending on what "REPLACE" actually did!

On the other hand Nesting are a good idea. There is one problem
however. When do you impose referential integrity for deferred checks. at
the last commit. or do you need a check references command.

Its not the point that a UPDATE on most databases infers a COPY,
CHANGE COPY, DELETE steps so you can roll back if nessessary.

Replace also needs to know the table stucture to work Update,
Insert and Delete don't they only need to check the constraints.

As I'm sure I've said before SQL has huge holes and inconsistencies
and needs a complete re-write. its like VHS-Video not the best just whats
been sold to everyone.

Peter Childs
On 5 Sep 2003, Csaba Nagy wrote:

[rant mode]
I have to answer this: I'm not trying to use a non-standard feature, I
try to solve a problem. Namely to be able to try to insert and on
failure continue the transaction. This is by no means a non-standard
feature.
AFAIKT the standard says nothing about rolling back automatically a
transaction on error, it just says that YOU should be able to roll it
back or commit it, and then all or nothing of the changes should be
executed.
The application design can be "fixed", but that means ugly workarounds.
In my case a simple fix would be to always insert all the possible
records before any update would happen, but that would bloat the table
10-fold - I think you agree this is unacceptable.
Please understand me: I'm not after pissing off the postgres developers
by telling Postgres is not up to it, I try to insist that nested
transactions are a very important feature, which can solve lots of
problems which apparently might have nothing to do with nested
transactions.

Cheers,
Csaba.
On Fri, 2003-09-05 at 04:38, Jan Wieck wrote:
Whatever you guy's try or suggest, it's doomed to suffer.

The whole problem stems from using a non-standard feature. And in my
opinion MySQL's "REPLACE INTO" is less a feature or extension to the
standard than more another stupid and lesser thought through addition of
apparently speed gaining crap at the cost of proper design.

One possible reason why this sort of "feature" was left out of the SQL
standard could be that the source of an ID, that is supposed to be
unique in the end, should by default ensure it's uniqueness. Defining a
column UNIQUE is a last line of defense, and aborted actions because of
constraint violation should be the exception, not the normal mode of
operation. If it's the DB to ensure uniqueness, it has to generate the
ID and one can use a sequence. If it's the application to generate it,
the application should know if this is an INSERT or an UPDATE.

Wherever one is using this "REPLACE INTO" language violation, the client
application or even something in front of it is generating ID's but it's
not sure if it is sending down a new or existing one. The real question
is "why is this piece of garbage unable to tell the ID is newly created
or has to exist already?"

I don't think there should be a way to subsitute this. Fix the
application design instead.
Jan

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 11 '05 #18

Csaba Nagy

Thinking about it, there's probably no easy way to avoid race conditions
(in a true transactional DB at least) when inserting into a table with
unique constraints. The REPLACE syntax will definitely not do it,
because I can't imagine what it should do when 2 threads try to REPLACE
the same key in concurrent transactions. Both will see the key as
missing, and try to insert it, so back we are at the same problem INSERT
has.

Cheers,
Csaba.

On Fri, 2003-09-05 at 12:06, Peter Childs wrote:

REPLACE INTO throws away data with out the user knowing that they
have ditched the data. This means it has side effects. Standard
Programming Theory says that functions should not have unintended side
REPLACE INTO is in-fact a

DELETE followed by a INSERT

Its also unclear what replace should do with missing fields
1. Fill them in with the defaults.
2. Leave them alone.

If 1 its very dangerous and if 2 you should know what you are
changing anyway and the very least it should return "UPDATE" or "INSERT"
depending on what "REPLACE" actually did!

On the other hand Nesting are a good idea. There is one problem
however. When do you impose referential integrity for deferred checks. at
the last commit. or do you need a check references command.

Its not the point that a UPDATE on most databases infers a COPY,
CHANGE COPY, DELETE steps so you can roll back if nessessary.

Replace also needs to know the table stucture to work Update,
Insert and Delete don't they only need to check the constraints.

As I'm sure I've said before SQL has huge holes and inconsistencies
and needs a complete re-write. its like VHS-Video not the best just whats
been sold to everyone.

Peter Childs

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 11 '05 #19

Jan Wieck

It was not meant against anyone in person and I agree that nested
transactions and/or catchable exceptions and continuing afterwards is
usefull and missing in PostgreSQL. What Stephan and Richard where
actually discussing was more like emulating the REPLACE INTO, and I was
responding to that.

However, even with nested transactions and exceptions and all that, your
problem will not be cleanly solvable. You basically have 2 choices,
trying the INSERT first and if that fails with a duplicate key then do
the UPDATE, or try the UPDATE first and if no rows got hit do an INSERT.
Now if 2 concurrent transactions do try the UPDATE they can both not
find the row and do INSERT - one has a dupkey error. But if you try to
INSERT and get a duplicate key, in the time between you get the error
and issue the UPDATE someone else can issue a DELETE - the row is gone
and your UPDATE will fail.

So you have to go into a loop and try INSERTorUPDATEorINSERT... until
you either get bored or succeed ... that's not exactly what I call a
solution.
Jan

Csaba Nagy wrote:

[rant mode]
I have to answer this: I'm not trying to use a non-standard feature, I
try to solve a problem. Namely to be able to try to insert and on
failure continue the transaction. This is by no means a non-standard
feature.
AFAIKT the standard says nothing about rolling back automatically a
transaction on error, it just says that YOU should be able to roll it
back or commit it, and then all or nothing of the changes should be
executed.
The application design can be "fixed", but that means ugly workarounds.
In my case a simple fix would be to always insert all the possible
records before any update would happen, but that would bloat the table
10-fold - I think you agree this is unacceptable.
Please understand me: I'm not after pissing off the postgres developers
by telling Postgres is not up to it, I try to insist that nested
transactions are a very important feature, which can solve lots of
problems which apparently might have nothing to do with nested
transactions.

Cheers,
Csaba.
On Fri, 2003-09-05 at 04:38, Jan Wieck wrote:
Whatever you guy's try or suggest, it's doomed to suffer.

The whole problem stems from using a non-standard feature. And in my
opinion MySQL's "REPLACE INTO" is less a feature or extension to the
standard than more another stupid and lesser thought through addition of
apparently speed gaining crap at the cost of proper design.

One possible reason why this sort of "feature" was left out of the SQL
standard could be that the source of an ID, that is supposed to be
unique in the end, should by default ensure it's uniqueness. Defining a
column UNIQUE is a last line of defense, and aborted actions because of
constraint violation should be the exception, not the normal mode of
operation. If it's the DB to ensure uniqueness, it has to generate the
ID and one can use a sequence. If it's the application to generate it,
the application should know if this is an INSERT or an UPDATE.

Wherever one is using this "REPLACE INTO" language violation, the client
application or even something in front of it is generating ID's but it's
not sure if it is sending down a new or existing one. The real question
is "why is this piece of garbage unable to tell the ID is newly created
or has to exist already?"

I don't think there should be a way to subsitute this. Fix the
application design instead.
Jan

Nov 11 '05 #20

Csaba Nagy

> You're completely right on that not all possible problems are solved by

this, but different solutions are better or worse based also on the odds
for the problem to actually occur. My code can deal with broken
transactions, it's just unacceptable if they are broken too often - that
slows down the system. You must admit that the odds of the
insert-update-delete to happen at the same time is much lower than just
2 inserts happening at the same time, whatever the application usage
pattern would be. In particular, it's fairly easy to make sure there's
no delete when updates happen: select the row for update. Only the ^^^^^^^^^^^^^^^^^^^^^^^^^
Of course that's stupid. When you do an update it selects the row for
update... and that will not help in this case.
But the update will not fail. It will just have nothing to update,
which usually is just alright if the row was deleted, meaning that it's
life time ended.
BTW, in my particular problem I can make sure there will be no delete
until all insert/updates are finished.
insert is the problem cause you don't have the row to lock beforehand.

Cheers,
Csaba.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #21

Csaba Nagy

On Fri, 2003-09-05 at 15:29, Jan Wieck wrote:

However, even with nested transactions and exceptions and all that, your
problem will not be cleanly solvable. You basically have 2 choices,
trying the INSERT first and if that fails with a duplicate key then do
the UPDATE, or try the UPDATE first and if no rows got hit do an INSERT.
Now if 2 concurrent transactions do try the UPDATE they can both not
find the row and do INSERT - one has a dupkey error. But if you try to
INSERT and get a duplicate key, in the time between you get the error
and issue the UPDATE someone else can issue a DELETE - the row is gone
and your UPDATE will fail.

So you have to go into a loop and try INSERTorUPDATEorINSERT... until
you either get bored or succeed ... that's not exactly what I call a
solution.

You're completely right on that not all possible problems are solved by
this, but different solutions are better or worse based also on the odds
for the problem to actually occur. My code can deal with broken
transactions, it's just unacceptable if they are broken too often - that
slows down the system. You must admit that the odds of the
insert-update-delete to happen at the same time is much lower than just
2 inserts happening at the same time, whatever the application usage
pattern would be. In particular, it's fairly easy to make sure there's
no delete when updates happen: select the row for update. Only the
insert is the problem cause you don't have the row to lock beforehand.

Cheers,
Csaba.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #22

Ron Johnson

On Fri, 2003-09-05 at 08:29, Jan Wieck wrote:

It was not meant against anyone in person and I agree that nested
transactions and/or catchable exceptions and continuing afterwards is
usefull and missing in PostgreSQL. What Stephan and Richard where
actually discussing was more like emulating the REPLACE INTO, and I was
responding to that.

However, even with nested transactions and exceptions and all that, your
problem will not be cleanly solvable. You basically have 2 choices,
trying the INSERT first and if that fails with a duplicate key then do
the UPDATE, or try the UPDATE first and if no rows got hit do an INSERT.
Now if 2 concurrent transactions do try the UPDATE they can both not
find the row and do INSERT - one has a dupkey error. But if you try to
INSERT and get a duplicate key, in the time between you get the error
and issue the UPDATE someone else can issue a DELETE - the row is gone
and your UPDATE will fail.

SERIALIZABLE transactions will solve this.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@cox.net
Jefferson, LA USA

Thanks to the good people in Microsoft, a great deal of the data
that flows is dependent on one company. That is not a healthy
ecosystem. The issue is that creativity gets filtered through
the business plan of one company.
Mitchell Baker, "Chief Lizard Wrangler" at Mozilla
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #23

Tom Lane

Csaba Nagy <na**@ecircle-ag.com> writes:

Thinking about it, there's probably no easy way to avoid race conditions
(in a true transactional DB at least) when inserting into a table with
unique constraints. The REPLACE syntax will definitely not do it,
because I can't imagine what it should do when 2 threads try to REPLACE
the same key in concurrent transactions. Both will see the key as
missing, and try to insert it, so back we are at the same problem INSERT
has.

Assuming that you've got a unique constraint defined, one thread will
succeed in doing the INSERT, and the other will fail with a duplicate
key error --- whereupon it should loop back and try the REPLACE part
again. So what this all comes down to is having control over recovery
from a dup-key error. You have to be able to not have that abort your
whole transaction.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 11 '05 #24

Jan Wieck

Ron Johnson wrote:

On Fri, 2003-09-05 at 08:29, Jan Wieck wrote:
It was not meant against anyone in person and I agree that nested
transactions and/or catchable exceptions and continuing afterwards is
usefull and missing in PostgreSQL. What Stephan and Richard where
actually discussing was more like emulating the REPLACE INTO, and I was
responding to that.

However, even with nested transactions and exceptions and all that, your
problem will not be cleanly solvable. You basically have 2 choices,
trying the INSERT first and if that fails with a duplicate key then do
the UPDATE, or try the UPDATE first and if no rows got hit do an INSERT.
Now if 2 concurrent transactions do try the UPDATE they can both not
find the row and do INSERT - one has a dupkey error. But if you try to
INSERT and get a duplicate key, in the time between you get the error
and issue the UPDATE someone else can issue a DELETE - the row is gone
and your UPDATE will fail.

SERIALIZABLE transactions will solve this.

Sure will they.

Care to elaborate a bit about the side effects of SERIALIZABLE? I mean
semantics AND performance wise ... people tend to use suggestions like
this without thinking (about the consequences).
Jan :-T

--
#================================================= =====================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================= = Ja******@Yahoo.com #
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #25

Jan Wieck

Ron Johnson wrote:

On Fri, 2003-09-05 at 08:29, Jan Wieck wrote:
It was not meant against anyone in person and I agree that nested
transactions and/or catchable exceptions and continuing afterwards is
usefull and missing in PostgreSQL. What Stephan and Richard where
actually discussing was more like emulating the REPLACE INTO, and I was
responding to that.

However, even with nested transactions and exceptions and all that, your
problem will not be cleanly solvable. You basically have 2 choices,
trying the INSERT first and if that fails with a duplicate key then do
the UPDATE, or try the UPDATE first and if no rows got hit do an INSERT.
Now if 2 concurrent transactions do try the UPDATE they can both not
find the row and do INSERT - one has a dupkey error. But if you try to
INSERT and get a duplicate key, in the time between you get the error
and issue the UPDATE someone else can issue a DELETE - the row is gone
and your UPDATE will fail.

SERIALIZABLE transactions will solve this.

Nov 11 '05 #26

Ron Johnson

On Wed, 2003-09-10 at 00:31, Jan Wieck wrote:

Ron Johnson wrote:
On Fri, 2003-09-05 at 08:29, Jan Wieck wrote:
It was not meant against anyone in person and I agree that nested
transactions and/or catchable exceptions and continuing afterwards is
usefull and missing in PostgreSQL. What Stephan and Richard where
actually discussing was more like emulating the REPLACE INTO, and I was
responding to that.

However, even with nested transactions and exceptions and all that, your
problem will not be cleanly solvable. You basically have 2 choices,
trying the INSERT first and if that fails with a duplicate key then do
the UPDATE, or try the UPDATE first and if no rows got hit do an INSERT.
Now if 2 concurrent transactions do try the UPDATE they can both not
find the row and do INSERT - one has a dupkey error. But if you try to
INSERT and get a duplicate key, in the time between you get the error
and issue the UPDATE someone else can issue a DELETE - the row is gone
and your UPDATE will fail.

SERIALIZABLE transactions will solve this.

Sure will they.

Care to elaborate a bit about the side effects of SERIALIZABLE? I mean
semantics AND performance wise ... people tend to use suggestions like
this without thinking (about the consequences).

Well, unless INSERT/UPDATE/DELETE transactions are very short, there
will definitely be a performance hit because of increased locking.

However, I prefer that consequence rather than the artifacts from
READ COMMITTED.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@cox.net
Jefferson, LA USA

LUKE: Is Perl better than Python?
YODA: No... no... no. Quicker, easier, more seductive.
LUKE: But how will I know why Python is better than Perl?
YODA: You will know. When your code you try to read six months
from now.
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 11 '05 #27

Ron Johnson

On Wed, 2003-09-10 at 00:31, Jan Wieck wrote:

Ron Johnson wrote:
On Fri, 2003-09-05 at 08:29, Jan Wieck wrote:
It was not meant against anyone in person and I agree that nested
transactions and/or catchable exceptions and continuing afterwards is
usefull and missing in PostgreSQL. What Stephan and Richard where
actually discussing was more like emulating the REPLACE INTO, and I was
responding to that.

However, even with nested transactions and exceptions and all that, your
problem will not be cleanly solvable. You basically have 2 choices,
trying the INSERT first and if that fails with a duplicate key then do
the UPDATE, or try the UPDATE first and if no rows got hit do an INSERT.
Now if 2 concurrent transactions do try the UPDATE they can both not
find the row and do INSERT - one has a dupkey error. But if you try to
INSERT and get a duplicate key, in the time between you get the error
and issue the UPDATE someone else can issue a DELETE - the row is gone
and your UPDATE will fail.

SERIALIZABLE transactions will solve this.

Sure will they.

Care to elaborate a bit about the side effects of SERIALIZABLE? I mean
semantics AND performance wise ... people tend to use suggestions like
this without thinking (about the consequences).

Nov 11 '05 #28

Similar topics