most idiomatic way to "update or insert"?

Mark Harrison

So I have some data that I want to put into a table. If the
row already exists (as defined by the primary key), I would
like to update the row. Otherwise, I would like to insert
the row.

I've been doing something like

delete from foo where name = 'xx';
insert into foo values('xx',1,2,...);

but I've been wondering if there's a more idiomatic or canonical
way to do this.

Many TIA,
Mark

--
Mark Harrison
Pixar Animation Studios

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #1

Subscribe Reply

18312

Peter Darley

Mark,
It's not canonical by any means, but what I do is:

update foo set thing='stuff' where name = 'xx' and thing<>'stuff';
insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where
not exists (select 1 from foo where name='xx'));

I believe if you put these on the same line it will be a single
transaction. It has the benefit of not updating the row if there aren't
real changes. It's plenty quick too, if name is indexed.

Thanks,
Peter Darley

-----Original Message-----
From: pg*****************@postgresql.org
[mailto:pg*****************@postgresql.org]On Behalf Of Mark Harrison
Sent: Wednesday, August 04, 2004 4:26 PM
To: pg***********@postgresql.org
Subject: [GENERAL] most idiomatic way to "update or insert"?
So I have some data that I want to put into a table. If the
row already exists (as defined by the primary key), I would
like to update the row. Otherwise, I would like to insert
the row.

I've been doing something like

delete from foo where name = 'xx';
insert into foo values('xx',1,2,...);

but I've been wondering if there's a more idiomatic or canonical
way to do this.

Many TIA,
Mark

--
Mark Harrison
Pixar Animation Studios

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #2

Greg Stark

I'll mention that often I do exactly what you're doing. I find deleting all
existing records and then inserting what I want to appear to be cleaner than
handling the various cases that can arise if you don't.

This happens most often when I have a list of items and have a UI that allows
the user to edit the entire list and commit a whole new list in one action.
It's much easier to simply delete the old list and insert the entire new list
in a single query than to try to figure out which rows to delete and which to
insert.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #3

Lincoln Yeoh

I don't think that works - there's a race condition if you do not do any
locking.

Why:
Before a transaction that inserts rows is committed, other transactions are
not aware of the inserted rows, so the select returns no rows.

So:
You can either create a unique index and catch insert duplicate failures.

Or:
lock the relevant tables, then do the select ... update/insert or insert
.... select , or whatever it is you want to do.

Or:
both.

Test it out yourself.

At 07:51 AM 8/5/2004, Peter Darley wrote:

Mark,
It's not canonical by any means, but what I do is:

update foo set thing='stuff' where name = 'xx' and thing<>'stuff';
insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where
not exists (select 1 from foo where name='xx'));

I believe if you put these on the same line it will be a single
transaction. It has the benefit of not updating the row if there aren't
real changes. It's plenty quick too, if name is indexed.

Thanks,
Peter Darley

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #4

Pierre-Frédéric Caillaud

I use stored procedures :

create function insertorupdate(....)
UPDATE mytable WHERE ... SET ...
IF NOT FOUND THEN
INSERT INTO mytable ...
END IF;
You lose flecibility in your request though.

I wish Postgresql had an INSERT OR UPDATE like MySQL does. So far it's
the only thing that I regret from MySQL.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #5

Richard Huxton

Mark Harrison wrote:

I've been doing something like

delete from foo where name = 'xx';
insert into foo values('xx',1,2,...);

but I've been wondering if there's a more idiomatic or canonical
way to do this.

The delete+insert isn't quite the same as an update since you might have
foreign keys referring to foo with "ON DELETE CASCADE" - oops, just lost
all your dependant rows. Other people have warned about race conditions
with insert/test/update.

An "update or insert" would be useful sometimes, but it's not always
necessary. Indeed, if I find I don't know whether I'm adding or updating
something I take a long hard look at my design - it ususally means I've
not thought clearly about something.

For a "running total" table it can make more sense to have an entry with
a total of 0 created automatically via a trigger. Likewise with some
other summary tables.

Can you give an actual example of where you need this?

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #6

Tommi Maekitalo

Hi,

I prefer to update and if the number of updated rows equals 0 do an insert. So
in case of update I need only one roundtrip. If insert is far more common in
this case it might be better try insert and catch the error. But I try to
avoid running on an error intentionally.

First delete and then insert works but needs 2 SQL-statements in every case.
And the database need to update indexes at least once. There might be also
problems with cascaded deletes.
Tommi

Am Donnerstag, 5. August 2004 01:25 schrieb Mark Harrison:

So I have some data that I want to put into a table. If the
row already exists (as defined by the primary key), I would
like to update the row. Otherwise, I would like to insert
the row.

I've been doing something like

delete from foo where name = 'xx';
insert into foo values('xx',1,2,...);

but I've been wondering if there's a more idiomatic or canonical
way to do this.

Many TIA,
Mark

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #7

Peter Darley

Lincoln,
It works for me...
I think what you said is wrong because it updates first (if there is a row
to update), then inserts. If there is a row to update the insert won't
insert anything. If there is no row to update the insert inserts a row.
Either way, the insert is the last thing in the transaction. Plus, as shown
in the code to follow, I have almost this exact thing in my application and
I know that it does work for me. :)

Code (Perl):
$Neo::DB::Neo->do ("UPDATE Sample_Settings SET Value=" . Quote($Args{Value})
.. " WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting=" .
Quote($Args{Setting}) . "; INSERT INTO Sample_Settings (Sample_ID, Setting,
Value) (SELECT " . Quote($Args{SampleID}) . ", " . Quote($Args{Setting}) .
", " . Quote($Args{Value}) . " WHERE NOT EXISTS (SELECT 1 FROM
Sample_Settings WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting="
.. Quote($Args{Setting}) . "));");

Thanks,
Peter Darley

-----Original Message-----
From: Lincoln Yeoh [mailto:ly***@pop.jaring.my]
Sent: Wednesday, August 04, 2004 6:49 PM
To: Peter Darley; Mark Harrison; pg***********@postgresql.org
Subject: Re: [GENERAL] most idiomatic way to "update or insert"?
I don't think that works - there's a race condition if you do not do any
locking.

Why:
Before a transaction that inserts rows is committed, other transactions are
not aware of the inserted rows, so the select returns no rows.

So:
You can either create a unique index and catch insert duplicate failures.

Or:
lock the relevant tables, then do the select ... update/insert or insert
.... select , or whatever it is you want to do.

Or:
both.

Test it out yourself.

At 07:51 AM 8/5/2004, Peter Darley wrote:

Mark,
It's not canonical by any means, but what I do is:

update foo set thing='stuff' where name = 'xx' and thing<>'stuff';
insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where
not exists (select 1 from foo where name='xx'));

I believe if you put these on the same line it will be a single
transaction. It has the benefit of not updating the row if there aren't
real changes. It's plenty quick too, if name is indexed.

Thanks,
Peter Darley

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #8

Csaba Nagy

Peter,

The "does not work" part is not refering to the method not working at
all, but to the fact that it is not safe when you have multiple
transactions operating on the same row at the same time.
There are plenty of discussions about the insert-or-update race
conditions on this list, and the final conclusion was always that it is
not possible to solve the race condition without being prepared to catch
exceptions and retry the whole thing until it succedes...
The reason of the race condition: let's say 2 transactions A and B try
to insert-or-update the same row which does not exist. They do the
update statement at the same time, and BOTH OF THEM gets as a result
that no rows were updated, since the row does not exist yet. Now both
transactions try to insert the row, and obviously one of them will fail.
So your code must be prepared that the insert can fail, and in that case
it should retry with the update.
People tried to devise a method to avoid the race condition and throwing
exception, but it is just not possible.
Now the one bad thing in postgres which people complained about in this
context is that the transaction gets rolled back on any error, so
actually instead of just retrying the update, you will have to redo your
whole transaction.

HTH,
Csaba.
On Thu, 2004-08-05 at 15:28, Peter Darley wrote:

Lincoln,
It works for me...
I think what you said is wrong because it updates first (if there is a row
to update), then inserts. If there is a row to update the insert won't
insert anything. If there is no row to update the insert inserts a row.
Either way, the insert is the last thing in the transaction. Plus, as shown
in the code to follow, I have almost this exact thing in my application and
I know that it does work for me. :)

Code (Perl):
$Neo::DB::Neo->do ("UPDATE Sample_Settings SET Value=" . Quote($Args{Value})
. " WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting=" .
Quote($Args{Setting}) . "; INSERT INTO Sample_Settings (Sample_ID, Setting,
Value) (SELECT " . Quote($Args{SampleID}) . ", " . Quote($Args{Setting}) .
", " . Quote($Args{Value}) . " WHERE NOT EXISTS (SELECT 1 FROM
Sample_Settings WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting="
. Quote($Args{Setting}) . "));");

Thanks,
Peter Darley

-----Original Message-----
From: Lincoln Yeoh [mailto:ly***@pop.jaring.my]
Sent: Wednesday, August 04, 2004 6:49 PM
To: Peter Darley; Mark Harrison; pg***********@postgresql.org
Subject: Re: [GENERAL] most idiomatic way to "update or insert"?
I don't think that works - there's a race condition if you do not do any
locking.

Why:
Before a transaction that inserts rows is committed, other transactions are
not aware of the inserted rows, so the select returns no rows.

So:
You can either create a unique index and catch insert duplicate failures.

Or:
lock the relevant tables, then do the select ... update/insert or insert
... select , or whatever it is you want to do.

Or:
both.

Test it out yourself.

At 07:51 AM 8/5/2004, Peter Darley wrote:
Mark,
It's not canonical by any means, but what I do is:

update foo set thing='stuff' where name = 'xx' and thing<>'stuff';
insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where
not exists (select 1 from foo where name='xx'));

I believe if you put these on the same line it will be a single
transaction. It has the benefit of not updating the row if there aren't
real changes. It's plenty quick too, if name is indexed.

Thanks,
Peter Darley

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #9

Mike Mascari

Peter Darley wrote:

Lincoln, It works for me... I think what you said is wrong
because it updates first (if there is a row to update), then
inserts. If there is a row to update the insert won't insert
anything. If there is no row to update the insert inserts a row.
Either way, the insert is the last thing in the transaction.
Plus, as shown in the code to follow, I have almost this exact
thing in my application and I know that it does work for me. :)

You're getting lucky. I suggested the same thing four years ago. The
race condition is still there:

http://groups.google.com/groups?hl=e...0sss.pgh.pa.us

HTH,

Mike Mascari
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #10

Peter Darley

Mike,
Ahha! I didn't understand what the objection was. I guess I am getting
lucky. :)
It seems to me that this is true with any concurrent inserts, isn't it?
One will succeed and one will fail.
Thanks,
Peter Darley

-----Original Message-----
From: Mike Mascari [mailto:ma*****@mascari.com]
Sent: Thursday, August 05, 2004 6:51 AM
To: Peter Darley
Cc: Lincoln Yeoh; Mark Harrison; pg***********@postgresql.org
Subject: Re: [GENERAL] most idiomatic way to "update or insert"?
Peter Darley wrote:

Lincoln, It works for me... I think what you said is wrong
because it updates first (if there is a row to update), then
inserts. If there is a row to update the insert won't insert
anything. If there is no row to update the insert inserts a row.
Either way, the insert is the last thing in the transaction.
Plus, as shown in the code to follow, I have almost this exact
thing in my application and I know that it does work for me. :)

You're getting lucky. I suggested the same thing four years ago. The
race condition is still there:

http://groups.google.com/groups?hl=e...78158285%40sss
..pgh.pa.us

HTH,

Mike Mascari

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #11

award

> An "update or insert" would be useful sometimes, but it's not always

necessary. Indeed, if I find I don't know whether I'm adding or updating
something I take a long hard look at my design - it ususally means I've
not thought clearly about something. .... Can you give an actual example of where you need this?

We have an environment where our data collection occurs by screen scraping
(er, web scraping?). Unfortunately, it takes two passes, once across
search results which provide partial data, then a second time over a
detail page loaded for each item in the search results we were given.
Since time is of the essence, we provide the partial data to our
customers, which means dealing with the insert or update. Additionally,
the process is multithreaded, so search results can be touching things
concurrently with details being loaded, otherwise we can't keep up.

I dealt with the problem by wrapping every touch of an item in a single
transaction with a loop around it, as has been recommended here many times
before. Any DB-exception (Python) inside the loop caused by concurrency
type problems causes a restart. As it turns out, the insert/update race
has yet to result in a retry. The real payoff in this design has proven to
be dealing with FK locking... without putting way more effort into fixing
it than the deadlocks are worth, we get around a dozen deadlocks a day
that are automatically retried.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #12

Ron St-Pierre

Richard Huxton wrote:

An "update or insert" would be useful sometimes, but it's not always
necessary. Indeed, if I find I don't know whether I'm adding or
updating something I take a long hard look at my design - it ususally
means I've not thought clearly about something.
Can you give an actual example of where you need this?

How about stocks for a simple example? Let's say you have a simple table
with the stock symbol, stock exchange, high, low, open, close and
volume. Every day you update the data for each stock. But there are
always new stocks listed on an exchange, so when a new stock shows up
you have to do an insert instead of an update.

Ron

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #13

Richard Huxton

Ron St-Pierre wrote:

Richard Huxton wrote:

An "update or insert" would be useful sometimes, but it's not always
necessary. Indeed, if I find I don't know whether I'm adding or
updating something I take a long hard look at my design - it ususally
means I've not thought clearly about something.

Can you give an actual example of where you need this?

How about stocks for a simple example? Let's say you have a simple table
with the stock symbol, stock exchange, high, low, open, close and
volume. Every day you update the data for each stock. But there are
always new stocks listed on an exchange, so when a new stock shows up
you have to do an insert instead of an update.

If it is just a simple table then delete all of them and insert from
scratch. If you wanted to track changes over time (perhaps more likely),
you'd have a separate table with the company name/address etc and a log
table. At which point you'll want to know if it's a new company or not...

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #14

Ron St-Pierre

Richard Huxton wrote:

Ron St-Pierre wrote:
Richard Huxton wrote:

An "update or insert" would be useful sometimes, but it's not always
necessary. Indeed, if I find I don't know whether I'm adding or
updating something I take a long hard look at my design - it
ususally means I've not thought clearly about something.

Can you give an actual example of where you need this?

How about stocks for a simple example? Let's say you have a simple
table with the stock symbol, stock exchange, high, low, open, close
and volume. Every day you update the data for each stock. But there
are always new stocks listed on an exchange, so when a new stock
shows up you have to do an insert instead of an update.

If it is just a simple table then delete all of them and insert from
scratch. If you wanted to track changes over time (perhaps more
likely), you'd have a separate table with the company name/address etc
and a log table. At which point you'll want to know if it's a new
company or not...

Okay, this simple example really exists, but the simple table also
includes a date that the stock was last traded, so we have:
stock symbol, stock exchange, high, low, open, close, volume, date, plus
a few more fields

But the data isn't always updated at one time, as we can update all
stocks for one exhange and possibly only some of the stocks for a
particular exchange in one go. Even if the data is received for only one
exchange we could delete all stocks for that exchange and insert new
ones, which would work fine. However some stocks are not traded every
day, so we need to show the trading information for the last date that
it was traded, so we can't delete them en masse even for the one exchange.

BTW these updates do take longer than we'd like so I would appreciate
more input on how this setup could be redesigned.

Ron

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #15

Greg Stark

Ron St-Pierre <rs*******@syscor.com> writes:

BTW these updates do take longer than we'd like so I would appreciate more
input on how this setup could be redesigned.

Where is the input coming from?

One option is to batch changes. If you just insert into a log table whenever
new data is available, and then do a batch update of many records you would
have a few advantages.

1) You could have a single updater and therefore no worries with concurrency.

2) The optimizer could choose a merge join or at least a nested loop and avoid
multiple round trips.

Something like

update current_stock_price
set price = log.price,
timestamp = log.timestamp
from stock_price log
where current_stock_price.stock = stock_price_log.stock
and stock_price_log.timestamp between ? and ?

You can either just use deterministic time ranges like midnight-midnight or
keep careful track of the last time the job was run.

You would first have to insert into current_stock_price any missing stocks,
but if you're batching them then again you don't have to worry about someone
else inserting them in the middle of your query. And it's more efficient to
add lots of them in one shot than one at a time.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #16

Richard Huxton

Ron St-Pierre wrote:

Okay, this simple example really exists, but the simple table also
includes a date that the stock was last traded, so we have:
stock symbol, stock exchange, high, low, open, close, volume, date, plus
a few more fields [snip more details] BTW these updates do take longer than we'd like so I would appreciate
more input on how this setup could be redesigned.

Well, I'd probably make the primary key (stock_id, trading_date) and
just insert into a log table. From there I'd update into a summary
table, or use a view.

Of course, that might make things slower in your case.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #17

Matteo Beccati

Hi,

I prefer to update and if the number of updated rows equals 0 do an
insert. So in case of update I need only one roundtrip. If insert is far
more common in this case it might be better try insert and catch the
error. But I try to avoid running on an error intentionally.

When logging to a compact table that stores data in an aggregate form, I
used something like that:

BEGIN;
UPDATE ... ;

if (!affected_rows)
{
INSERT ... ;

if (error)
{
ROLLBACK;
UPDATE ... ;
}
}

COMMIT;

I added the error check with a second UPDATE try after INSERT to
increase accuracy. In fact, INSERTs were sometimes failing because of
concurrency, and this was the only viable solution I found to avoid
losing data.
Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #18

Ron St-Pierre

Greg Stark wrote:

Ron St-Pierre <rs*******@syscor.com> writes:

BTW these updates do take longer than we'd like so I would appreciate more
input on how this setup could be redesigned.
Where is the input coming from?

One option is to batch changes.

<snip>

Something like

update current_stock_price
set price = log.price,
timestamp = log.timestamp
from stock_price log
where current_stock_price.stock = stock_price_log.stock
and stock_price_log.timestamp between ? and ?

We check for new stocks and add them, and initially were using a
procedure to do something similar to your code:

CREATE OR REPLACE FUNCTION updateData() RETURNS SETOF datatype AS '
DECLARE
rec RECORD;
BEGIN
FOR rec IN SELECT symbol, tradeDate, tickDate, high, low , open,
close, volume FROM exchangedata LOOP
RETURN NEXT rec;
UPDATE stockdata SET high=rec.high, low=rec.low,
open=rec.low, close=rec.close, volume=rec.volume, tradeDate=rec.tradeDate
WHERE symbol=rec.symbol;
END LOOP;
RETURN;
END;
' LANGUAGE 'plpgsql';
.... but it took too long. Off hand, do you know if your approach above
would be quicker?

Ron

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #19

Greg Stark

Matteo Beccati <ph*@beccati.com> writes:

I added the error check with a second UPDATE try after INSERT to increase
accuracy. In fact, INSERTs were sometimes failing because of concurrency, and
this was the only viable solution I found to avoid losing data.

in the general case you could have someone else delete the record again before
you get the chance to update it again. to handle this case you would actually
have to make it a loop.

this is the main reason a built-in merge command could be nice, it would avoid
the need to loop, since it can take the lock on the existing record if it's
there or perform the insert if not.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #20

Jerry Sievers

gs*****@mit.edu (Greg Stark) writes:

This happens most often when I have a list of items and have a UI that allows
the user to edit the entire list and commit a whole new list in one action.
It's much easier to simply delete the old list and insert the entire new list
in a single query than to try to figure out which rows to delete and which to
insert.

Right, but this is going to quit working for you as soon as foreign
key references are involved. For that matter, other types of
dependent objects will prevent this as well.

In other words, when your relational database actually becomes
relational :-)

I too was in the habit of the delete/insert as a lazy mans update on
simple schemad DBs.

HTH

--
-------------------------------------------------------------------------------
Jerry Sievers 305 854-3001 (home) WWW ECommerce Consultant
305 321-1144 (mobile http://www.JerrySievers.com/

Nov 23 '05 #21

most idiomatic way to "update or insert"?

Similar topics