By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,640 Members | 2,110 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,640 IT Pros & Developers. It's quick & easy.

ERROR: Cannot insert a duplicate key into unique index pg_class_relname_nsp_index

P: n/a
ERROR: Cannot insert a duplicate key into unique index pg_class_relname_nsp_index

We've been getting this error in our application every once in a while
--- typically once an hour to once a day, although it varies over time.
The daemon that gets the error exits and restarts a few seconds later.
Usually it's fine then, but sometimes the error happens three or more
times in a row.

Occasionally, instead, we get "ERROR: Cannot insert a duplicate key
into unique index pg_type_typname_nsp_index".

We started seeing this error on November 22, three days after we migrated
from Postgres 7.2.3 and 7.3.2 to 7.3.4. We still see the error with
7.3.5, but not with 7.4.0. We're not sure we're quite ready to migrate
all of our customers to 7.4.0 yet, though.

The daemon that gets this error does the following every 15 seconds:
- start a transaction
- execute a complex and sometimes slow SELECT INTO query, creating a
temporary table
- lock another table in exclusive mode
- for no good reason, locking the temporary table too; this is
embarrassing but could conceivably relate to the bug
- delete the contents of the other table
- copy the contents of the temporary table into the other table
- drop the temporary table (again, embarrassing, sorry)
- commit

Reducing the interval from 15 seconds to 0.2 seconds makes the error
happen several times an hour. We hoped it might make the error happen
several times a minute.

We're doing this temp-table dance to minimize the time the other table
is locked, and to keep anyone who's reading the other table from seeing
an incomplete list of records.

It does all of this in four queries. The second query, the one that
merely creates and populates the temporary table, is the one that gets
the error --- understandably, since I wouldn't expect the other queries
to insert into pg_class or pg_type.

Creating and destroying a temporary table with a lot of attributes every
second is causing other problems, too; the indices on the pg_attribute
table grow to hundreds of megabytes in size, and for some reason,
reindexing our system tables corrupts our database. These huge indices
slow down access to attributes of tables, and therefore our whole
application.

We originally ran the slow SELECT INTO query in an Apache handler, because
we thought it would be fast. Usually it took tens of milliseconds.
But every once in a while, on a loaded system, while other processes were
updating the tables it was selecting from, the query would take 2, 5,
10, 20, 30, even 60 seconds. That's a long time for a web page to load.
We never did figure out why it was slow. Now, the slow query happens
in a daemon, and the Apache handler just queries the table populated by
the daemon, which reliably runs fast. Do you have any suggestions for
how to diagnose this unpredictable performance?

Many thanks.

-Kragen

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a
Kragen Sitaker <kr**********@airwave.com> writes:
ERROR: Cannot insert a duplicate key into unique index pg_class_relname_nsp_index
We've been getting this error in our application every once in a while
--- typically once an hour to once a day, although it varies over time.
This seems to me that it must indicate a collision on name+schema of the
temp table. Now that ought to be impossible :-( --- you should get
errors earlier than this if you were actually creating a duplicately
named temp table, and the design for selecting nonconflicting temp
schemas seems pretty bulletproof to me too.
We started seeing this error on November 22, three days after we migrated
from Postgres 7.2.3 and 7.3.2 to 7.3.4. We still see the error with
7.3.5, but not with 7.4.0.
Hmm. I'm not aware of any 7.4 bug fix that would affect such a thing,
so I wouldn't want to bet that 7.4 has really solved the issue.

Digging in the 7.3.2-to-7.3.4 change logs, I see one potentially
relevant change:

2003-02-06 20:33 tgl

* src/: backend/catalog/dependency.c, backend/catalog/namespace.c,
include/catalog/dependency.h (REL7_3_STABLE): Revise mechanism for
getting rid of temp tables at backend shutdown. Instead of
grovelling through pg_class to find them, make use of the handy
dandy dependency mechanism: just delete everything that depends on
our temp schema. Unlike the pg_class scan, the dependency
mechanism is smart enough to delete things in an order that doesn't
fall foul of any dependency restrictions. Fixes problem reported
by David Heggie: a temp table with a serial column may cause a
backend FATAL exit at shutdown time, if it chances to try to delete
the temp sequence first.

Now this change also exists in 7.4, but perhaps it is malfunctioning in
7.3.*. Or maybe you just haven't stressed the 7.4 installation enough
to reproduce the problem there --- what do you think are the odds of
that?

Given that you're explicitly dropping the temp table before exit, it's
not clear how a problem in this code could cause you grief anyway.
But it's the only starting point I can see. You might try adding some
monitoring code to see if you can detect temp tables being left around
by exited backends.
Creating and destroying a temporary table with a lot of attributes every
second is causing other problems, too; the indices on the pg_attribute
table grow to hundreds of megabytes in size,
Right. 7.4 should fix that though.
and for some reason,
reindexing our system tables corrupts our database.
That seems suspicious as well. What happens exactly? How did you get
out of it??
Do you have any suggestions for
how to diagnose this unpredictable performance?


I think you have more urgent things to worry about. Like finding why it
doesn't work reliably.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #2

P: n/a
On Fri, Jan 09, 2004 at 12:07:25PM -0800, Kragen Sitaker wrote:
<snip>

Not really related to your problem, but given you're in a transaction, why
do you need to lock anything? What's wrong with:
The daemon that gets this error does the following every 15 seconds:
- start a transaction
- delete the contents of the other table
- execute a complex and sometimes slow SELECT INTO query, creating a
temporary table
- copy the contents of the temporary table into the other table
- drop the temporary table (again, embarrassing, sorry)
- commit
Maybe I'm missing something?
--
Martijn van Oosterhout <kl*****@svana.org> http://svana.org/kleptog/ (... have gone from d-i being barely usable even by its developers
anywhere, to being about 20% done. Sweet. And the last 80% usually takes
20% of the time, too, right?) -- Anthony Towns, debian-devel-announce


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE//0U7Y5Twig3Ge+YRAqrDAKCFAZ7dPL/TpPPo/o4McWs7thgcjACfXxWF
cuwCIEkBuc8O81wmrS7GMeg=
=OccQ
-----END PGP SIGNATURE-----

Nov 12 '05 #3

P: n/a
On Fri, Jan 09, 2004 at 06:19:00PM -0500, Tom Lane wrote:
Kragen Sitaker <kr**********@airwave.com> writes:
ERROR: Cannot insert a duplicate key into unique index pg_class_relname_nsp_index
We've been getting this error in our application every once in a while
--- typically once an hour to once a day, although it varies over time.
This seems to me that it must indicate a collision on name+schema of the
temp table. Now that ought to be impossible :-(


Those were my first thoughts too :)
--- you should get errors earlier than this if you were actually
creating a duplicately named temp table, and the design for selecting
nonconflicting temp schemas seems pretty bulletproof to me too.
Sure. We thought maybe we had two instances of the daemons running at
once, but we tried that, and couldn't make the error happen every time.

It's worth mentioning that the daemon will often run for hours before
dying with this error. Then, when it comes back up a few seconds later,
it's likely to fail again immediately, but it's even more likely to run
without a problem for hours more.
We started seeing this error on November 22, three days after we migrated
from Postgres 7.2.3 and 7.3.2 to 7.3.4. We still see the error with
7.3.5, but not with 7.4.0.


Hmm. I'm not aware of any 7.4 bug fix that would affect such a thing,
so I wouldn't want to bet that 7.4 has really solved the issue.


I'm glad to know that.
Digging in the 7.3.2-to-7.3.4 change logs, I see one potentially
relevant change:

2003-02-06 20:33 tgl

* src/: backend/catalog/dependency.c, backend/catalog/namespace.c,
include/catalog/dependency.h (REL7_3_STABLE): Revise mechanism for
getting rid of temp tables at backend shutdown. Instead of
grovelling through pg_class to find them, make use of the handy
dandy dependency mechanism: just delete everything that depends on
our temp schema. Unlike the pg_class scan, the dependency
mechanism is smart enough to delete things in an order that doesn't
fall foul of any dependency restrictions. Fixes problem reported
by David Heggie: a temp table with a serial column may cause a
backend FATAL exit at shutdown time, if it chances to try to delete
the temp sequence first.

Now this change also exists in 7.4, but perhaps it is malfunctioning in
7.3.*. Or maybe you just haven't stressed the 7.4 installation enough
to reproduce the problem there --- what do you think are the odds of
that?
It's possible. We've re-downgraded that development machine to 7.3.4
to experiment with other ways of solving the problem, and it looks like
our nightly backup script didn't work last night, so I can't inspect
the state of the database that didn't manifest the problems with 7.4.
It's possible it might have had less stuff in it :(

We'll run the experiment again. Should we try 7.3.3 too?
Given that you're explicitly dropping the temp table before exit, it's
not clear how a problem in this code could cause you grief anyway.
Well, it's possible the daemon could have gotten killed while it was
inside the transaction, followed shortly by a shutdown of postgres ---
a dozen times or more --- and during development, we frequently kill
the daemon so that it will restart with new code. For our application,
we shut down and restart Postgres every night because it seems to make
VACUUM FULL work better.
But it's the only starting point I can see. You might try adding some
monitoring code to see if you can detect temp tables being left around
by exited backends.


Something like this?
foobar=> select count(*), pg_class.relnamespace group by relnamespace;
count | relnamespace
-------+--------------
106 | 11
70 | 99
147 | 2200
(3 rows)
foobar=> select oid, * from pg_namespace;
oid | nspname | nspowner | nspacl
---------+------------+----------+--------
11 | pg_catalog | 1 | {=U}
99 | pg_toast | 1 | {=}
2200 | public | 1 | {=UC}
16765 | pg_temp_1 | 1 |
17593 | pg_temp_2 | 1 |
17647 | pg_temp_15 | 1 |
20278 | pg_temp_16 | 1 |
1570284 | pg_temp_32 | 1 |
(8 rows)

I wonder why those old namespaces are left around? A new one shows up
whenever I kill and restart the daemon that creates the temporary tables.

We could run this code periodically to see when new namespaces pop up.
Creating and destroying a temporary table with a lot of attributes every
second is causing other problems, too; the indices on the pg_attribute
table grow to hundreds of megabytes in size,


Right. 7.4 should fix that though.


Great!
and for some reason,
reindexing our system tables corrupts our database.


That seems suspicious as well. What happens exactly? How did you get
out of it??


I don't remember what happens exactly. One of us will try this again
this weekend or early next week to get more details. All my coworkers
can remember is that PostgreSQL complained about "something about
the heap".

We never deployed that code to any customer sites, so we recovered from
it by wiping the data directory on the development machines that we had
that problem on.
Do you have any suggestions for how to diagnose this unpredictable
performance?


I think you have more urgent things to worry about. Like finding why it
doesn't work reliably.


Well, if we didn't have the unpredictable performance, we wouldn't have
been creating the temporary table in the first place, which is the only
thing that exposed this problem for us. PostgreSQL otherwise works great,
rock-solidly reliably; we've lost data to disk failure, flaky hardware,
filesystem corruption due to power failure on ext2fs, and human error,
in hundreds of thousands of hours of production operation, but never
yet to Postgres.

BTW, we're using the 7.3.4 PGDG RPMs with an extra patch to add
pg_autovacuum.

Thank you very much.

-Kragen
--
Very clever, young man. But if you use turtles for RPC, you'll have a
very, very slow RPC system. -- Jacqueline Arnold

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #4

P: n/a
On Sat, Jan 10, 2004 at 11:20:11AM +1100, Martijn van Oosterhout wrote:
Not really related to your problem, but given you're in a transaction, why
do you need to lock anything? What's wrong with:
The daemon that gets this error does the following every 15 seconds:
- start a transaction
- delete the contents of the other table
- execute a complex and sometimes slow SELECT INTO query, creating a
temporary table
- copy the contents of the temporary table into the other table
- drop the temporary table (again, embarrassing, sorry)
- commit


Maybe I'm missing something?


We don't need to lock anything. We just thought we did. We'd observed
that accessing a table inside a transaction (at the default READ COMMITTED
isolation level) could show us records created by other transactions since
this transaction started (i.e. it doesn't guarantee repeatable reads),
even if we'd already accessed the table.

So, lacking a thorough understanding of section 12.2 (or transaction
isolation levels in general), we thought we might have to lock the table
to keep someone else from accessing it while it was partly empty.
We were wrong, but I didn't know that until this afternoon.

Thank you very much for your help!

-Kragen

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #5

P: n/a
Tom Lane wrote:
Hmm. I'm not aware of any 7.4 bug fix that would affect such a thing,
so I wouldn't want to bet that 7.4 has really solved the issue.

Digging in the 7.3.2-to-7.3.4 change logs, I see one potentially
relevant change:


The only thing I can think of is the fix for splitting the first btree
page. We fixed that in 7.4. I remember it happened mostly on SMP
machines.

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #6

P: n/a
Bruce Momjian <pg***@candle.pha.pa.us> writes:
Tom Lane wrote:
Digging in the 7.3.2-to-7.3.4 change logs, I see one potentially
relevant change:
The only thing I can think of is the fix for splitting the first btree
page.


I paused on that too, but I don't see how it could apply, unless they
were dropping and rebuilding their database every few hours. Besides,
that bug is fixed in 7.3.5, which is still showing the problem.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #7

P: n/a
Tom Lane wrote:
Bruce Momjian <pg***@candle.pha.pa.us> writes:
Tom Lane wrote:
Digging in the 7.3.2-to-7.3.4 change logs, I see one potentially
relevant change:

The only thing I can think of is the fix for splitting the first btree
page.


I paused on that too, but I don't see how it could apply, unless they
were dropping and rebuilding their database every few hours. Besides,
that bug is fixed in 7.3.5, which is still showing the problem.


I didn't know we got that into 7.3.5, but now I remember it wasn't
serious enough to force a new 7.3.X release but it was in 7.3.X CVS.

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #8

P: n/a
Kragen Sitaker <kr**********@airwave.com> writes:
We'll run the experiment again. Should we try 7.3.3 too?
No, I don't think 7.3.3 is likely to behave differently from 7.3.4
as far as this goes. What would actually be interesting is whether
you can make 7.4 fail.
Well, it's possible the daemon could have gotten killed while it was
inside the transaction, followed shortly by a shutdown of postgres ---
a dozen times or more --- and during development, we frequently kill
the daemon so that it will restart with new code.
But you're seeing these errors in production, on a machine where you're
not doing that, no? In any case there is code in place to clean out
a temp schema of any pre-existing junk when a new backend starts to use
it ... perhaps there's a bug in that, but that code was not changed
since 7.3.2 ...

Another question: are you fairly confident that if the same bug had been
in 7.3.2, you would have found it? Were there any changes in your usage
patterns around the time you adopted 7.3.4?
For our application, we shut down and restart Postgres every night
because it seems to make VACUUM FULL work better.
[ itch... ] Let's not discuss the wisdom of that just now, but ...
I wonder why those old namespaces are left around?
They're supposed to be; there's no point in deleting the pg_namespace
entry only to recreate it the next time someone needs it. The real
question is whether you see any tables belonging to those namespaces.
The count(*) query on pg_class looked like a fine way to watch that.
BTW, we're using the 7.3.4 PGDG RPMs with an extra patch to add
pg_autovacuum.


If you're not planning to go to 7.4 soon, you might want to think about
an update to 7.3.5, just on general principles.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.