int8 primary keys still not using index without manual JDBC driverpatch (7.4RC1)

Craig O'Shannessy

Hi all,

Just thought I'd mention that I really think this problem needs to be
fixed. I

I'm patching the 7.4RC1 JDBC drivers as we speak due to this optimiser
bug, and it's the third time I've had to do this. I would think this bug
causes quite a lot of people to evaluate postgres and decide it has awful
primary key performance! I love postgres, and hate to think that this
could be happening.

template1=# explain select * from lineitem where lineitemid=26845437;
QUERY PLAN
--------------------------------------------------------------
Seq Scan on lineitem (cost=0.00..82685.91 rows=1 width=103)
Filter: (lineitemid = 26845437)
(2 rows)

template1=# explain select * from lineitem where lineitemid=26845437::int8;
QUERY PLAN
--------------------------------------------------------------------------------
Index Scan using lineitem_pkey on lineitem (cost=0.00..3.53 rows=1 width=103)
Index Cond: (lineitemid = 26845437::bigint)
(2 rows)

I've noticed this is in the TODO :
Allow SELECT * FROM tab WHERE int2col = 4 to use int2col index, int8,
float4, numeric/decimal too [optimizer])

Too hard to fix before 7.4 final?

Regards,

Craig

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #1

Subscribe Post Reply

2216

Martijn van Oosterhout

Any particular reason you can't just put the value in quotes and let
postgres determine the type?

On Fri, Nov 07, 2003 at 10:43:05AM +1100, Craig O'Shannessy wrote:

Hi all,

Just thought I'd mention that I really think this problem needs to be
fixed. I

I'm patching the 7.4RC1 JDBC drivers as we speak due to this optimiser
bug, and it's the third time I've had to do this. I would think this bug
causes quite a lot of people to evaluate postgres and decide it has awful
primary key performance! I love postgres, and hate to think that this
could be happening.

template1=# explain select * from lineitem where lineitemid=26845437;
QUERY PLAN
--------------------------------------------------------------
Seq Scan on lineitem (cost=0.00..82685.91 rows=1 width=103)
Filter: (lineitemid = 26845437)
(2 rows)

template1=# explain select * from lineitem where lineitemid=26845437::int8;
QUERY PLAN
--------------------------------------------------------------------------------
Index Scan using lineitem_pkey on lineitem (cost=0.00..3.53 rows=1 width=103)
Index Cond: (lineitemid = 26845437::bigint)
(2 rows)

I've noticed this is in the TODO :
Allow SELECT * FROM tab WHERE int2col = 4 to use int2col index, int8,
float4, numeric/decimal too [optimizer])

Too hard to fix before 7.4 final?

Regards,

Craig

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
--
Martijn van Oosterhout <kl*****@svana.org> http://svana.org/kleptog/ "All that is needed for the forces of evil to triumph is for enough good
men to do nothing." - Edmond Burke
"The penalty good people pay for not being interested in politics is to be
governed by people worse than themselves." - Plato

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE/qusNY5Twig3Ge+YRAuaRAJ9/fMV3pczwTcGOXN0Ci61Q4o39awCgj3Gl
y8mpmQesRJc+aBO4E8GXUf4=
=Sjpt
-----END PGP SIGNATURE-----

Nov 12 '05 #2

Craig O'Shannessy

I'm using EJB CMP (Enterprise Java Beans, Container Managed Persistence),
so the SQL is generated. I would think this is a common usage of
PostgreSQL, as a database for a modern EJB container. There are options
for fixing this (not including fixing postgres itself), IMHO the best is
patching the JDBC PreparedStatement code so that setLong() adds '::int8'.
The advantage here is that you can use hand coded prepared statements, as
well as auto CMP ones, and both will get the proper cast.

The real problem is that PostgreSQL out of the box is not really usable
for CMP! This really isn't good, and I'm always suprised that it's not
fixed. It was very luck we found the bug on the website when we were
evaluating PostgreSQL against Oracle, it wasn't easy to track down or fix,
and it causes truly horrible performance problems.

I spose you'd call it my pet peeve.

Craig
On Fri, 7 Nov 2003, Martijn van Oosterhout wrote:

Any particular reason you can't just put the value in quotes and let
postgres determine the type?

On Fri, Nov 07, 2003 at 10:43:05AM +1100, Craig O'Shannessy wrote:
Hi all,

Just thought I'd mention that I really think this problem needs to be
fixed. I

I'm patching the 7.4RC1 JDBC drivers as we speak due to this optimiser
bug, and it's the third time I've had to do this. I would think this bug
causes quite a lot of people to evaluate postgres and decide it has awful
primary key performance! I love postgres, and hate to think that this
could be happening.

template1=# explain select * from lineitem where lineitemid=26845437;
QUERY PLAN
--------------------------------------------------------------
Seq Scan on lineitem (cost=0.00..82685.91 rows=1 width=103)
Filter: (lineitemid = 26845437)
(2 rows)

template1=# explain select * from lineitem where lineitemid=26845437::int8;
QUERY PLAN
--------------------------------------------------------------------------------
Index Scan using lineitem_pkey on lineitem (cost=0.00..3.53 rows=1 width=103)
Index Cond: (lineitemid = 26845437::bigint)
(2 rows)

I've noticed this is in the TODO :
Allow SELECT * FROM tab WHERE int2col = 4 to use int2col index, int8,
float4, numeric/decimal too [optimizer])

Too hard to fix before 7.4 final?

Regards,

Craig

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #3

Tino Wildenhain

Hi Craig,

Craig O'Shannessy schrieb:

I'm using EJB CMP (Enterprise Java Beans, Container Managed Persistence),
so the SQL is generated. I would think this is a common usage of
PostgreSQL, as a database for a modern EJB container. There are options
for fixing this (not including fixing postgres itself), IMHO the best is
patching the JDBC PreparedStatement code so that setLong() adds '::int8'.
The advantage here is that you can use hand coded prepared statements, as
well as auto CMP ones, and both will get the proper cast.

The real problem is that PostgreSQL out of the box is not really usable
for CMP! This really isn't good, and I'm always suprised that it's not
fixed. It was very luck we found the bug on the website when we were
evaluating PostgreSQL against Oracle, it wasn't easy to track down or fix,
and it causes truly horrible performance problems.

I spose you'd call it my pet peeve.

I agree with you wholeheartly - it also bothers me why
postgresql can cast [0-9]+ to int4, but only
'[0-9]+' to int8 or int2, I really cannot see the
difference.

Any ideas where we have to look for the place to patch?

Regards
Tino
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #4

Bruce Momjian

Tino Wildenhain wrote:

Hi Craig,

Craig O'Shannessy schrieb:
I'm using EJB CMP (Enterprise Java Beans, Container Managed Persistence),
so the SQL is generated. I would think this is a common usage of
PostgreSQL, as a database for a modern EJB container. There are options
for fixing this (not including fixing postgres itself), IMHO the best is
patching the JDBC PreparedStatement code so that setLong() adds '::int8'.
The advantage here is that you can use hand coded prepared statements, as
well as auto CMP ones, and both will get the proper cast.

The real problem is that PostgreSQL out of the box is not really usable
for CMP! This really isn't good, and I'm always suprised that it's not
fixed. It was very luck we found the bug on the website when we were
evaluating PostgreSQL against Oracle, it wasn't easy to track down or fix,
and it causes truly horrible performance problems.

I spose you'd call it my pet peeve.

I agree with you wholeheartly - it also bothers me why
postgresql can cast [0-9]+ to int4, but only
'[0-9]+' to int8 or int2, I really cannot see the
difference.

OK, the issue for int8, I think, is that when the lexer/scanner is
looking for tokens, it doesn't know how the token is going to be used,
so it doesn't know if [0-9]+ should be considered an int4 or int8 so it
makes it an int4. When it is a string, there is no type-casting done in
the lexer because it is just a string.

What I don't understand is why we can't just upgrade an int4 to int8 if
the value is greater than an int4, and why we can't just convert it
inside to int8 as needed. I am sure there is a reason, but I can't
remember it.

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #5

Tom Lane

Tino Wildenhain <ti**@wildenhain.de> writes:

I agree with you wholeheartly - it also bothers me why
postgresql can cast [0-9]+ to int4, but only
'[0-9]+' to int8 or int2, I really cannot see the
difference. Any ideas where we have to look for the place to patch?

Try reading the thousands of lines of discussion of this problem that
exist in the last several years of the pgsql-hackers archives.
Basically, we have found no solution that doesn't have side-effects
worse than what it fixes. Here's one recent example of a possible
solution that crashed and burned on takeoff:
http://archives.postgresql.org/pgsql...1/msg00468.php

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #6

Bruce Momjian

Tom Lane wrote:

Tino Wildenhain <ti**@wildenhain.de> writes:
I agree with you wholeheartly - it also bothers me why
postgresql can cast [0-9]+ to int4, but only
'[0-9]+' to int8 or int2, I really cannot see the
difference.

Any ideas where we have to look for the place to patch?

Try reading the thousands of lines of discussion of this problem that
exist in the last several years of the pgsql-hackers archives.
Basically, we have found no solution that doesn't have side-effects
worse than what it fixes. Here's one recent example of a possible
solution that crashed and burned on takeoff:
http://archives.postgresql.org/pgsql...1/msg00468.php

I read that URL and it covered the main stuff. This part was
particularly interesting:

I am wondering about adding some notion of "conversion distance"
associated with casts, and preferring choices that require a smaller
conversion distance; perhaps this could replace the concept of
"preferred type", too. But again I don't have a specific proposal to
make. Any thoughts?

The test case that was actually in the regression tests was

select to_hex(256*256*256 - 1) AS "ffffff";
! ERROR: Function to_hex(smallint) does not exist
! Unable to identify a function that satisfies the given argument types
! You may need to add explicit typecasts

Even had the parser resolved the overloaded to_hex call, this test would
have failed, because int2 multiplication overflows:

regression=# select 256::int2*256::int2*256::int2;
?column?
----------
0
(1 row)

I am thinking that it might be good to eliminate all the basic
arithmetic operators on int2, so that you get int4 as the minimum
width for arithmetic. But this cannot work unless we have some concept
like conversion distance, or the parser will just fail to make a choice
between int4, int8, etc alternatives.

I think your idea of conversion distance is required for any working
solution --- without it, things seem impossible --- you have an int4
value and have function for int2 and int8 --- you need to lean in a
specific direction and can't just give up.

We can try removing most int2 functions and see if that makes such
conversions much easier, or try casting incoming constants to int2 to
see what happens. I see float4/float8/numeric as similar, though that
has precision issues that I am not sure how to address --- when can you
tell if the user wants full precision?

Sorry I haven't focused on this issue but I am ready to do so if I can
be of help. How can we set up some tests of these ideas?

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #7

Tom Lane

Bruce Momjian <pg***@candle.pha.pa.us> writes:

We can try removing most int2 functions and see if that makes such
conversions much easier, or try casting incoming constants to int2 to
see what happens.

I already did that --- that was exactly the substance of the tests I was
reporting in that message.

I have been thinking lately that the fundamental approach is wrong
anyway. Basically the idea was to make the world safe for
single-datatype index handling by removing all the cross-type comparison
operators. The reason

SELECT ... WHERE int8col = 42

isn't indexable is that the = operator is int84eq, which is not to be
found in the set of operators associated with an index on int8. What
we were thinking was that if we didn't have int84eq then the parser
would be forced to promote the 42 to int8, and then the comparison using
int8eq would be recognized as indexable.

I think this might actually be workable for int8, but it's not going to
work for int2 without changing the initial typing of small integer
constants, and we already know that that opens a Pandora's box of other
problems.

But quite aside from the semantic difficulties of rejiggering all that
stuff, it's going to break other parts of the optimizer if we do it.
In particular it will interfere with handling of mergejoins and
recognizing transitive equality. For example consider

SELECT ... WHERE a.int8col = b.int4col AND b.int4col = 42;

Currently we are able to deduce a.int8col = 42 (where the operator
is int84eq). If we remove int84eq then the output of the parser for
this example will look like

SELECT ... WHERE a.int8col = b.int4col::int8 AND b.int4col = 42;

and the transitive equality will not be recognized because
b.int4col::int8 is not the same expression as b.int4col.

So I'm currently thinking we'd be better off not to try to eliminate
the cross-type comparison operators. Instead we need some solution
that is narrowly focused on the problem of making a non-indexable
comparison indexable, by converting a comparison value of the wrong
datatype into the right datatype locally to the indexscan plan
generation code. I posted some speculation about that here:
http://archives.postgresql.org/pgsql...9/msg00983.php

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #8

Tom Lane

I said:

So I'm currently thinking we'd be better off not to try to eliminate
the cross-type comparison operators. Instead we need some solution
that is narrowly focused on the problem of making a non-indexable
comparison indexable, by converting a comparison value of the wrong
datatype into the right datatype locally to the indexscan plan
generation code.

BTW, plan C would be to attack the problem head-on by allowing index
opclasses to include cross-datatype operators. This might be the
cleanest solution in the long run, but it seems likely to be a lot of
work and could force us to break existing user-defined operator classes.
I think everyone has shied away from that without much thought, but
in principle at least we could probably do it. (Say, extend pg_amop
and pg_amproc so that the datatype of the other operand becomes part
of the key.)

We are now well outside the charter of pgsql-general, so please redirect
any followup discussion to pgsql-hackers ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #9

Bruce Momjian

Tom Lane wrote:

Bruce Momjian <pg***@candle.pha.pa.us> writes:
We can try removing most int2 functions and see if that makes such
conversions much easier, or try casting incoming constants to int2 to
see what happens.
I already did that --- that was exactly the substance of the tests I was
reporting in that message.

I have been thinking lately that the fundamental approach is wrong
anyway. Basically the idea was to make the world safe for
single-datatype index handling by removing all the cross-type comparison
operators. The reason

SELECT ... WHERE int8col = 42

isn't indexable is that the = operator is int84eq, which is not to be
found in the set of operators associated with an index on int8. What
we were thinking was that if we didn't have int84eq then the parser
would be forced to promote the 42 to int8, and then the comparison using
int8eq would be recognized as indexable.

Could we not always promote int4 to int8 for indexing purposes? I
realize OID has issues, though, as you mention in that URL:

This algorithm was wrong on both practical and theoretical levels;
in the first place it's not very helpful to only be able to handle
binary-compatible transformations, and in the second place there isn't
any good guarantee that it's not changing the semantics when it replaces
the operator. For instance int4 < and oid < do not act the same.
Depending on equality of operator names was a bad idea even then, and
would be quite unworkable now in the world of schema search paths.

Could we just promote int4 constants to int8 always? I just checked and
2^32-1 is already promoted to int8:

select 4294967295;

so the funny thing is that:

SELECT ... WHERE int8col = 4294967296;

already uses the int8 index.

I think the complex case you mentioned was oid. Let's look at the
various possible constant comparisons against an oid column:

-1 This constant would match no oid, so we could just allow the
sequential scan. I don't think anyone would expect good behavior for
such a comparison.
1 This could be promoted to oid cleanly.

2^31+1 This will come in a int8, so we can just downcast to oid
automatically.

I know your case was "<" comparison. It would be:

SELECT ... WHERE oidcol < -1;
SELECT ... WHERE oidcol < 1;
SELECT ... WHERE oidcol < 2147483649; 2^31+1

These all seem to work, I think. -1 will not use an index, which is OK.

I am concerned about having to add catalog maintenance for every index
case, which seems it could be a lot.

Here is my logic. I am having trouble getting the big picture on this:

int2 fits in the int4 range
int4 fits in the int8 range

oid fits only in the int8 range, some oids fit in int4

This means a valid oid could come in as int4 or int8.

I realize this requires hard-coded comparisons to C include defines to
get the maxium for each type. I know this breaks our type-neutral
style, but in this case, it seems it might be the cleanest way ---
abstracting this out into a table seems too hard.

Now for int2-based indexes. Can't we just downcast constants to int2 if
they fit in the int2 valid range?
I think this might actually be workable for int8, but it's not going to
work for int2 without changing the initial typing of small integer
constants, and we already know that that opens a Pandora's box of other
problems.

But quite aside from the semantic difficulties of rejiggering all that
stuff, it's going to break other parts of the optimizer if we do it.
In particular it will interfere with handling of mergejoins and
recognizing transitive equality. For example consider

SELECT ... WHERE a.int8col = b.int4col AND b.int4col = 42;

Currently we are able to deduce a.int8col = 42 (where the operator
is int84eq). If we remove int84eq then the output of the parser for
this example will look like

SELECT ... WHERE a.int8col = b.int4col::int8 AND b.int4col = 42;

and the transitive equality will not be recognized because
b.int4col::int8 is not the same expression as b.int4col.
Seems we should keep those cross-type comparisons around for col op col
comparisons, at least, as well as internal optimizer use as you
described.
So I'm currently thinking we'd be better off not to try to eliminate
the cross-type comparison operators. Instead we need some solution
that is narrowly focused on the problem of making a non-indexable
comparison indexable, by converting a comparison value of the wrong
datatype into the right datatype locally to the indexscan plan
generation code. I posted some speculation about that here:
http://archives.postgresql.org/pgsql...9/msg00983.php

Agreed.

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #10

Tom Lane

Bruce Momjian <pg***@candle.pha.pa.us> writes:

Here is my logic. I am having trouble getting the big picture on this:

Th big picture is that it doesn't work very well to assume that indexes
only need to handle same-datatype comparisons. I think we are
ultimately going to have to address that issue more-or-less directly.

At least for btree indexes, it doesn't seem that the index mechanics
would have too big a problem with this --- as long as you are using the
proper comparison function, who cares whether the righthand side of the
comparison is the same datatype as the left? It might be harder for
other index types, but 99.9% of the problem is with btrees anyway.
I think if we implemented it only for btrees we'd still have a usable
solution.

I'm currently digging around to see how much of the rest of the backend
really cares about it ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #11

Jan Wieck

Tom Lane wrote:

Bruce Momjian <pg***@candle.pha.pa.us> writes:
Here is my logic. I am having trouble getting the big picture on this:

Th big picture is that it doesn't work very well to assume that indexes
only need to handle same-datatype comparisons. I think we are
ultimately going to have to address that issue more-or-less directly.

At least for btree indexes, it doesn't seem that the index mechanics
would have too big a problem with this --- as long as you are using the
proper comparison function, who cares whether the righthand side of the
comparison is the same datatype as the left? It might be harder for
other index types, but 99.9% of the problem is with btrees anyway.
I think if we implemented it only for btrees we'd still have a usable
solution.

I'm currently digging around to see how much of the rest of the backend
really cares about it ...

Wouldn't that logically lead to an "abstract" operator class to be
pointed to in the original indexes operator class?

In the concrete example, int8, a btree index is using int8_ops for
opcamid 403. This operator class is specifically set up with 5 operators
designed for int8 to int8 comparision. Assume we create a pseudo
operator class that only tells the operator names "<, <=, =, >= and >",
but does not already resolve them to the operators and thus the
comparision functions and allow int8_ops to be substituted with this
pseudo operator class. In the case the planner sees a type mismatch in
the comparision, and the index operator class points to our pseudo
opclass, then it could try "on the fly" to construct the operator class
if it finds cross type operators/functions for all 5 needed operators.

Not that I know much about the planner, more asking.
Jan

--
#================================================= =====================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================= = Ja******@Yahoo.com #
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #12

Tom Lane

Jan Wieck <Ja******@Yahoo.com> writes:

Tom Lane wrote:
Th big picture is that it doesn't work very well to assume that indexes
only need to handle same-datatype comparisons. I think we are
ultimately going to have to address that issue more-or-less directly.
Wouldn't that logically lead to an "abstract" operator class to be
pointed to in the original indexes operator class?

I've just posted a proposal in pgsql-hackers that attacks the problem
a little differently: put the cross-type operators directly into the
opclass.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #13

Jan Wieck

Tom Lane wrote:

Jan Wieck <Ja******@Yahoo.com> writes:
Tom Lane wrote:
Th big picture is that it doesn't work very well to assume that indexes
only need to handle same-datatype comparisons. I think we are
ultimately going to have to address that issue more-or-less directly.

Wouldn't that logically lead to an "abstract" operator class to be
pointed to in the original indexes operator class?

I've just posted a proposal in pgsql-hackers that attacks the problem
a little differently: put the cross-type operators directly into the
opclass.

I like that approach even better than mine. It needs less work during
the actual planning.
Jan

--
#================================================= =====================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================= = Ja******@Yahoo.com #
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #14

Tom Lane

Jan Wieck <Ja******@Yahoo.com> writes:

I like that approach even better than mine. It needs less work during
the actual planning.

Right. I don't think I will need to touch the planner at all, except
for recording the operator strategy numbers in indexscan plan nodes.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #15

int8 primary keys still not using index without manual JDBC driverpatch (7.4RC1)

Similar topics