473,739 Members | 5,405 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

int8 primary keys still not using index without manual JDBC driverpatch (7.4RC1)

Hi all,

Just thought I'd mention that I really think this problem needs to be
fixed. I

I'm patching the 7.4RC1 JDBC drivers as we speak due to this optimiser
bug, and it's the third time I've had to do this. I would think this bug
causes quite a lot of people to evaluate postgres and decide it has awful
primary key performance! I love postgres, and hate to think that this
could be happening.

template1=# explain select * from lineitem where lineitemid=2684 5437;
QUERY PLAN
--------------------------------------------------------------
Seq Scan on lineitem (cost=0.00..826 85.91 rows=1 width=103)
Filter: (lineitemid = 26845437)
(2 rows)

template1=# explain select * from lineitem where lineitemid=2684 5437::int8;
QUERY PLAN
--------------------------------------------------------------------------------
Index Scan using lineitem_pkey on lineitem (cost=0.00..3.5 3 rows=1 width=103)
Index Cond: (lineitemid = 26845437::bigin t)
(2 rows)

I've noticed this is in the TODO :
Allow SELECT * FROM tab WHERE int2col = 4 to use int2col index, int8,
float4, numeric/decimal too [optimizer])

Too hard to fix before 7.4 final?

Regards,

Craig

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #1
14 2242
Any particular reason you can't just put the value in quotes and let
postgres determine the type?

On Fri, Nov 07, 2003 at 10:43:05AM +1100, Craig O'Shannessy wrote:
Hi all,

Just thought I'd mention that I really think this problem needs to be
fixed. I

I'm patching the 7.4RC1 JDBC drivers as we speak due to this optimiser
bug, and it's the third time I've had to do this. I would think this bug
causes quite a lot of people to evaluate postgres and decide it has awful
primary key performance! I love postgres, and hate to think that this
could be happening.

template1=# explain select * from lineitem where lineitemid=2684 5437;
QUERY PLAN
--------------------------------------------------------------
Seq Scan on lineitem (cost=0.00..826 85.91 rows=1 width=103)
Filter: (lineitemid = 26845437)
(2 rows)

template1=# explain select * from lineitem where lineitemid=2684 5437::int8;
QUERY PLAN
--------------------------------------------------------------------------------
Index Scan using lineitem_pkey on lineitem (cost=0.00..3.5 3 rows=1 width=103)
Index Cond: (lineitemid = 26845437::bigin t)
(2 rows)

I've noticed this is in the TODO :
Allow SELECT * FROM tab WHERE int2col = 4 to use int2col index, int8,
float4, numeric/decimal too [optimizer])

Too hard to fix before 7.4 final?

Regards,

Craig



---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
--
Martijn van Oosterhout <kl*****@svana. org> http://svana.org/kleptog/ "All that is needed for the forces of evil to triumph is for enough good
men to do nothing." - Edmond Burke
"The penalty good people pay for not being interested in politics is to be
governed by people worse than themselves." - Plato


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE/qusNY5Twig3Ge+Y RAuaRAJ9/fMV3pczwTcGOXN0 Ci61Q4o39awCgj3 Gl
y8mpmQesRJc+aBO 4E8GXUf4=
=Sjpt
-----END PGP SIGNATURE-----

Nov 12 '05 #2
I'm using EJB CMP (Enterprise Java Beans, Container Managed Persistence),
so the SQL is generated. I would think this is a common usage of
PostgreSQL, as a database for a modern EJB container. There are options
for fixing this (not including fixing postgres itself), IMHO the best is
patching the JDBC PreparedStateme nt code so that setLong() adds '::int8'.
The advantage here is that you can use hand coded prepared statements, as
well as auto CMP ones, and both will get the proper cast.

The real problem is that PostgreSQL out of the box is not really usable
for CMP! This really isn't good, and I'm always suprised that it's not
fixed. It was very luck we found the bug on the website when we were
evaluating PostgreSQL against Oracle, it wasn't easy to track down or fix,
and it causes truly horrible performance problems.

I spose you'd call it my pet peeve.

Craig
On Fri, 7 Nov 2003, Martijn van Oosterhout wrote:
Any particular reason you can't just put the value in quotes and let
postgres determine the type?

On Fri, Nov 07, 2003 at 10:43:05AM +1100, Craig O'Shannessy wrote:
Hi all,

Just thought I'd mention that I really think this problem needs to be
fixed. I

I'm patching the 7.4RC1 JDBC drivers as we speak due to this optimiser
bug, and it's the third time I've had to do this. I would think this bug
causes quite a lot of people to evaluate postgres and decide it has awful
primary key performance! I love postgres, and hate to think that this
could be happening.

template1=# explain select * from lineitem where lineitemid=2684 5437;
QUERY PLAN
--------------------------------------------------------------
Seq Scan on lineitem (cost=0.00..826 85.91 rows=1 width=103)
Filter: (lineitemid = 26845437)
(2 rows)

template1=# explain select * from lineitem where lineitemid=2684 5437::int8;
QUERY PLAN
--------------------------------------------------------------------------------
Index Scan using lineitem_pkey on lineitem (cost=0.00..3.5 3 rows=1 width=103)
Index Cond: (lineitemid = 26845437::bigin t)
(2 rows)

I've noticed this is in the TODO :
Allow SELECT * FROM tab WHERE int2col = 4 to use int2col index, int8,
float4, numeric/decimal too [optimizer])

Too hard to fix before 7.4 final?

Regards,

Craig

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #3
Hi Craig,

Craig O'Shannessy schrieb:
I'm using EJB CMP (Enterprise Java Beans, Container Managed Persistence),
so the SQL is generated. I would think this is a common usage of
PostgreSQL, as a database for a modern EJB container. There are options
for fixing this (not including fixing postgres itself), IMHO the best is
patching the JDBC PreparedStateme nt code so that setLong() adds '::int8'.
The advantage here is that you can use hand coded prepared statements, as
well as auto CMP ones, and both will get the proper cast.

The real problem is that PostgreSQL out of the box is not really usable
for CMP! This really isn't good, and I'm always suprised that it's not
fixed. It was very luck we found the bug on the website when we were
evaluating PostgreSQL against Oracle, it wasn't easy to track down or fix,
and it causes truly horrible performance problems.

I spose you'd call it my pet peeve.

I agree with you wholeheartly - it also bothers me why
postgresql can cast [0-9]+ to int4, but only
'[0-9]+' to int8 or int2, I really cannot see the
difference.

Any ideas where we have to look for the place to patch?

Regards
Tino
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #4
Tino Wildenhain wrote:
Hi Craig,

Craig O'Shannessy schrieb:
I'm using EJB CMP (Enterprise Java Beans, Container Managed Persistence),
so the SQL is generated. I would think this is a common usage of
PostgreSQL, as a database for a modern EJB container. There are options
for fixing this (not including fixing postgres itself), IMHO the best is
patching the JDBC PreparedStateme nt code so that setLong() adds '::int8'.
The advantage here is that you can use hand coded prepared statements, as
well as auto CMP ones, and both will get the proper cast.

The real problem is that PostgreSQL out of the box is not really usable
for CMP! This really isn't good, and I'm always suprised that it's not
fixed. It was very luck we found the bug on the website when we were
evaluating PostgreSQL against Oracle, it wasn't easy to track down or fix,
and it causes truly horrible performance problems.

I spose you'd call it my pet peeve.

I agree with you wholeheartly - it also bothers me why
postgresql can cast [0-9]+ to int4, but only
'[0-9]+' to int8 or int2, I really cannot see the
difference.


OK, the issue for int8, I think, is that when the lexer/scanner is
looking for tokens, it doesn't know how the token is going to be used,
so it doesn't know if [0-9]+ should be considered an int4 or int8 so it
makes it an int4. When it is a string, there is no type-casting done in
the lexer because it is just a string.

What I don't understand is why we can't just upgrade an int4 to int8 if
the value is greater than an int4, and why we can't just convert it
inside to int8 as needed. I am sure there is a reason, but I can't
remember it.

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.ph a.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #5
Tino Wildenhain <ti**@wildenhai n.de> writes:
I agree with you wholeheartly - it also bothers me why
postgresql can cast [0-9]+ to int4, but only
'[0-9]+' to int8 or int2, I really cannot see the
difference. Any ideas where we have to look for the place to patch?


Try reading the thousands of lines of discussion of this problem that
exist in the last several years of the pgsql-hackers archives.
Basically, we have found no solution that doesn't have side-effects
worse than what it fixes. Here's one recent example of a possible
solution that crashed and burned on takeoff:
http://archives.postgresql.org/pgsql...1/msg00468.php

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 12 '05 #6
Tom Lane wrote:
Tino Wildenhain <ti**@wildenhai n.de> writes:
I agree with you wholeheartly - it also bothers me why
postgresql can cast [0-9]+ to int4, but only
'[0-9]+' to int8 or int2, I really cannot see the
difference.

Any ideas where we have to look for the place to patch?


Try reading the thousands of lines of discussion of this problem that
exist in the last several years of the pgsql-hackers archives.
Basically, we have found no solution that doesn't have side-effects
worse than what it fixes. Here's one recent example of a possible
solution that crashed and burned on takeoff:
http://archives.postgresql.org/pgsql...1/msg00468.php


I read that URL and it covered the main stuff. This part was
particularly interesting:

I am wondering about adding some notion of "conversion distance"
associated with casts, and preferring choices that require a smaller
conversion distance; perhaps this could replace the concept of
"preferred type", too. But again I don't have a specific proposal to
make. Any thoughts?

The test case that was actually in the regression tests was

select to_hex(256*256* 256 - 1) AS "ffffff";
! ERROR: Function to_hex(smallint ) does not exist
! Unable to identify a function that satisfies the given argument types
! You may need to add explicit typecasts

Even had the parser resolved the overloaded to_hex call, this test would
have failed, because int2 multiplication overflows:

regression=# select 256::int2*256:: int2*256::int2;
?column?
----------
0
(1 row)

I am thinking that it might be good to eliminate all the basic
arithmetic operators on int2, so that you get int4 as the minimum
width for arithmetic. But this cannot work unless we have some concept
like conversion distance, or the parser will just fail to make a choice
between int4, int8, etc alternatives.

I think your idea of conversion distance is required for any working
solution --- without it, things seem impossible --- you have an int4
value and have function for int2 and int8 --- you need to lean in a
specific direction and can't just give up.

We can try removing most int2 functions and see if that makes such
conversions much easier, or try casting incoming constants to int2 to
see what happens. I see float4/float8/numeric as similar, though that
has precision issues that I am not sure how to address --- when can you
tell if the user wants full precision?

Sorry I haven't focused on this issue but I am ready to do so if I can
be of help. How can we set up some tests of these ideas?

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.ph a.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #7
Bruce Momjian <pg***@candle.p ha.pa.us> writes:
We can try removing most int2 functions and see if that makes such
conversions much easier, or try casting incoming constants to int2 to
see what happens.


I already did that --- that was exactly the substance of the tests I was
reporting in that message.

I have been thinking lately that the fundamental approach is wrong
anyway. Basically the idea was to make the world safe for
single-datatype index handling by removing all the cross-type comparison
operators. The reason

SELECT ... WHERE int8col = 42

isn't indexable is that the = operator is int84eq, which is not to be
found in the set of operators associated with an index on int8. What
we were thinking was that if we didn't have int84eq then the parser
would be forced to promote the 42 to int8, and then the comparison using
int8eq would be recognized as indexable.

I think this might actually be workable for int8, but it's not going to
work for int2 without changing the initial typing of small integer
constants, and we already know that that opens a Pandora's box of other
problems.

But quite aside from the semantic difficulties of rejiggering all that
stuff, it's going to break other parts of the optimizer if we do it.
In particular it will interfere with handling of mergejoins and
recognizing transitive equality. For example consider

SELECT ... WHERE a.int8col = b.int4col AND b.int4col = 42;

Currently we are able to deduce a.int8col = 42 (where the operator
is int84eq). If we remove int84eq then the output of the parser for
this example will look like

SELECT ... WHERE a.int8col = b.int4col::int8 AND b.int4col = 42;

and the transitive equality will not be recognized because
b.int4col::int8 is not the same expression as b.int4col.

So I'm currently thinking we'd be better off not to try to eliminate
the cross-type comparison operators. Instead we need some solution
that is narrowly focused on the problem of making a non-indexable
comparison indexable, by converting a comparison value of the wrong
datatype into the right datatype locally to the indexscan plan
generation code. I posted some speculation about that here:
http://archives.postgresql.org/pgsql...9/msg00983.php

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #8
I said:
So I'm currently thinking we'd be better off not to try to eliminate
the cross-type comparison operators. Instead we need some solution
that is narrowly focused on the problem of making a non-indexable
comparison indexable, by converting a comparison value of the wrong
datatype into the right datatype locally to the indexscan plan
generation code.


BTW, plan C would be to attack the problem head-on by allowing index
opclasses to include cross-datatype operators. This might be the
cleanest solution in the long run, but it seems likely to be a lot of
work and could force us to break existing user-defined operator classes.
I think everyone has shied away from that without much thought, but
in principle at least we could probably do it. (Say, extend pg_amop
and pg_amproc so that the datatype of the other operand becomes part
of the key.)

We are now well outside the charter of pgsql-general, so please redirect
any followup discussion to pgsql-hackers ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #9
Tom Lane wrote:
Bruce Momjian <pg***@candle.p ha.pa.us> writes:
We can try removing most int2 functions and see if that makes such
conversions much easier, or try casting incoming constants to int2 to
see what happens.
I already did that --- that was exactly the substance of the tests I was
reporting in that message.

I have been thinking lately that the fundamental approach is wrong
anyway. Basically the idea was to make the world safe for
single-datatype index handling by removing all the cross-type comparison
operators. The reason

SELECT ... WHERE int8col = 42

isn't indexable is that the = operator is int84eq, which is not to be
found in the set of operators associated with an index on int8. What
we were thinking was that if we didn't have int84eq then the parser
would be forced to promote the 42 to int8, and then the comparison using
int8eq would be recognized as indexable.


Could we not always promote int4 to int8 for indexing purposes? I
realize OID has issues, though, as you mention in that URL:

This algorithm was wrong on both practical and theoretical levels;
in the first place it's not very helpful to only be able to handle
binary-compatible transformations , and in the second place there isn't
any good guarantee that it's not changing the semantics when it replaces
the operator. For instance int4 < and oid < do not act the same.
Depending on equality of operator names was a bad idea even then, and
would be quite unworkable now in the world of schema search paths.

Could we just promote int4 constants to int8 always? I just checked and
2^32-1 is already promoted to int8:

select 4294967295;

so the funny thing is that:

SELECT ... WHERE int8col = 4294967296;

already uses the int8 index.

I think the complex case you mentioned was oid. Let's look at the
various possible constant comparisons against an oid column:

-1 This constant would match no oid, so we could just allow the
sequential scan. I don't think anyone would expect good behavior for
such a comparison.
1 This could be promoted to oid cleanly.

2^31+1 This will come in a int8, so we can just downcast to oid
automatically.

I know your case was "<" comparison. It would be:

SELECT ... WHERE oidcol < -1;
SELECT ... WHERE oidcol < 1;
SELECT ... WHERE oidcol < 2147483649; 2^31+1

These all seem to work, I think. -1 will not use an index, which is OK.

I am concerned about having to add catalog maintenance for every index
case, which seems it could be a lot.

Here is my logic. I am having trouble getting the big picture on this:

int2 fits in the int4 range
int4 fits in the int8 range

oid fits only in the int8 range, some oids fit in int4

This means a valid oid could come in as int4 or int8.

I realize this requires hard-coded comparisons to C include defines to
get the maxium for each type. I know this breaks our type-neutral
style, but in this case, it seems it might be the cleanest way ---
abstracting this out into a table seems too hard.

Now for int2-based indexes. Can't we just downcast constants to int2 if
they fit in the int2 valid range?
I think this might actually be workable for int8, but it's not going to
work for int2 without changing the initial typing of small integer
constants, and we already know that that opens a Pandora's box of other
problems.

But quite aside from the semantic difficulties of rejiggering all that
stuff, it's going to break other parts of the optimizer if we do it.
In particular it will interfere with handling of mergejoins and
recognizing transitive equality. For example consider

SELECT ... WHERE a.int8col = b.int4col AND b.int4col = 42;

Currently we are able to deduce a.int8col = 42 (where the operator
is int84eq). If we remove int84eq then the output of the parser for
this example will look like

SELECT ... WHERE a.int8col = b.int4col::int8 AND b.int4col = 42;

and the transitive equality will not be recognized because
b.int4col::int8 is not the same expression as b.int4col.
Seems we should keep those cross-type comparisons around for col op col
comparisons, at least, as well as internal optimizer use as you
described.
So I'm currently thinking we'd be better off not to try to eliminate
the cross-type comparison operators. Instead we need some solution
that is narrowly focused on the problem of making a non-indexable
comparison indexable, by converting a comparison value of the wrong
datatype into the right datatype locally to the indexscan plan
generation code. I posted some speculation about that here:
http://archives.postgresql.org/pgsql...9/msg00983.php


Agreed.

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.ph a.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
5350
by: Philip | last post by:
Hey all, (Access 2000) I've been having a horror story with this design problem. My Database is Structured like This: AUTHORS, BOOKS, PAGES. Those are the Tables and each Item in each table needs a unique ID# based on its context. Primary Keys AUTHORS = AuthorID - NO Duplicates
9
3909
by: sonal | last post by:
Hi all, I hv started with python just recently... and have been assigned to make an utility which would be used for data validations... In short we take up various comma separated data files for eg: area.txt, school.txt, students.txt.... and so on (ok?!?) now, 1. area code used in the school.txt must be defined in the area.txt (Primary key in area => area_code defined in area.txt & Foreign key on school => area_code defined in...
115
6252
by: LurfysMa | last post by:
Most of the reference books recommend autonum primary keys, but the Access help says that any unique keys will work. What are the tradeoffs? I have several tables that have unique fields. Can I use them as primary keys or should I define an autonum primary key? One table has information about the 50 states in the US. The table looks like this:
5
5360
by: Monty M. | last post by:
Hello; I was wondering if anyone can assist me with this problem. Here are the tools I am using: Language: C# Database: MS SQL Server 2000 Application: Visual Studio 2005 1. I have a table whose primary key is a varchar data type.
1
933
by: hirak chatterjee | last post by:
Hi, i want to find interrelations among all the primary keys in all the tables. for e.g== the output will be .....is the primary key of table............ .....is the primary key of table..... and foreign key in table...... etc...like this. i am using jdbc and oracle. i will take care of the exact format of the output,but someone please help me in wrting queries that will give me exactly the results i stated
2
9164
by: Danny | last post by:
Hello, We imported a bunch of tables from a database and realized that the primary keys weren't copied to the destination db. In order to re- create the keys, we need to know which tables have them. Is there a command that I can use (on the source db) to find out which tables contain primary keys? The db has hundreds of tables and I'd rather not go through each one to see which has a primary key. Also, for future reference, is there a...
4
3831
by: Peter | last post by:
I am interested in informed feedback on the use of Constraints, Primary Keys and Unique. The following SQL statement creates a Bands tables for a database of bookings Bands into Venues, where the rule of the business is that only band plays on the one night. The SQL statement prevents a Band name being repeated (as it is Unique). Similar statement for the Venues. CREATE TABLE Bands (BandID varchar(5) CONSTRAINT BandID PRIMARY KEY, Band...
0
8792
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9479
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9209
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8215
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6054
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4826
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3280
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2748
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2193
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.