indexing with lower(...) -> queries are not optimised very well - Please Help

Martin Hampl

Hi,

I am using PostgreSQL 7.4, but I did have the same problem with the
last version.

I indexed the column word (defined as varchar(64)) using lower(word).
If I use the following query, everything is fine, the index is used and
the query is executed very quickly:

select * from token where lower(word) = 'saxophone';

However, with EXPLAIN you get the following:

QUERY PLAN
------------------------------------------------------------------------
----------------
Index Scan using word_lower_idx on token (cost=0.00..98814.08
rows=25382 width=16)
Index Cond: (lower((word)::text) = 'saxophone'::text)
I indexed the same column without the use of lower(...). Now

explain select * from token where word = 'saxophone';

results in:
QUERY PLAN
------------------------------------------------------------------------
-----
Index Scan using word_idx on token (cost=0.00..6579.99 rows=1676
width=16)
Index Cond: ((word)::text = 'saxophone'::text)

Please note the difference in the estimated cost! Why is there such a
huge difference? Both queries almost exactly need the same time to
execute (all instances of 'saxophone' in the table are lower-case (this
is a coincidence)).

The Problem is, if I use this query as part of a more complicated query
the optimiser chooses a *very* bad query plan.

Please help me. What am I doing wrong? I would appreciate any help an
this very much.

Regards,
Martin.
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #1

Subscribe Post Reply

1572

CoL

hi,

Martin Hampl wrote, On 11/18/2003 7:24 PM:

Hi,

I am using PostgreSQL 7.4, but I did have the same problem with the
last version.

I indexed the column word (defined as varchar(64)) using lower(word).
If I use the following query, everything is fine, the index is used and
the query is executed very quickly:

select * from token where lower(word) = 'saxophone';

However, with EXPLAIN you get the following:

QUERY PLAN
------------------------------------------------------------------------
----------------
Index Scan using word_lower_idx on token (cost=0.00..98814.08
rows=25382 width=16)
Index Cond: (lower((word)::text) = 'saxophone'::text)
I indexed the same column without the use of lower(...). Now

explain select * from token where word = 'saxophone';

results in:
QUERY PLAN
------------------------------------------------------------------------
-----
Index Scan using word_idx on token (cost=0.00..6579.99 rows=1676
width=16)
Index Cond: ((word)::text = 'saxophone'::text)

Please note the difference in the estimated cost! Why is there such a
huge difference? Both queries almost exactly need the same time to
execute (all instances of 'saxophone' in the table are lower-case (this
is a coincidence)).

And after analyze token; ?

C.

Nov 12 '05 #2

Martin Hampl

Hi,

hi,

Martin Hampl wrote, On 11/18/2003 7:24 PM:
Hi,
I am using PostgreSQL 7.4, but I did have the same problem with the
last version.
I indexed the column word (defined as varchar(64)) using lower(word).
If I use the following query, everything is fine, the index is used
and the query is executed very quickly:
select * from token where lower(word) = 'saxophone';
However, with EXPLAIN you get the following:
QUERY PLAN
----------------------------------------------------------------------
-- ----------------
Index Scan using word_lower_idx on token (cost=0.00..98814.08
rows=25382 width=16)
Index Cond: (lower((word)::text) = 'saxophone'::text)
I indexed the same column without the use of lower(...). Now
explain select * from token where word = 'saxophone';
results in:
QUERY PLAN
----------------------------------------------------------------------
-- -----
Index Scan using word_idx on token (cost=0.00..6579.99 rows=1676
width=16)
Index Cond: ((word)::text = 'saxophone'::text)
Please note the difference in the estimated cost! Why is there such a
huge difference? Both queries almost exactly need the same time to
execute (all instances of 'saxophone' in the table are lower-case
(this is a coincidence)). And after analyze token; ?

No, doesn't work (I tried that of course). But this might be the
problem: how to analyse properly for the use of an index with
lower(...).

Thanks for the answer,
Martin.

C.

---------------------------(end of
broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to
ma*******@postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #3

Tom Lane

Martin Hampl <Ma**********@gmx.de> writes:

Index Scan using word_lower_idx on token (cost=0.00..98814.08
rows=25382 width=16)
Index Cond: (lower((word)::text) = 'saxophone'::text)

The rows estimate (and therefore also the cost estimate) is a complete
guess in this situation, because the system keeps no statistics about
the values of lower(word). Improving this situation is on the TODO list.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #4

Martin Hampl

Am 21.11.2003 um 06:54 schrieb Tom Lane:

Martin Hampl <Ma**********@gmx.de> writes:
Index Scan using word_lower_idx on token (cost=0.00..98814.08
rows=25382 width=16)
Index Cond: (lower((word)::text) = 'saxophone'::text)

The rows estimate (and therefore also the cost estimate) is a complete
guess in this situation, because the system keeps no statistics about
the values of lower(word). Improving this situation is on the TODO
list.

Thanks a lot for your answer.

Any idea about when this situation will be improved? Until then I have
to find a work around... any suggestions?

Regards,
Martin.
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #5

Martin Hampl

Hi,

Am 21.11.2003 um 06:54 schrieb Tom Lane:

Martin Hampl <Ma**********@gmx.de> writes:
Index Scan using word_lower_idx on token (cost=0.00..98814.08
rows=25382 width=16)
Index Cond: (lower((word)::text) = 'saxophone'::text)

The rows estimate (and therefore also the cost estimate) is a complete
guess in this situation, because the system keeps no statistics about
the values of lower(word). Improving this situation is on the TODO
list.

Any ideas when this will work? Is it difficult to implement?

(For those who don't recall the context: I asked about indexing lower
values of a varchar-coloumn ("create index xy_idx on
table(lower(coloumn));") and how the query planner uses this index).
Regards,
Martin.
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 22 '05 #6

Tom Lane

Martin Hampl <Ma**********@gmx.de> writes:

Am 21.11.2003 um 06:54 schrieb Tom Lane:
[ bad plan for use of a functional index ]

The rows estimate (and therefore also the cost estimate) is a complete
guess in this situation, because the system keeps no statistics about
the values of lower(word). Improving this situation is on the TODO
list.
Any ideas when this will work? Is it difficult to implement?

It strikes me as a small-but-not-trivial project. Possibly someone will
get it done for 7.5. You can find some discussion in the pghackers
archives, IIRC (look for threads about keeping statistics on functional
indexes).

This brings up a thought for Mark Cave-Ayland's project of breaking out
the datatype dependencies in ANALYZE: it would be wise to ensure that
the API for examine_attribute doesn't depend too much on the assumption
that the value(s) being analyzed are part of the relation proper. They
might be coming from a functional index, or even more likely being
computed on-the-fly based on the definition of a functional index.
Not sure what we'd want to change exactly, but it's something to think
about before the API gets set in stone.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 22 '05 #7

by: Roland Johann | last post by:

On my Windows Server 2003 Web Edition I have installed an application which offers some OLE Automation Objects. On my local system (W2k Prof) it works fine but on my server 2003 I have a big...

ASP / Active Server Pages

A little C# -> VB.Net conversion help?

by: Jim Hubbard | last post by:

I have some C# code that is supposed to wrap the defrag APIs and I am trying to convert it to VB.Net (2003). But, I keep having problems. The C# code is relatively short, so I'll post it...

.NET Framework

my hook can't work well,help me!

by: ayiiq180 | last post by:

my hook already in a dll and the handle is shared,but the hook cant work well,when i run the application,My mouse click the application's view,the hook work well,but when i click the other...

C / C++

Why are the "same" queries optimised differently ?

by: PaulR | last post by:

We have seen this a lot, but have just experienced the opposite to what we have always seen previously, so this has prompted me to ask a high level - why do we get this behaviour? If we re-write...

DB2 Database

ASP -> ASP.Net migration help

by: Mark Parter | last post by:

I have an ASP-based web application which broadly speaking, can add, edit and delete appointment items in an Exchange 2000 public calendar. As our organization is migrating to SharePoint 2003, I'd...

ASP.NET

C#->VB Snippet Translation Help Needed

by: Phil C. | last post by:

Hi. I'm having trouble translating a routine to alter validation at the client and the server. A portion of the C# code is: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>...

Visual Basic .NET

Multi table queries ACCESS- URGENT HELP

by: 663scott | last post by:

Hi I am pretty new to ACCESS. I have created some small databases previously. I need to run a simple query searching for a USERNAME which will gather information from five to ten tables containing...

Microsoft Access / VBA

Problem with queries. Call for help.

by: johnmay1248 | last post by:

I am having a problem with queries running in this code When I use the query "SELECT * FROM sample" the code runs and the data grid binds and shows the contents of the sample table. If I change...

ASP.NET

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

indexing with lower(...) -> queries are not optimised very well - Please Help

Similar topics