Connecting Tech Pros Worldwide Forums | Help | Site Map

Index problem.... GIST (tsearch2)

Net Virtual Mailing Lists
Guest
 
Posts: n/a
#1: Nov 23 '05
Hello,

I have a table like this with some indexes as identified:


CREATE TABLE sometable (
data TEXT,
data_fti TSVECTOR,
category1 INTEGER,
category2 INTEGER,
category3 INTEGER
);

CREATE OR REPLACE FUNCTION is_null(anyelement) RETURNS BOOLEAN AS 'SELECT
$1 IS NULL;' LANGUAGE 'SQL' IMMUTABLE;
CREATE FUNCTION sometable_category1_idx ON sometable (category1);
CREATE FUNCTION sometable_category2_idx ON sometable (category2);
CREATE FUNCTION sometable_category3_idx ON sometable (category3);

CREATE FUNCTION sometable_data_fti_idx ON sometable USING gist(data_fti);


When I do a query like this, it uses sometable_category1_idx and is very
fast (it only returns a few rows out of several thousand)

SELECT * from sometable WHERE is_null(category1)='f';

When I do a query like this though it is slow because it insists on doing
the full-text index first:

SELECT * from sometable WHERE is_null(category1)='f' AND data_fti @@
to_tsquery('default', 'postgres');


How can I make this query first use the is_null index?... It strikes me
that this would almost always be faster then doing the full-text search
first, right?...


Thanks!

- Greg






---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


Greg Stark
Guest
 
Posts: n/a
#2: Nov 23 '05

re: Index problem.... GIST (tsearch2)



"Net Virtual Mailing Lists" <mailinglists@net-virtual.com> writes:
[color=blue]
> SELECT * from sometable WHERE is_null(category1)='f' AND data_fti @@
> to_tsquery('default', 'postgres');
>
> How can I make this query first use the is_null index?... It strikes me
> that this would almost always be faster then doing the full-text search
> first, right?...[/color]

Well that depends on how many are false versus how many the full-text search
finds.

In this circumstance postgres is trying to compare two unknowns. It doesn't
know how often is_nul() is going to return false, and it doesn't know how many
records the full text search will match.

8.0 will have statistics on how often is_null() will return false. But that
isn't really going to solve your problem since it still won't have any idea
how many rows the full text search will find.

I don't even know of anything you can do to influence the selectivity
estimates of the full text search.

--
greg


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Tom Lane
Guest
 
Posts: n/a
#3: Nov 23 '05

re: Index problem.... GIST (tsearch2)


Greg Stark <gsstark@mit.edu> writes:[color=blue]
> 8.0 will have statistics on how often is_null() will return false. But that
> isn't really going to solve your problem since it still won't have any idea
> how many rows the full text search will find.[/color]
[color=blue]
> I don't even know of anything you can do to influence the selectivity
> estimates of the full text search.[/color]

Write some code ;-) ?

Seriously, we desperately need some people thinking about how to do
statistics and selectivity estimates for these sorts of complex
indexable conditions. Even crude estimates would be better than none
at all, which is where we're at now. I think that as of 8.0 there is
sufficient infrastructure in place to collect datatype-specific stats
and do something with them --- but *what* to do is now the pressing
problem.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Tom Lane
Guest
 
Posts: n/a
#4: Nov 23 '05

re: Index problem.... GIST (tsearch2)


"Net Virtual Mailing Lists" <mailinglists@net-virtual.com> writes:[color=blue]
> I have a table like this with some indexes as identified:[/color]
[color=blue]
> CREATE OR REPLACE FUNCTION is_null(anyelement) RETURNS BOOLEAN AS 'SELECT
> $1 IS NULL;' LANGUAGE 'SQL' IMMUTABLE;
> CREATE FUNCTION sometable_category1_idx ON sometable (category1);
> CREATE FUNCTION sometable_category2_idx ON sometable (category2);
> CREATE FUNCTION sometable_category3_idx ON sometable (category3);[/color]
[color=blue]
> CREATE FUNCTION sometable_data_fti_idx ON sometable USING gist(data_fti);[/color]

[ raises eyebrow... ] It'd be easier to offer advice if you accurately
depicted what you'd done. The above isn't even syntactically valid.

I suppose what you meant is

CREATE INDEX sometable_category1_idx ON sometable (is_null(category1));

The main problem with this is that before 8.0 there are no stats on
functional indexes, and so the planner has no idea that the condition
is_null(category1)='f' is very selective. (If you looked at the
rowcount estimates from EXPLAIN this would be pretty obvious.)

What I would suggest is that you forget the functional indexes and use
partial indexes:

CREATE INDEX sometable_category1_idx ON sometable (category1)
WHERE category1 IS NOT NULL;

SELECT * from sometable WHERE category1 IS NOT NULL AND data_fti @@
to_tsquery('default', 'postgres');

7.4 has a reasonable chance of figuring out that the category1_idx
is the thing to use if you cast it this way.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Closed Thread