Index problem.... GIST (tsearch2)

Net Virtual Mailing Lists

Hello,

I have a table like this with some indexes as identified:
CREATE TABLE sometable (
data TEXT,
data_fti TSVECTOR,
category1 INTEGER,
category2 INTEGER,
category3 INTEGER
);

CREATE OR REPLACE FUNCTION is_null(anyelement) RETURNS BOOLEAN AS 'SELECT
$1 IS NULL;' LANGUAGE 'SQL' IMMUTABLE;
CREATE FUNCTION sometable_category1_idx ON sometable (category1);
CREATE FUNCTION sometable_category2_idx ON sometable (category2);
CREATE FUNCTION sometable_category3_idx ON sometable (category3);

CREATE FUNCTION sometable_data_fti_idx ON sometable USING gist(data_fti);
When I do a query like this, it uses sometable_category1_idx and is very
fast (it only returns a few rows out of several thousand)

SELECT * from sometable WHERE is_null(category1)='f';

When I do a query like this though it is slow because it insists on doing
the full-text index first:

SELECT * from sometable WHERE is_null(category1)='f' AND data_fti @@
to_tsquery('default', 'postgres');
How can I make this query first use the is_null index?... It strikes me
that this would almost always be faster then doing the full-text search
first, right?...
Thanks!

- Greg

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #1

Subscribe Post Reply

1573

Greg Stark

"Net Virtual Mailing Lists" <ma**********@net-virtual.com> writes:

SELECT * from sometable WHERE is_null(category1)='f' AND data_fti @@
to_tsquery('default', 'postgres');

How can I make this query first use the is_null index?... It strikes me
that this would almost always be faster then doing the full-text search
first, right?...

Well that depends on how many are false versus how many the full-text search
finds.

In this circumstance postgres is trying to compare two unknowns. It doesn't
know how often is_nul() is going to return false, and it doesn't know how many
records the full text search will match.

8.0 will have statistics on how often is_null() will return false. But that
isn't really going to solve your problem since it still won't have any idea
how many rows the full text search will find.

I don't even know of anything you can do to influence the selectivity
estimates of the full text search.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #2

Tom Lane

Greg Stark <gs*****@mit.edu> writes:

8.0 will have statistics on how often is_null() will return false. But that
isn't really going to solve your problem since it still won't have any idea
how many rows the full text search will find. I don't even know of anything you can do to influence the selectivity
estimates of the full text search.

Write some code ;-) ?

Seriously, we desperately need some people thinking about how to do
statistics and selectivity estimates for these sorts of complex
indexable conditions. Even crude estimates would be better than none
at all, which is where we're at now. I think that as of 8.0 there is
sufficient infrastructure in place to collect datatype-specific stats
and do something with them --- but *what* to do is now the pressing
problem.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #3

Tom Lane

"Net Virtual Mailing Lists" <ma**********@net-virtual.com> writes:

I have a table like this with some indexes as identified: CREATE OR REPLACE FUNCTION is_null(anyelement) RETURNS BOOLEAN AS 'SELECT
$1 IS NULL;' LANGUAGE 'SQL' IMMUTABLE;
CREATE FUNCTION sometable_category1_idx ON sometable (category1);
CREATE FUNCTION sometable_category2_idx ON sometable (category2);
CREATE FUNCTION sometable_category3_idx ON sometable (category3); CREATE FUNCTION sometable_data_fti_idx ON sometable USING gist(data_fti);

[ raises eyebrow... ] It'd be easier to offer advice if you accurately
depicted what you'd done. The above isn't even syntactically valid.

I suppose what you meant is

CREATE INDEX sometable_category1_idx ON sometable (is_null(category1));

The main problem with this is that before 8.0 there are no stats on
functional indexes, and so the planner has no idea that the condition
is_null(category1)='f' is very selective. (If you looked at the
rowcount estimates from EXPLAIN this would be pretty obvious.)

What I would suggest is that you forget the functional indexes and use
partial indexes:

CREATE INDEX sometable_category1_idx ON sometable (category1)
WHERE category1 IS NOT NULL;

SELECT * from sometable WHERE category1 IS NOT NULL AND data_fti @@
to_tsquery('default', 'postgres');

7.4 has a reasonable chance of figuring out that the category1_idx
is the thing to use if you cast it this way.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #4

Similar topics

Trying to create a GiST index in 7.3

by: Dmitry Tkach | last post by:

Hi, everybody! I am trying to create a custom GiST index in 7.3, but getting an error, that I don't know how to interpret: testdb=# create table gist_test (field int8); CREATE TABLE testdb=#...

PostgreSQL Database

tsearch2 in 7.4beta1 compile problem

by: Jeff Davis | last post by:

After installing PostgreSQL 7.4 beta 1 from source, I decided to install the /contrib module tsearch2. I cd to the tsearch2 directory and typed "make", however I get an error that...

PostgreSQL Database

Performance with different index types

by: Johann Uhrmann | last post by:

Hello, are there any experiences about the performance of indices with different data types. How do the performance of an index that consists of - an integer field - a varchar() field - a...

PostgreSQL Database

backend crashing despite tsearch2 patch

by: psql-mail | last post by:

I have applied the recent tsearch2 patch and recompiled the tsearch2 module but I am still experiencing the same backend crashes as I previously described. Thanks for any help, Mat GDB...

PostgreSQL Database

tsearch2 and gist index bloat

by: George Essig | last post by:

I have installed tsearch2 and have noticed that the gist index used to do searches grows and grows as I update rows, delete rows, or run VACUUM FULL ANALYZE. Below are some details: PostgreSQL...

PostgreSQL Database

tsearch2 installation

by: konf | last post by:

Well, now I tried to compile with tsearch2. I do in src direcotry: ../configure then change into contrib/tsearch2 and do make (as I read in manual) and I got:

PostgreSQL Database

gist index build produces corrupt result on first access totable.

by: Eric Davies | last post by:

We've implemented a 5D box data type and have implemented both RTree and GiST access methods under PostgresSQL 7.4 and PostgresSQL 7.4.1. The 5D box internally looks like: struct Box5D{ float...

PostgreSQL Database

abnormal data grow

by: Reynard Hilman | last post by:

Hi, I have been having this problem where the database size suddenly grows from the normal size of about 300Mb to 12Gb in one night. When I look up the table size, the biggest one is only 41Mb...

PostgreSQL Database

Which index can i use ?

by: Abandoned | last post by:

Hi.. I want to do index in postgresql & python. My table: id(int) | id2(int) | w(int) | d(int) My query: select id, w where id=x and id2=y (sometimes and d=z) I have too many insert and...

Python

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice