473,385 Members | 1,325 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

tsearch2 queries faster than expected

I have been using tsearch2 for quite a while with a fair amount of
success.

The other day I was playiing around with a query, and randomly changed
a few things. I noticed a 10 times speedup and didn't know why. Both
queries return identical results.

The idea was to do a proximity search, where one word appears within 10
minutes of the other.

I'm not sure if this is a bug or something weird. Using postgres
7.4.2, and I think the 7.4.2 version of tsearch2.

The two queries:
Fast:
explain analyze select m1.message_date, m1.message_id from messages m1,
messages m2, to_tsquery('haley') q1, to_tsquery('birthday') q2 where
m2.message_date between m1.message_date - '5 minutes'::interval and
m1.message_date + '5 minutes'::interval and m1.idxfti @@ q1 and
m2.idxfti @@ q2 and m1.message_id <> m2.message_id;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
---------------------------------------------------------------------
Nested Loop (cost=485403.85..549229077.83 rows=1651000057 width=12)
(actual time=190.952..221.859 rows=4 loops=1)
Join Filter: (("outer".message_date >= ("inner".message_date -
'00:05:00'::interval)) AND ("outer".message_date <=
("inner".message_date + '00:05:00'::interval)) AND ("inner".message_id
<> "outer".message_id))
-> Nested Loop (cost=0.00..484867.85 rows=121898 width=12) (actual
time=0.879..33.273 rows=86 loops=1)
-> Function Scan on q2 (cost=0.00..12.50 rows=1000 width=32)
(actual time=0.034..0.037 rows=1 loops=1)
-> Index Scan using fti_idx on messages m2
(cost=0.00..483.33 rows=122 width=44) (actual time=0.831..32.828
rows=86 loops=1)
Index Cond: (m2.idxfti @@ "outer".q2)
Filter: (m2.idxfti @@ "outer".q2)
-> Materialize (cost=485403.85..487158.83 rows=121898 width=12)
(actual time=0.189..1.477 rows=160 loops=86)
-> Nested Loop (cost=0.00..484867.85 rows=121898 width=12)
(actual time=16.132..110.991 rows=160 loops=1)
-> Function Scan on q1 (cost=0.00..12.50 rows=1000
width=32) (actual time=0.061..0.065 rows=1 loops=1)
-> Index Scan using fti_idx on messages m1
(cost=0.00..483.33 rows=122 width=44) (actual time=16.048..99.997
rows=160 loops=1)
Index Cond: (m1.idxfti @@ "outer".q1)
Filter: (m1.idxfti @@ "outer".q1)
Total runtime: 223.481 ms
(14 rows)

Slow:

explain analyze select m1.message_date, m1.message, m2.message_date
from messages m1, messages m2 where m2.message_date BETWEEN
m1.message_date - '5 minutes'::interval and m1.message_date + '5
minutes'::interval and m1.idxfti @@ to_tsquery('haley') and m2.idxfti
@@ to_tsquery('birthday') and m1.message_id <> m2.message_id;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
---------------------------------------------------------------------
Nested Loop (cost=0.00..59784.68 rows=1654 width=56) (actual
time=746.830..3132.006 rows=4 loops=1)
Join Filter: (("inner".message_date >= ("outer".message_date -
'00:05:00'::interval)) AND ("inner".message_date <=
("outer".message_date + '00:05:00'::interval)) AND ("outer".message_id
<> "inner".message_id))
-> Index Scan using fti_idx on messages m1 (cost=0.00..483.33
rows=122 width=52) (actual time=8.770..69.013 rows=160 loops=1)
Index Cond: (idxfti @@ '\'haley\''::tsquery)
Filter: (idxfti @@ '\'haley\''::tsquery)
-> Index Scan using fti_idx on messages m2 (cost=0.00..483.33
rows=122 width=12) (actual time=0.112..18.899 rows=86 loops=160)
Index Cond: (idxfti @@ '\'birthday\''::tsquery)
Filter: (idxfti @@ '\'birthday\''::tsquery)
Total runtime: 3132.665 ms
(9 rows)
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #1
2 1537
Jeffrey Melloy <jm*****@visualdistortion.org> writes:
The other day I was playiing around with a query, and randomly changed
a few things. I noticed a 10 times speedup and didn't know why. Both
queries return identical results.


It looks like the planner's overestimate of the number of rows returned
by the function causes it to insert a Materialize step in the inside of
the nestloop join, so as to avoid recomputing the inner scan multiple
times. Which is a good idea. I wonder why it didn't do it in both
cases?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #2
Jeffrey Melloy <jm*****@visualdistortion.org> writes:
The other day I was playiing around with a query, and randomly changed
a few things. I noticed a 10 times speedup and didn't know why. Both
queries return identical results.


It looks like the planner's overestimate of the number of rows returned
by the function causes it to insert a Materialize step in the inside of
the nestloop join, so as to avoid recomputing the inner scan multiple
times. Which is a good idea. I wonder why it didn't do it in both
cases?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: Nigel J. Andrews | last post by:
This will be a little vague, it was last night and I can't now do the test in that db (see below) so can't give the exact wording. I seem to remember a report a little while ago about tsearch v2...
1
by: psql-mail | last post by:
I have applied the recent tsearch2 patch and recompiled the tsearch2 module but I am still experiencing the same backend crashes as I previously described. Thanks for any help, Mat GDB...
3
by: Diogo Biazus | last post by:
Hi, Is there any performance diference between the following SQL commands: SELECT * FROM documents WHERE content_ix @@ to_tsquery('word1&word2|word3'); SELECT * FROM documents WHERE...
9
by: Pavel Stehule | last post by:
Hello I try tsearch2 within czech environment. It is works fine, but I have two questions. 1. I have words "se", "ve" in my czech stop words. But I get this words in result. Why? Have I...
2
by: Fischer Ulrich | last post by:
Hi I have a problem with the restoring of a database which uses tsearch2. I made a backup as discribed in 'tsearch-v2-intro' on the tsearch2 page. Now i'm trying to restore it into a...
0
by: Jeffrey Melloy | last post by:
I have been using tsearch2 for quite a while with a fair amount of success. The other day I was playiing around with a query, and randomly changed a few things. I noticed a 10 times speedup and...
0
by: Markus Wollny | last post by:
Hi! Sorry to bother you, but I just don't know how to get tsearch2 configured correctly for my setup. I've got a 7.4.3 database-cluster initdb'ed with de_DE@euro as locale, the database is with...
3
by: Marcel Boscher | last post by:
Hello everybody, i tried to "J.U.S.T" install the FullTextSearchTool tsearch2 under the guidiance of : http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/...
2
by: Net Virtual Mailing Lists | last post by:
Hello, If I have a rule like this: CREATE OR REPLACE RULE sometable_update AS ON UPDATE TO table2 DO UPDATE cache SET updated_dt=NULL WHERE tablename='sometable'; CREATE OR REPLACE RULE...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.