I have been using tsearch2 for quite a while with a fair amount of
success.
The other day I was playiing around with a query, and randomly changed
a few things. I noticed a 10 times speedup and didn't know why. Both
queries return identical results.
The idea was to do a proximity search, where one word appears within 10
minutes of the other.
I'm not sure if this is a bug or something weird. Using postgres
7.4.2, and I think the 7.4.2 version of tsearch2.
The two queries:
Fast:
explain analyze select m1.message_date , m1.message_id from messages m1,
messages m2, to_tsquery('hal ey') q1, to_tsquery('bir thday') q2 where
m2.message_date between m1.message_date - '5 minutes'::inter val and
m1.message_date + '5 minutes'::inter val and m1.idxfti @@ q1 and
m2.idxfti @@ q2 and m1.message_id <> m2.message_id;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
---------------------------------------------------------------------
Nested Loop (cost=485403.85 ..549229077.83 rows=1651000057 width=12)
(actual time=190.952..2 21.859 rows=4 loops=1)
Join Filter: (("outer".messa ge_date >= ("inner".messag e_date -
'00:05:00'::int erval)) AND ("outer".messag e_date <=
("inner".messag e_date + '00:05:00'::int erval)) AND ("inner".messag e_id
<> "outer".message _id))
-> Nested Loop (cost=0.00..484 867.85 rows=121898 width=12) (actual
time=0.879..33. 273 rows=86 loops=1)
-> Function Scan on q2 (cost=0.00..12. 50 rows=1000 width=32)
(actual time=0.034..0.0 37 rows=1 loops=1)
-> Index Scan using fti_idx on messages m2
(cost=0.00..483 .33 rows=122 width=44) (actual time=0.831..32. 828
rows=86 loops=1)
Index Cond: (m2.idxfti @@ "outer".q2)
Filter: (m2.idxfti @@ "outer".q2)
-> Materialize (cost=485403.85 ..487158.83 rows=121898 width=12)
(actual time=0.189..1.4 77 rows=160 loops=86)
-> Nested Loop (cost=0.00..484 867.85 rows=121898 width=12)
(actual time=16.132..11 0.991 rows=160 loops=1)
-> Function Scan on q1 (cost=0.00..12. 50 rows=1000
width=32) (actual time=0.061..0.0 65 rows=1 loops=1)
-> Index Scan using fti_idx on messages m1
(cost=0.00..483 .33 rows=122 width=44) (actual time=16.048..99 .997
rows=160 loops=1)
Index Cond: (m1.idxfti @@ "outer".q1)
Filter: (m1.idxfti @@ "outer".q1)
Total runtime: 223.481 ms
(14 rows)
Slow:
explain analyze select m1.message_date , m1.message, m2.message_date
from messages m1, messages m2 where m2.message_date BETWEEN
m1.message_date - '5 minutes'::inter val and m1.message_date + '5
minutes'::inter val and m1.idxfti @@ to_tsquery('hal ey') and m2.idxfti
@@ to_tsquery('bir thday') and m1.message_id <> m2.message_id;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
---------------------------------------------------------------------
Nested Loop (cost=0.00..597 84.68 rows=1654 width=56) (actual
time=746.830..3 132.006 rows=4 loops=1)
Join Filter: (("inner".messa ge_date >= ("outer".messag e_date -
'00:05:00'::int erval)) AND ("inner".messag e_date <=
("outer".messag e_date + '00:05:00'::int erval)) AND ("outer".messag e_id
<> "inner".message _id))
-> Index Scan using fti_idx on messages m1 (cost=0.00..483 .33
rows=122 width=52) (actual time=8.770..69. 013 rows=160 loops=1)
Index Cond: (idxfti @@ '\'haley\''::ts query)
Filter: (idxfti @@ '\'haley\''::ts query)
-> Index Scan using fti_idx on messages m2 (cost=0.00..483 .33
rows=122 width=12) (actual time=0.112..18. 899 rows=86 loops=160)
Index Cond: (idxfti @@ '\'birthday\'': :tsquery)
Filter: (idxfti @@ '\'birthday\'': :tsquery)
Total runtime: 3132.665 ms
(9 rows)
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
http://archives.postgresql.org