473,854 Members | 1,516 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

website doc search is extremely SLOW

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

I submitted my search over two minutes ago. I just finished this
email to the list. The results have still not come back. I only
searched for:

SECURITY INVOKER

Perhaps this should be worked on?

Dante

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05
83 5982
On Dec 31, 2003, at 5:40 AM, Arjen van der Meijden wrote:
The main advantage of taking this out of your sql database is that it
runs on its own custom built storage system (and you could offload it
to another machine, like we did).
Btw, if you really need an "in database" solution, read back the
postings of Eric Ridge at 26-12-2003 20:54 on the hackers list (he's
working on integrating xapian in postgresql as a FTI)


Hi, that's me! I'm working it right now, and it's coming along really
well. I actually hope to have it integrated with Postgres' storage
subsystem by the end of the day and to have it returning useful
results.

eric
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #21
On Wed, 31 Dec 2003, Dave Cramer wrote:
Marc,

No it doesn't spider, it is a specialized tool for searching documents.

I'm curious, what value is there to being able to count the number of
url's ?


Sorry, that was just an example of the # of docs that have to be searched
through ... again, the *biggest* thing that is searched is the mailing
list archives, so without spidering, not sure how we'll be able to pull
that in ... ?
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #22
On Tue, 30 Dec 2003, Joshua D. Drake wrote:
Hello,

Why are we not using Tsearch2?


Because nobody has built it yet? Oleg's stuff is nice, but we want
something that we can build into the existing web sites, not a standalone
site ...

I keep searching the web hoping someone has come up with a 'tsearch2'
based search engine that does the spidering, but, unless its sitting right
in front of my eyes and I'm not seeing it, I haven't found it yet :(

Out of everything I've found so far, mnogosearch is one of the best ... I
just wish I could figure out where the bottleneck for it was, since, from
reading their docs, their method of storing the data doesn't appear to be
particularly off. I'm tempted to try their caching storage manager, and
getting away from SQL totally, but I *really* want to showcase PostgreSQL
on this :(

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 12 '05 #23
Marc G. Fournier wrote:
On Tue, 30 Dec 2003, Joshua D. Drake wrote:
Hello,

Why are we not using Tsearch2?


Because nobody has built it yet? Oleg's stuff is nice, but we want
something that we can build into the existing web sites, not a standalone
site ...

I keep searching the web hoping someone has come up with a 'tsearch2'
based search engine that does the spidering, but, unless its sitting right
in front of my eyes and I'm not seeing it, I haven't found it yet :(

Out of everything I've found so far, mnogosearch is one of the best ... I
just wish I could figure out where the bottleneck for it was, since, from
reading their docs, their method of storing the data doesn't appear to be
particularly off. I'm tempted to try their caching storage manager, and
getting away from SQL totally, but I *really* want to showcase PostgreSQL
on this :(


Well, PostgreSQL is being un-showcased in the current setup, that's for
sure. :-(

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.ph a.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #24
On Wed, 31 Dec 2003, Bruce Momjian wrote:
Out of everything I've found so far, mnogosearch is one of the best ... I
just wish I could figure out where the bottleneck for it was, since, from
reading their docs, their method of storing the data doesn't appear to be
particularly off. I'm tempted to try their caching storage manager, and
getting away from SQL totally, but I *really* want to showcase PostgreSQL
on this :(


Well, PostgreSQL is being un-showcased in the current setup, that's for
sure. :-(


Agreed ... I could install the MySQL backend, whichits designed for, and
advertise it as PostgreSQL? :)

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 12 '05 #25
Marc G. Fournier wrote:
On Wed, 31 Dec 2003, Bruce Momjian wrote:
Out of everything I've found so far, mnogosearch is one of the best ... I
just wish I could figure out where the bottleneck for it was, since, from
reading their docs, their method of storing the data doesn't appear to be
particularly off. I'm tempted to try their caching storage manager, and
getting away from SQL totally, but I *really* want to showcase PostgreSQL
on this :(


Well, PostgreSQL is being un-showcased in the current setup, that's for
sure. :-(


Agreed ... I could install the MySQL backend, whichits designed for, and
advertise it as PostgreSQL? :)


I would be curious to know if it is faster --- that would tell use if it
is tuned only for MySQL.

Have you tried CLUSTER? I think the MySQL ISAM files are
auto-clustered, and clustering is usually important for full-text
searches.
--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.ph a.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #26
On Wed, 2003-12-31 at 18:43, Bruce Momjian wrote:
Marc G. Fournier wrote:
On Tue, 30 Dec 2003, Joshua D. Drake wrote:
Hello,

Why are we not using Tsearch2?


Because nobody has built it yet? Oleg's stuff is nice, but we want
something that we can build into the existing web sites, not a standalone
site ...

I keep searching the web hoping someone has come up with a 'tsearch2'
based search engine that does the spidering, but, unless its sitting right
in front of my eyes and I'm not seeing it, I haven't found it yet :(

Out of everything I've found so far, mnogosearch is one of the best ... I
just wish I could figure out where the bottleneck for it was, since, from
reading their docs, their method of storing the data doesn't appear to be
particularly off. I'm tempted to try their caching storage manager, and
getting away from SQL totally, but I *really* want to showcase PostgreSQL
on this :(


Well, PostgreSQL is being un-showcased in the current setup, that's for
sure. :-(

In fact this is a very bad advertisement for postgres

--
Dave Cramer
519 939 0336
ICQ # 1467551
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #27
On Wed, 31 Dec 2003, Bruce Momjian wrote:
Marc G. Fournier wrote:
On Wed, 31 Dec 2003, Bruce Momjian wrote:
> Out of everything I've found so far, mnogosearch is one of the best ... I
> just wish I could figure out where the bottleneck for it was, since, from
> reading their docs, their method of storing the data doesn't appear to be
> particularly off. I'm tempted to try their caching storage manager, and
> getting away from SQL totally, but I *really* want to showcase PostgreSQL
> on this :(

Well, PostgreSQL is being un-showcased in the current setup, that's for
sure. :-(


Agreed ... I could install the MySQL backend, whichits designed for, and
advertise it as PostgreSQL? :)


I would be curious to know if it is faster --- that would tell use if it
is tuned only for MySQL.

Have you tried CLUSTER? I think the MySQL ISAM files are
auto-clustered, and clustering is usually important for full-text
searches.


Actually, check out http://www.mnogosearch.com ... the way they do the
indexing doesn't (at least, as far as I can tell) make use of full-text
searching. Simplistically, it appears to take the web page, sort -u all
the words it finds, removes all 'stopwords' (and, the, in, etc) from the
result, and then dumps the resultant words to the database, link'd to the
URL ...

We're using crc-multi, so a CRC value of the word is what is stored in the
database, not the actual word itself ... the '-multi' part spreads the
words across several tables depending on the word size, to keep total # of
rows down ...

The slow part on the database is finding those words, as can be seen by
the following search on 'SECURITY INVOKER':

Jan 1 01:21:05 pgsql74 postgres[59959]: [44-1] LOG: statement: SELECT ndict8.url_id,n dict8.intag FROM ndict8, url WHERE ndict8.word_id= 417851441 AND url.rec_id=ndic t8
..url_id
Jan 1 01:21:05 pgsql74 postgres[59959]: [44-2] AND ((url.url || '') LIKE 'http://archives.postgr esql.org/%%')
Jan 1 01:22:00 pgsql74 postgres[59959]: [45-1] LOG: duration: 55015.644 ms
Jan 1 01:22:00 pgsql74 postgres[59959]: [46-1] LOG: statement: SELECT ndict7.url_id,n dict7.intag FROM ndict7, url WHERE ndict7.word_id=-509484498 AND url.rec_id=ndic t
7.url_id
Jan 1 01:22:00 pgsql74 postgres[59959]: [46-2] AND ((url.url || '') LIKE 'http://archives.postgr esql.org/%%')
Jan 1 01:22:01 pgsql74 postgres[59959]: [47-1] LOG: duration: 1167.407 ms

ndict8 looks like:

186_archives=# select count(1) from ndict8;
count
---------
6320380
(1 row)
rchives=# select count(1) from ndict8 where word_id=4178514 41;
count
-------
15532
(1 row)

186_archives=# \d ndict8
Table "public.ndi ct8"
Column | Type | Modifiers
---------+---------+--------------------
url_id | integer | not null default 0
word_id | integer | not null default 0
intag | integer | not null default 0
Indexes:
"n8_url" btree (url_id)
"n8_word" btree (word_id)
and ndict7 looks like:

186_archives=# select count(1) from ndict7;
count
---------
8400616
(1 row)
186_archives=# select count(1) from ndict7 where word_id=-509484498;
count
-------
333
(1 row)

186_archives=# \d ndict7
Table "public.ndi ct7"
Column | Type | Modifiers
---------+---------+--------------------
url_id | integer | not null default 0
word_id | integer | not null default 0
intag | integer | not null default 0
Indexes:
"n7_url" btree (url_id)
"n7_word" btree (word_id)
The slowdown is the LIKE condition, as the ndict[78] word_id conditions
return near instantly when run individually, and when I run the 'url/LIKE'
condition, it takes "forever" ...

186_archives-# ;
count
--------
304811
(1 row)

186_archives=# explain analyze select count(1) from url where ((url.url || '') LIKE 'http://archives.postgr esql.org/%%');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Aggregate (cost=93962.19. .93962.19 rows=1 width=0) (actual time=5833.084.. 5833.088 rows=1 loops=1)
-> Seq Scan on url (cost=0.00..939 57.26 rows=1968 width=0) (actual time=0.069..438 7.378 rows=304811 loops=1)
Filter: ((url || ''::text) ~~ 'http://archives.postgr esql.org/%%'::text)
Total runtime: 5833.179 ms
(4 rows)
Hrmmm ... I don't have much (any) experience with tsearch, but could it be
used to replace the LIKE? Then again, when its returning 300k rows out of
393k, it wouldn't help much on the above, would it?

The full first query:

SELECT ndict8.url_id,n dict8.intag
FROM ndict8, url
WHERE ndict8.word_id= 417851441
AND url.rec_id=ndic t8.url_id
AND ((url.url || '') LIKE 'http://archives.postgr esql.org/%%');

returns 13415 rows, and explain analyze shows:

-----------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..301 99.82 rows=17 width=8) (actual time=0.312..145 9.504 rows=13415 loops=1)
-> Index Scan using n8_word on ndict8 (cost=0.00..126 16.09 rows=3219 width=8) (actual time=0.186..387 .673 rows=15532 loops=1)
Index Cond: (word_id = 417851441)
-> Index Scan using url_rec_id on url (cost=0.00..5.4 5 rows=1 width=4) (actual time=0.029..0.0 50 rows=1 loops=15532)
Index Cond: (url.rec_id = "outer".url _id)
Filter: ((url || ''::text) ~~ 'http://archives.postgr esql.org/%%'::text)
Total runtime: 1520.145 ms
(7 rows)

Which, of course, doesn't come close to matching what the duration showed
in the original, most likely due to catching :(

The server that the database is on rarely jumps abov a loadavg of 1, isn't
using any swap (after 77 days up, used swap is 0% -or- 17meg) and the
database itself is on a strip'd file system ...

I'm open to ideas/things to try here ...

The whole 'process' of the search shows the following times for the
queries:

pgsql74# grep 59959 /var/log/pgsql | grep duration
Jan 1 01:21:05 pgsql74 postgres[59959]: [39-1] LOG: duration: 25.663 ms
Jan 1 01:21:05 pgsql74 postgres[59959]: [41-1] LOG: duration: 4.376 ms
Jan 1 01:21:05 pgsql74 postgres[59959]: [43-1] LOG: duration: 11.179 ms
Jan 1 01:22:00 pgsql74 postgres[59959]: [45-1] LOG: duration: 55015.644 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [47-1] LOG: duration: 1167.407 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [49-1] LOG: duration: 7.886 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [51-1] LOG: duration: 1.516 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [53-1] LOG: duration: 3.539 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [55-1] LOG: duration: 109.890 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [57-1] LOG: duration: 15.582 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [59-1] LOG: duration: 1.631 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [61-1] LOG: duration: 0.838 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [63-1] LOG: duration: 2.148 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [65-1] LOG: duration: 0.810 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [67-1] LOG: duration: 1.211 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [69-1] LOG: duration: 0.798 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [71-1] LOG: duration: 0.861 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [73-1] LOG: duration: 0.748 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [75-1] LOG: duration: 0.555 ms

With the two >1000ms queries being the above two ndict[78] queries ...

Doing two subsequent searches, on "setuid functions" and "privilege
rules", just so that caching isn't involved, shows pretty much the same
distribution:

grep 61697 /var/log/pgsql | grep duration
Jan 1 01:44:25 pgsql74 postgres[61697]: [41-1] LOG: duration: 1.244 ms
Jan 1 01:44:25 pgsql74 postgres[61697]: [43-1] LOG: duration: 21.868 ms
Jan 1 01:44:25 pgsql74 postgres[61697]: [45-1] LOG: duration: 17.956 ms
Jan 1 01:44:29 pgsql74 postgres[61697]: [47-1] LOG: duration: 4452.326 ms
Jan 1 01:44:57 pgsql74 postgres[61697]: [49-1] LOG: duration: 27992.581 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [51-1] LOG: duration: 357.158 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [53-1] LOG: duration: 1.338 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [55-1] LOG: duration: 11.438 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [57-1] LOG: duration: 63.389 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [59-1] LOG: duration: 134.941 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [61-1] LOG: duration: 0.570 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [63-1] LOG: duration: 0.489 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [65-1] LOG: duration: 0.477 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [67-1] LOG: duration: 0.470 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [69-1] LOG: duration: 0.471 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [71-1] LOG: duration: 0.468 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [73-1] LOG: duration: 0.473 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [75-1] LOG: duration: 0.466 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [77-1] LOG: duration: 0.469 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [79-1] LOG: duration: 0.515 ms

and:

grep 61869 /var/log/pgsql | grep duration
Jan 1 01:46:50 pgsql74 postgres[61869]: [41-1] LOG: duration: 19.776 ms
Jan 1 01:46:50 pgsql74 postgres[61869]: [43-1] LOG: duration: 58.352 ms
Jan 1 01:46:50 pgsql74 postgres[61869]: [45-1] LOG: duration: 0.897 ms
Jan 1 01:46:53 pgsql74 postgres[61869]: [47-1] LOG: duration: 2859.331 ms
Jan 1 01:47:47 pgsql74 postgres[61869]: [49-1] LOG: duration: 54774.241 ms
Jan 1 01:47:47 pgsql74 postgres[61869]: [51-1] LOG: duration: 14.926 ms
Jan 1 01:47:47 pgsql74 postgres[61869]: [53-1] LOG: duration: 1.502 ms
Jan 1 01:47:47 pgsql74 postgres[61869]: [55-1] LOG: duration: 3.865 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [57-1] LOG: duration: 110.435 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [59-1] LOG: duration: 0.646 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [61-1] LOG: duration: 0.503 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [63-1] LOG: duration: 0.498 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [65-1] LOG: duration: 0.484 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [67-1] LOG: duration: 0.487 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [69-1] LOG: duration: 0.478 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [71-1] LOG: duration: 0.479 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [73-1] LOG: duration: 0.480 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [75-1] LOG: duration: 0.478 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [77-1] LOG: duration: 0.477 ms

So it looks like its those joins that are really killing things ...

Note that I haven't made many changes to the postgresql.conf file, so
there might be something really obvious I've overlooked, but here are the
uncommented ones (ie. ones I've modified from defaults):

tcpip_socket = true
max_connections = 512
shared_buffers = 10000 # min 16, at least max_connections *2, 8KB each
sort_mem = 10240 # min 64, size in KB
vacuum_mem = 81920 # min 1024, size in KB
syslog = 2 # range 0-2; 0=stdout; 1=both; 2=syslog
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
log_connections = true
log_duration = false
log_statement = false
lc_messages = 'C' # locale for system error message strings
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #28
On Wed, 31 Dec 2003, Dave Cramer wrote:
In fact this is a very bad advertisement for postgres


I just posted a very very long email of what I'm seeing in the logs, as
well as various query runs ... it may just be something that I need to
tune that I'm overlooking:( the queries aren't particularly complex :(

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #29
What is the locale of the database?

like won't use an index, unless it is 'C' locale, or you use 7.4 and
change the operator of the index.

Dave
On Wed, 2003-12-31 at 20:49, Marc G. Fournier wrote:
On Wed, 31 Dec 2003, Bruce Momjian wrote:
Marc G. Fournier wrote:
On Wed, 31 Dec 2003, Bruce Momjian wrote:

> > Out of everything I've found so far, mnogosearch is one of the best ... I
> > just wish I could figure out where the bottleneck for it was, since, from
> > reading their docs, their method of storing the data doesn't appear to be
> > particularly off. I'm tempted to try their caching storage manager, and
> > getting away from SQL totally, but I *really* want to showcase PostgreSQL
> > on this :(
>
> Well, PostgreSQL is being un-showcased in the current setup, that's for
> sure. :-(

Agreed ... I could install the MySQL backend, whichits designed for, and
advertise it as PostgreSQL? :)


I would be curious to know if it is faster --- that would tell use if it
is tuned only for MySQL.

Have you tried CLUSTER? I think the MySQL ISAM files are
auto-clustered, and clustering is usually important for full-text
searches.


Actually, check out http://www.mnogosearch.com ... the way they do the
indexing doesn't (at least, as far as I can tell) make use of full-text
searching. Simplistically, it appears to take the web page, sort -u all
the words it finds, removes all 'stopwords' (and, the, in, etc) from the
result, and then dumps the resultant words to the database, link'd to the
URL ...

We're using crc-multi, so a CRC value of the word is what is stored in the
database, not the actual word itself ... the '-multi' part spreads the
words across several tables depending on the word size, to keep total # of
rows down ...

The slow part on the database is finding those words, as can be seen by
the following search on 'SECURITY INVOKER':

Jan 1 01:21:05 pgsql74 postgres[59959]: [44-1] LOG: statement: SELECT ndict8.url_id,n dict8.intag FROM ndict8, url WHERE ndict8.word_id= 417851441 AND url.rec_id=ndic t8
.url_id
Jan 1 01:21:05 pgsql74 postgres[59959]: [44-2] AND ((url.url || '') LIKE 'http://archives.postgr esql.org/%%')
Jan 1 01:22:00 pgsql74 postgres[59959]: [45-1] LOG: duration: 55015.644 ms
Jan 1 01:22:00 pgsql74 postgres[59959]: [46-1] LOG: statement: SELECT ndict7.url_id,n dict7.intag FROM ndict7, url WHERE ndict7.word_id=-509484498 AND url.rec_id=ndic t
7.url_id
Jan 1 01:22:00 pgsql74 postgres[59959]: [46-2] AND ((url.url || '') LIKE 'http://archives.postgr esql.org/%%')
Jan 1 01:22:01 pgsql74 postgres[59959]: [47-1] LOG: duration: 1167.407 ms

ndict8 looks like:

186_archives=# select count(1) from ndict8;
count
---------
6320380
(1 row)
rchives=# select count(1) from ndict8 where word_id=4178514 41;
count
-------
15532
(1 row)

186_archives=# \d ndict8
Table "public.ndi ct8"
Column | Type | Modifiers
---------+---------+--------------------
url_id | integer | not null default 0
word_id | integer | not null default 0
intag | integer | not null default 0
Indexes:
"n8_url" btree (url_id)
"n8_word" btree (word_id)
and ndict7 looks like:

186_archives=# select count(1) from ndict7;
count
---------
8400616
(1 row)
186_archives=# select count(1) from ndict7 where word_id=-509484498;
count
-------
333
(1 row)

186_archives=# \d ndict7
Table "public.ndi ct7"
Column | Type | Modifiers
---------+---------+--------------------
url_id | integer | not null default 0
word_id | integer | not null default 0
intag | integer | not null default 0
Indexes:
"n7_url" btree (url_id)
"n7_word" btree (word_id)
The slowdown is the LIKE condition, as the ndict[78] word_id conditions
return near instantly when run individually, and when I run the 'url/LIKE'
condition, it takes "forever" ...

186_archives-# ;
count
--------
304811
(1 row)

186_archives=# explain analyze select count(1) from url where ((url.url || '') LIKE 'http://archives.postgr esql.org/%%');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Aggregate (cost=93962.19. .93962.19 rows=1 width=0) (actual time=5833.084.. 5833.088 rows=1 loops=1)
-> Seq Scan on url (cost=0.00..939 57.26 rows=1968 width=0) (actual time=0.069..438 7.378 rows=304811 loops=1)
Filter: ((url || ''::text) ~~ 'http://archives.postgr esql.org/%%'::text)
Total runtime: 5833.179 ms
(4 rows)
Hrmmm ... I don't have much (any) experience with tsearch, but could it be
used to replace the LIKE? Then again, when its returning 300k rows out of
393k, it wouldn't help much on the above, would it?

The full first query:

SELECT ndict8.url_id,n dict8.intag
FROM ndict8, url
WHERE ndict8.word_id= 417851441
AND url.rec_id=ndic t8.url_id
AND ((url.url || '') LIKE 'http://archives.postgr esql.org/%%');

returns 13415 rows, and explain analyze shows:

-----------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..301 99.82 rows=17 width=8) (actual time=0.312..145 9.504 rows=13415 loops=1)
-> Index Scan using n8_word on ndict8 (cost=0.00..126 16.09 rows=3219 width=8) (actual time=0.186..387 .673 rows=15532 loops=1)
Index Cond: (word_id = 417851441)
-> Index Scan using url_rec_id on url (cost=0.00..5.4 5 rows=1 width=4) (actual time=0.029..0.0 50 rows=1 loops=15532)
Index Cond: (url.rec_id = "outer".url _id)
Filter: ((url || ''::text) ~~ 'http://archives.postgr esql.org/%%'::text)
Total runtime: 1520.145 ms
(7 rows)

Which, of course, doesn't come close to matching what the duration showed
in the original, most likely due to catching :(

The server that the database is on rarely jumps abov a loadavg of 1, isn't
using any swap (after 77 days up, used swap is 0% -or- 17meg) and the
database itself is on a strip'd file system ...

I'm open to ideas/things to try here ...

The whole 'process' of the search shows the following times for the
queries:

pgsql74# grep 59959 /var/log/pgsql | grep duration
Jan 1 01:21:05 pgsql74 postgres[59959]: [39-1] LOG: duration: 25.663 ms
Jan 1 01:21:05 pgsql74 postgres[59959]: [41-1] LOG: duration: 4.376 ms
Jan 1 01:21:05 pgsql74 postgres[59959]: [43-1] LOG: duration: 11.179 ms
Jan 1 01:22:00 pgsql74 postgres[59959]: [45-1] LOG: duration: 55015.644 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [47-1] LOG: duration: 1167.407 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [49-1] LOG: duration: 7.886 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [51-1] LOG: duration: 1.516 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [53-1] LOG: duration: 3.539 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [55-1] LOG: duration: 109.890 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [57-1] LOG: duration: 15.582 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [59-1] LOG: duration: 1.631 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [61-1] LOG: duration: 0.838 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [63-1] LOG: duration: 2.148 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [65-1] LOG: duration: 0.810 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [67-1] LOG: duration: 1.211 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [69-1] LOG: duration: 0.798 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [71-1] LOG: duration: 0.861 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [73-1] LOG: duration: 0.748 ms
Jan 1 01:22:01 pgsql74 postgres[59959]: [75-1] LOG: duration: 0.555 ms

With the two >1000ms queries being the above two ndict[78] queries ...

Doing two subsequent searches, on "setuid functions" and "privilege
rules", just so that caching isn't involved, shows pretty much the same
distribution:

grep 61697 /var/log/pgsql | grep duration
Jan 1 01:44:25 pgsql74 postgres[61697]: [41-1] LOG: duration: 1.244 ms
Jan 1 01:44:25 pgsql74 postgres[61697]: [43-1] LOG: duration: 21.868 ms
Jan 1 01:44:25 pgsql74 postgres[61697]: [45-1] LOG: duration: 17.956 ms
Jan 1 01:44:29 pgsql74 postgres[61697]: [47-1] LOG: duration: 4452.326 ms
Jan 1 01:44:57 pgsql74 postgres[61697]: [49-1] LOG: duration: 27992.581 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [51-1] LOG: duration: 357.158 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [53-1] LOG: duration: 1.338 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [55-1] LOG: duration: 11.438 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [57-1] LOG: duration: 63.389 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [59-1] LOG: duration: 134.941 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [61-1] LOG: duration: 0.570 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [63-1] LOG: duration: 0.489 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [65-1] LOG: duration: 0.477 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [67-1] LOG: duration: 0.470 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [69-1] LOG: duration: 0.471 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [71-1] LOG: duration: 0.468 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [73-1] LOG: duration: 0.473 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [75-1] LOG: duration: 0.466 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [77-1] LOG: duration: 0.469 ms
Jan 1 01:44:58 pgsql74 postgres[61697]: [79-1] LOG: duration: 0.515 ms

and:

grep 61869 /var/log/pgsql | grep duration
Jan 1 01:46:50 pgsql74 postgres[61869]: [41-1] LOG: duration: 19.776 ms
Jan 1 01:46:50 pgsql74 postgres[61869]: [43-1] LOG: duration: 58.352 ms
Jan 1 01:46:50 pgsql74 postgres[61869]: [45-1] LOG: duration: 0.897 ms
Jan 1 01:46:53 pgsql74 postgres[61869]: [47-1] LOG: duration: 2859.331 ms
Jan 1 01:47:47 pgsql74 postgres[61869]: [49-1] LOG: duration: 54774.241 ms
Jan 1 01:47:47 pgsql74 postgres[61869]: [51-1] LOG: duration: 14.926 ms
Jan 1 01:47:47 pgsql74 postgres[61869]: [53-1] LOG: duration: 1.502 ms
Jan 1 01:47:47 pgsql74 postgres[61869]: [55-1] LOG: duration: 3.865 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [57-1] LOG: duration: 110.435 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [59-1] LOG: duration: 0.646 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [61-1] LOG: duration: 0.503 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [63-1] LOG: duration: 0.498 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [65-1] LOG: duration: 0.484 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [67-1] LOG: duration: 0.487 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [69-1] LOG: duration: 0.478 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [71-1] LOG: duration: 0.479 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [73-1] LOG: duration: 0.480 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [75-1] LOG: duration: 0.478 ms
Jan 1 01:47:48 pgsql74 postgres[61869]: [77-1] LOG: duration: 0.477 ms

So it looks like its those joins that are really killing things ...

Note that I haven't made many changes to the postgresql.conf file, so
there might be something really obvious I've overlooked, but here are the
uncommented ones (ie. ones I've modified from defaults):

tcpip_socket = true
max_connections = 512
shared_buffers = 10000 # min 16, at least max_connections *2, 8KB each
sort_mem = 10240 # min 64, size in KB
vacuum_mem = 81920 # min 1024, size in KB
syslog = 2 # range 0-2; 0=stdout; 1=both; 2=syslog
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
log_connections = true
log_duration = false
log_statement = false
lc_messages = 'C' # locale for system error message strings
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

--
Dave Cramer
519 939 0336
ICQ # 1467551
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 12 '05 #30

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
2239
by: bettina | last post by:
I'm re-programming my Website (www.coaster.ch) in PHP and I find it too slow (although I have ADSL). That's more or less how it functions: Here my tables: 'COASTERS' (code of coaster, code of country, etc...) 'COUNTRIES' (code of country, names of countries in different languages, code of continent) 'CONTINENTS' (code of continent, names of continents in different languages)
12
6490
by: Vjay77 | last post by:
Hi, I haven't posted any problem in quite a while now, but I came to the point that I really need to ask for help. I need to create an application which will search through .txt log file and find all lines where email from hotmail occured. All these emails need to be printed to list box on the form. Problem with code you'll see below, is that it takes long time to
4
4778
by: sommes | last post by:
It's only happen on .asp website, what's the problem? Thank you
2
2080
by: tmb | last post by:
When publishing a website the process is excrutiatingly slow - we are talking 3-4 minutes from when the actual transfer to the site has begun to completion. Apparently i'm not the only one experiencing this and searching on the net i found a possible solution: http://blog.n-technologies.be/CommentView.aspx?guid=3df1930b-9517-4b9b-9dd6-b59cbcbbe34d However, i don't quite understand how to actually apply the solution mentioned. I have...
2
5042
by: yasmike | last post by:
I am having a problem with my secure website on our internal network. The secure website is hosted on our Windows 2000 Server running IIS 5.0. If you try and access the website from a browser from another computer on the same internal network using its domain name, https://www.domainname .com, it is extremely slow. If you access it using its IP https://192.168.1.2 it is very quick. It is also quick for anyone outside the internal network to...
0
9901
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, weíll explore What is ONU, What Is Router, ONU & Routerís main usage, and What is the difference between ONU and Router. Letís take a closer look ! Part I. Meaning of...
0
9751
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10682
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10371
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5743
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5942
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4563
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4159
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3187
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.