473,854 Members | 1,523 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

website doc search is extremely SLOW

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

I submitted my search over two minutes ago. I just finished this
email to the list. The results have still not come back. I only
searched for:

SECURITY INVOKER

Perhaps this should be worked on?

Dante

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05
83 5982
On Wed, 31 Dec 2003, Dave Cramer wrote:
What is the locale of the database?

like won't use an index, unless it is 'C' locale, or you use 7.4 and
change the operator of the index.


one thing I failed to note ... this is all running on 7.4 ... under 7.3,
it was much much worse :)
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 12 '05 #31
Marc G. Fournier wrote:
186_archives=# \d ndict7
Table "public.ndi ct7"
Column | Type | Modifiers
---------+---------+--------------------
url_id | integer | not null default 0
word_id | integer | not null default 0
intag | integer | not null default 0
Indexes:
"n7_url" btree (url_id)
"n7_word" btree (word_id)
The slowdown is the LIKE condition, as the ndict[78] word_id conditions
return near instantly when run individually, and when I run the 'url/LIKE'
condition, it takes "forever" ...


Does it help to CLUSTER url.url? Is your data being loaded in so
identical values used by LIKE are next to each other?

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.ph a.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #32
On Thu, 1 Jan 2004, Bruce Momjian wrote:
Marc G. Fournier wrote:
186_archives=# \d ndict7
Table "public.ndi ct7"
Column | Type | Modifiers
---------+---------+--------------------
url_id | integer | not null default 0
word_id | integer | not null default 0
intag | integer | not null default 0
Indexes:
"n7_url" btree (url_id)
"n7_word" btree (word_id)
The slowdown is the LIKE condition, as the ndict[78] word_id conditions
return near instantly when run individually, and when I run the 'url/LIKE'
condition, it takes "forever" ...


Does it help to CLUSTER url.url? Is your data being loaded in so
identical values used by LIKE are next to each other?


Just tried CLUSTER, and no difference, but ... chat'd with Dave on ICQ
this evening, and was thinking of something ... and it comes back to
something that I mentioned awhile back ...

Taking the ndict8 query that I originally presented, now post CLUSTER, and
an explain analyze looks like:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=13918.23. .26550.58 rows=17 width=8) (actual time=4053.403.. 83481.769 rows=13415 loops=1)
Hash Cond: ("outer".url _id = "inner".rec _id)
-> Index Scan using n8_word on ndict8 (cost=0.00..126 16.09 rows=3219 width=8) (actual time=113.645..7 9163.431 rows=15533 loops=1)
Index Cond: (word_id = 417851441)
-> Hash (cost=13913.31. .13913.31 rows=1968 width=4) (actual time=3920.597.. 3920.597 rows=0 loops=1)
-> Seq Scan on url (cost=0.00..139 13.31 rows=1968 width=4) (actual time=3.837..237 7.853 rows=304811 loops=1)
Filter: ((url || ''::text) ~~ 'http://archives.postgr esql.org/%%'::text)
Total runtime: 83578.572 ms
(8 rows)

Now, if I knock off the LIKE, so that I'm returning all rows from ndict8,
join'd to all the URLs that contain them, you get:

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..301 83.13 rows=3219 width=8) (actual time=0.299..121 7.116 rows=15533 loops=1)
-> Index Scan using n8_word on ndict8 (cost=0.00..126 16.09 rows=3219 width=8) (actual time=0.144..458 .891 rows=15533 loops=1)
Index Cond: (word_id = 417851441)
-> Index Scan using url_rec_id on url (cost=0.00..5.4 4 rows=1 width=4) (actual time=0.024..0.0 29 rows=1 loops=15533)
Index Cond: (url.rec_id = "outer".url _id)
Total runtime: 1286.647 ms
(6 rows)

So, there are 15333 URLs that contain that word ... now, what I want to
find out is how many of those 15333 URLs contain
'http://archives.postgr esql.org/%%', which is 13415 ...

The problem is that right now, we look at the LIKE first, giving us ~300k
rows, and then search through those for those who have the word matching
.... is there some way of reducing the priority of the LIKE part of the
query, as far as the planner is concerned, so that it will "resolve" the =
first, and then work the LIKE on the resultant set, instead of the other
way around? So that the query is only checking 15k records for the 13k
that match, instead of searching through 300k?

I'm guessing that the reason that the LIKE is taking precidence(sp?) is
because the URL table has less rows in it then ndict8?

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #33
Might be worth trying a larger statistics target (say 100), in the hope
that the planner then has better information to work with.

best wishes

Mark

Marc G. Fournier wrote:

he problem is that right now, we look at the LIKE first, giving us ~300k
rows, and then search through those for those who have the word matching
... is there some way of reducing the priority of the LIKE part of the
query, as far as the planner is concerned, so that it will "resolve" the =
first, and then work the LIKE on the resultant set, instead of the other
way around? So that the query is only checking 15k records for the 13k
that match, instead of searching through 300k?

I'm guessing that the reason that the LIKE is taking precidence(sp?) is
because the URL table has less rows in it then ndict8?

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #34
Marc G. Fournier wrote:

Now, if I knock off the LIKE, so that I'm returning all rows from ndict8,
join'd to all the URLs that contain them, you get:


Can't you build seperate databases for each domain you want to index?
Than you wouldn't need the like operator at all.

The like-operator doesn't seem to allow a very scalable production
environment. And besides that point, I don't really believe a "record
per word/document-couple" is very scalable (not in SQL, not anywhere).

Anyway, that doesn't help you much, perhaps decreasing the size of the
index-tables can help, are they with OIDs ? If so, wouldn't it help to
recreate them without, so you save yourselves 4 bytes per word-document
couple, therefore allowing it to fit in less pages and by that speeding
up the seqscans.

Are _all_ your queries with the like on the url? Wouldn't it help to
create an index on both the wordid and the urlid for ndict8?

Perhaps you can create your own 'host table' (which could be filled
using a trigger or a slightly adjusted indexer), and a foreign key from
your url table to that, so you can search on url.hostid = X (or a join
with that host table) instead of the like that is used now?

By the way, can a construction like (tablefield || '') ever use an index
in postgresql?

Best regards and good luck,

Arjen van der Meijden

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 12 '05 #35
On Thu, 1 Jan 2004, Arjen van der Meijden wrote:
Marc G. Fournier wrote:

Now, if I knock off the LIKE, so that I'm returning all rows from ndict8,
join'd to all the URLs that contain them, you get:
Can't you build seperate databases for each domain you want to index?
Than you wouldn't need the like operator at all.


First off, that would make searching across multiple domains difficult,
no?

Second, the LIKE is still required ... the LIKE allows the search to
"group" URLs ... for instance, if I wanted to just search on the docs, the
LIKE would look for all URLs that contain:

http://www.postgresql.org/docs/%%

whereas searching the whole site would be:

http://www.postgresql.org/%%
Anyway, that doesn't help you much, perhaps decreasing the size of the
index-tables can help, are they with OIDs ? If so, wouldn't it help to
recreate them without, so you save yourselves 4 bytes per word-document
couple, therefore allowing it to fit in less pages and by that speeding
up the seqscans.
This one I hadn't thought about ... for some reason, I thought that
WITHOUT OIDs was now the default ... looking at that one now ...
Are _all_ your queries with the like on the url? Wouldn't it help to
create an index on both the wordid and the urlid for ndict8?
as mentioned in a previous email, the schema for ndict8 is:

186_archives=# \d ndict8
Table "public.ndi ct8"
Column | Type | Modifiers
---------+---------+--------------------
url_id | integer | not null default 0
word_id | integer | not null default 0
intag | integer | not null default 0
Indexes:
"n8_url" btree (url_id)
"n8_word" btree (word_id)
By the way, can a construction like (tablefield || '') ever use an index
in postgresql?


again, as shown in a previous email, the index is being used for the LIKE
query ... the big problem as I see it is that the result set from the LIKE
is ~20x larger then the result set for the the = ... if there was some way
to telling the planner that going the LIKE route was the more expensive of
the two (even though table size seems to indicate the other way around), I
suspect that that would improve things also ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 12 '05 #36
Mark Kirkwood <ma****@paradis e.net.nz> writes:
Might be worth trying a larger statistics target (say 100), in the hope
that the planner then has better information to work with.


I concur with that suggestion. Looking at Marc's problem:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=13918.23. .26550.58 rows=17 width=8) (actual time=4053.403.. 83481.769 rows=13415 loops=1)
Hash Cond: ("outer".url _id = "inner".rec _id)
-> Index Scan using n8_word on ndict8 (cost=0.00..126 16.09 rows=3219 width=8) (actual time=113.645..7 9163.431 rows=15533 loops=1)
Index Cond: (word_id = 417851441)
-> Hash (cost=13913.31. .13913.31 rows=1968 width=4) (actual time=3920.597.. 3920.597 rows=0 loops=1)
-> Seq Scan on url (cost=0.00..139 13.31 rows=1968 width=4) (actual time=3.837..237 7.853 rows=304811 loops=1)
Filter: ((url || ''::text) ~~ 'http://archives.postgr esql.org/%%'::text)
Total runtime: 83578.572 ms
(8 rows)

the slowness is not really in the LIKE, it's in the indexscan on
ndict8 (79 out of 83 seconds spent there). The planner probably would
not have chosen this plan if it hadn't been off by a factor of 5 on the
rows estimate. So try knocking up the stats target for ndict8.word_id,
re-analyze, and see what happens.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #37
On Thu, 1 Jan 2004, Tom Lane wrote:
Mark Kirkwood <ma****@paradis e.net.nz> writes:
Might be worth trying a larger statistics target (say 100), in the hope
that the planner then has better information to work with.


I concur with that suggestion. Looking at Marc's problem:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=13918.23. .26550.58 rows=17 width=8) (actual time=4053.403.. 83481.769 rows=13415 loops=1)
Hash Cond: ("outer".url _id = "inner".rec _id)
-> Index Scan using n8_word on ndict8 (cost=0.00..126 16.09 rows=3219 width=8) (actual time=113.645..7 9163.431 rows=15533 loops=1)
Index Cond: (word_id = 417851441)
-> Hash (cost=13913.31. .13913.31 rows=1968 width=4) (actual time=3920.597.. 3920.597 rows=0 loops=1)
-> Seq Scan on url (cost=0.00..139 13.31 rows=1968 width=4) (actual time=3.837..237 7.853 rows=304811 loops=1)
Filter: ((url || ''::text) ~~ 'http://archives.postgr esql.org/%%'::text)
Total runtime: 83578.572 ms
(8 rows)

the slowness is not really in the LIKE, it's in the indexscan on ndict8
(79 out of 83 seconds spent there). The planner probably would not have
chosen this plan if it hadn't been off by a factor of 5 on the rows
estimate. So try knocking up the stats target for ndict8.word_id,
re-analyze, and see what happens.


'k, and for todays question ... how does one 'knock up the stats target'?
This is stuff I've not played with yet, so a URL to read up on this would
be nice, vs just how to do it?

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 12 '05 #38
"Marc G. Fournier" <sc*****@postgr esql.org> writes:
'k, and for todays question ... how does one 'knock up the stats target'?


ALTER TABLE [ ONLY ] name [ * ]
ALTER [ COLUMN ] column SET STATISTICS integer

The default is 10; try 100, or even 1000 (don't think it will let you
go higher than 1000).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #39
On Thu, 1 Jan 2004, Bruce Momjian wrote:
Marc G. Fournier wrote:
186_archives=# \d ndict7
Table "public.ndi ct7"
Column | Type | Modifiers
---------+---------+--------------------
url_id | integer | not null default 0
word_id | integer | not null default 0
intag | integer | not null default 0
Indexes:
"n7_url" btree (url_id)
"n7_word" btree (word_id)
The slowdown is the LIKE condition, as the ndict[78] word_id conditions
return near instantly when run individually, and when I run the 'url/LIKE'
condition, it takes "forever" ...


Does it help to CLUSTER url.url? Is your data being loaded in so
identical values used by LIKE are next to each other?


I'm loading up a MySQL 4.1 database right now, along side of a PgSQL 7.4
one WITHOUT OIDs ... should take several days to fully load, but it will
be interesting to compare them all ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #40

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
2239
by: bettina | last post by:
I'm re-programming my Website (www.coaster.ch) in PHP and I find it too slow (although I have ADSL). That's more or less how it functions: Here my tables: 'COASTERS' (code of coaster, code of country, etc...) 'COUNTRIES' (code of country, names of countries in different languages, code of continent) 'CONTINENTS' (code of continent, names of continents in different languages)
12
6490
by: Vjay77 | last post by:
Hi, I haven't posted any problem in quite a while now, but I came to the point that I really need to ask for help. I need to create an application which will search through .txt log file and find all lines where email from hotmail occured. All these emails need to be printed to list box on the form. Problem with code you'll see below, is that it takes long time to
4
4778
by: sommes | last post by:
It's only happen on .asp website, what's the problem? Thank you
2
2080
by: tmb | last post by:
When publishing a website the process is excrutiatingly slow - we are talking 3-4 minutes from when the actual transfer to the site has begun to completion. Apparently i'm not the only one experiencing this and searching on the net i found a possible solution: http://blog.n-technologies.be/CommentView.aspx?guid=3df1930b-9517-4b9b-9dd6-b59cbcbbe34d However, i don't quite understand how to actually apply the solution mentioned. I have...
2
5042
by: yasmike | last post by:
I am having a problem with my secure website on our internal network. The secure website is hosted on our Windows 2000 Server running IIS 5.0. If you try and access the website from a browser from another computer on the same internal network using its domain name, https://www.domainname .com, it is extremely slow. If you access it using its IP https://192.168.1.2 it is very quick. It is also quick for anyone outside the internal network to...
0
9751
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11027
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10682
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10761
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10371
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7082
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
4563
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4159
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3187
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.