473,838 Members | 1,624 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

website doc search is extremely SLOW

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

I submitted my search over two minutes ago. I just finished this
email to the list. The results have still not come back. I only
searched for:

SECURITY INVOKER

Perhaps this should be worked on?

Dante

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05
83 5977
>


A content management system is long overdue I think, do you have any
good recommendations ?

Bricolage
--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-222-2783 - jd@commandpromp t.com - http://www.commandprompt.com

Nov 12 '05 #61
On Sat, 3 Jan 2004, Dave Cramer wrote:
On Sat, 2004-01-03 at 09:49, Oleg Bartunov wrote:
Hi there,

I hoped to release pilot version of www.pgsql.ru with full text search
of postgresql related resources (currently we've crawled 27 sites, about
340K pages) but we started celebration NY too early :)
Expect it tomorrow or monday.

Fantastic!


I'm just working on web interface to give people possibility to choose
collection of documents to search, for example: 7.1 documentation, 7.4 documentation



I'm not sure is there are some kind of CMS on www.postgresql.org, but
if it's there the best way is to embed tsearch2 into CMS. You'll have
fast, incremental search engine. There are many users of tsearch2 and I think
embedding isn't very difficult problem. I estimate there are maximum
10-20K pages of documentation, nothing for tsearch2.


A content management system is long overdue I think, do you have any
good recommendations ?


*.postgresql.or g likes PHP, so let's see in Google for 'php cms' :)


Regards,
Oleg
_______________ _______________ _______________ _______________ _
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: ol**@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #62
On Thu, 1 Jan 2004, Marc G. Fournier wrote:
On Thu, 1 Jan 2004, Bruce Momjian wrote:
Marc G. Fournier wrote:
186_archives=# \d ndict7
Table "public.ndi ct7"
Column | Type | Modifiers
---------+---------+--------------------
url_id | integer | not null default 0
word_id | integer | not null default 0
intag | integer | not null default 0
Indexes:
"n7_url" btree (url_id)
"n7_word" btree (word_id)
The slowdown is the LIKE condition, as the ndict[78] word_id conditions
return near instantly when run individually, and when I run the 'url/LIKE'
condition, it takes "forever" ...
Does it help to CLUSTER url.url? Is your data being loaded in so
identical values used by LIKE are next to each other?


Just tried CLUSTER, and no difference, but ... chat'd with Dave on ICQ
this evening, and was thinking of something ... and it comes back to
something that I mentioned awhile back ...

Taking the ndict8 query that I originally presented, now post CLUSTER, and
an explain analyze looks like:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=13918.23. .26550.58 rows=17 width=8) (actual time=4053.403.. 83481.769 rows=13415 loops=1)
Hash Cond: ("outer".url _id = "inner".rec _id)
-> Index Scan using n8_word on ndict8 (cost=0.00..126 16.09 rows=3219 width=8) (actual time=113.645..7 9163.431 rows=15533 loops=1)
Index Cond: (word_id = 417851441)
-> Hash (cost=13913.31. .13913.31 rows=1968 width=4) (actual time=3920.597.. 3920.597 rows=0 loops=1)
-> Seq Scan on url (cost=0.00..139 13.31 rows=1968 width=4) (actual time=3.837..237 7.853 rows=304811 loops=1)
Filter: ((url || ''::text) ~~ 'http://archives.postgr esql.org/%%'::text)
Total runtime: 83578.572 ms
(8 rows)

Now, if I knock off the LIKE, so that I'm returning all rows from ndict8,
join'd to all the URLs that contain them, you get:

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..301 83.13 rows=3219 width=8) (actual time=0.299..121 7.116 rows=15533 loops=1)
-> Index Scan using n8_word on ndict8 (cost=0.00..126 16.09 rows=3219 width=8) (actual time=0.144..458 .891 rows=15533 loops=1)
Index Cond: (word_id = 417851441)
-> Index Scan using url_rec_id on url (cost=0.00..5.4 4 rows=1 width=4) (actual time=0.024..0.0 29 rows=1 loops=15533)
Index Cond: (url.rec_id = "outer".url _id)
Total runtime: 1286.647 ms
(6 rows)

So, there are 15333 URLs that contain that word ... now, what I want to
find out is how many of those 15333 URLs contain
'http://archives.postgr esql.org/%%', which is 13415 ...


what's the need for such query ? Are you trying to restrict search to
archives ? Why not just have site attribute for document and use simple
join ?

The problem is that right now, we look at the LIKE first, giving us ~300k
rows, and then search through those for those who have the word matching
... is there some way of reducing the priority of the LIKE part of the
query, as far as the planner is concerned, so that it will "resolve" the =
first, and then work the LIKE on the resultant set, instead of the other
way around? So that the query is only checking 15k records for the 13k
that match, instead of searching through 300k?

I'm guessing that the reason that the LIKE is taking precidence(sp?) is
because the URL table has less rows in it then ndict8?

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend


Regards,
Oleg
_______________ _______________ _______________ _______________ _
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: ol**@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #63
Hi there,

I hoped to release pilot version of www.pgsql.ru with full text search
of postgresql related resources (currently we've crawled 27 sites, about
340K pages) but we started celebration NY too early :)
Expect it tomorrow or monday.

We have developed many search engines, some of them are based on
PostgreSQL like tsearch2, OpenFTS and are best to be embedded into
CMS for true online updating. Their power comes from access to documents attributes
stored in database, so one could perform categorized search, restricted
search (different rights, different document status, etc). The most close
example would be search on archive of mailing lists, which should be
embed such kind of full text search engine. fts.postgresql. org in his best
time was one of implementation of such system. This is what I hope to have on
www.pgsql.ru, if Marc will give us access to mailing list archives :)

Another search engines we use are based on standard technology of
inverted indices, they are best suited for indexing of semi-static collections
od documents. We've full-fledged crawler, indexer and searcher. Online
update of inverted indices is rather complex technological task and I'm
not sure there are databases which have true online update. On www.pgsql.ru
we use GTSearch which is generic text search engine we developed for
vertical searches (for example, postgresql related resources). It has
common set of features like phrase search, proximity ranking, site search,
morphology, stemming support, cached documents, spell checking, similar search
etc.

I see several separate tasks:

* official documents (documentation mostly)

I'm not sure is there are some kind of CMS on www.postgresql.org, but
if it's there the best way is to embed tsearch2 into CMS. You'll have
fast, incremental search engine. There are many users of tsearch2 and I think
embedding isn't very difficult problem. I estimate there are maximum
10-20K pages of documentation, nothing for tsearch2.

* mailing lists archive

mailing lists archive, which is constantly growing and
also required incremental update, so tsearch2 also needed. Nice hardware
like Marc has described would be more than enough. We have moderate dual
PIII 1Ggz server and I hope it would be enough.

* postgresql related resources

I think this task should be solved using standard technique - crawler,
indexer, searcher. Due to limited number of sites it's possible to
keep indices more actual than major search engines, for example
crawl once a week. This is what we currently have on pgsql.ru because
it doesn't require any permissions and interaction with sites officials.
Regards,
Oleg
On Wed, 31 Dec 2003, Marc G. Fournier wrote:
On Tue, 30 Dec 2003, Joshua D. Drake wrote:
Hello,

Why are we not using Tsearch2?


Because nobody has built it yet? Oleg's stuff is nice, but we want
something that we can build into the existing web sites, not a standalone
site ...

I keep searching the web hoping someone has come up with a 'tsearch2'
based search engine that does the spidering, but, unless its sitting right
in front of my eyes and I'm not seeing it, I haven't found it yet :(

Out of everything I've found so far, mnogosearch is one of the best ... I
just wish I could figure out where the bottleneck for it was, since, from
reading their docs, their method of storing the data doesn't appear to be
particularly off. I'm tempted to try their caching storage manager, and
getting away from SQL totally, but I *really* want to showcase PostgreSQL
on this :(

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)


Regards,
Oleg
_______________ _______________ _______________ _______________ _
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: ol**@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #64
On Sat, 2004-01-03 at 09:49, Oleg Bartunov wrote:
Hi there,

I hoped to release pilot version of www.pgsql.ru with full text search
of postgresql related resources (currently we've crawled 27 sites, about
340K pages) but we started celebration NY too early :)
Expect it tomorrow or monday. Fantastic!
We have developed many search engines, some of them are based on
PostgreSQL like tsearch2, OpenFTS and are best to be embedded into
CMS for true online updating. Their power comes from access to documents attributes
stored in database, so one could perform categorized search, restricted
search (different rights, different document status, etc). The most close
example would be search on archive of mailing lists, which should be
embed such kind of full text search engine. fts.postgresql. org in his best
time was one of implementation of such system. This is what I hope to have on
www.pgsql.ru, if Marc will give us access to mailing list archives :)
I too would like access to the archives.

Another search engines we use are based on standard technology of
inverted indices, they are best suited for indexing of semi-static collections
od documents. We've full-fledged crawler, indexer and searcher. Online
update of inverted indices is rather complex technological task and I'm
not sure there are databases which have true online update. On www.pgsql.ru
we use GTSearch which is generic text search engine we developed for
vertical searches (for example, postgresql related resources). It has
common set of features like phrase search, proximity ranking, site search,
morphology, stemming support, cached documents, spell checking, similar search
etc.

I see several separate tasks:

* official documents (documentation mostly)

I'm not sure is there are some kind of CMS on www.postgresql.org, but
if it's there the best way is to embed tsearch2 into CMS. You'll have
fast, incremental search engine. There are many users of tsearch2 and I think
embedding isn't very difficult problem. I estimate there are maximum
10-20K pages of documentation, nothing for tsearch2.
A content management system is long overdue I think, do you have any
good recommendations ?

* mailing lists archive

mailing lists archive, which is constantly growing and
also required incremental update, so tsearch2 also needed. Nice hardware
like Marc has described would be more than enough. We have moderate dual
PIII 1Ggz server and I hope it would be enough.

* postgresql related resources

I think this task should be solved using standard technique - crawler,
indexer, searcher. Due to limited number of sites it's possible to
keep indices more actual than major search engines, for example
crawl once a week. This is what we currently have on pgsql.ru because
it doesn't require any permissions and interaction with sites officials.
Regards,
Oleg
On Wed, 31 Dec 2003, Marc G. Fournier wrote:
On Tue, 30 Dec 2003, Joshua D. Drake wrote:
Hello,

Why are we not using Tsearch2?


Because nobody has built it yet? Oleg's stuff is nice, but we want
something that we can build into the existing web sites, not a standalone
site ...

I keep searching the web hoping someone has come up with a 'tsearch2'
based search engine that does the spidering, but, unless its sitting right
in front of my eyes and I'm not seeing it, I haven't found it yet :(

Out of everything I've found so far, mnogosearch is one of the best ... I
just wish I could figure out where the bottleneck for it was, since, from
reading their docs, their method of storing the data doesn't appear to be
particularly off. I'm tempted to try their caching storage manager, and
getting away from SQL totally, but I *really* want to showcase PostgreSQL
on this :(

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)


Regards,
Oleg
_______________ _______________ _______________ _______________ _
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: ol**@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

--
Dave Cramer
519 939 0336
ICQ # 1467551
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 12 '05 #65
>


It's good Mason driven CMS, but Marc seems is a PHP fun :)

Well I know that the advocacy site is looking at implementing Bricolage.
It seems that if we
were smart about it, we would pick a platform (application wise) and
stick with it.

Sincerely,

Joshua D. Drake



Regards,
Oleg
______________ _______________ _______________ _______________ __
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: ol**@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-222-2783 - jd@commandpromp t.com - http://www.commandprompt.com
Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org
Nov 12 '05 #66
On Sat, 3 Jan 2004, Joshua D. Drake wrote:

A content management system is long overdue I think, do you have any
good recommendations ?
Bricolage


It's good Mason driven CMS, but Marc seems is a PHP fun :)


Regards,
Oleg
_______________ _______________ _______________ _______________ _
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: ol**@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #67
On Sat, 3 Jan 2004, Oleg Bartunov wrote:
So, there are 15333 URLs that contain that word ... now, what I want to
find out is how many of those 15333 URLs contain
'http://archives.postgr esql.org/%%', which is 13415 ...


what's the need for such query ? Are you trying to restrict search to
archives ? Why not just have site attribute for document and use simple
join ?


The searches are designed so that you can do sub-section searches ... ie.
if you only wanted to search hackers, the LIKE would be:

'http://archives.postgr esql.org/pgsql-hackers/%%'

while:

'http://archives.postgr esql.org/%%'

would give you a search of *all* the mailing lists ...

In theory, you could go smaller and search on:

'http://archives.postgr esql.org/pgsql-hackers/2003-11/%% for all messages
in November of 2003 ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #68
On Sat, 3 Jan 2004, Oleg Bartunov wrote:
time was one of implementation of such system. This is what I hope to have on
www.pgsql.ru, if Marc will give us access to mailing list archives :)


Access to the archives was provided before New Years *puzzled look* I sent
Teodor the rsync command that he needs to run to download it all from the
IP he provided me previously ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #69
On Sat, 3 Jan 2004, Marc G. Fournier wrote:
On Sat, 3 Jan 2004, Oleg Bartunov wrote:
time was one of implementation of such system. This is what I hope to have on
www.pgsql.ru, if Marc will give us access to mailing list archives :)
Access to the archives was provided before New Years *puzzled look* I sent
Teodor the rsync command that he needs to run to download it all from the
IP he provided me previously ...


Hmm, what's the secret rsync command you didn't share with me :)


----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Regards,
Oleg
_______________ _______________ _______________ _______________ _
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: ol**@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #70

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
2238
by: bettina | last post by:
I'm re-programming my Website (www.coaster.ch) in PHP and I find it too slow (although I have ADSL). That's more or less how it functions: Here my tables: 'COASTERS' (code of coaster, code of country, etc...) 'COUNTRIES' (code of country, names of countries in different languages, code of continent) 'CONTINENTS' (code of continent, names of continents in different languages)
12
6481
by: Vjay77 | last post by:
Hi, I haven't posted any problem in quite a while now, but I came to the point that I really need to ask for help. I need to create an application which will search through .txt log file and find all lines where email from hotmail occured. All these emails need to be printed to list box on the form. Problem with code you'll see below, is that it takes long time to
4
4778
by: sommes | last post by:
It's only happen on .asp website, what's the problem? Thank you
2
2079
by: tmb | last post by:
When publishing a website the process is excrutiatingly slow - we are talking 3-4 minutes from when the actual transfer to the site has begun to completion. Apparently i'm not the only one experiencing this and searching on the net i found a possible solution: http://blog.n-technologies.be/CommentView.aspx?guid=3df1930b-9517-4b9b-9dd6-b59cbcbbe34d However, i don't quite understand how to actually apply the solution mentioned. I have...
2
5040
by: yasmike | last post by:
I am having a problem with my secure website on our internal network. The secure website is hosted on our Windows 2000 Server running IIS 5.0. If you try and access the website from a browser from another computer on the same internal network using its domain name, https://www.domainname .com, it is extremely slow. If you access it using its IP https://192.168.1.2 it is very quick. It is also quick for anyone outside the internal network to...
0
9853
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9696
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10903
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10289
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9423
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7827
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7014
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5864
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
3131
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.