By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,879 Members | 1,809 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,879 IT Pros & Developers. It's quick & easy.

Query re search engine robots and indexing

P: n/a
Hi

I'm aware of the use of robots.txt and the use of <META NAME="ROBOTS"
CONTENT="index,follow">

However, what would be more useful is to be able to control within a
page which elements of the page should be indexed and seen by robots and
which elements are simply page furniture and it is safe to ignore or not
cache (e.g. adverts).

Is there any guidance, especially using XHTML, regarding this and
whether it is possible to tell robots to ignore portions of a page

The . <META NAME="Description"... is of some use, but generally the
limitations on length are a problem.

thanks for any info

Craig
--
Craig Cockburn ("coburn"). SiliconGlen.com Ltd. http://SiliconGlen.com
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, wedding info, website design, stop spam and more!
Jul 20 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a
Craig Cockburn wrote:
what would be more useful is to be able to control within a page
which elements of the page should be indexed and seen by robots and
which elements are simply page furniture and it is safe to ignore or
not cache (e.g. adverts).
Such a mechanism would be open to severe abuse by web authors. As such,
no search engine has permitted such a thing.

Is there any guidance, especially using XHTML, regarding this and
whether it is possible to tell robots to ignore portions of a page


There is no such thing for any search engines that I know of. This
includes Google, the most important one, I think.

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/
Jul 20 '05 #2

P: n/a
In article <10*************@corp.supernews.com>,
Brian <us*****@julietremblay.com.invalid> wrote:
Craig Cockburn wrote:
Is there any guidance, especially using XHTML, regarding this and
whether it is possible to tell robots to ignore portions of a page


There is no such thing for any search engines that I know of. This
includes Google, the most important one, I think.


This probably isn't quite what the OP had in mind, but if one is talking
about a search engine for one's own site, PicoSearch does allow one to
exclude portions of a page from being indexed.

This may be because PicoSearch is not quite at Google's level in terms
of sophistication, and of course it can't really rely on external links
as hints as to the relevance of a given page. (I suppose it could use
internal links in that manner ... I think I wish it did.)

--
Joel.

http://www.cv6.org/
"May she also say with just pride:
I have done the State some service."
Jul 20 '05 #3

P: n/a
In message <10*************@corp.supernews.com>, Brian
<us*****@julietremblay.com.invalid> writes
Craig Cockburn wrote:
what would be more useful is to be able to control within a page
which elements of the page should be indexed and seen by robots and
which elements are simply page furniture and it is safe to ignore or
not cache (e.g. adverts).


Such a mechanism would be open to severe abuse by web authors. As such,
no search engine has permitted such a thing.

However, using <span lang="en"... versus other languages should have a
similar effect, if the language is taken as being significant. After all
if I specify an English search, should I receive matches that are in
pages where <html lang="en" but the relevant matching text is within
another language's span block?

(e.g. the Failte on www.siliconglen.com)

--
Craig Cockburn ("coburn"). SiliconGlen.com Ltd. http://SiliconGlen.com
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, wedding info, website design, stop spam and more!
Jul 20 '05 #4

P: n/a
Joel Shepherd wrote:
Brian wrote:
Craig Cockburn wrote:
whether it is possible to tell robots to ignore portions of a page
This probably isn't quite what the OP had in mind, but if one is talking
about a search engine for one's own site, PicoSearch does allow one to
exclude portions of a page from being indexed.
So does Perlfect Search.
This may be because PicoSearch is not quite at Google's level in terms
of sophistication,


Since fooling the search engine for a site search seems self-defeating,
there is no reason to block it. The only reason I can think of to abuse
such a mechanism is to fool external search engines into showing your
site for inappropriate search terms.

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/
Jul 20 '05 #5

P: n/a
Craig Cockburn wrote:
Brian writes
Craig Cockburn wrote:
what would be more useful is to be able to control within a page
which elements of the page should be indexed and seen by robots
and which elements are simply page furniture and it is safe to
ignore
Such a mechanism would be open to severe abuse by web authors. As
such, no search engine has permitted such a thing.


You seem to have misunderstood what I was saying. An example of the
abuse I mentioned would be site authors hiding their real content from
search engines to get
However, using <span lang="en"... versus other languages should have
a similar effect
I'm not sure how that would affect search engine results, except perhaps
to show e.g. an English page when one asked for only French pages.
if the language is taken as being significant.
Lang attributes are, afaik, ignored by search engines.
if I specify an English search, should I receive matches that are in
pages where <html lang="en" but the relevant matching text is within
another language's span block?


Well, I suppose not, but I don't see how an unscrupulous author could
abuse this to his advantage. Perhaps I'm missing something.

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/
Jul 20 '05 #6

P: n/a
In article <10*************@corp.supernews.com>,
Brian <us*****@julietremblay.com.invalid> wrote:
Joel Shepherd wrote:
This may be because PicoSearch is not quite at Google's level in terms
of sophistication,
Since fooling the search engine for a site search seems self-defeating,
there is no reason to block it.


I agree in principle, and don't see a need to defeat external search
engines. With a single site search, however, the situation I've
encountered is that there may be a single page with content about some
subject, and several pages with inbound links. The links, being happy
links, often include the page subject in the link text. So the local
search engine is happy to include these pages in the search results.
Since there are only a few pages in the results total, they're nearly as
prominent in the results as the page with the actual content.

Pointing a visitor at a page that simply has a link to the page where
the content that they want resides seems unfriendly to me. So, where it
makes sense, I exclude these links from being indexed.

There are surely better solutions, but this particular one doesn't seem
harmful. I wouldn't be inclined to try this with an external search
engine because I expect it to be more sophisticated, and whatever it's
indexing/ranking logic is, it's opaque to me. Trying to optimize for
opaque logic has a high probability of backfiring.
The only reason I can think of to abuse
such a mechanism is to fool external search engines into showing your
site for inappropriate search terms.


I would have phrased this as: "With regard to external search engines,
the only reason I can think of to abuse such a mechanism is to fool them
into showing your site for inappropriate search terms."

No real disagreement there.

--
Joel.

http://www.cv6.org/
"May she also say with just pride:
I have done the State some service."
Jul 20 '05 #7

P: n/a
In message <10*************@corp.supernews.com>, Brian
<us*****@julietremblay.com.invalid> writes
if I specify an English search, should I receive matches that are in
pages where <html lang="en" but the relevant matching text is within
another language's span block?


Well, I suppose not, but I don't see how an unscrupulous author could
abuse this to his advantage. Perhaps I'm missing something.

I don't see how either. My original point was about *excluding* certain
content on a page from search engines and whether there is any way of
doing this. Both robots.txt and meta tags are all-or-nothing and it
would seem to be useful to exclude page furniture, adverts from robots
but allow them to see the main text body content.

--
Craig Cockburn ("coburn"). SiliconGlen.com Ltd. http://SiliconGlen.com
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, wedding info, website design, stop spam and more!
Jul 20 '05 #8

P: n/a
Craig Cockburn wrote:
My original point was about *excluding* certain
content on a page from search engines and whether there is any way of
doing this.


I understood your original point quite well, and answered it. No such
device exists. The reason is that search engines want to produce
accurate results, and providing a means to selectively hide content on
pages would almost certainly lead to authors lying about their content,
thus degrading those results.

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/
Jul 20 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.