473,394 Members | 1,709 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Query re search engine robots and indexing

Hi

I'm aware of the use of robots.txt and the use of <META NAME="ROBOTS"
CONTENT="index,follow">

However, what would be more useful is to be able to control within a
page which elements of the page should be indexed and seen by robots and
which elements are simply page furniture and it is safe to ignore or not
cache (e.g. adverts).

Is there any guidance, especially using XHTML, regarding this and
whether it is possible to tell robots to ignore portions of a page

The . <META NAME="Description"... is of some use, but generally the
limitations on length are a problem.

thanks for any info

Craig
--
Craig Cockburn ("coburn"). SiliconGlen.com Ltd. http://SiliconGlen.com
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, wedding info, website design, stop spam and more!
Jul 20 '05 #1
8 2035
Craig Cockburn wrote:
what would be more useful is to be able to control within a page
which elements of the page should be indexed and seen by robots and
which elements are simply page furniture and it is safe to ignore or
not cache (e.g. adverts).
Such a mechanism would be open to severe abuse by web authors. As such,
no search engine has permitted such a thing.

Is there any guidance, especially using XHTML, regarding this and
whether it is possible to tell robots to ignore portions of a page


There is no such thing for any search engines that I know of. This
includes Google, the most important one, I think.

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/
Jul 20 '05 #2
In article <10*************@corp.supernews.com>,
Brian <us*****@julietremblay.com.invalid> wrote:
Craig Cockburn wrote:
Is there any guidance, especially using XHTML, regarding this and
whether it is possible to tell robots to ignore portions of a page


There is no such thing for any search engines that I know of. This
includes Google, the most important one, I think.


This probably isn't quite what the OP had in mind, but if one is talking
about a search engine for one's own site, PicoSearch does allow one to
exclude portions of a page from being indexed.

This may be because PicoSearch is not quite at Google's level in terms
of sophistication, and of course it can't really rely on external links
as hints as to the relevance of a given page. (I suppose it could use
internal links in that manner ... I think I wish it did.)

--
Joel.

http://www.cv6.org/
"May she also say with just pride:
I have done the State some service."
Jul 20 '05 #3
In message <10*************@corp.supernews.com>, Brian
<us*****@julietremblay.com.invalid> writes
Craig Cockburn wrote:
what would be more useful is to be able to control within a page
which elements of the page should be indexed and seen by robots and
which elements are simply page furniture and it is safe to ignore or
not cache (e.g. adverts).


Such a mechanism would be open to severe abuse by web authors. As such,
no search engine has permitted such a thing.

However, using <span lang="en"... versus other languages should have a
similar effect, if the language is taken as being significant. After all
if I specify an English search, should I receive matches that are in
pages where <html lang="en" but the relevant matching text is within
another language's span block?

(e.g. the Failte on www.siliconglen.com)

--
Craig Cockburn ("coburn"). SiliconGlen.com Ltd. http://SiliconGlen.com
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, wedding info, website design, stop spam and more!
Jul 20 '05 #4
Joel Shepherd wrote:
Brian wrote:
Craig Cockburn wrote:
whether it is possible to tell robots to ignore portions of a page
This probably isn't quite what the OP had in mind, but if one is talking
about a search engine for one's own site, PicoSearch does allow one to
exclude portions of a page from being indexed.
So does Perlfect Search.
This may be because PicoSearch is not quite at Google's level in terms
of sophistication,


Since fooling the search engine for a site search seems self-defeating,
there is no reason to block it. The only reason I can think of to abuse
such a mechanism is to fool external search engines into showing your
site for inappropriate search terms.

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/
Jul 20 '05 #5
Craig Cockburn wrote:
Brian writes
Craig Cockburn wrote:
what would be more useful is to be able to control within a page
which elements of the page should be indexed and seen by robots
and which elements are simply page furniture and it is safe to
ignore
Such a mechanism would be open to severe abuse by web authors. As
such, no search engine has permitted such a thing.


You seem to have misunderstood what I was saying. An example of the
abuse I mentioned would be site authors hiding their real content from
search engines to get
However, using <span lang="en"... versus other languages should have
a similar effect
I'm not sure how that would affect search engine results, except perhaps
to show e.g. an English page when one asked for only French pages.
if the language is taken as being significant.
Lang attributes are, afaik, ignored by search engines.
if I specify an English search, should I receive matches that are in
pages where <html lang="en" but the relevant matching text is within
another language's span block?


Well, I suppose not, but I don't see how an unscrupulous author could
abuse this to his advantage. Perhaps I'm missing something.

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/
Jul 20 '05 #6
In article <10*************@corp.supernews.com>,
Brian <us*****@julietremblay.com.invalid> wrote:
Joel Shepherd wrote:
This may be because PicoSearch is not quite at Google's level in terms
of sophistication,
Since fooling the search engine for a site search seems self-defeating,
there is no reason to block it.


I agree in principle, and don't see a need to defeat external search
engines. With a single site search, however, the situation I've
encountered is that there may be a single page with content about some
subject, and several pages with inbound links. The links, being happy
links, often include the page subject in the link text. So the local
search engine is happy to include these pages in the search results.
Since there are only a few pages in the results total, they're nearly as
prominent in the results as the page with the actual content.

Pointing a visitor at a page that simply has a link to the page where
the content that they want resides seems unfriendly to me. So, where it
makes sense, I exclude these links from being indexed.

There are surely better solutions, but this particular one doesn't seem
harmful. I wouldn't be inclined to try this with an external search
engine because I expect it to be more sophisticated, and whatever it's
indexing/ranking logic is, it's opaque to me. Trying to optimize for
opaque logic has a high probability of backfiring.
The only reason I can think of to abuse
such a mechanism is to fool external search engines into showing your
site for inappropriate search terms.


I would have phrased this as: "With regard to external search engines,
the only reason I can think of to abuse such a mechanism is to fool them
into showing your site for inappropriate search terms."

No real disagreement there.

--
Joel.

http://www.cv6.org/
"May she also say with just pride:
I have done the State some service."
Jul 20 '05 #7
In message <10*************@corp.supernews.com>, Brian
<us*****@julietremblay.com.invalid> writes
if I specify an English search, should I receive matches that are in
pages where <html lang="en" but the relevant matching text is within
another language's span block?


Well, I suppose not, but I don't see how an unscrupulous author could
abuse this to his advantage. Perhaps I'm missing something.

I don't see how either. My original point was about *excluding* certain
content on a page from search engines and whether there is any way of
doing this. Both robots.txt and meta tags are all-or-nothing and it
would seem to be useful to exclude page furniture, adverts from robots
but allow them to see the main text body content.

--
Craig Cockburn ("coburn"). SiliconGlen.com Ltd. http://SiliconGlen.com
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, wedding info, website design, stop spam and more!
Jul 20 '05 #8
Craig Cockburn wrote:
My original point was about *excluding* certain
content on a page from search engines and whether there is any way of
doing this.


I understood your original point quite well, and answered it. No such
device exists. The reason is that search engines want to produce
accurate results, and providing a means to selectively hide content on
pages would almost certainly lead to authors lying about their content,
thus degrading those results.

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/
Jul 20 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Marlo Brandon | last post by:
Hi i need to create a search engine for my site. What are the fastest, cheapest, most thorough solutions? Out of the box? Coding? Where can I look for code, resources, strageies etc. We have...
2
by: Brent | last post by:
Hi, I have indexing servicing working fine on my test server for a search engine in C# ASP.NET, but when I tried to make one on our live server, its not working correctly. The index only returns...
8
by: Steve | last post by:
I notice that search engines are now finding robots.txt files and catalogue their contents. Is this wise I wonder? Is it a possible security risk? I even found the White House robots.txt file on...
9
by: Bob Bedford | last post by:
I've a question about generating pages for search engines. It's possible to detect a bot coming on a website and then show a complete other page for it ? My main page is mainly graphic, with a...
4
by: John | last post by:
Greetings, all, Several days after adding personalized URLs to my "amazing" collection of "God Loves (yourname)" mazes, it occurred to me that if someone were to create an offcolor term, then...
5
by: Sam | last post by:
Does anyone know of a way to create a search page under ASP.NET 2.0? I have started out by configuring a catalog in Index Server, registering the aspx, ascx extensions in the registry to allow...
3
by: Mark | last post by:
Our site gets searched by robots all the time. This is great. However, many of our pages that we want to be cataloged are data driven, so we end up with pages like: ...
4
by: MDW | last post by:
Posted this on another board, but evidently it was off-topic there...hope you folks will be able to provide some guidance. I've been working on a Web site for a business (my first non-personal...
3
by: darrel | last post by:
I'm in need of a cheap, windows-based web site indexer/search engine that, ideally, has some .net integration and/or can sit along side of an asp.net web site fairly easily. We've used DTSearch...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.