Query re search engine robots and indexing

Craig Cockburn

Hi

I'm aware of the use of robots.txt and the use of <META NAME="ROBOTS"
CONTENT="index,follow">

However, what would be more useful is to be able to control within a
page which elements of the page should be indexed and seen by robots and
which elements are simply page furniture and it is safe to ignore or not
cache (e.g. adverts).

Is there any guidance, especially using XHTML, regarding this and
whether it is possible to tell robots to ignore portions of a page

The . <META NAME="Description"... is of some use, but generally the
limitations on length are a problem.

thanks for any info

Craig
--
Craig Cockburn ("coburn"). SiliconGlen.com Ltd. http://SiliconGlen.com
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, wedding info, website design, stop spam and more!

Jul 20 '05 #1

Subscribe Post Reply

2035

Brian

Craig Cockburn wrote:

what would be more useful is to be able to control within a page
which elements of the page should be indexed and seen by robots and
which elements are simply page furniture and it is safe to ignore or
not cache (e.g. adverts).
Such a mechanism would be open to severe abuse by web authors. As such,
no search engine has permitted such a thing.

Is there any guidance, especially using XHTML, regarding this and
whether it is possible to tell robots to ignore portions of a page

There is no such thing for any search engines that I know of. This
includes Google, the most important one, I think.

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/

Jul 20 '05 #2

Joel Shepherd

In article <10*************@corp.supernews.com>,
Brian <us*****@julietremblay.com.invalid> wrote:

Craig Cockburn wrote:
Is there any guidance, especially using XHTML, regarding this and
whether it is possible to tell robots to ignore portions of a page

There is no such thing for any search engines that I know of. This
includes Google, the most important one, I think.

This probably isn't quite what the OP had in mind, but if one is talking
about a search engine for one's own site, PicoSearch does allow one to
exclude portions of a page from being indexed.

This may be because PicoSearch is not quite at Google's level in terms
of sophistication, and of course it can't really rely on external links
as hints as to the relevance of a given page. (I suppose it could use
internal links in that manner ... I think I wish it did.)

--
Joel.

http://www.cv6.org/
"May she also say with just pride:
I have done the State some service."

Jul 20 '05 #3

Craig Cockburn

In message <10*************@corp.supernews.com>, Brian
<us*****@julietremblay.com.invalid> writes

Craig Cockburn wrote:
what would be more useful is to be able to control within a page
which elements of the page should be indexed and seen by robots and
which elements are simply page furniture and it is safe to ignore or
not cache (e.g. adverts).

Such a mechanism would be open to severe abuse by web authors. As such,
no search engine has permitted such a thing.

However, using <span lang="en"... versus other languages should have a
similar effect, if the language is taken as being significant. After all
if I specify an English search, should I receive matches that are in
pages where <html lang="en" but the relevant matching text is within
another language's span block?

(e.g. the Failte on www.siliconglen.com)

--
Craig Cockburn ("coburn"). SiliconGlen.com Ltd. http://SiliconGlen.com
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, wedding info, website design, stop spam and more!

Jul 20 '05 #4

Brian

Joel Shepherd wrote:

Brian wrote:
Craig Cockburn wrote:
whether it is possible to tell robots to ignore portions of a page
This probably isn't quite what the OP had in mind, but if one is talking
about a search engine for one's own site, PicoSearch does allow one to
exclude portions of a page from being indexed.
So does Perlfect Search.
This may be because PicoSearch is not quite at Google's level in terms
of sophistication,

Since fooling the search engine for a site search seems self-defeating,
there is no reason to block it. The only reason I can think of to abuse
such a mechanism is to fool external search engines into showing your
site for inappropriate search terms.

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/

Jul 20 '05 #5

Brian

Craig Cockburn wrote:

Brian writes
Craig Cockburn wrote:
what would be more useful is to be able to control within a page
which elements of the page should be indexed and seen by robots
and which elements are simply page furniture and it is safe to
ignore
Such a mechanism would be open to severe abuse by web authors. As
such, no search engine has permitted such a thing.

You seem to have misunderstood what I was saying. An example of the
abuse I mentioned would be site authors hiding their real content from
search engines to get
However, using <span lang="en"... versus other languages should have
a similar effect
I'm not sure how that would affect search engine results, except perhaps
to show e.g. an English page when one asked for only French pages.
if the language is taken as being significant.
Lang attributes are, afaik, ignored by search engines.
if I specify an English search, should I receive matches that are in
pages where <html lang="en" but the relevant matching text is within
another language's span block?

Well, I suppose not, but I don't see how an unscrupulous author could
abuse this to his advantage. Perhaps I'm missing something.

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/

Jul 20 '05 #6

Joel Shepherd

In article <10*************@corp.supernews.com>,
Brian <us*****@julietremblay.com.invalid> wrote:

Joel Shepherd wrote:
This may be because PicoSearch is not quite at Google's level in terms
of sophistication,
Since fooling the search engine for a site search seems self-defeating,
there is no reason to block it.

I agree in principle, and don't see a need to defeat external search
engines. With a single site search, however, the situation I've
encountered is that there may be a single page with content about some
subject, and several pages with inbound links. The links, being happy
links, often include the page subject in the link text. So the local
search engine is happy to include these pages in the search results.
Since there are only a few pages in the results total, they're nearly as
prominent in the results as the page with the actual content.

Pointing a visitor at a page that simply has a link to the page where
the content that they want resides seems unfriendly to me. So, where it
makes sense, I exclude these links from being indexed.

There are surely better solutions, but this particular one doesn't seem
harmful. I wouldn't be inclined to try this with an external search
engine because I expect it to be more sophisticated, and whatever it's
indexing/ranking logic is, it's opaque to me. Trying to optimize for
opaque logic has a high probability of backfiring.
The only reason I can think of to abuse
such a mechanism is to fool external search engines into showing your
site for inappropriate search terms.

I would have phrased this as: "With regard to external search engines,
the only reason I can think of to abuse such a mechanism is to fool them
into showing your site for inappropriate search terms."

No real disagreement there.

--
Joel.

http://www.cv6.org/
"May she also say with just pride:
I have done the State some service."

Jul 20 '05 #7

Craig Cockburn

In message <10*************@corp.supernews.com>, Brian
<us*****@julietremblay.com.invalid> writes

if I specify an English search, should I receive matches that are in
pages where <html lang="en" but the relevant matching text is within
another language's span block?

Well, I suppose not, but I don't see how an unscrupulous author could
abuse this to his advantage. Perhaps I'm missing something.

I don't see how either. My original point was about *excluding* certain
content on a page from search engines and whether there is any way of
doing this. Both robots.txt and meta tags are all-or-nothing and it
would seem to be useful to exclude page furniture, adverts from robots
but allow them to see the main text body content.

--
Craig Cockburn ("coburn"). SiliconGlen.com Ltd. http://SiliconGlen.com
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, wedding info, website design, stop spam and more!

Jul 20 '05 #8

Brian

Craig Cockburn wrote:

My original point was about *excluding* certain
content on a page from search engines and whether there is any way of
doing this.

I understood your original point quite well, and answered it. No such
device exists. The reason is that search engines want to produce
accurate results, and providing a means to selectively hide content on
pages would almost certainly lead to authors lying about their content,
thus degrading those results.

--
Brian (remove "invalid" from my address to email me)
http://www.tsmchughs.com/

Jul 20 '05 #9

Similar topics

search engine

by: Marlo Brandon | last post by:

Hi i need to create a search engine for my site. What are the fastest, cheapest, most thorough solutions? Out of the box? Coding? Where can I look for code, resources, strageies etc. We have...

ASP / Active Server Pages

Problems with Indexing Service for Search Engine

by: Brent | last post by:

Hi, I have indexing servicing working fine on my test server for a search engine in C# ASP.NET, but when I tried to make one on our live server, its not working correctly. The index only returns...

.NET Framework

What is going on with the Search Engines?

by: Steve | last post by:

I notice that search engines are now finding robots.txt files and catalogue their contents. Is this wise I wonder? Is it a possible security risk? I even found the White House robots.txt file on...

HTML / CSS

search engine optimization question

by: Bob Bedford | last post by:

I've a question about generating pages for search engines. It's possible to detect a bot coming on a website and then show a complete other page for it ? My main page is mainly graphic, with a...

PHP

Are personalized URLs a danger to my Search Engine inclusion?

by: John | last post by:

Greetings, all, Several days after adding personalized URLs to my "amazing" collection of "God Loves (yourname)" mazes, it occurred to me that if someone were to create an offcolor term, then...

HTML / CSS

ASP.NET 2.0 Web Site Search Page

by: Sam | last post by:

Does anyone know of a way to create a search page under ASP.NET 2.0? I have started out by configuring a catalog in Index Server, registering the aspx, ascx extensions in the registry to allow...

ASP.NET

Search engines crawling our .NET site

by: Mark | last post by:

Our site gets searched by robots all the time. This is great. However, many of our pages that we want to be cataloged are data driven, so we end up with pages like: ...

ASP.NET

Search Engine Listings

by: MDW | last post by:

Posted this on another board, but evidently it was off-topic there...hope you folks will be able to provide some guidance. I've been working on a Web site for a business (my first non-personal...

ASP / Active Server Pages

Search Engine for Web Site suggestions?

by: darrel | last post by:

I'm in need of a cheap, windows-based web site indexer/search engine that, ideally, has some .net integration and/or can sit along side of an asp.net web site fairly easily. We've used DTSearch...

ASP.NET

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server