Google ignoring robot exclusion tags

Philip Ronan

Hi,

I recently discovered that Google's mobile search robot doesn't
understand the "robots" Meta tag.

Here's an example:

<http://www.google.com/xhtml/search?s...ch-cingular_mb
_xhtml&mrestrict=xhtml&q=robots+noindex&btnG=Searc h&site=mobile>

When I last looked, the top result for this search was a page at
gustaf.symbiandiaries.com with this tag in the HEAD section:

<meta name="robots" content="noindex,follow" />

It even contains a blog post from its author explaining why the meta tag
was added to this page. That was way back in April, so there's no
getting around the fact that Google is at fault here.

I've informed Google, but no reply yet.

Just thought you might like to know :-)

Phil

--
philronan [@] blueyonder [dot] co [dot] uk

Dec 1 '06 #1

Subscribe Post Reply

2026

Spartanicus

Philip Ronan <no****@example.invalidwrote:

>I recently discovered that Google's mobile search robot doesn't
understand the "robots" Meta tag.

Here's an example:

<http://www.google.com/xhtml/search?s...ch-cingular_mb
_xhtml&mrestrict=xhtml&q=robots+noindex&btnG=Sear ch&site=mobile>

When I last looked, the top result for this search was a page at
gustaf.symbiandiaries.com with this tag in the HEAD section:

<meta name="robots" content="noindex,follow" />

Note that the site uses this robots.txt:

User-agent: *
Disallow: /cgi-bin
Disallow: /metablog
Disallow: /feedonfeeds
Disallow: /weblog/2004/
Disallow: /weblog/2005/
Disallow: /weblog/2006/
Disallow: /weblog/2007/

and that the given URL is not excluded.

IMO it's reasonable to completely ignore such legacy meta tags, more so
if a robots.txt is present.

--
Spartanicus

Dec 1 '06 #2

Philip Ronan

In article <gk********************************@4ax.com>,
Spartanicus <in*****@invalid.invalidwrote:

IMO it's reasonable to completely ignore such legacy meta tags, more so
if a robots.txt is present.

Really.

Do you think it's also OK for Google to ignore their own published
guidelines?

<http://www.google.com/support/webmasters/bin/answer.py?answer=35303>

A site owner might have perfectly good reasons for not wanting to
publicize URLs in a robots.txt file (e.g., preventing users from
siphoning out thousands of web pages with "site download" tools.

And since when has the "robots" meta tag been deprecated? Is that just
an opinion, or can you back that up?

--
If you really must contact me by email, visit
http://rumkin.com/tools/compression/base64.php
and decode the following string of characters:
RW1haWw6IHBoaWxyb25hbkBibHVleW9uZGVyLmNvLnVr

Dec 1 '06 #3

Spartanicus

Philip Ronan <no****@example.invalidwrote:

>IMO it's reasonable to completely ignore such legacy meta tags, more so
if a robots.txt is present.

Really.

Do you think it's also OK for Google to ignore their own published
guidelines?

I'm not interested in what Google does WRT their own guidelines.

><http://www.google.com/support/webmasters/bin/answer.py?answer=35303>

A site owner might have perfectly good reasons for not wanting to
publicize URLs in a robots.txt file (e.g., preventing users from
siphoning out thousands of web pages with "site download" tools.

In rare cases of publicly accessible documents that cannot be found by a
link following spider there is no point in listing these documents in a
robots.txt.

Publicly accessible documents that can be found by a link following
spider will be spidered anyway by bots that do not adhere to exclude
requests.

>And since when has the "robots" meta tag been deprecated? Is that just
an opinion, or can you back that up?

Legacy != deprecated, legacy = a left over, relic.

It makes no sense to use document tags to guide SEs, it never did. This
is reflected by the fact that nowadays they are often ignored. A note
likely written quite some time ago from the robots.txt site [about meta
tags aimed at SEs] : "Note that currently only a few robots implement
this."

There has been a better mechanism for some considerable time now.

--
Spartanicus

Dec 1 '06 #4

Philip Ronan

In article <5m********************************@4ax.com>,
Spartanicus <in*****@invalid.invalidwrote:

I'm not interested in what Google does WRT their own guidelines.

Then STFU.

--
If you really must contact me by email, visit
http://rumkin.com/tools/compression/base64.php
and decode the following string of characters:
RW1haWw6IHBoaWxyb25hbkBibHVleW9uZGVyLmNvLnVr

Dec 1 '06 #5

Harlan Messinger

Spartanicus wrote:

It makes no sense to use document tags to guide SEs, it never did. This
is reflected by the fact that nowadays they are often ignored. A note
likely written quite some time ago from the robots.txt site [about meta
tags aimed at SEs] : "Note that currently only a few robots implement
this."

There has been a better mechanism for some considerable time now.

Define "better". Robots.txt is a mechanism that's useless to anyone who
doesn't have control over the robots.txt file, which includes any
hosting site with user directories, and any organization web site where
each department maintains its own part of the site.

Robots.txt also has its advantages. So, who says there shouldn't be two
complementary ways to accomplish one goal? Once the META method came to
exist, there's no reason to start ignoring those tags. That's like
deciding that the expression "excuse me" is now a legacy expression and
choosing not to get out of people's way when they politely say, "Excuse
me, please." Dropping an existing courtesy serves no principle and is a
hostile act.

Dec 1 '06 #6

Spartanicus

Harlan Messinger <hm*******************@comcast.netwrote:

>It makes no sense to use document tags to guide SEs, it never did. This
is reflected by the fact that nowadays they are often ignored. A note
likely written quite some time ago from the robots.txt site [about meta
tags aimed at SEs] : "Note that currently only a few robots implement
this."

There has been a better mechanism for some considerable time now.

Define "better".

More efficient, much better supported and better features would be a
start.

>Robots.txt is a mechanism that's useless to anyone who
doesn't have control over the robots.txt file, which includes any
hosting site with user directories,

Despite of that limitation it is overall a much better mechanism.

>and any organization web site where
each department maintains its own part of the site.

That doesn't mean that they are excluded from editing a web root
document such as a robots.txt file. And subdomains can be used on which
each can use it's own robots.txt.

>Robots.txt also has its advantages. So, who says there shouldn't be two
complementary ways to accomplish one goal? Once the META method came to
exist, there's no reason to start ignoring those tags.

I think you'd find that bot operators much appreciate the better
efficiency of the robots.txt convention.

>That's like
deciding that the expression "excuse me" is now a legacy expression and
choosing not to get out of people's way when they politely say, "Excuse
me, please." Dropping an existing courtesy serves no principle and is a
hostile act.

Again: bot support for meta tags aimed at guiding indexing has reduced
greatly. But you are free to ignore that.

--
Spartanicus

Dec 1 '06 #7

David E. Ross

Philip Ronan wrote:

Hi,

I recently discovered that Google's mobile search robot doesn't
understand the "robots" Meta tag.

Here's an example:

<http://www.google.com/xhtml/search?s...ch-cingular_mb
_xhtml&mrestrict=xhtml&q=robots+noindex&btnG=Searc h&site=mobile>

When I last looked, the top result for this search was a page at
gustaf.symbiandiaries.com with this tag in the HEAD section:

<meta name="robots" content="noindex,follow" />

It even contains a blog post from its author explaining why the meta tag
was added to this page. That was way back in April, so there's no
getting around the fact that Google is at fault here.

I've informed Google, but no reply yet.

Just thought you might like to know :-)

Phil

Just be aware that there are many rogue bots, crawlers, and spiders that
ignore both robots.txt and the META tag. See
<http://www.kloth.net/internet/badbots.php>.

--

David E. Ross
<http://www.rossde.com/>

I use SeaMonkey as my Web browser because I want
a browser that complies with Web standards. See
<http://www.mozilla.org/projects/seamonkey/>.

Dec 1 '06 #8

Philip Ronan

In article <ob******************************@iswest.net>,
"David E. Ross" <no****@nowhere.notwrote:

Just be aware that there are many rogue bots, crawlers, and spiders that
ignore both robots.txt and the META tag. See
<http://www.kloth.net/internet/badbots.php>.

Yeah, I'm aware of that.

--
If you really must contact me by email, visit
http://rumkin.com/tools/compression/base64.php
and decode the following string of characters:
RW1haWw6IHBoaWxyb25hbkBibHVleW9uZGVyLmNvLnVr

Dec 1 '06 #9

Similar topics

Meta tags and Google

by: Christian Hvid | last post by:

Hello groups. I have a series of applet computer games on my homepage: http://vredungmand.dk/games/erik-spillet/index.html http://vredungmand.dk/games/nohats/index.html...

HTML / CSS

standard for robot behaviour

by: Biggie | last post by:

Hi, is there any standard (RFC or other...) that specifies how to write rules for robots / how robots should implement these rules? There is a document called "Standard for Robot Exclusion" at...

HTML / CSS

Google Bot problems?

by: Steve | last post by:

I have worked on a couple of sites which google's bot visits, partially lists and then goes away again. MSN and Yahoo are fine and working. Can anyone please suggest what, if anything, is...

HTML / CSS

XPathNavigator ignoring embedded tags

by: David | last post by:

I'm using an XPathNodeIterator to select an element in an XML document that contains text I am going to put in a label on an aspx page. I want to be able to include HTML tags in the text read...

.NET Framework

Programming challenge: wildcard exclusion in cartesian products

by: wkehowski | last post by:

The python code below generates a cartesian product subject to any logical combination of wildcard exclusions. For example, suppose I want to generate a cartesian product S^n, n>=3, of that...

Python

php ignoring code after greater than sybmol

by: dennis.mcknight | last post by:

new to php -- please help. it seems like php is treating any '>' character as the end of my code segment, even when it's embedded in a string, as shown <? $s="THIS IS MY TEST STRING"; ?> ...

PHP

Python Robot Implementation

by: nnobakht | last post by:

Hi, I'm working on an assignment for school which i am a bit stuck on. The assignment is to make robot which i have been given the library for move around different boards and collecting "coins" and...

Python

'google' 2

by: tatata9999 | last post by:

The first generation of web site search engine hands-down is google. A majority of these web sites are static page -driven html pages. Now, I would think more and more web-based applications are...

HTML / CSS

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General