468,106 Members | 1,536 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,106 developers. It's quick & easy.

Google ignoring robot exclusion tags

Hi,

I recently discovered that Google's mobile search robot doesn't
understand the "robots" Meta tag.

Here's an example:

<http://www.google.com/xhtml/search?s...ch-cingular_mb
_xhtml&mrestrict=xhtml&q=robots+noindex&btnG=Searc h&site=mobile>

When I last looked, the top result for this search was a page at
gustaf.symbiandiaries.com with this tag in the HEAD section:

<meta name="robots" content="noindex,follow" />

It even contains a blog post from its author explaining why the meta tag
was added to this page. That was way back in April, so there's no
getting around the fact that Google is at fault here.

I've informed Google, but no reply yet.

Just thought you might like to know :-)

Phil

--
philronan [@] blueyonder [dot] co [dot] uk
Dec 1 '06 #1
8 1891
Philip Ronan <no****@example.invalidwrote:
>I recently discovered that Google's mobile search robot doesn't
understand the "robots" Meta tag.

Here's an example:

<http://www.google.com/xhtml/search?s...ch-cingular_mb
_xhtml&mrestrict=xhtml&q=robots+noindex&btnG=Sear ch&site=mobile>

When I last looked, the top result for this search was a page at
gustaf.symbiandiaries.com with this tag in the HEAD section:

<meta name="robots" content="noindex,follow" />
Note that the site uses this robots.txt:

User-agent: *
Disallow: /cgi-bin
Disallow: /metablog
Disallow: /feedonfeeds
Disallow: /weblog/2004/
Disallow: /weblog/2005/
Disallow: /weblog/2006/
Disallow: /weblog/2007/

and that the given URL is not excluded.

IMO it's reasonable to completely ignore such legacy meta tags, more so
if a robots.txt is present.

--
Spartanicus
Dec 1 '06 #2
In article <gk********************************@4ax.com>,
Spartanicus <in*****@invalid.invalidwrote:
IMO it's reasonable to completely ignore such legacy meta tags, more so
if a robots.txt is present.
Really.

Do you think it's also OK for Google to ignore their own published
guidelines?

<http://www.google.com/support/webmasters/bin/answer.py?answer=35303>

A site owner might have perfectly good reasons for not wanting to
publicize URLs in a robots.txt file (e.g., preventing users from
siphoning out thousands of web pages with "site download" tools.

And since when has the "robots" meta tag been deprecated? Is that just
an opinion, or can you back that up?

--
If you really must contact me by email, visit
http://rumkin.com/tools/compression/base64.php
and decode the following string of characters:
RW1haWw6IHBoaWxyb25hbkBibHVleW9uZGVyLmNvLnVr
Dec 1 '06 #3
Philip Ronan <no****@example.invalidwrote:
>IMO it's reasonable to completely ignore such legacy meta tags, more so
if a robots.txt is present.

Really.

Do you think it's also OK for Google to ignore their own published
guidelines?
I'm not interested in what Google does WRT their own guidelines.
><http://www.google.com/support/webmasters/bin/answer.py?answer=35303>

A site owner might have perfectly good reasons for not wanting to
publicize URLs in a robots.txt file (e.g., preventing users from
siphoning out thousands of web pages with "site download" tools.
In rare cases of publicly accessible documents that cannot be found by a
link following spider there is no point in listing these documents in a
robots.txt.

Publicly accessible documents that can be found by a link following
spider will be spidered anyway by bots that do not adhere to exclude
requests.
>And since when has the "robots" meta tag been deprecated? Is that just
an opinion, or can you back that up?
Legacy != deprecated, legacy = a left over, relic.

It makes no sense to use document tags to guide SEs, it never did. This
is reflected by the fact that nowadays they are often ignored. A note
likely written quite some time ago from the robots.txt site [about meta
tags aimed at SEs] : "Note that currently only a few robots implement
this."

There has been a better mechanism for some considerable time now.

--
Spartanicus
Dec 1 '06 #4
In article <5m********************************@4ax.com>,
Spartanicus <in*****@invalid.invalidwrote:
I'm not interested in what Google does WRT their own guidelines.
Then STFU.

--
If you really must contact me by email, visit
http://rumkin.com/tools/compression/base64.php
and decode the following string of characters:
RW1haWw6IHBoaWxyb25hbkBibHVleW9uZGVyLmNvLnVr
Dec 1 '06 #5
Spartanicus wrote:
It makes no sense to use document tags to guide SEs, it never did. This
is reflected by the fact that nowadays they are often ignored. A note
likely written quite some time ago from the robots.txt site [about meta
tags aimed at SEs] : "Note that currently only a few robots implement
this."

There has been a better mechanism for some considerable time now.
Define "better". Robots.txt is a mechanism that's useless to anyone who
doesn't have control over the robots.txt file, which includes any
hosting site with user directories, and any organization web site where
each department maintains its own part of the site.

Robots.txt also has its advantages. So, who says there shouldn't be two
complementary ways to accomplish one goal? Once the META method came to
exist, there's no reason to start ignoring those tags. That's like
deciding that the expression "excuse me" is now a legacy expression and
choosing not to get out of people's way when they politely say, "Excuse
me, please." Dropping an existing courtesy serves no principle and is a
hostile act.
Dec 1 '06 #6
Harlan Messinger <hm*******************@comcast.netwrote:
>It makes no sense to use document tags to guide SEs, it never did. This
is reflected by the fact that nowadays they are often ignored. A note
likely written quite some time ago from the robots.txt site [about meta
tags aimed at SEs] : "Note that currently only a few robots implement
this."

There has been a better mechanism for some considerable time now.

Define "better".
More efficient, much better supported and better features would be a
start.
>Robots.txt is a mechanism that's useless to anyone who
doesn't have control over the robots.txt file, which includes any
hosting site with user directories,
Despite of that limitation it is overall a much better mechanism.
>and any organization web site where
each department maintains its own part of the site.
That doesn't mean that they are excluded from editing a web root
document such as a robots.txt file. And subdomains can be used on which
each can use it's own robots.txt.
>Robots.txt also has its advantages. So, who says there shouldn't be two
complementary ways to accomplish one goal? Once the META method came to
exist, there's no reason to start ignoring those tags.
I think you'd find that bot operators much appreciate the better
efficiency of the robots.txt convention.
>That's like
deciding that the expression "excuse me" is now a legacy expression and
choosing not to get out of people's way when they politely say, "Excuse
me, please." Dropping an existing courtesy serves no principle and is a
hostile act.
Again: bot support for meta tags aimed at guiding indexing has reduced
greatly. But you are free to ignore that.

--
Spartanicus
Dec 1 '06 #7
Philip Ronan wrote:
Hi,

I recently discovered that Google's mobile search robot doesn't
understand the "robots" Meta tag.

Here's an example:

<http://www.google.com/xhtml/search?s...ch-cingular_mb
_xhtml&mrestrict=xhtml&q=robots+noindex&btnG=Searc h&site=mobile>

When I last looked, the top result for this search was a page at
gustaf.symbiandiaries.com with this tag in the HEAD section:

<meta name="robots" content="noindex,follow" />

It even contains a blog post from its author explaining why the meta tag
was added to this page. That was way back in April, so there's no
getting around the fact that Google is at fault here.

I've informed Google, but no reply yet.

Just thought you might like to know :-)

Phil
Just be aware that there are many rogue bots, crawlers, and spiders that
ignore both robots.txt and the META tag. See
<http://www.kloth.net/internet/badbots.php>.

--

David E. Ross
<http://www.rossde.com/>

I use SeaMonkey as my Web browser because I want
a browser that complies with Web standards. See
<http://www.mozilla.org/projects/seamonkey/>.
Dec 1 '06 #8
In article <ob******************************@iswest.net>,
"David E. Ross" <no****@nowhere.notwrote:
Just be aware that there are many rogue bots, crawlers, and spiders that
ignore both robots.txt and the META tag. See
<http://www.kloth.net/internet/badbots.php>.
Yeah, I'm aware of that.

--
If you really must contact me by email, visit
http://rumkin.com/tools/compression/base64.php
and decode the following string of characters:
RW1haWw6IHBoaWxyb25hbkBibHVleW9uZGVyLmNvLnVr
Dec 1 '06 #9

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

19 posts views Thread by Christian Hvid | last post: by
3 posts views Thread by Biggie | last post: by
29 posts views Thread by Steve | last post: by
4 posts views Thread by David | last post: by
4 posts views Thread by dennis.mcknight | last post: by
1 post views Thread by nnobakht | last post: by
20 posts views Thread by tatata9999 | last post: by
1 post views Thread by Solo | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.