473,396 Members | 2,099 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Google ignoring robot exclusion tags

Hi,

I recently discovered that Google's mobile search robot doesn't
understand the "robots" Meta tag.

Here's an example:

<http://www.google.com/xhtml/search?s...ch-cingular_mb
_xhtml&mrestrict=xhtml&q=robots+noindex&btnG=Searc h&site=mobile>

When I last looked, the top result for this search was a page at
gustaf.symbiandiaries.com with this tag in the HEAD section:

<meta name="robots" content="noindex,follow" />

It even contains a blog post from its author explaining why the meta tag
was added to this page. That was way back in April, so there's no
getting around the fact that Google is at fault here.

I've informed Google, but no reply yet.

Just thought you might like to know :-)

Phil

--
philronan [@] blueyonder [dot] co [dot] uk
Dec 1 '06 #1
8 2026
Philip Ronan <no****@example.invalidwrote:
>I recently discovered that Google's mobile search robot doesn't
understand the "robots" Meta tag.

Here's an example:

<http://www.google.com/xhtml/search?s...ch-cingular_mb
_xhtml&mrestrict=xhtml&q=robots+noindex&btnG=Sear ch&site=mobile>

When I last looked, the top result for this search was a page at
gustaf.symbiandiaries.com with this tag in the HEAD section:

<meta name="robots" content="noindex,follow" />
Note that the site uses this robots.txt:

User-agent: *
Disallow: /cgi-bin
Disallow: /metablog
Disallow: /feedonfeeds
Disallow: /weblog/2004/
Disallow: /weblog/2005/
Disallow: /weblog/2006/
Disallow: /weblog/2007/

and that the given URL is not excluded.

IMO it's reasonable to completely ignore such legacy meta tags, more so
if a robots.txt is present.

--
Spartanicus
Dec 1 '06 #2
In article <gk********************************@4ax.com>,
Spartanicus <in*****@invalid.invalidwrote:
IMO it's reasonable to completely ignore such legacy meta tags, more so
if a robots.txt is present.
Really.

Do you think it's also OK for Google to ignore their own published
guidelines?

<http://www.google.com/support/webmasters/bin/answer.py?answer=35303>

A site owner might have perfectly good reasons for not wanting to
publicize URLs in a robots.txt file (e.g., preventing users from
siphoning out thousands of web pages with "site download" tools.

And since when has the "robots" meta tag been deprecated? Is that just
an opinion, or can you back that up?

--
If you really must contact me by email, visit
http://rumkin.com/tools/compression/base64.php
and decode the following string of characters:
RW1haWw6IHBoaWxyb25hbkBibHVleW9uZGVyLmNvLnVr
Dec 1 '06 #3
Philip Ronan <no****@example.invalidwrote:
>IMO it's reasonable to completely ignore such legacy meta tags, more so
if a robots.txt is present.

Really.

Do you think it's also OK for Google to ignore their own published
guidelines?
I'm not interested in what Google does WRT their own guidelines.
><http://www.google.com/support/webmasters/bin/answer.py?answer=35303>

A site owner might have perfectly good reasons for not wanting to
publicize URLs in a robots.txt file (e.g., preventing users from
siphoning out thousands of web pages with "site download" tools.
In rare cases of publicly accessible documents that cannot be found by a
link following spider there is no point in listing these documents in a
robots.txt.

Publicly accessible documents that can be found by a link following
spider will be spidered anyway by bots that do not adhere to exclude
requests.
>And since when has the "robots" meta tag been deprecated? Is that just
an opinion, or can you back that up?
Legacy != deprecated, legacy = a left over, relic.

It makes no sense to use document tags to guide SEs, it never did. This
is reflected by the fact that nowadays they are often ignored. A note
likely written quite some time ago from the robots.txt site [about meta
tags aimed at SEs] : "Note that currently only a few robots implement
this."

There has been a better mechanism for some considerable time now.

--
Spartanicus
Dec 1 '06 #4
In article <5m********************************@4ax.com>,
Spartanicus <in*****@invalid.invalidwrote:
I'm not interested in what Google does WRT their own guidelines.
Then STFU.

--
If you really must contact me by email, visit
http://rumkin.com/tools/compression/base64.php
and decode the following string of characters:
RW1haWw6IHBoaWxyb25hbkBibHVleW9uZGVyLmNvLnVr
Dec 1 '06 #5
Spartanicus wrote:
It makes no sense to use document tags to guide SEs, it never did. This
is reflected by the fact that nowadays they are often ignored. A note
likely written quite some time ago from the robots.txt site [about meta
tags aimed at SEs] : "Note that currently only a few robots implement
this."

There has been a better mechanism for some considerable time now.
Define "better". Robots.txt is a mechanism that's useless to anyone who
doesn't have control over the robots.txt file, which includes any
hosting site with user directories, and any organization web site where
each department maintains its own part of the site.

Robots.txt also has its advantages. So, who says there shouldn't be two
complementary ways to accomplish one goal? Once the META method came to
exist, there's no reason to start ignoring those tags. That's like
deciding that the expression "excuse me" is now a legacy expression and
choosing not to get out of people's way when they politely say, "Excuse
me, please." Dropping an existing courtesy serves no principle and is a
hostile act.
Dec 1 '06 #6
Harlan Messinger <hm*******************@comcast.netwrote:
>It makes no sense to use document tags to guide SEs, it never did. This
is reflected by the fact that nowadays they are often ignored. A note
likely written quite some time ago from the robots.txt site [about meta
tags aimed at SEs] : "Note that currently only a few robots implement
this."

There has been a better mechanism for some considerable time now.

Define "better".
More efficient, much better supported and better features would be a
start.
>Robots.txt is a mechanism that's useless to anyone who
doesn't have control over the robots.txt file, which includes any
hosting site with user directories,
Despite of that limitation it is overall a much better mechanism.
>and any organization web site where
each department maintains its own part of the site.
That doesn't mean that they are excluded from editing a web root
document such as a robots.txt file. And subdomains can be used on which
each can use it's own robots.txt.
>Robots.txt also has its advantages. So, who says there shouldn't be two
complementary ways to accomplish one goal? Once the META method came to
exist, there's no reason to start ignoring those tags.
I think you'd find that bot operators much appreciate the better
efficiency of the robots.txt convention.
>That's like
deciding that the expression "excuse me" is now a legacy expression and
choosing not to get out of people's way when they politely say, "Excuse
me, please." Dropping an existing courtesy serves no principle and is a
hostile act.
Again: bot support for meta tags aimed at guiding indexing has reduced
greatly. But you are free to ignore that.

--
Spartanicus
Dec 1 '06 #7
Philip Ronan wrote:
Hi,

I recently discovered that Google's mobile search robot doesn't
understand the "robots" Meta tag.

Here's an example:

<http://www.google.com/xhtml/search?s...ch-cingular_mb
_xhtml&mrestrict=xhtml&q=robots+noindex&btnG=Searc h&site=mobile>

When I last looked, the top result for this search was a page at
gustaf.symbiandiaries.com with this tag in the HEAD section:

<meta name="robots" content="noindex,follow" />

It even contains a blog post from its author explaining why the meta tag
was added to this page. That was way back in April, so there's no
getting around the fact that Google is at fault here.

I've informed Google, but no reply yet.

Just thought you might like to know :-)

Phil
Just be aware that there are many rogue bots, crawlers, and spiders that
ignore both robots.txt and the META tag. See
<http://www.kloth.net/internet/badbots.php>.

--

David E. Ross
<http://www.rossde.com/>

I use SeaMonkey as my Web browser because I want
a browser that complies with Web standards. See
<http://www.mozilla.org/projects/seamonkey/>.
Dec 1 '06 #8
In article <ob******************************@iswest.net>,
"David E. Ross" <no****@nowhere.notwrote:
Just be aware that there are many rogue bots, crawlers, and spiders that
ignore both robots.txt and the META tag. See
<http://www.kloth.net/internet/badbots.php>.
Yeah, I'm aware of that.

--
If you really must contact me by email, visit
http://rumkin.com/tools/compression/base64.php
and decode the following string of characters:
RW1haWw6IHBoaWxyb25hbkBibHVleW9uZGVyLmNvLnVr
Dec 1 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
by: Christian Hvid | last post by:
Hello groups. I have a series of applet computer games on my homepage: http://vredungmand.dk/games/erik-spillet/index.html http://vredungmand.dk/games/nohats/index.html...
3
by: Biggie | last post by:
Hi, is there any standard (RFC or other...) that specifies how to write rules for robots / how robots should implement these rules? There is a document called "Standard for Robot Exclusion" at...
29
by: Steve | last post by:
I have worked on a couple of sites which google's bot visits, partially lists and then goes away again. MSN and Yahoo are fine and working. Can anyone please suggest what, if anything, is...
4
by: David | last post by:
I'm using an XPathNodeIterator to select an element in an XML document that contains text I am going to put in a label on an aspx page. I want to be able to include HTML tags in the text read...
78
by: wkehowski | last post by:
The python code below generates a cartesian product subject to any logical combination of wildcard exclusions. For example, suppose I want to generate a cartesian product S^n, n>=3, of that...
4
by: dennis.mcknight | last post by:
new to php -- please help. it seems like php is treating any '>' character as the end of my code segment, even when it's embedded in a string, as shown <? $s="THIS IS MY TEST STRING"; ?> ...
1
by: nnobakht | last post by:
Hi, I'm working on an assignment for school which i am a bit stuck on. The assignment is to make robot which i have been given the library for move around different boards and collecting "coins" and...
20
by: tatata9999 | last post by:
The first generation of web site search engine hands-down is google. A majority of these web sites are static page -driven html pages. Now, I would think more and more web-based applications are...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.