Hi,
I recently discovered that Google's mobile search robot doesn't
understand the "robots" Meta tag.
Here's an example:
< http://www.google.com/xhtml/search?s...ch-cingular_mb
_xhtml&mrestrict=xhtml&q=robots+noindex&btnG=Searc h&site=mobile>
When I last looked, the top result for this search was a page at
gustaf.symbiandiaries.com with this tag in the HEAD section:
<meta name="robots" content="noindex,follow" />
It even contains a blog post from its author explaining why the meta tag
was added to this page. That was way back in April, so there's no
getting around the fact that Google is at fault here.
I've informed Google, but no reply yet.
Just thought you might like to know :-)
Phil
--
philronan [@] blueyonder [dot] co [dot] uk 8 2026
Philip Ronan <no****@example.invalidwrote:
>I recently discovered that Google's mobile search robot doesn't understand the "robots" Meta tag.
Here's an example:
<http://www.google.com/xhtml/search?s...ch-cingular_mb _xhtml&mrestrict=xhtml&q=robots+noindex&btnG=Sear ch&site=mobile>
When I last looked, the top result for this search was a page at gustaf.symbiandiaries.com with this tag in the HEAD section:
<meta name="robots" content="noindex,follow" />
Note that the site uses this robots.txt:
User-agent: *
Disallow: /cgi-bin
Disallow: /metablog
Disallow: /feedonfeeds
Disallow: /weblog/2004/
Disallow: /weblog/2005/
Disallow: /weblog/2006/
Disallow: /weblog/2007/
and that the given URL is not excluded.
IMO it's reasonable to completely ignore such legacy meta tags, more so
if a robots.txt is present.
--
Spartanicus
In article <gk********************************@4ax.com>,
Spartanicus <in*****@invalid.invalidwrote:
IMO it's reasonable to completely ignore such legacy meta tags, more so
if a robots.txt is present.
Really.
Do you think it's also OK for Google to ignore their own published
guidelines?
<http://www.google.com/support/webmasters/bin/answer.py?answer=35303>
A site owner might have perfectly good reasons for not wanting to
publicize URLs in a robots.txt file (e.g., preventing users from
siphoning out thousands of web pages with "site download" tools.
And since when has the "robots" meta tag been deprecated? Is that just
an opinion, or can you back that up?
--
If you really must contact me by email, visit http://rumkin.com/tools/compression/base64.php
and decode the following string of characters:
RW1haWw6IHBoaWxyb25hbkBibHVleW9uZGVyLmNvLnVr
Philip Ronan <no****@example.invalidwrote:
>IMO it's reasonable to completely ignore such legacy meta tags, more so if a robots.txt is present.
Really.
Do you think it's also OK for Google to ignore their own published guidelines?
I'm not interested in what Google does WRT their own guidelines.
><http://www.google.com/support/webmasters/bin/answer.py?answer=35303>
A site owner might have perfectly good reasons for not wanting to publicize URLs in a robots.txt file (e.g., preventing users from siphoning out thousands of web pages with "site download" tools.
In rare cases of publicly accessible documents that cannot be found by a
link following spider there is no point in listing these documents in a
robots.txt.
Publicly accessible documents that can be found by a link following
spider will be spidered anyway by bots that do not adhere to exclude
requests.
>And since when has the "robots" meta tag been deprecated? Is that just an opinion, or can you back that up?
Legacy != deprecated, legacy = a left over, relic.
It makes no sense to use document tags to guide SEs, it never did. This
is reflected by the fact that nowadays they are often ignored. A note
likely written quite some time ago from the robots.txt site [about meta
tags aimed at SEs] : "Note that currently only a few robots implement
this."
There has been a better mechanism for some considerable time now.
--
Spartanicus
In article <5m********************************@4ax.com>,
Spartanicus <in*****@invalid.invalidwrote:
I'm not interested in what Google does WRT their own guidelines.
Then STFU.
--
If you really must contact me by email, visit http://rumkin.com/tools/compression/base64.php
and decode the following string of characters:
RW1haWw6IHBoaWxyb25hbkBibHVleW9uZGVyLmNvLnVr
Spartanicus wrote:
It makes no sense to use document tags to guide SEs, it never did. This
is reflected by the fact that nowadays they are often ignored. A note
likely written quite some time ago from the robots.txt site [about meta
tags aimed at SEs] : "Note that currently only a few robots implement
this."
There has been a better mechanism for some considerable time now.
Define "better". Robots.txt is a mechanism that's useless to anyone who
doesn't have control over the robots.txt file, which includes any
hosting site with user directories, and any organization web site where
each department maintains its own part of the site.
Robots.txt also has its advantages. So, who says there shouldn't be two
complementary ways to accomplish one goal? Once the META method came to
exist, there's no reason to start ignoring those tags. That's like
deciding that the expression "excuse me" is now a legacy expression and
choosing not to get out of people's way when they politely say, "Excuse
me, please." Dropping an existing courtesy serves no principle and is a
hostile act.
Harlan Messinger <hm*******************@comcast.netwrote:
>It makes no sense to use document tags to guide SEs, it never did. This is reflected by the fact that nowadays they are often ignored. A note likely written quite some time ago from the robots.txt site [about meta tags aimed at SEs] : "Note that currently only a few robots implement this."
There has been a better mechanism for some considerable time now.
Define "better".
More efficient, much better supported and better features would be a
start.
>Robots.txt is a mechanism that's useless to anyone who doesn't have control over the robots.txt file, which includes any hosting site with user directories,
Despite of that limitation it is overall a much better mechanism.
>and any organization web site where each department maintains its own part of the site.
That doesn't mean that they are excluded from editing a web root
document such as a robots.txt file. And subdomains can be used on which
each can use it's own robots.txt.
>Robots.txt also has its advantages. So, who says there shouldn't be two complementary ways to accomplish one goal? Once the META method came to exist, there's no reason to start ignoring those tags.
I think you'd find that bot operators much appreciate the better
efficiency of the robots.txt convention.
>That's like deciding that the expression "excuse me" is now a legacy expression and choosing not to get out of people's way when they politely say, "Excuse me, please." Dropping an existing courtesy serves no principle and is a hostile act.
Again: bot support for meta tags aimed at guiding indexing has reduced
greatly. But you are free to ignore that.
--
Spartanicus
Philip Ronan wrote:
Hi,
I recently discovered that Google's mobile search robot doesn't
understand the "robots" Meta tag.
Here's an example:
<http://www.google.com/xhtml/search?s...ch-cingular_mb
_xhtml&mrestrict=xhtml&q=robots+noindex&btnG=Searc h&site=mobile>
When I last looked, the top result for this search was a page at
gustaf.symbiandiaries.com with this tag in the HEAD section:
<meta name="robots" content="noindex,follow" />
It even contains a blog post from its author explaining why the meta tag
was added to this page. That was way back in April, so there's no
getting around the fact that Google is at fault here.
I've informed Google, but no reply yet.
Just thought you might like to know :-)
Phil
Just be aware that there are many rogue bots, crawlers, and spiders that
ignore both robots.txt and the META tag. See
<http://www.kloth.net/internet/badbots.php>.
--
David E. Ross
<http://www.rossde.com/>
I use SeaMonkey as my Web browser because I want
a browser that complies with Web standards. See
<http://www.mozilla.org/projects/seamonkey/>.
In article <ob******************************@iswest.net>,
"David E. Ross" <no****@nowhere.notwrote:
Just be aware that there are many rogue bots, crawlers, and spiders that
ignore both robots.txt and the META tag. See
<http://www.kloth.net/internet/badbots.php>.
Yeah, I'm aware of that.
--
If you really must contact me by email, visit http://rumkin.com/tools/compression/base64.php
and decode the following string of characters:
RW1haWw6IHBoaWxyb25hbkBibHVleW9uZGVyLmNvLnVr This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Christian Hvid |
last post by:
Hello groups.
I have a series of applet computer games on my homepage:
http://vredungmand.dk/games/erik-spillet/index.html
http://vredungmand.dk/games/nohats/index.html...
|
by: Biggie |
last post by:
Hi,
is there any standard (RFC or other...) that specifies how to write rules
for robots / how robots should implement these rules?
There is a document called "Standard for Robot Exclusion" at...
|
by: Steve |
last post by:
I have worked on a couple of sites which google's bot visits, partially lists
and then goes away again.
MSN and Yahoo are fine and working.
Can anyone please suggest what, if anything, is...
|
by: David |
last post by:
I'm using an XPathNodeIterator to select an element in an XML document that
contains text I am going to put in a label on an aspx page.
I want to be able to include HTML tags in the text read...
|
by: wkehowski |
last post by:
The python code below generates a cartesian product subject to any
logical combination of wildcard exclusions. For example, suppose I want
to generate a cartesian product S^n, n>=3, of that...
|
by: dennis.mcknight |
last post by:
new to php -- please help.
it seems like php is treating any '>' character as the end of my code
segment, even when it's embedded in a string, as shown
<?
$s="THIS IS MY TEST STRING";
?>
...
|
by: nnobakht |
last post by:
Hi, I'm working on an assignment for school which i am a bit stuck on. The assignment is to make robot which i have been given the library for move around different boards and collecting "coins" and...
|
by: tatata9999 |
last post by:
The first generation of web site search engine hands-down is google.
A majority of these web
sites are static page -driven html pages.
Now, I would think more and more web-based applications are...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
| |