473,695 Members | 2,788 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

html source to prevent web bot searching

I read that there are some tags that can be entered in a web page's
meta tags in order to prevent web bot searching and indexing of the
web page for search engines.

What is the tagging that I would need to use?
Jul 20 '05 #1
14 5366
*Ludwig77* wrote:
I read that there are some tags that can be entered in a web page's
meta tags in order to prevent web bot searching and indexing of the
web page for search engines.

What is the tagging that I would need to use?


http://www.robotstxt.org/wc/faq.html#noindex
--
Andrew Urquhart
- FAQ: www.htmlhelp.org/faq/html/
- Archive: www.tinyurl.com/2zw7m (Google Groups)
- My reply address is invalid, use: www.andrewu.co.uk/contact/
Jul 20 '05 #2
Ludwig77 <gr********@yah oo.com> wrote:
I read that there are some tags that can be entered in a web page's
meta tags in order to prevent web bot searching and indexing of the
web page for search engines.

What is the tagging that I would need to use?


There's always robots.txt
http://www.searchengineworld.com/rob...s_tutorial.htm

--
_Deirdre http://deirdre.net
"Memes are a hoax! Pass it on!"
Jul 20 '05 #3
----- Original Message -----
From: "Ludwig77" <>
Newsgroups: comp.infosystem s.www.authoring.html
Sent: Monday, June 14, 2004 2:48 PM
Subject: html source to prevent web bot searching

I read that there are some tags that can be entered in a web page's
meta tags in order to prevent web bot searching and indexing of the
web page for search engines.

What is the tagging that I would need to use?

In the HEAD of the page, insert the following four lines:

<META NAME="ROBOTS" CONTENT="NOINDE X,NOFOLLOW">
<META NAME="robots" CONTENT="noarch ive">
<meta name="robots" content="noimag eindex, nomediaindex" />
<META HTTP-EQUIV="pragma" CONTENT="no-cache">

PLEASE note: The key word in your inquiry is "prevent."
Neither the use of the aforementioned four lines or the use of disallow in
robots.txt PREVENTS ANYTHING.
Rather the interpretation is that honorable bots will abide by your wishes.
On the other hand there are many, many dishonorbale bots.
The solution to those bots is with the implementation of an effective
"htaccess" file. Htaccess is a control rather than a request and properly
used enables the PREVENT you inquiried about.
Jul 20 '05 #4
lostinspace wrote:
Original Message From: "Ludwig77"
I read that there are some tags that can be entered in a web
page's meta tags in order to prevent web bot searching and
indexing of the web page for search engines.
<META NAME="ROBOTS" CONTENT="NOINDE X,NOFOLLOW">
<META NAME="robots" CONTENT="noarch ive">


Ok, though it'd be worth mentioning robots.txt as well.
<meta name="robots" content="noimag eindex, nomediaindex" />
Why did you switch to xhtml syntax for this one line?
<META HTTP-EQUIV="pragma" CONTENT="no-cache">


Pardon? What does caching have to do with search engine indexing?

--
Brian (remove ".invalid" to email me)
http://www.tsmchughs.com/
Jul 20 '05 #5
----- Original Message -----
From: "Brian" <>
Newsgroups: comp.infosystem s.www.authoring.html
Sent: Monday, June 14, 2004 11:07 PM
Subject: Re: html source to prevent web bot searching

lostinspace wrote:
Original Message From: "Ludwig77"
I read that there are some tags that can be entered in a web
page's meta tags in order to prevent web bot searching and
indexing of the web page for search engines.


<META NAME="ROBOTS" CONTENT="NOINDE X,NOFOLLOW">
<META NAME="robots" CONTENT="noarch ive">


Ok, though it'd be worth mentioning robots.txt as well.
<meta name="robots" content="noimag eindex, nomediaindex" />


Why did you switch to xhtml syntax for this one line?
<META HTTP-EQUIV="pragma" CONTENT="no-cache">


Pardon? What does caching have to do with search engine indexing?

--
Brian (remove ".invalid" to email me)

Brian,
I've been using those lines for some time.
Google and some other SE's "each" have their individual preference for page
exclusions.
http://google.netscape.com/webmasters/faq.html#cached
Now isn't that absurd that Google requires something different from the
industry norm?
They aren't alone.
I'm unable to recall which bot the xhtml syntax addresses however it is
specific.

Caching Vs indexing?
If they don't see it they can't read it.

Over a period of "time" controlling ALL the cache (with what ever mean
possible) will provide your logs with the majority of your visitors :-))
This is "contrary" to what many folks will tell you.

BTW the inquiry was specific to "prevent."
If he's unable to find any mention of robot's txt, htaccess or INDEX-NOINDEX
or even do a simple google on "prevent+web+bo t+searching" what in your
opinion is going to be his understanding and experience in these regards?
Jul 20 '05 #6
lostinspace wrote:
From: "Brian" <>
lostinspace wrote:
Original Message From: "Ludwig77"

I read that there are some tags that can be entered in a web
page's meta tags in order to prevent web bot searching and
indexing

<META NAME="ROBOTS" CONTENT="NOINDE X,NOFOLLOW">
<META NAME="robots" CONTENT="noarch ive">
<meta name="robots" content="noimag eindex, nomediaindex" />
Why did you switch to xhtml syntax for this one line?
<META HTTP-EQUIV="pragma" CONTENT="no-cache">


Pardon? What does caching have to do with search engine indexing?

I've been using those lines for some time.


That may be, but it sheds no light on *why* you use them, and why
you're telling others to use them.
Google and some other SE's "each" have their individual preference
for page exclusions.
Which search engines ignore the robots exclusion policy?
http://google.netscape.com/webmasters/faq.html#cached
This link is a Netscape page; strange that you didn't reference a
google.com page.
Now isn't that absurd that Google requires something different from
the industry norm?
What is absurd is that you are offering advice about something which
you have badly misunderstood.

Google cache is a service by which Google offers users a view of the
page as it was last indexed by Googlebot. It is completely unrelated
to http caching (more on that below). Googlebot does respect the
robots policy. If you don't want the page indexed, editing the page
with the meta robots element or the site's robots.txt file will suffice.
I'm unable to recall which bot the xhtml syntax addresses however
it is specific.
Then I can only assume you're mistaken.
Caching Vs indexing? If they don't see it they can't read it.
Exactly. So including

<META NAME="ROBOTS" CONTENT="NOARCH IVE">

is entirely unnecessary if the robot has been blocked from indexing
the site in the first place.
Over a period of "time" controlling ALL the cache (with what ever
mean possible) will provide your logs with the majority of your
visitors :-)) This is "contrary" to what many folks will tell you.
No, this is not contrary to what many folks will tell me. In fact, a
search of web forums will turn up many people who as clueless
about caching as you are.

First, there is no way to ensure that your document is not cached
using pragma; you have a better chance with cache-control. Impeding
caches is an incredibly stupid thing to do in most situations, since
it slows down your site with no appreciable gain. It should only be
done if there is a genuine reason to block caching (security, privacy,
etc.). Vain attempts to "improve" your server logs is one of the
silliest reasons to block caches.

You have muddied the waters further by confusing Google's cache on one
hand with proxy and browser caching on the other. They have *nothing*
to do with each other.
If he's unable to find any mention of robot's txt, htaccess or
INDEX-NOINDEX or even do a simple google on
"prevent+web+bo t+searching" what in your opinion is going to be his
understanding and experience in these regards?


It couldn't possibly be more misleading than your post. You -- and the
op -- can start learning about robots exclusion:

http://www.robotstxt.org/wc/robots.html

Google additions can be found on their site:

http://www.google.com/bot.html

And for pete's sake, please stop giving advice on caching until you
understand it better. Start here:

http://www.web-caching.com/mnot_tutorial/

P.S. Please follow the norms for posting in this group: trim your
quotes, and insert your replies after the relevant quoted parts. See

http://www.xs4all.nl/%7ewijnands/nnq/nquote.html

--
Brian (remove ".invalid" to email me)
http://www.tsmchughs.com/
Jul 20 '05 #7
----- Original Message -----
From: "Brian" <>
Newsgroups: comp.infosystem s.www.authoring.html
Sent: Tuesday, June 15, 2004 1:18 AM
Subject: Re: html source to prevent web bot searching

lostinspace wrote:
From: "Brian" <>
lostinspace wrote:

Original Message From: "Ludwig77"

> I read that there are some tags that can be entered in a web
> page's meta tags in order to prevent web bot searching and
> indexing

<META NAME="ROBOTS" CONTENT="NOINDE X,NOFOLLOW">
<META NAME="robots" CONTENT="noarch ive">
<meta name="robots" content="noimag eindex, nomediaindex" />

Why did you switch to xhtml syntax for this one line?

<META HTTP-EQUIV="pragma" CONTENT="no-cache">

Pardon? What does caching have to do with search engine indexing?

I've been using those lines for some time.


That may be, but it sheds no light on *why* you use them, and why
you're telling others to use them.
Google and some other SE's "each" have their individual preference
for page exclusions.


Which search engines ignore the robots exclusion policy?
http://google.netscape.com/webmasters/faq.html#cached


This link is a Netscape page; strange that you didn't reference a
google.com page.
Now isn't that absurd that Google requires something different from
the industry norm?


What is absurd is that you are offering advice about something which
you have badly misunderstood.

Google cache is a service by which Google offers users a view of the
page as it was last indexed by Googlebot. It is completely unrelated
to http caching (more on that below). Googlebot does respect the
robots policy. If you don't want the page indexed, editing the page
with the meta robots element or the site's robots.txt file will suffice.
I'm unable to recall which bot the xhtml syntax addresses however
it is specific.


Then I can only assume you're mistaken.
Caching Vs indexing? If they don't see it they can't read it.


Exactly. So including

<META NAME="ROBOTS" CONTENT="NOARCH IVE">

is entirely unnecessary if the robot has been blocked from indexing
the site in the first place.
Over a period of "time" controlling ALL the cache (with what ever
mean possible) will provide your logs with the majority of your
visitors :-)) This is "contrary" to what many folks will tell you.


No, this is not contrary to what many folks will tell me. In fact, a
search of web forums will turn up many people who as clueless
about caching as you are.

First, there is no way to ensure that your document is not cached
using pragma; you have a better chance with cache-control. Impeding
caches is an incredibly stupid thing to do in most situations, since
it slows down your site with no appreciable gain. It should only be
done if there is a genuine reason to block caching (security, privacy,
etc.). Vain attempts to "improve" your server logs is one of the
silliest reasons to block caches.

You have muddied the waters further by confusing Google's cache on one
hand with proxy and browser caching on the other. They have *nothing*
to do with each other.
If he's unable to find any mention of robot's txt, htaccess or
INDEX-NOINDEX or even do a simple google on
"prevent+web+bo t+searching" what in your opinion is going to be his
understanding and experience in these regards?


It couldn't possibly be more misleading than your post. You -- and the
op -- can start learning about robots exclusion:

http://www.robotstxt.org/wc/robots.html

Google additions can be found on their site:

http://www.google.com/bot.html

And for pete's sake, please stop giving advice on caching until you
understand it better. Start here:

http://www.web-caching.com/mnot_tutorial/

P.S. Please follow the norms for posting in this group: trim your
quotes, and insert your replies after the relevant quoted parts. See

http://www.xs4all.nl/%7ewijnands/nnq/nquote.html

--
Brian (remove ".invalid" to email me)


Brian,
If you are as knowledgable in regards to these as issues as you
believe you are than WHY didn't you advise the op before I?
It's quite easy for you to sit on your backside and tear apart emails
"after the fact."
You really have less of a clue than you understand. I've been using these
methods and htaccess methods for nearly six years on my websites and today
your attempting to convey that in that period I learned nothing of traffic
patterne :-))

If my sumbissions are upsetting you than filter me out.
Jul 20 '05 #8
On Tue, 15 Jun 2004 13:25:52 GMT, lostinspace
<lo*********@12 3-universe.com> wrote:
If you are as knowledgable in regards to these as issues as
you
believe you are than WHY didn't you advise the op before I?
Posts propogate, and people check in, differently. Possible he never saw
the post before you did.
It's quite easy for you to sit on your backside and tear apart emails
"after the fact."
You really have less of a clue than you understand. I've been using these
methods and htaccess methods for nearly six years on my websites and
today
your attempting to convey that in that period I learned nothing of
traffic
patterne :-))


I think he'd like some evidence, beyond "I heard this works". I would as
well. Google's methods are well known. Where did you learn of the other
methods?
Jul 20 '05 #9
----- Original Message -----
From: "Neal" <>
Newsgroups: comp.infosystem s.www.authoring.html
Sent: Tuesday, June 15, 2004 9:43 AM
Subject: Re: html source to prevent web bot searching

On Tue, 15 Jun 2004 13:25:52 GMT, lostinspace
<> wrote:
If you are as knowledgable in regards to these as issues as
you
believe you are than WHY didn't you advise the op before I?


Posts propogate, and people check in, differently. Possible he never saw
the post before you did.
It's quite easy for you to sit on your backside and tear apart emails
"after the fact."
You really have less of a clue than you understand. I've been using these methods and htaccess methods for nearly six years on my websites and
today
your attempting to convey that in that period I learned nothing of
traffic
patterne :-))


I think he'd like some evidence, beyond "I heard this works". I would as
well. Google's methods are well known. Where did you learn of the other
methods?


Hello Neal,
When I began with my websites, I also started following
alt.html and alt.www.webmaster. From following threads of interest, I began
doing internet searches on lead words which had been supplied in
conversation. Later I participated in a Webmaster World forum surrounding
identifying robots (that forum has since been non-activated.)

There is so much more to this than was previously conveyed, however I felt
no reason to overwhelm the original inquiry.

Brian's concerns and interest are of no relevance to me. I provided the OP
with some lines as he requested which will possibly lead him to some
expanded insights, provided he learns how to use SE's :-)))

htaccess? Just do a google.
Proxies? Google or anybody else will be no help here. Most of the
proxy-server cache bots don't even identify themselves when spidering. AOL
is easy, they are using a UA ( "Mozilla/3.01 (compatible;)" ) .

This very extensive thread will provide you with a wealth of information:
http://www.webmasterworld.com/forum1...ht=perfect+ban

I'm not sure if the search capability still exists in that forum:
http://www.webmasterworld.com/forum11/index.htm
The entire defunct forum surrounded what has been touched on here.

In the end, each webmaster does what he/she determines to best enhance their
websites. Personally, I've in effect created an intranet on the open
internet by denying countries and regions. To take the time to explain and
"debate" over issues that some folks believe they understand to be so and
what I actually see occur is IMO not worth the any time spent convincing
them otherwise. Nor is it my desire to chase URL's that I long ago chased to
solve issues ONLY to support an effective solution which I implemented long
ago, merely to support a mail submission I provided an insight to.

I rarely post in this forum and this required detail for assisting somebody
explains why. :-(((
I provided the simple solution that the OP was asking for. In his original
mail, he inquired about something to include in the <head></head> although
he didn't realize that. He made NO inquiry about robots.txt, htaccess,
proxies, cache or all this other nonsense (at least in regard to his
inquiry.) In effect, I answered his question and I'm required to defend
myself. BS!
Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
5101
by: KathyB | last post by:
Hi, I'm using xml transformed to html via xsl. The problem is my HTML source text auto wraps/breaks lines. This only matters when the line involves a parameter I'm passing to a script...which then doesn't work. QUESTION: How can I prevent the line wrap/break in the html source text? The only thing I've seen is <nobr> but that doesn't affect the source, just the output...is there ANY way?
1
2575
by: Wladimir Borsov | last post by:
I want to mark certain text on a web page in InternetExplorer and then copy and paste it into a normal text editor. With the copied text not only the pure text should be transferred but also the HTML code (e.g. <P>,<HL>,<TABLE>,CSS styles,...). Is there such a tool? No, I don't want to open the source code of the webpage through the menu view->source because searching the actual desired source part is sometimes terrible. I want a...
12
2258
by: Nalaka | last post by:
Hi, I suddenly started getting a lot of errors from html validation (some CSS) so I followed the following instructions to disable it. If you'd rather not have these types of HTML validation errors show up in your error-list, you can disable this functionality by selecting the Tools->Options menu item in VS or Visual Web Developer. Select the TextEditor->Html->Validation tree option in the left-hand side of the
3
1830
by: Brooke | last post by:
How can I prevent a user from viewing the html code on a web page? e.g. right click->view source
11
2777
by: Nathan Sokalski | last post by:
I add several JavaScript events (onchange, onkeypress, etc.) to Controls using the Add method of the Attributes collection. However, if the JavaScript code contains certain characters, such as & or < or several others, it converts them to html, such as &amp; or &lt; which can sometimes cause my scripts not to work. How can I prevent ASP.NET from doing this? Thanks. -- Nathan Sokalski njsokalski@hotmail.com http://www.nathansokalski.com/
12
4002
by: Peter Michaux | last post by:
Hi, I am experimenting with some of the Ruby on Rails JavaScript generators and see something I haven't before. Maybe it is worthwhile? In the page below the script is enclosed in //<!]> Is this trick grounded in any real information about HTML vs XHTML? I
13
31387
by: =?Utf-8?B?S2VzdGZpZWxk?= | last post by:
Hi Our company has a .Net web service that, when called via asp.net web pages across our network works 100%! The problem is that when we try and call the web service from a remote machine, one outside of our domain, we get the error.. ** Client found response content type of 'text/html; charset=Windows-1252', but expected 'text/xml' **. We can discover the web service by typing in the url of the asmx so we know the server can 'see' it...
42
8936
by: Santander | last post by:
how to decode HTML pages encoded like this: http://www.long2consulting.com/seeinaction2008/Simplicity_Beach_table/index.htm Is there script that will do this automatically and generate normal fully readable HTML? Santander
0
8649
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8586
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9004
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8864
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
6506
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5842
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4351
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4592
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2289
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.