By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,974 Members | 1,913 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,974 IT Pros & Developers. It's quick & easy.

HttpBrowserCapabilities - recognized Crawlers

P: n/a
Hi,
Would someone know where I could get a list of the supported crawlers for
the HttpBrowserCapabilities?
Is there a way to add new ones/modify the list?

I have a web site for which I want to show a different content for search
engine bots. I was planning on relying on HttpBrowserCapabilities.crawler,
but what if the bot signature changes, or there is another one added, ...

Thanks,
Zolt
Aug 20 '08 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Zolt wrote:
I have a web site for which I want to show a different content for
search engine bots.
Rather than try to get the site blacklisted by search engines, why not just
use a robots.txt file to exclude them?

Andrew
Aug 21 '08 #2

P: n/a
Andrew,

What I want to do is show the search engines a different content, not
prevent them from coming to my site.
The problem is that I have pages that contain text in 2 languages which is
shown depending on the browser's prefered language and/or selected language
saved in a cookie.
Doing it this way, I don't have to show urls with ugly query strings like
http://www.mysite.com/default.aspx?lang=en
The problem with search engines is that they only use the default language,
but can't switch language to reindex the content in the other language.
My goal is to detect if the requester is a web crawler, and if it is, show
both languages. If not, continue the normal way.

I have found an interesting post, which I believe I will be able to use
(http://forums.asp.net/p/908519/1012090.aspx#1012090).

I should be able to modify it to monitor the major search engines - I am
only interested in those major ones.

Thanks for the suggestion anyway,
Zolt

"Andrew Morton" wrote:
Zolt wrote:
I have a web site for which I want to show a different content for
search engine bots.

Rather than try to get the site blacklisted by search engines, why not just
use a robots.txt file to exclude them?

Andrew
Aug 21 '08 #3

P: n/a
Zolt wrote:
Andrew,

What I want to do is show the search engines a different content, not
prevent them from coming to my site.
The problem is that I have pages that contain text in 2 languages
which is shown depending on the browser's prefered language and/or
selected language saved in a cookie.
Ahh - it sounded like you might want to do something referred to as web site
cloaking.
....
My goal is to detect if the requester is a web crawler, and if it is,
show both languages. If not, continue the normal way.
A regular expression which catches the crawlers which visit our sites is

Dim re As New
Regex("bot|spider|slurp|crawler|teoma|DMOZ|;1813|f indlinks|tellbaby|ia_archiver|nutch|voyager|wwwste r|3dir|scooter|appie|exactseek|feedfetcher|freedir |holmes|panscient|yandex|alef|cfnetwork|kalooga",
RegexOptions.Compiled Or RegexOptions.IgnoreCase)

applied to the user-agent string, of course. You could use the Sub
Session_Start in Global.asax.vb as the location to check it.

Then if you can find a URL in the UA string, you can check its TLD for .com,
..fr, .whatever.

(You might want to take out the ";1813" - I put that in to filter out the
AVG link checker thing which happened to distort the actual users stats on
our sites.)

HTH

Andrew
Aug 21 '08 #4

P: n/a
Thanks a lot Andrew!
Your solution seems to give more choices than the one I found.
I will probably go that route.

Really appreciated,
Zolt
Aug 21 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.