I have a C# Web Application, and we are getting banged repeatedly by a web
crawler at a specific IP address. I did an internet lookup on the IP address
( http://www.ip2location.com), and it just says it is owned by Cox
Communications. I don't know if it belongs to a specific customer of Cox, or
if it belongs to Cox themselves. What is the normal procedure for finding
the owner of the IP? Do I call Cox and complain about abuse? Or is there a
better tool for finding out the actual customer that is running on the IP so
that I may contact them directly and find out why they are banging our site
so often? I know I can just block the IP address, but right now, I can't be
sure if it's one specific user on a dedicated IP, or if it's one person on a
shared network IP. 9 1941
Brian Kitt wrote: I have a C# Web Application, and we are getting banged repeatedly by a web crawler at a specific IP address. I did an internet lookup on the IP address (http://www.ip2location.com), and it just says it is owned by Cox Communications. I don't know if it belongs to a specific customer of Cox, or if it belongs to Cox themselves. What is the normal procedure for finding the owner of the IP? Do I call Cox and complain about abuse? Or is there a better tool for finding out the actual customer that is running on the IP so that I may contact them directly and find out why they are banging our site so often? I know I can just block the IP address, but right now, I can't be sure if it's one specific user on a dedicated IP, or if it's one person on a shared network IP.
it is probably a cable modem user.
collect his IP, dates and times of the abuse, block that IP from your
website, and contact cox abuse center. http://www.cox.com/support/selectlocation_contact.asp
Brian Kitt wrote: I have a C# Web Application
Sounds like the only thing in this that has to do with C# -- you might want
to consider some IIS related group for this type of query.
We are getting banged repeatedly by a web crawler at a specific IP address.
How often?
Do I call Cox and complain about abuse?
If you must.
Or is there a better tool for finding out the actual customer that is running on the IP
No.
I know I can just block the IP address, but right now, I can't be sure if it's one specific user on a dedicated IP, or if it's one person on a shared network IP.
If I were you, the first question I'd ask myself would be "Is this a
problem?", where problem is defined as substantial increase in my bandwidth
use or server load, or perhaps a valid security concern.
Depending on what exactly is being transfered, what seems often at first may
actually amount to nothing significant at all. For example, requesting a
page once a minute seems often, but if the response happens to be 2 KB of
HTML (note that if this is a web crawler or another automated process, it
might not be loading images even if you have them), that amounts to 2 KB *
60 * 24 * 30 = ~86.5 MB / month, which is practically nothing.
There's one valid reason for doing this, which comes to mind -- does your
web application present any data somebody might want to automatically scrub
for use somewhere else? If yes, is this necessarily bad? Assume first it's
not for re-publishing (there are copyright laws for that).
--
Chris Priede
have you defined a robots.txt file: http://www.searchengineworld.com/rob...s_tutorial.htm
"Brian Kitt" <Br*******@discussions.microsoft.com> wrote in message
news:3E**********************************@microsof t.com... I have a C# Web Application, and we are getting banged repeatedly by a web crawler at a specific IP address. I did an internet lookup on the IP address (http://www.ip2location.com), and it just says it is owned by Cox Communications. I don't know if it belongs to a specific customer of Cox, or if it belongs to Cox themselves. What is the normal procedure for finding the owner of the IP? Do I call Cox and complain about abuse? Or is there a better tool for finding out the actual customer that is running on the IP so that I may contact them directly and find out why they are banging our site so often? I know I can just block the IP address, but right now, I can't be sure if it's one specific user on a dedicated IP, or if it's one person on a shared network IP.
The problem is that my site has a lot of links that do searches against
databases and vendor databases. Some of these searches we pay a fee for
(small, but a fee nonetheless). When these crawlers hit our site, they also
invoke every search link on our site. There are maybe 50 or 60 links that
invoke searches.
"Chris Priede" wrote: Brian Kitt wrote: I have a C# Web Application
Sounds like the only thing in this that has to do with C# -- you might want to consider some IIS related group for this type of query.
We are getting banged repeatedly by a web crawler at a specific IP address.
How often?
Do I call Cox and complain about abuse?
If you must.
Or is there a better tool for finding out the actual customer that is running on the IP
No.
I know I can just block the IP address, but right now, I can't be sure if it's one specific user on a dedicated IP, or if it's one person on a shared network IP.
If I were you, the first question I'd ask myself would be "Is this a problem?", where problem is defined as substantial increase in my bandwidth use or server load, or perhaps a valid security concern.
Depending on what exactly is being transfered, what seems often at first may actually amount to nothing significant at all. For example, requesting a page once a minute seems often, but if the response happens to be 2 KB of HTML (note that if this is a web crawler or another automated process, it might not be loading images even if you have them), that amounts to 2 KB * 60 * 24 * 30 = ~86.5 MB / month, which is practically nothing.
There's one valid reason for doing this, which comes to mind -- does your web application present any data somebody might want to automatically scrub for use somewhere else? If yes, is this necessarily bad? Assume first it's not for re-publishing (there are copyright laws for that).
-- Chris Priede
Brian Kitt <Br*******@discussions.microsoft.com> wrote: The problem is that my site has a lot of links that do searches against databases and vendor databases. Some of these searches we pay a fee for (small, but a fee nonetheless). When these crawlers hit our site, they also invoke every search link on our site. There are maybe 50 or 60 links that invoke searches.
A robots.txt file is definitely what you want then - just disable the
crawlers from following those links, and it should be fine.
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
A robots.txt file isn't necessarily going to solve the problem.
Crawlers from respectable companies will respect the robots.txt file, but
it's not those guys you have to worry about.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om... Brian Kitt <Br*******@discussions.microsoft.com> wrote: The problem is that my site has a lot of links that do searches against databases and vendor databases. Some of these searches we pay a fee for (small, but a fee nonetheless). When these crawlers hit our site, they also invoke every search link on our site. There are maybe 50 or 60 links that invoke searches.
A robots.txt file is definitely what you want then - just disable the crawlers from following those links, and it should be fine.
-- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Brian,
Since you know the specific IP that the request is coming from, you can
block the IP address. Here is a link to an article which explains how to do
it: http://www.15seconds.com/issue/011227.htm
If they truly want to crawl your site, they will end up contacting you
and making themselves known.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com
"Brian Kitt" <Br*******@discussions.microsoft.com> wrote in message
news:50**********************************@microsof t.com... The problem is that my site has a lot of links that do searches against databases and vendor databases. Some of these searches we pay a fee for (small, but a fee nonetheless). When these crawlers hit our site, they also invoke every search link on our site. There are maybe 50 or 60 links that invoke searches.
"Chris Priede" wrote:
Brian Kitt wrote: > I have a C# Web Application
Sounds like the only thing in this that has to do with C# -- you might want to consider some IIS related group for this type of query.
> We are getting banged repeatedly by > a web crawler at a specific IP address.
How often?
> Do I call Cox and complain about abuse?
If you must.
> Or is there a better tool for finding out the actual > customer that is running on the IP
No.
> I know I can just block the IP address, but right now, I > can't be sure if it's one specific user on a dedicated IP, > or if it's one person on a shared network IP.
If I were you, the first question I'd ask myself would be "Is this a problem?", where problem is defined as substantial increase in my bandwidth use or server load, or perhaps a valid security concern.
Depending on what exactly is being transfered, what seems often at first may actually amount to nothing significant at all. For example, requesting a page once a minute seems often, but if the response happens to be 2 KB of HTML (note that if this is a web crawler or another automated process, it might not be loading images even if you have them), that amounts to 2 KB * 60 * 24 * 30 = ~86.5 MB / month, which is practically nothing.
There's one valid reason for doing this, which comes to mind -- does your web application present any data somebody might want to automatically scrub for use somewhere else? If yes, is this necessarily bad? Assume first it's not for re-publishing (there are copyright laws for that).
-- Chris Priede
Nicholas Paldino [.NET/C# MVP] <mv*@spam.guard.caspershouse.com> wrote: A robots.txt file isn't necessarily going to solve the problem. Crawlers from respectable companies will respect the robots.txt file, but it's not those guys you have to worry about.
It depends - if the site is set up in a way which would make
respectable crawlers hit the "wrong" links, a robots.txt file *would*
sort it out. It's definitely the first thing to try. It that fails,
*then* it's worth going in for IP blocking etc.
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om... Nicholas Paldino [.NET/C# MVP] <mv*@spam.guard.caspershouse.com>
wrote: A robots.txt file isn't necessarily going to solve the problem. Crawlers from respectable companies will respect the robots.txt
file, but it's not those guys you have to worry about.
It depends - if the site is set up in a way which would make respectable crawlers hit the "wrong" links, a robots.txt file *would* sort it out. It's definitely the first thing to try. It that fails, *then* it's worth going in for IP blocking etc.
There is also the problem of dynamic pages that change content based on
the link clicked on. I have some pages I want googled and some I don't,
so I also check the browser string for things like "bot". One page is
used so infrequently that I send an email to myself if someone shows up
there, and generally it turns out to be a bot, which I add to the list
and redirect.
--
Mabden This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: JIMMIE WHITAKER |
last post by:
Can someone help on this:
I am just learning, and I'm connecting to the the northwindcs.mdf tables /
open file is northwindcs.adp.
This is the sample installed using msde, which is supposed to be...
|
by: ritagoldman101 |
last post by:
Pls help - how to find Domain owner
For most people this may be an easy question...but not for me.
How can I find out who the owner of a domain is so I can write to her /
him and ask if she /...
|
by: DD |
last post by:
I'm not sure that this msg made it out, the first time I sent it, so I
am trying again. --
Win XP Home Edition
I use System.Diagnostics.Process.GetProcesses()) to get info about the
processes...
|
by: John Regan |
last post by:
Hello All
I am trying to find the owner of a file or folder on our network (Windows
2000 Server) using VB.Net and/or API. so I can search for Folders that don't
follow our company's specified...
|
by: Mike Mascari |
last post by:
While migrating to 7.4, which performs quite nicely btw, I must have
performed some sequence of the migration incorrectly. Now, when I use
pg_dump on a database for backup, I get:
pg_dump:...
|
by: gs |
last post by:
I have an about form I want to use also as splash form. however when
invoked from the above I want to provide additional info that is not
accessible when being used as splash
is this possible?
|
by: Laurence |
last post by:
Hi folks,
Who knows how to retrieve the owner of SQL objects, such as SCHEMA,
TABLE etc.? Is GRANTOR within catalog view owner?
Thanks,
Laurence
|
by: MarkJ |
last post by:
him im kinda new to c-sharp...
to reference up the class chain, how do i reference
the super class (parent)
example
class mybase
{
protected int abc=0
}
classs myclass:mybase
{
|
by: SimeonD |
last post by:
Hi
Is there a way I can fnd the Owner of a folder, using vb.net? I know how to
find the permissions, but I can't figure how to find the owner.
Thanks
SimeonD
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |