473,398 Members | 2,404 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

Screenscraping question ... URGENT

I need to download some publicly available data from a website. I can
access the data manually, without any problem when I use my browser,
however, I am trying to automate the donloading of the data, using a C#
application.

However when I try to use an "HTTP get" from C#, I get a message from
the server saying that "No Automatic downloads allowed".

I have two questions:

1). Has someone come accross this problem before?
2). How do I get round this ? - some guidelines, pseudocode or even a
link to some sample code would be greatly appreciated...

Additionally, I would be grateful if someone could explain technically,
whats going on.

Misc info: The site I'm tring to access is running IIS on .Net 1.1 framework

Aug 1 '06 #1
5 1306
Rather obviously it would not be ethical for us to give you a way to bypass
someone's system when they obviously do not want you to automatically
download and screen scrape from their site.

I will however point you in the right direction ... try using a tool like
http://www.securitysupervisor.com/s/http%20sniffer.php and see what is
different between the http posts.

Cheers,

Greg Young
MVP - C#
http://codebetter.com/blogs/gregyoung

"Bit Byte" <fl**@flop.comwrote in message
news:u7********************@bt.com...
>I need to download some publicly available data from a website. I can
access the data manually, without any problem when I use my browser,
however, I am trying to automate the donloading of the data, using a C#
application.

However when I try to use an "HTTP get" from C#, I get a message from the
server saying that "No Automatic downloads allowed".

I have two questions:

1). Has someone come accross this problem before?
2). How do I get round this ? - some guidelines, pseudocode or even a link
to some sample code would be greatly appreciated...

Additionally, I would be grateful if someone could explain technically,
whats going on.

Misc info: The site I'm tring to access is running IIS on .Net 1.1
framework

Aug 1 '06 #2
My guess is that if they've gone that far, they've probably
gone even further. You are probably wasting your time.

--
Robbe Morris - 2004-2006 Microsoft MVP C#
Microsoft .NET Search Engine Scoring Analysis
How does your site rate?
http://www.topichound.com


"Bit Byte" <fl**@flop.comwrote in message
news:u7********************@bt.com...
>I need to download some publicly available data from a website. I can
access the data manually, without any problem when I use my browser,
however, I am trying to automate the donloading of the data, using a C#
application.

However when I try to use an "HTTP get" from C#, I get a message from the
server saying that "No Automatic downloads allowed".

I have two questions:

1). Has someone come accross this problem before?
2). How do I get round this ? - some guidelines, pseudocode or even a link
to some sample code would be greatly appreciated...

Additionally, I would be grateful if someone could explain technically,
whats going on.

Misc info: The site I'm tring to access is running IIS on .Net 1.1
framework

Aug 2 '06 #3
Thus wrote Bit,
I need to download some publicly available data from a website. I can
access the data manually, without any problem when I use my browser,
however, I am trying to automate the donloading of the data, using a
C# application.

However when I try to use an "HTTP get" from C#, I get a message from
the server saying that "No Automatic downloads allowed".

I have two questions:

1). Has someone come accross this problem before?
2). How do I get round this ? - some guidelines, pseudocode or even a
link to some sample code would be greatly appreciated...
Additionally, I would be grateful if someone could explain
technically, whats going on.
Try setting the HttpWebRequest.UserAgent property to a known browser, such
as
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.0.4) Gecko/20060508
Firefox/1.5.0.4".

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Aug 2 '06 #4
Bit Byte wrote:
I need to download some publicly available data from a website. I can
access the data manually, without any problem when I use my browser,
however, I am trying to automate the donloading of the data, using a C#
application.

However when I try to use an "HTTP get" from C#, I get a message from
the server saying that "No Automatic downloads allowed".
That should suggest to you that they really don't want you to download
the data automatically. If you feel you have a valid reason for doing
so, why not email them? Otherwise, show some respect for their wishes.

Jon

Aug 2 '06 #5


Bit Byte wrote:
I need to download some publicly available data from a website. I can
access the data manually, without any problem when I use my browser,
however, I am trying to automate the donloading of the data, using a C#
application.

However when I try to use an "HTTP get" from C#, I get a message from
the server saying that "No Automatic downloads allowed".

I have two questions:

1). Has someone come accross this problem before?
2). How do I get round this ? - some guidelines, pseudocode or even a
link to some sample code would be greatly appreciated...

Additionally, I would be grateful if someone could explain technically,
whats going on.

Misc info: The site I'm tring to access is running IIS on .Net 1.1
framework
I can understand the "moral" objections .... but as I said, this data is
already freely available on the internet. I normally manually download
it myself, but I am going away on holiday for 2 weeks, and I will not
have access to a computer, (and MO, I don't want/have anybody o ask to
do this chore for me whilst I'm away), so all I want to do, is to leave
my computer switched on, and let my program download the data on my
behalf - its not like I will be using a mega computer to overwhelm the
server or anything (I can understand why the site may want to prevent
activities like that may quickly overwhelm the server), but I want to be
able to do what I normally do everyday, using my browser - thats all.

Aug 2 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Rob | last post by:
I have a form - when you click the submit button, it appends a variable to the URL (e.g. xyz.cgi?inputID=some_dynamic_variable) It also opens a new page. Now, that some_dynamic_variable is...
9
by: Stefan Bauer | last post by:
Hi NG, we've got a very urgent problem... :( We are importing data with the LOAD utility. The input DATE field data is in the format DDMMYYYY (for days) and MMYYYY (for months). The target...
8
by: Mike | last post by:
Hello, I have a few rather urgent questions that I hope someone can help with (I need to figure this out prior to a meeting tomorrow.) First, a bit of background: The company I work for is...
28
by: Tamir Khason | last post by:
Follwing the struct: public struct TpSomeMsgRep { public uint SomeId;
16
by: | last post by:
Hi all, I have a website running on beta 2.0 on server 2003 web sp1 and I keep getting the following error:- Error In:...
7
by: zeyais | last post by:
Here is my HTML: <style> ..leftcolumn{float:left;width:300px;border: 1px solid #ccc} ..rtcolumn{float:left;width:600px;border: 1px solid #ccc} </style> <body> <div class="leftcolumn"...
3
by: N. Spiker | last post by:
I am attempting to receive a single TCP packet with some text ending with carriage return and line feed characters. When the text is send and the packet has the urgent flag set, the text read from...
1
by: Philipp Lenssen | last post by:
Hi! I'm having some problems correctly screenscraping and outputting e.g. Chinese characters from a Google translator search result. The output is always a garbled mess, not Chinese characters....
1
by: Dan Stromberg - Datallegro | last post by:
Is there a method, with python, of screenscraping a web page, if that web page uses javascript? I know about BeautifulSoup, but AFAIK at this time, BeautifulSoup is for HTML that doesn't have...
7
by: Cirene | last post by:
I used to use the Web Deployment Project with my VS2005 projects. Now I've fully upgraded to VS2008. Do I have to download a new version of the Web Deployment Project? If so where can I find...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.