471,570 Members | 906 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,570 software developers and data experts.

WebRequest from behind a proxy

Hi

I have been struggling with what should be a simple thing. I want to
crawl the web for specific links, and save any html and files that meet
my criteria to my hard drive.

I thought that WebRequest/WebResponse would be the best way to go,
using a proxy. The page that I am returned is just a proxy generated
file, and not the source code. Is there any way around this?

As a workaround, I used the AxWebBrowser and an mshtml document. this
works well, except when I come to downloading files. As stated above,
the easy options don't work - WebRequest/WebRespose return a 404:File
not Found error (which you would expect, given that it doesn't get to
the html), webclient doesn't work behind a proxy AND after doing some
searching, it is proving difficult to programmatically download linked
files using AxWebbrowser.

So, my questions to the smart people out there include:

a) is there any way to get the webrequest/webresponse objects working
behind a proxy when only proxy generated source is returned to the
WebRequest object 9and not the file source code) - could this be
anything to do with proxy authorisation?;

b) is there a way to programattically download pdf files using
AxWebbrowser?

Grateful for any advice.

Cheers

James

Dec 20 '05 #1
5 3218
<ja*********@dewr.gov.au> wrote in message
news:11*********************@g14g2000cwa.googlegro ups.com...
b) is there a way to programattically download pdf files using
AxWebbrowser?

Grateful for any advice.


Can't you just use the TcpClient class to get the page? The syntax is very
simple and you get the full text of the page back unprocessed, so you can do
what you like with it. All you need to do is send:

GET /pagename.htm HTTP/1.1
HOST: nameofhost.com

then 2 crlfs.

Michael
Dec 20 '05 #2
Does this help?

http://support.microsoft.com/default...301102&SD=MSDN

Yosh
<ja*********@dewr.gov.au> wrote in message
news:11*********************@g14g2000cwa.googlegro ups.com...
Hi

I have been struggling with what should be a simple thing. I want to
crawl the web for specific links, and save any html and files that meet
my criteria to my hard drive.

I thought that WebRequest/WebResponse would be the best way to go,
using a proxy. The page that I am returned is just a proxy generated
file, and not the source code. Is there any way around this?

As a workaround, I used the AxWebBrowser and an mshtml document. this
works well, except when I come to downloading files. As stated above,
the easy options don't work - WebRequest/WebRespose return a 404:File
not Found error (which you would expect, given that it doesn't get to
the html), webclient doesn't work behind a proxy AND after doing some
searching, it is proving difficult to programmatically download linked
files using AxWebbrowser.

So, my questions to the smart people out there include:

a) is there any way to get the webrequest/webresponse objects working
behind a proxy when only proxy generated source is returned to the
WebRequest object 9and not the file source code) - could this be
anything to do with proxy authorisation?;

b) is there a way to programattically download pdf files using
AxWebbrowser?

Grateful for any advice.

Cheers

James

Dec 20 '05 #3
Thanks Yosh - yes I had seen that and tried webrequest etc, but
couldn't get through.

Michael - could you provide more information (or links to more
information) on how to set a simple TCPClient up - haven't done it
before.

Thanks

James

Yosh wrote:
Does this help?

http://support.microsoft.com/default...301102&SD=MSDN

Yosh
<ja*********@dewr.gov.au> wrote in message
news:11*********************@g14g2000cwa.googlegro ups.com...
Hi

I have been struggling with what should be a simple thing. I want to
crawl the web for specific links, and save any html and files that meet
my criteria to my hard drive.

I thought that WebRequest/WebResponse would be the best way to go,
using a proxy. The page that I am returned is just a proxy generated
file, and not the source code. Is there any way around this?

As a workaround, I used the AxWebBrowser and an mshtml document. this
works well, except when I come to downloading files. As stated above,
the easy options don't work - WebRequest/WebRespose return a 404:File
not Found error (which you would expect, given that it doesn't get to
the html), webclient doesn't work behind a proxy AND after doing some
searching, it is proving difficult to programmatically download linked
files using AxWebbrowser.

So, my questions to the smart people out there include:

a) is there any way to get the webrequest/webresponse objects working
behind a proxy when only proxy generated source is returned to the
WebRequest object 9and not the file source code) - could this be
anything to do with proxy authorisation?;

b) is there a way to programattically download pdf files using
AxWebbrowser?

Grateful for any advice.

Cheers

James


Dec 20 '05 #4
<ja*********@dewr.gov.au> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...
Thanks Yosh - yes I had seen that and tried webrequest etc, but
couldn't get through.

Michael - could you provide more information (or links to more
information) on how to set a simple TCPClient up - haven't done it
before.


Note the slash after the GET is the page you're requesting, in this case the
default page for MS.

TcpClient client = new TcpClient();
client.Connect("www.microsoft.com", 80);
NetworkStream stream = client.GetStream();
byte[] data = System.Text.ASCIIEncoding.ASCII.GetBytes("GET /
HTTP/1.1\r\nHOST: microsoft.com\r\n\r\n");
stream.Write(data, 0, data.Length);
data = new byte[256];
int len = 0;
do
{
len = stream.Read(data, 0, 256);
Console.WriteLine(System.Text.ASCIIEncoding.ASCII. GetString(data, 0, len));
}while(len == 256);
stream.Close();
client.Close();

Michael
Dec 20 '05 #5
Thanks Michael - that code worked beautifully. I have also found
another proxy server I can use which allows me to use WebRequest and
WebResponse, so case closed ...

Dec 20 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Matthias Kwiedor | last post: by
1 post views Thread by Pete Davis | last post: by
2 posts views Thread by davidcbrown | last post: by
6 posts views Thread by Jensen Bredhal | last post: by
2 posts views Thread by kkb | last post: by
reply views Thread by XIAOLAOHU | last post: by
reply views Thread by lumer26 | last post: by
reply views Thread by Vinnie | last post: by
reply views Thread by lumer26 | last post: by
reply views Thread by lumer26 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.