By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,874 Members | 1,027 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,874 IT Pros & Developers. It's quick & easy.

WebRequest from behind a proxy

P: n/a
Hi

I have been struggling with what should be a simple thing. I want to
crawl the web for specific links, and save any html and files that meet
my criteria to my hard drive.

I thought that WebRequest/WebResponse would be the best way to go,
using a proxy. The page that I am returned is just a proxy generated
file, and not the source code. Is there any way around this?

As a workaround, I used the AxWebBrowser and an mshtml document. this
works well, except when I come to downloading files. As stated above,
the easy options don't work - WebRequest/WebRespose return a 404:File
not Found error (which you would expect, given that it doesn't get to
the html), webclient doesn't work behind a proxy AND after doing some
searching, it is proving difficult to programmatically download linked
files using AxWebbrowser.

So, my questions to the smart people out there include:

a) is there any way to get the webrequest/webresponse objects working
behind a proxy when only proxy generated source is returned to the
WebRequest object 9and not the file source code) - could this be
anything to do with proxy authorisation?;

b) is there a way to programattically download pdf files using
AxWebbrowser?

Grateful for any advice.

Cheers

James

Dec 20 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
<ja*********@dewr.gov.au> wrote in message
news:11*********************@g14g2000cwa.googlegro ups.com...
b) is there a way to programattically download pdf files using
AxWebbrowser?

Grateful for any advice.


Can't you just use the TcpClient class to get the page? The syntax is very
simple and you get the full text of the page back unprocessed, so you can do
what you like with it. All you need to do is send:

GET /pagename.htm HTTP/1.1
HOST: nameofhost.com

then 2 crlfs.

Michael
Dec 20 '05 #2

P: n/a
Does this help?

http://support.microsoft.com/default...301102&SD=MSDN

Yosh
<ja*********@dewr.gov.au> wrote in message
news:11*********************@g14g2000cwa.googlegro ups.com...
Hi

I have been struggling with what should be a simple thing. I want to
crawl the web for specific links, and save any html and files that meet
my criteria to my hard drive.

I thought that WebRequest/WebResponse would be the best way to go,
using a proxy. The page that I am returned is just a proxy generated
file, and not the source code. Is there any way around this?

As a workaround, I used the AxWebBrowser and an mshtml document. this
works well, except when I come to downloading files. As stated above,
the easy options don't work - WebRequest/WebRespose return a 404:File
not Found error (which you would expect, given that it doesn't get to
the html), webclient doesn't work behind a proxy AND after doing some
searching, it is proving difficult to programmatically download linked
files using AxWebbrowser.

So, my questions to the smart people out there include:

a) is there any way to get the webrequest/webresponse objects working
behind a proxy when only proxy generated source is returned to the
WebRequest object 9and not the file source code) - could this be
anything to do with proxy authorisation?;

b) is there a way to programattically download pdf files using
AxWebbrowser?

Grateful for any advice.

Cheers

James

Dec 20 '05 #3

P: n/a
Thanks Yosh - yes I had seen that and tried webrequest etc, but
couldn't get through.

Michael - could you provide more information (or links to more
information) on how to set a simple TCPClient up - haven't done it
before.

Thanks

James

Yosh wrote:
Does this help?

http://support.microsoft.com/default...301102&SD=MSDN

Yosh
<ja*********@dewr.gov.au> wrote in message
news:11*********************@g14g2000cwa.googlegro ups.com...
Hi

I have been struggling with what should be a simple thing. I want to
crawl the web for specific links, and save any html and files that meet
my criteria to my hard drive.

I thought that WebRequest/WebResponse would be the best way to go,
using a proxy. The page that I am returned is just a proxy generated
file, and not the source code. Is there any way around this?

As a workaround, I used the AxWebBrowser and an mshtml document. this
works well, except when I come to downloading files. As stated above,
the easy options don't work - WebRequest/WebRespose return a 404:File
not Found error (which you would expect, given that it doesn't get to
the html), webclient doesn't work behind a proxy AND after doing some
searching, it is proving difficult to programmatically download linked
files using AxWebbrowser.

So, my questions to the smart people out there include:

a) is there any way to get the webrequest/webresponse objects working
behind a proxy when only proxy generated source is returned to the
WebRequest object 9and not the file source code) - could this be
anything to do with proxy authorisation?;

b) is there a way to programattically download pdf files using
AxWebbrowser?

Grateful for any advice.

Cheers

James


Dec 20 '05 #4

P: n/a
<ja*********@dewr.gov.au> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...
Thanks Yosh - yes I had seen that and tried webrequest etc, but
couldn't get through.

Michael - could you provide more information (or links to more
information) on how to set a simple TCPClient up - haven't done it
before.


Note the slash after the GET is the page you're requesting, in this case the
default page for MS.

TcpClient client = new TcpClient();
client.Connect("www.microsoft.com", 80);
NetworkStream stream = client.GetStream();
byte[] data = System.Text.ASCIIEncoding.ASCII.GetBytes("GET /
HTTP/1.1\r\nHOST: microsoft.com\r\n\r\n");
stream.Write(data, 0, data.Length);
data = new byte[256];
int len = 0;
do
{
len = stream.Read(data, 0, 256);
Console.WriteLine(System.Text.ASCIIEncoding.ASCII. GetString(data, 0, len));
}while(len == 256);
stream.Close();
client.Close();

Michael
Dec 20 '05 #5

P: n/a
Thanks Michael - that code worked beautifully. I have also found
another proxy server I can use which allows me to use WebRequest and
WebResponse, so case closed ...

Dec 20 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.