473,405 Members | 2,445 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Saving a web page

Using HttpWebRequest and HttpWebResponse to retrieve a webpage seems
clear enough.

But unless I am missing something, this will only give me the html
source of the webpage requsted, and not all the images, stylesheets and
so on. Is there a simple way to get the entire webpage?

The alternatives I see now:
Get a WebBrowser in background to do it, but this seems very nasty.
There _has_ to be a better way. Besides, how can I select the correct
file type and enter the name in the backgound?
Interop with mshtml.dll. See above.
After getting the html file, I could iterate through the images, etc.
to request all of them separately.

Thank you in advance!

Jul 3 '06 #1
8 2123
You'll have to get the img tags and download them manually; basically,
write some code which normally a browser would do.

So, parse the <imgtags (and <atags, if you like), then use
HttpRequest to get the images.

HTH
Andy
al******@mail.ru wrote:
Using HttpWebRequest and HttpWebResponse to retrieve a webpage seems
clear enough.

But unless I am missing something, this will only give me the html
source of the webpage requsted, and not all the images, stylesheets and
so on. Is there a simple way to get the entire webpage?

The alternatives I see now:
Get a WebBrowser in background to do it, but this seems very nasty.
There _has_ to be a better way. Besides, how can I select the correct
file type and enter the name in the backgound?
Interop with mshtml.dll. See above.
After getting the html file, I could iterate through the images, etc.
to request all of them separately.

Thank you in advance!
Jul 3 '06 #2
al******@mail.ru wrote:
Using HttpWebRequest and HttpWebResponse to retrieve a webpage seems
clear enough.

But unless I am missing something, this will only give me the html
source of the webpage requsted, and not all the images, stylesheets and
so on. Is there a simple way to get the entire webpage?

The alternatives I see now:
Get a WebBrowser in background to do it, but this seems very nasty.
There _has_ to be a better way. Besides, how can I select the correct
file type and enter the name in the backgound?
Interop with mshtml.dll. See above.
After getting the html file, I could iterate through the images, etc.
to request all of them separately.

Thank you in advance!
Hi,

Unfortunately, there isn't a simple way. The way web-browsers (usually)
work is that they start rendering the page, and download the
images/stylesheets/whatnot as they need them. They're parsing the HTML,
finding an <imgtag, or a <linktag and deciding to download the file
that the tag is referencing.

You'll need to do this; i.e. analyse the HTML you've received, and decide
what needs to be downloaded by looking at the tags.

--
Hope this helps,
Tom Spink
Jul 3 '06 #3
Hello al******@mail.ru,

I'd save page into MHT (web archive) and then parse it to get images
BTW images are encoded in the MHT

PS: This lib could be used for parsing http://www.codeproject.com/csharp/mime_project.asp
Using HttpWebRequest and HttpWebResponse to retrieve a webpage seems
clear enough.

But unless I am missing something, this will only give me the html
source of the webpage requsted, and not all the images, stylesheets
and so on. Is there a simple way to get the entire webpage?

The alternatives I see now:
Get a WebBrowser in background to do it, but this seems very nasty.
There _has_ to be a better way. Besides, how can I select the correct
file type and enter the name in the backgound?
Interop with mshtml.dll. See above.
After getting the html file, I could iterate through the images, etc.
to request all of them separately.
Thank you in advance!
---
WBR,
Michael Nemtsev :: blog: http://spaces.msn.com/laflour

"At times one remains faithful to a cause only because its opponents do not
cease to be insipid." (c) Friedrich Nietzsche
Jul 3 '06 #4

Michael Nemtsev wrote:
Hello al******@mail.ru,

I'd save page into MHT (web archive) and then parse it to get images
BTW images are encoded in the MHT
Ah, thank you. But how do I save it as MHT?

Jul 4 '06 #5

Tom Spink wrote:
al******@mail.ru wrote:
Using HttpWebRequest and HttpWebResponse to retrieve a webpage seems
clear enough.

But unless I am missing something, this will only give me the html
source of the webpage requsted, and not all the images, stylesheets and
so on. Is there a simple way to get the entire webpage?

The alternatives I see now:
Get a WebBrowser in background to do it, but this seems very nasty.
There _has_ to be a better way. Besides, how can I select the correct
file type and enter the name in the backgound?
Interop with mshtml.dll. See above.
After getting the html file, I could iterate through the images, etc.
to request all of them separately.

Thank you in advance!

Hi,

Unfortunately, there isn't a simple way. The way web-browsers (usually)
work is that they start rendering the page, and download the
images/stylesheets/whatnot as they need them. They're parsing the HTML,
finding an <imgtag, or a <linktag and deciding to download the file
that the tag is referencing.

You'll need to do this; i.e. analyse the HTML you've received, and decide
what needs to be downloaded by looking at the tags.
Thank you.

Jul 4 '06 #6
Hello al******@mail.ru,

http://groups.google.com/groups/sear...otnet+save+mht
Michael Nemtsev wrote:
>I'd save page into MHT (web archive) and then parse it to get images
BTW images are encoded in the MHT
Ah, thank you. But how do I save it as MHT?
---
WBR,
Michael Nemtsev :: blog: http://spaces.msn.com/laflour

"At times one remains faithful to a cause only because its opponents do not
cease to be insipid." (c) Friedrich Nietzsche
Jul 4 '06 #7

Michael Nemtsev wrote:
Hello al******@mail.ru,

http://groups.google.com/groups/sear...otnet+save+mht
Thank you again! Looks like it won't work for websites protected by
password, so I am back to plan A.
Michael Nemtsev wrote:
I'd save page into MHT (web archive) and then parse it to get images
BTW images are encoded in the MHT
Ah, thank you. But how do I save it as MHT?
Jul 4 '06 #8
Hello al******@mail.ru,

What does "websites protected by password"?
Any example?
Have you tried to save that sites to MHT via IE?
Michael Nemtsev wrote:
>Hello al******@mail.ru,

http://groups.google.com/groups/sear...otnet+save+mht
Thank you again! Looks like it won't work for websites protected by
password, so I am back to plan A.
>>Michael Nemtsev wrote:

I'd save page into MHT (web archive) and then parse it to get
images BTW images are encoded in the MHT

Ah, thank you. But how do I save it as MHT?
---
WBR,
Michael Nemtsev :: blog: http://spaces.msn.com/laflour

"At times one remains faithful to a cause only because its opponents do not
cease to be insipid." (c) Friedrich Nietzsche
Jul 4 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Lovely Angel For You | last post by:
Dear Friends Hope you all doing great. I have this problem. When I try to save any ASP page, I get the message that "The page will not save correctly". Even though I go ahead and when I go...
1
by: Sachin | last post by:
Hi, I am trying to save a file using the following code. MessageBox("Saving file","",MB_OK); _Module.m_pWebBrowser->get_Document(&pDisp); spHTMLDocument2 = pDisp; IPersistFile* pFile ...
3
by: TheTenor | last post by:
I have a page with a graphic. I want to be able to define the graphics such that it is not saved when the viewer saves the page to his local drive. I'm trying to avoid having a seperate folder...
5
by: Thaynann | last post by:
I have an app that (at the moment) moves through files that are on a web site, and deletes them, wat i want to do for the next stage, is to be able to download each file before i delete it. i...
3
by: Mats Boberg | last post by:
Hi, I have problems with saving a bitmap to hdd from my asp.net page I get the following error: "A generic error occurred in GDI+." Code: Bitmap bmp = new Bitmap(240,120);
3
by: RCS | last post by:
I have an app that I have different "sections" that I want to switch back and forth from, all while having the server maintain viewstate for each page. In other words, when I am on Page1.aspx and...
4
by: Pedro Leite | last post by:
Good Afternoon. the code below is properly retreiving binary data from a database and saving it. but instead of saving at client machine is saving at the server machine. what is wrong with my...
6
by: Mark Denardo | last post by:
My question is similar to one someone posted a few months back, but I don't see any replies. Basically I want to be able to have users upload photos and save them in a database (as byte data)...
3
by: ajaycfd | last post by:
hi all, I need to implement the following functionality in asp. I have got an asp page which renders some html onto the browser.Now instead of rendering the html to the browser i should save...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.