473,698 Members | 2,376 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Downloading full page source of a web page

Hello everyone,

I am currently writing a web spider, and I have it working for the
most part using winINet functions. However, I have found that winINet
functions do not get close to retrieving the full page source, and all
I get is the basic html.

Main Question:::::
What I really want is the entire page source as you would see it if
you do a view->page source in firefox. Can this be done using wininet?
or would I need to use another connection method.

I looked through all the flags that you could set while setting all
the functions necessary to retrieve a file, but I did not find any
flags that would do what I wanted...
found here : http://msdn2.microsoft.com/en-us/library/aa385473.aspx

this is my current code for connecting (minus error checking):
hINet = InternetOpen("I netHTTP/1.0", INTERNET_OPEN_T YPE_PRECONFIG,
NULL, NULL, 0);

hConnection = InternetConnect ( hINet,tempsite. c_str(),
INTERNET_DEFAUL T_HTTP_PORT, NULL, NULL, INTERNET_SERVIC E_HTTP, 0, 0);

hData = HttpOpenRequest (hConnection, "GET",
csite.site.subs tr(endposition, csite.site.leng th()).c_str(), NULL,
NULL, NULL, INTERNET_FLAG_K EEP_CONNECTION, 0);

httpSendRequest Succeeded = HttpSendRequest (hData, NULL, 0, 0, 0);

internetReadFil eSucceeded = InternetReadFil e(hData, (LPVOID)buffer,
(ULONG)(BUFFSIZ E-1), &dwBytesRead );
Thanks a lot,
Rob

Aug 29 '07 #1
2 2119
he*********@gma il.com wrote:
I am currently writing a web spider, and I have it working for the
most part using winINet functions. However, I have found that winINet
functions do not get close to retrieving the full page source, and all
I get is the basic html.

Main Question:::::
What I really want is the entire page source as you would see it if
you do a view->page source in firefox. Can this be done using wininet?
[..]
Wrong newsgroup. Whatever "wininet" is, it's not part of C++ language
or the Standard library. You should consider asking in the newsgroup
for your platform or your compiler (if compiler contains that it its
package).

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Aug 29 '07 #2
On 2007-08-29 05:08, he*********@gma il.com wrote:
Hello everyone,

I am currently writing a web spider, and I have it working for the
most part using winINet functions. However, I have found that winINet
functions do not get close to retrieving the full page source, and all
I get is the basic html.

Main Question:::::
What I really want is the entire page source as you would see it if
you do a view->page source in firefox. Can this be done using wininet?
or would I need to use another connection method.
It seems to me that your understanding of the HTTP protocol and of how
the internet works is a bit lacking. Try to read up on those things and
you'll find that your question will be answered.

--
Erik Wikström
Aug 29 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
10147
by: Xerxes | last post by:
Hi, I have a link in my page that allows users to download an exe file. However, when I download and run it, it briefly displays the DOS box and nothing happens. In my php file, I have: header("Content-Type: application/octet-stream"); header("Content-Disposition: atachment; filename=$filename"); header("Content-Length: ".filesize("$path/$filename")); header("Pragma: no-cache");
3
2123
by: Marc | last post by:
Hello, I have a page with lots of thumbnails of photos. I want the user to be able to dload a big version of the photo (also present on the server) by either 1) clicking on it (left mousebutton) and have a 'save as...' dialog or 2) right-clicking on it and have a 'save as...' dialog. But I don't want the the whole photo to dload to the user before he is able to
16
12641
by: thomas peter | last post by:
I am building a precache engine... one that request over 100 pages on an remote server to cache them remotely... can i use the HttpWebRequest and WebResponse classes for this? or must i use the MSHTML objects to really load the HTML and request all of the images on site? string lcUrl = http://www.cnn.com; // *** Establish the request
1
2078
by: just.starting | last post by:
Hi, My dot net client downloads files and checks for any new files time to time. The server is apache2.0.53 server. So what happens is that my file download thing works fine if I dont try to call any page from the server while downloading. If I try to call a single page while downloading a file then the page request goes time out and the server then closes the existing download stream and the client doestn't throw any exception. So many a...
3
2788
by: just.starting | last post by:
Hi, My dot net client downloads files and checks for any new files time to time. The server is apache2.0.53 server. So what happens is that my file download thing works fine if I dont try to call any page from the server while downloading. If I try to call a single page while downloading a file then the page request goes time out and the server then closes the existing download stream and the client doestn't throw any exception. So many a...
5
5964
by: fniles | last post by:
We created an ActiveX control and marked it as safe for scripting using Implements IObjectSafety. We then created a CAB file and signed it using Verisign. We also created a license file (LPK file) for it. We use this control on an ASP page. We put the CAB file for the AX control and VBRun60.CAB in the same folder with all the ASP files. When we call the ASP page using any machine with IE 6 SP2 (we tried it with multiple IE6 SP2...
8
1922
by: danish | last post by:
Hi, Can anyone suggest me why some .php files are downloading and some are executing. Ive got php 5.2.0 and apache 2.0.54. Apache came as an rpm along with Fedora 4. I downloaded php from the site and built it Thanks Danish
6
3761
by: cyusman | last post by:
Hi, We have just moved our application to a new webfarm server environment which utilizing hardware load balancing, SSL off-loading and HTTP compression off-loading.My application is running on .NET 1.1, IIS 6, Win2003. Now we are having problem when trying to download word documents located in the file server. It often says Internet Explorer cannot download <filenamefrom <servername>. Sometimes we are able to view
1
3701
by: Dave the Wave | last post by:
I work for a large chain coporation. My schedule is posted on their secure web site which I have a user name and a password for. I want to create a URL string that can be sent (using VBA inside of Outlook) which contain my username and passsword and allow me access to my schedule information. I tried: IE.Navigate2 "https://mydish.olivegarden.com/company/TOG/home.asp?user=&password=" where and were replaced by the actual
0
8608
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9161
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9029
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8897
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
6522
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5860
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4370
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3050
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2006
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.