473,327 Members | 1,930 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

Downloading full page source of a web page

Hello everyone,

I am currently writing a web spider, and I have it working for the
most part using winINet functions. However, I have found that winINet
functions do not get close to retrieving the full page source, and all
I get is the basic html.

Main Question:::::
What I really want is the entire page source as you would see it if
you do a view->page source in firefox. Can this be done using wininet?
or would I need to use another connection method.

I looked through all the flags that you could set while setting all
the functions necessary to retrieve a file, but I did not find any
flags that would do what I wanted...
found here : http://msdn2.microsoft.com/en-us/library/aa385473.aspx

this is my current code for connecting (minus error checking):
hINet = InternetOpen("InetHTTP/1.0", INTERNET_OPEN_TYPE_PRECONFIG,
NULL, NULL, 0);

hConnection = InternetConnect( hINet,tempsite.c_str(),
INTERNET_DEFAULT_HTTP_PORT, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 0);

hData = HttpOpenRequest(hConnection, "GET",
csite.site.substr(endposition,csite.site.length()) .c_str(), NULL,
NULL, NULL, INTERNET_FLAG_KEEP_CONNECTION, 0);

httpSendRequestSucceeded = HttpSendRequest(hData, NULL, 0, 0, 0);

internetReadFileSucceeded = InternetReadFile(hData, (LPVOID)buffer,
(ULONG)(BUFFSIZE-1), &dwBytesRead);
Thanks a lot,
Rob

Aug 29 '07 #1
2 2102
he*********@gmail.com wrote:
I am currently writing a web spider, and I have it working for the
most part using winINet functions. However, I have found that winINet
functions do not get close to retrieving the full page source, and all
I get is the basic html.

Main Question:::::
What I really want is the entire page source as you would see it if
you do a view->page source in firefox. Can this be done using wininet?
[..]
Wrong newsgroup. Whatever "wininet" is, it's not part of C++ language
or the Standard library. You should consider asking in the newsgroup
for your platform or your compiler (if compiler contains that it its
package).

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Aug 29 '07 #2
On 2007-08-29 05:08, he*********@gmail.com wrote:
Hello everyone,

I am currently writing a web spider, and I have it working for the
most part using winINet functions. However, I have found that winINet
functions do not get close to retrieving the full page source, and all
I get is the basic html.

Main Question:::::
What I really want is the entire page source as you would see it if
you do a view->page source in firefox. Can this be done using wininet?
or would I need to use another connection method.
It seems to me that your understanding of the HTTP protocol and of how
the internet works is a bit lacking. Try to read up on those things and
you'll find that your question will be answered.

--
Erik Wikström
Aug 29 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Xerxes | last post by:
Hi, I have a link in my page that allows users to download an exe file. However, when I download and run it, it briefly displays the DOS box and nothing happens. In my php file, I have: ...
3
by: Marc | last post by:
Hello, I have a page with lots of thumbnails of photos. I want the user to be able to dload a big version of the photo (also present on the server) by either 1) clicking on it (left...
16
by: thomas peter | last post by:
I am building a precache engine... one that request over 100 pages on an remote server to cache them remotely... can i use the HttpWebRequest and WebResponse classes for this? or must i use the...
1
by: just.starting | last post by:
Hi, My dot net client downloads files and checks for any new files time to time. The server is apache2.0.53 server. So what happens is that my file download thing works fine if I dont try to call...
3
by: just.starting | last post by:
Hi, My dot net client downloads files and checks for any new files time to time. The server is apache2.0.53 server. So what happens is that my file download thing works fine if I dont try to call...
5
by: fniles | last post by:
We created an ActiveX control and marked it as safe for scripting using Implements IObjectSafety. We then created a CAB file and signed it using Verisign. We also created a license file (LPK file)...
8
by: danish | last post by:
Hi, Can anyone suggest me why some .php files are downloading and some are executing. Ive got php 5.2.0 and apache 2.0.54. Apache came as an rpm along with Fedora 4. I downloaded php from the site...
6
by: cyusman | last post by:
Hi, We have just moved our application to a new webfarm server environment which utilizing hardware load balancing, SSL off-loading and HTTP compression off-loading.My application is running on...
1
by: Dave the Wave | last post by:
I work for a large chain coporation. My schedule is posted on their secure web site which I have a user name and a password for. I want to create a URL string that can be sent (using VBA inside of...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.