By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,357 Members | 1,021 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,357 IT Pros & Developers. It's quick & easy.

How to do screen scraping where the site requires a log in

P: n/a
Hello,

I would like to pull some information off a site that requires a log in.
I have a subscription to a premium content site, and I would like to be
able to do a few automatic requests instead of having to load the site
manually in a browser.

I have seen plenty articles that explain how to do screen scraping in
..NET, others that describe how to do it via a POST, but I couldn't find
any that covered my scenario.

Basically the problem is that the code would first have to call the home
page, then fill in the log in entries and post the page back. Then, the
code would need to hang on to the cookie (which is what I assume they
are using) so that when it does another request (GET would be fine
here), the site will allow the request and not think the requester is
not logged in.

This all works fine in a browser, as the browser handles the cookie for
you, but the code examples I have seen seem to use completely stateless
requests (ie no cookies preserved), so it wouldn't work for a site like
this.

Any ideas? TIA

--
Alan Silver
(anything added below this line is nothing to do with me)
Aug 31 '06 #1
Share this Question
Share on Google+
2 Replies


P: n/a
You can try SWExplorerAutomation (SWEA) (http:\\webunittesting.com).

Alan Silver wrote:
Hello,

I would like to pull some information off a site that requires a log in.
I have a subscription to a premium content site, and I would like to be
able to do a few automatic requests instead of having to load the site
manually in a browser.

I have seen plenty articles that explain how to do screen scraping in
.NET, others that describe how to do it via a POST, but I couldn't find
any that covered my scenario.

Basically the problem is that the code would first have to call the home
page, then fill in the log in entries and post the page back. Then, the
code would need to hang on to the cookie (which is what I assume they
are using) so that when it does another request (GET would be fine
here), the site will allow the request and not think the requester is
not logged in.

This all works fine in a browser, as the browser handles the cookie for
you, but the code examples I have seen seem to use completely stateless
requests (ie no cookies preserved), so it wouldn't work for a site like
this.

Any ideas? TIA

--
Alan Silver
(anything added below this line is nothing to do with me)
Sep 5 '06 #2

P: n/a
In article <11**********************@m73g2000cwd.googlegroups .com>,
al*******@hotmail.com writes
>You can try SWExplorerAutomation (SWEA) (http:\\webunittesting.com).
Thanks, looks interesting. The only shame is that I prefer to write my
own code rather than use someone else's. You don't get to understand
what's going on when you use a 3rd party app to do the grunt work.
>Alan Silver wrote:
>Hello,

I would like to pull some information off a site that requires a log in.
I have a subscription to a premium content site, and I would like to be
able to do a few automatic requests instead of having to load the site
manually in a browser.

I have seen plenty articles that explain how to do screen scraping in
.NET, others that describe how to do it via a POST, but I couldn't find
any that covered my scenario.

Basically the problem is that the code would first have to call the home
page, then fill in the log in entries and post the page back. Then, the
code would need to hang on to the cookie (which is what I assume they
are using) so that when it does another request (GET would be fine
here), the site will allow the request and not think the requester is
not logged in.

This all works fine in a browser, as the browser handles the cookie for
you, but the code examples I have seen seem to use completely stateless
requests (ie no cookies preserved), so it wouldn't work for a site like
this.

Any ideas? TIA

--
Alan Silver
(anything added below this line is nothing to do with me)
--
Alan Silver
(anything added below this line is nothing to do with me)
Sep 5 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.