473,386 Members | 1,958 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

How to do screen scraping where the site requires a log in

Hello,

I would like to pull some information off a site that requires a log in.
I have a subscription to a premium content site, and I would like to be
able to do a few automatic requests instead of having to load the site
manually in a browser.

I have seen plenty articles that explain how to do screen scraping in
..NET, others that describe how to do it via a POST, but I couldn't find
any that covered my scenario.

Basically the problem is that the code would first have to call the home
page, then fill in the log in entries and post the page back. Then, the
code would need to hang on to the cookie (which is what I assume they
are using) so that when it does another request (GET would be fine
here), the site will allow the request and not think the requester is
not logged in.

This all works fine in a browser, as the browser handles the cookie for
you, but the code examples I have seen seem to use completely stateless
requests (ie no cookies preserved), so it wouldn't work for a site like
this.

Any ideas? TIA

--
Alan Silver
(anything added below this line is nothing to do with me)
Aug 31 '06 #1
2 1713
You can try SWExplorerAutomation (SWEA) (http:\\webunittesting.com).

Alan Silver wrote:
Hello,

I would like to pull some information off a site that requires a log in.
I have a subscription to a premium content site, and I would like to be
able to do a few automatic requests instead of having to load the site
manually in a browser.

I have seen plenty articles that explain how to do screen scraping in
.NET, others that describe how to do it via a POST, but I couldn't find
any that covered my scenario.

Basically the problem is that the code would first have to call the home
page, then fill in the log in entries and post the page back. Then, the
code would need to hang on to the cookie (which is what I assume they
are using) so that when it does another request (GET would be fine
here), the site will allow the request and not think the requester is
not logged in.

This all works fine in a browser, as the browser handles the cookie for
you, but the code examples I have seen seem to use completely stateless
requests (ie no cookies preserved), so it wouldn't work for a site like
this.

Any ideas? TIA

--
Alan Silver
(anything added below this line is nothing to do with me)
Sep 5 '06 #2
In article <11**********************@m73g2000cwd.googlegroups .com>,
al*******@hotmail.com writes
>You can try SWExplorerAutomation (SWEA) (http:\\webunittesting.com).
Thanks, looks interesting. The only shame is that I prefer to write my
own code rather than use someone else's. You don't get to understand
what's going on when you use a 3rd party app to do the grunt work.
>Alan Silver wrote:
>Hello,

I would like to pull some information off a site that requires a log in.
I have a subscription to a premium content site, and I would like to be
able to do a few automatic requests instead of having to load the site
manually in a browser.

I have seen plenty articles that explain how to do screen scraping in
.NET, others that describe how to do it via a POST, but I couldn't find
any that covered my scenario.

Basically the problem is that the code would first have to call the home
page, then fill in the log in entries and post the page back. Then, the
code would need to hang on to the cookie (which is what I assume they
are using) so that when it does another request (GET would be fine
here), the site will allow the request and not think the requester is
not logged in.

This all works fine in a browser, as the browser handles the cookie for
you, but the code examples I have seen seem to use completely stateless
requests (ie no cookies preserved), so it wouldn't work for a site like
this.

Any ideas? TIA

--
Alan Silver
(anything added below this line is nothing to do with me)
--
Alan Silver
(anything added below this line is nothing to do with me)
Sep 5 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Roland Hall | last post by:
Am I correct in assuming screen scraping is just the response text sent to the browser? If so, would that mean that this could not be screen scraped? function moi() { var tag = '<a href='; var...
0
by: Robert Martinez | last post by:
I've seen a lot about screen scraping with .NET, mostly in VB.net. I have been able to convert most of it over, but it is still just very basic stuff. Can someone help direct me toward some good...
3
by: _eee_ | last post by:
Does anyone know of a simple code module that can do screen scraping, including simulating user-entered pushbuttons, etc. I can get the first screen on a website with HttpWebRequest, but I need...
3
by: Jim Giblin | last post by:
I need to scrape specific information from another website, specifically the prices of precious metals from several different vendors. While I will credit the vendors as the data source, I do not...
14
by: n8 | last post by:
Hi, Hi have to do the followign and have been racking my brain with various solutions that have had no so great results. I want to use the System.Net.WebClient to submit data to a form (log a...
4
by: rachel | last post by:
Hello, I am currently contracted out by a real estate agent. He has a page that he has created himself that has a list of homes.. their images and data in html format. He wants me to take...
2
by: Victor | last post by:
I'm doing screen scraping by retrieving data from one site and entering into another site. I have a problem with logging into the site. User name and password field contain 'name' property, and...
7
by: ljr2600 | last post by:
Hello, I'm very new to python and still familiarizing myself with the language, sorry if the post seems moronic or simple. For a side project I'm working on I need to be able to scrape a...
3
by: Gregory A Greenman | last post by:
I'm trying to screen scrape a site that requires a password. If I access the site's login page in my browser and view the source, I see that it does not contain a viewstate. When my program...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.