473,583 Members | 3,072 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to do screen scraping where the site requires a log in

Hello,

I would like to pull some information off a site that requires a log in.
I have a subscription to a premium content site, and I would like to be
able to do a few automatic requests instead of having to load the site
manually in a browser.

I have seen plenty articles that explain how to do screen scraping in
..NET, others that describe how to do it via a POST, but I couldn't find
any that covered my scenario.

Basically the problem is that the code would first have to call the home
page, then fill in the log in entries and post the page back. Then, the
code would need to hang on to the cookie (which is what I assume they
are using) so that when it does another request (GET would be fine
here), the site will allow the request and not think the requester is
not logged in.

This all works fine in a browser, as the browser handles the cookie for
you, but the code examples I have seen seem to use completely stateless
requests (ie no cookies preserved), so it wouldn't work for a site like
this.

Any ideas? TIA

--
Alan Silver
(anything added below this line is nothing to do with me)
Aug 31 '06 #1
2 1726
You can try SWExplorerAutom ation (SWEA) (http:\\webunit testing.com).

Alan Silver wrote:
Hello,

I would like to pull some information off a site that requires a log in.
I have a subscription to a premium content site, and I would like to be
able to do a few automatic requests instead of having to load the site
manually in a browser.

I have seen plenty articles that explain how to do screen scraping in
.NET, others that describe how to do it via a POST, but I couldn't find
any that covered my scenario.

Basically the problem is that the code would first have to call the home
page, then fill in the log in entries and post the page back. Then, the
code would need to hang on to the cookie (which is what I assume they
are using) so that when it does another request (GET would be fine
here), the site will allow the request and not think the requester is
not logged in.

This all works fine in a browser, as the browser handles the cookie for
you, but the code examples I have seen seem to use completely stateless
requests (ie no cookies preserved), so it wouldn't work for a site like
this.

Any ideas? TIA

--
Alan Silver
(anything added below this line is nothing to do with me)
Sep 5 '06 #2
In article <11************ **********@m73g 2000cwd.googleg roups.com>,
al*******@hotma il.com writes
>You can try SWExplorerAutom ation (SWEA) (http:\\webunit testing.com).
Thanks, looks interesting. The only shame is that I prefer to write my
own code rather than use someone else's. You don't get to understand
what's going on when you use a 3rd party app to do the grunt work.
>Alan Silver wrote:
>Hello,

I would like to pull some information off a site that requires a log in.
I have a subscription to a premium content site, and I would like to be
able to do a few automatic requests instead of having to load the site
manually in a browser.

I have seen plenty articles that explain how to do screen scraping in
.NET, others that describe how to do it via a POST, but I couldn't find
any that covered my scenario.

Basically the problem is that the code would first have to call the home
page, then fill in the log in entries and post the page back. Then, the
code would need to hang on to the cookie (which is what I assume they
are using) so that when it does another request (GET would be fine
here), the site will allow the request and not think the requester is
not logged in.

This all works fine in a browser, as the browser handles the cookie for
you, but the code examples I have seen seem to use completely stateless
requests (ie no cookies preserved), so it wouldn't work for a site like
this.

Any ideas? TIA

--
Alan Silver
(anything added below this line is nothing to do with me)
--
Alan Silver
(anything added below this line is nothing to do with me)
Sep 5 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
5730
by: Roland Hall | last post by:
Am I correct in assuming screen scraping is just the response text sent to the browser? If so, would that mean that this could not be screen scraped? function moi() { var tag = '<a href='; var tagType1 = '"mail'+'to:', tagType2 = '">', tagType3 = '<\/a>'; var user1 = 'web', user2 = 'master', user3 = '@'; var dom1 = 'danger', dom2 = 'ous',...
0
2155
by: Robert Martinez | last post by:
I've seen a lot about screen scraping with .NET, mostly in VB.net. I have been able to convert most of it over, but it is still just very basic stuff. Can someone help direct me toward some good info / samples on the following: I want to be able to do 3 things: 1) Set up a module in IBUYSPY Portal (like in the right or left pane) that...
3
1676
by: _eee_ | last post by:
Does anyone know of a simple code module that can do screen scraping, including simulating user-entered pushbuttons, etc. I can get the first screen on a website with HttpWebRequest, but I need to follow up to that by simulating user entries, then get subsequent response screens to that entered data. I figure someone has done this...
3
2353
by: Jim Giblin | last post by:
I need to scrape specific information from another website, specifically the prices of precious metals from several different vendors. While I will credit the vendors as the data source, I do not want to use the format of their pages, and want the inforamtion consolidated to a single page of my design. I did something like this for a...
14
7873
by: n8 | last post by:
Hi, Hi have to do the followign and have been racking my brain with various solutions that have had no so great results. I want to use the System.Net.WebClient to submit data to a form (log a user in) and then redirect to the correct article. Here is the scenerio. If you are not logged into the site for certain articles you are
4
3448
by: rachel | last post by:
Hello, I am currently contracted out by a real estate agent. He has a page that he has created himself that has a list of homes.. their images and data in html format. He wants me to take this page and reformat it so that it looks different. Do I use screen scraping to do this? Could someone please point me to a good screen scraping
2
1514
by: Victor | last post by:
I'm doing screen scraping by retrieving data from one site and entering into another site. I have a problem with logging into the site. User name and password field contain 'name' property, and therefore I can easily do assign statement to them. "userid=uidTest&password=pwTest" However, submit button represented on page via hyper link: <A...
7
3547
by: ljr2600 | last post by:
Hello, I'm very new to python and still familiarizing myself with the language, sorry if the post seems moronic or simple. For a side project I'm working on I need to be able to scrape a modern computer desktop. Is there any basic material already available to do this? I'd rather not need to write my own to interact with hardware. ...
3
4015
by: Gregory A Greenman | last post by:
I'm trying to screen scrape a site that requires a password. If I access the site's login page in my browser and view the source, I see that it does not contain a viewstate. When my program posts the login information, the response I get is the same page as if I had logged in using my browser. In the page it says "Welcome" followed by my...
0
7895
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7826
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
1
7935
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
5374
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3818
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3843
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2333
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1433
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1157
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.