473,396 Members | 1,773 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

retrieving https pages

I'm using Linux - Manriva LE2005, python 2.3 (or i can also use python 2.4
on my other system just as well).
Anyways...
I want to get a web page containing my stock grants.
The initial page is an https and there is a form on it to
fill in your username and password and then click "login"
I played with python's urlopen and basically it complains "your browser
doesnt support frames" meaning the urlopen call makes it unhappy somehow.
Is it reasonable to think i can build a script to login to this secure
website, move to a different page (on that site) and download it to disk?
Or am i just looking at a ling complicated task.
I'd really like to get the page because then i can analyze it from a cron
job and email myself my current options value each week or each month.
Thanks
Eric

Jul 21 '05 #1
2 1252
ncf
It might be checking the browser's User-agent. My best bet for you
would to be to use something to record the headers your browser sends
out, and mimic those in Python.

If you look at the source code for urlopener (I think you can press
Alt+M and type in "urlopener"), under the FancyURLopener definition,
you should see something like self.add_headers (not on a box to check
it right now, but it's in the constructer, I remember that much).

Just set all the headers to send out (like your browser would) by
setting that value from your script. i.e.:

import urlopener
urlopener = FancyURLopener()
urlopener.add_headers =
[('User-agent','blah'),('Header2','val'),('monkey','bone')]
# do the other stuff here :P

HTH

-Wes

Jul 21 '05 #2
Eric <BorgMotherShip@AliensR_US.org> writes:
I'm using Linux - Manriva LE2005, python 2.3 (or i can also use python 2.4
on my other system just as well).
Anyways...
I want to get a web page containing my stock grants.
The initial page is an https and there is a form on it to
fill in your username and password and then click "login"
I played with python's urlopen and basically it complains "your browser
doesnt support frames" meaning the urlopen call makes it unhappy somehow.
Is it reasonable to think i can build a script to login to this secure
website, move to a different page (on that site) and download it to disk?
Or am i just looking at a ling complicated task.


It's not that bad. It took me about half a day to do this for a site I
wanted scraped regularly, and what I had to do was much more
complicated than what you describe. I had to deal with an optional
second login page (a "security feature" of the site), http-equiv
redirects (which urlopen doesn't handle), and then digging the URL of
the page I wanted to get information from from the resulting page.

The complaint about your browser may be their inadequate attempt to
deal with browser portability by putting that on the resulting framed
page in the NOFRAMES element. In which case, you just need to find the
URL for the frame that's got the information you want, and get that
page. On the other hand, as Wes said, they may be browser-sniffing. In
which case you'll have to set the User-Agent to something they won't
complain about. Personally, I always try "Your Web Site Developer
Sucks" to see if they have a list of disallowed browsers. If that
fails, try the User-Agent string of a well-known browser.

For page scraping, install BeautifulSoup.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jul 21 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: NotGiven | last post by:
I have a web page where certain pages have to be opened in a certain order and should only be available when the user openes them in HTTPS. They are all forms and the form action sends you to the...
1
by: Kenneth Keeley | last post by:
Hi I wish to have a web site that has most of the pages as normal HTTP pages but has some areas that use HTTPS. I want to have it that if a user selects a link to a HTTPS page that they go there...
6
by: Astra | last post by:
Hi All I've noticed on quite a few ASP sites that when they have a 'MyAccount' section they transfer the site to https and then when you have logged into your account successfully and gone back...
2
by: Rujuta Gandhi | last post by:
Hi All, I am facing a very crucial problem. Im developing a web application using .net studio 2005(beta). I want my Login.aspx page to be secured(https) for encrypted login information...
3
by: zn | last post by:
This is a beginner question. I need to create a page that is encrypted by SSL. The web server is already serving SSL encrypted web pages with "https" before the link. Do I need to do anything other...
12
by: Grunff | last post by:
I'm experiencing an interesting problem with carrying a php session over from http to https. Much googling later, I'm still stuck. The application is an online shop, where some user data is...
2
by: Mark Delon | last post by:
Hi, i want to log via python script to https page: 'https://brokerjet.ecetra.com/at/' # But it does not work. I am using following code(see below)
3
by: Pooja Renukdas | last post by:
Hello, I have this web site where only two pages have to be secure pages and I need to call them using https, but since I have my development server and my production web server, I dont want to...
3
by: E | last post by:
I have a https login page with C# code FormsAuthentication. After logging in, my https pages recognize that I'm logged in. My http pages do not. It's as if it's considering these pages under a...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.