not able to HTTPS page from python

muttu2244

Hi all,

Am trying to read a email ids which will be in the form of links ( on
which if we click, they will redirect to outlook with their respective
email ids).

And these links are in the HTTPS page, a secured http page.

The point is that am able to read some links with HTTP page, but am not
able to read the same when I try with HTTPS.

Using the following code from sgmllib am able to read the links,

class MyParser(sgmllib.SGMLParser):

def __init__(self):

sgmllib.SGMLParser.__init__(self)

self.inside_a = False

self.address = ''

def start_a(self,attrs):

if DEBUG:

print "start_a"

print attrs

for attr,value in attrs:

if attr == 'href' and value.startswith('mailto:'):

self.address = value[7:]

self.inside_a = True

def end_a(self):

if DEBUG:

print "end_a"

if self.address:

print '"%s" <%s>' % (self.nickname, self.address)

mailIdList.append(self.address)

self.inside_a = False

self.address = self.nickname = ''

def handle_data(self,data):

if self.inside_a:

self.nickname = data

And for the proxy authentication and the https handler am using the
following lines of code

authinfo = urllib2.HTTPBasicAuthHandler()

proxy_support = urllib2.ProxyHandler ({"http" :
"http://user:password@proxyname:port"})

opener = urllib2.build_opener(proxy_support, authinfo,
urllib2.HTTPSHandler)

urllib2.install_opener(opener)

Then am trying to call the parser for the links in a particular https
page which will be given as a command line argument. Which will read me
all the links in that page.

p = MyParser()

for ln in urllib2.urlopen( sys.argv[1] ):

p.feed(ln)

p.close()

NOTE : I have installed python with _ssl support also.

So with this code am able to read the links with HTTP page but not for
the HTTPS page.

AM NOT GETTING ANY ERRORS EITHER BUT ITS NOT READING THE LINKS, THAT
ARE PRESENT IN THE GIVEN HTTPS PAGE

Could you please tell me am I doing some thing wrong in the above code
for any of the handlers.

I have got struck here from so many days, please give me the solution
for this.

Thanks and regards

YOGI

Nov 9 '05 #1

Subscribe Post Reply

1677

Fredrik Lundh

<mu*******@yahoo.com> wrote:

AM NOT GETTING ANY ERRORS EITHER BUT ITS NOT READING THE LINKS, THAT
ARE PRESENT IN THE GIVEN HTTPS PAGE

HAVE YOU TRIED ADDING A PRINT STATEMENT TO THE FEED LOOP SO
YOU CAN SEE WHAT YOU'RE GETTING BACK FROM THE SERVER ?

</f>

Nov 9 '05 #2

Steve Holden

Fredrik Lundh wrote:

<mu*******@yahoo.com> wrote:

AM NOT GETTING ANY ERRORS EITHER BUT ITS NOT READING THE LINKS, THAT
ARE PRESENT IN THE GIVEN HTTPS PAGE

HAVE YOU TRIED ADDING A PRINT STATEMENT TO THE FEED LOOP SO
YOU CAN SEE WHAT YOU'RE GETTING BACK FROM THE SERVER ?

COULD YOU GUYS BE QUIET, PLEASE, I'M TRYING TO WORK HERE!

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Nov 9 '05 #3

Larry Bates

It is possible that the links have been obscured (something
I do on my own web pages) by inserting Javascript that creates
the links on the fly using document.write(). That way web
spiders can't go through the web pages and easily pick up email
addresses to send spam to all my employees. Just a thought
since you have spent days on this.

-Larry Bates
mu*******@yahoo.com wrote:

Hi all,

Am trying to read a email ids which will be in the form of links ( on
which if we click, they will redirect to outlook with their respective
email ids).

And these links are in the HTTPS page, a secured http page.

The point is that am able to read some links with HTTP page, but am not
able to read the same when I try with HTTPS.

Using the following code from sgmllib am able to read the links,

class MyParser(sgmllib.SGMLParser):

def __init__(self):

sgmllib.SGMLParser.__init__(self)

self.inside_a = False

self.address = ''

def start_a(self,attrs):

if DEBUG:

print "start_a"

print attrs

for attr,value in attrs:

if attr == 'href' and value.startswith('mailto:'):

self.address = value[7:]

self.inside_a = True

def end_a(self):

if DEBUG:

print "end_a"

if self.address:

print '"%s" <%s>' % (self.nickname, self.address)

mailIdList.append(self.address)

self.inside_a = False

self.address = self.nickname = ''

def handle_data(self,data):

if self.inside_a:

self.nickname = data

And for the proxy authentication and the https handler am using the
following lines of code

authinfo = urllib2.HTTPBasicAuthHandler()

proxy_support = urllib2.ProxyHandler ({"http" :
"http://user:password@proxyname:port"})

opener = urllib2.build_opener(proxy_support, authinfo,
urllib2.HTTPSHandler)

urllib2.install_opener(opener)

Then am trying to call the parser for the links in a particular https
page which will be given as a command line argument. Which will read me
all the links in that page.

p = MyParser()

for ln in urllib2.urlopen( sys.argv[1] ):

p.feed(ln)

p.close()

NOTE : I have installed python with _ssl support also.

So with this code am able to read the links with HTTP page but not for
the HTTPS page.

AM NOT GETTING ANY ERRORS EITHER BUT ITS NOT READING THE LINKS, THAT
ARE PRESENT IN THE GIVEN HTTPS PAGE

Could you please tell me am I doing some thing wrong in the above code
for any of the handlers.

I have got struck here from so many days, please give me the solution
for this.

Thanks and regards

YOGI

Nov 9 '05 #4

Similar topics

urllib2 for HTTPS/SSL

by: Kylotan | last post by:

The documentation on this module doesn't seem very clear to me... there's an 'HTTPSHandler' object documented, but it just lists the "https_open" function without giving an example of its use. And...

Python

Using urrlib for https

by: G. Feldman | last post by:

I'm running Python 2.2 on MSWin XP. I can successfully download an http page using urllib.urlopen. At first trying to download an https page failed, so I downloaded and installed the SSL socket...

Python

bus error / crash using https

by: Adam | last post by:

I have an application which interacts with a webserver over https using client certificates. Due to a bug in openssl 0.9.6, I upgraded to 0.9.7 and rebuilt python. Now, when I access the page...

Python

Python to measure HTTP and HTTPS performances: best way ???

by: vincent delft | last post by:

I want to write a script that will monitore the performance of a web application delivering HTTP and HTTPS pages. I would like to know the best way to do it... By reading Python Doc. I've...

Python

Https Form Page

by: Hasan D | last post by:

I'm new on this httplib and urllib. Actually I dont know what should i use. I want to fill the form in a "https" page , and return the result . I write a test code but always gives errors. I cant...

Python

SSL (HTTPS) with 2.4

by: Bloke | last post by:

Hi all. Some time ago (years) I had a script on Python 2.2 that would retieve a HTTPS web site. I used python22-win32-ssl.zip to handle the SSL aspect and it worked wonderfully. I am...

Python

retrieving https pages

by: Eric | last post by:

I'm using Linux - Manriva LE2005, python 2.3 (or i can also use python 2.4 on my other system just as well). Anyways... I want to get a web page containing my stock grants. The initial page is an...

Python

connect to https unpossible. Please help.

by: Mark Delon | last post by:

Hi, i want to log via python script to https page: 'https://brokerjet.ecetra.com/at/' # But it does not work. I am using following code(see below)

Python

https on ActiveState Python 2.4?

by: Jack | last post by:

I'm trying to use urllib to retrieve an https page but I am getting an "unknown url type: https" It seems that ActiveState Python doesn't have SSL support. Any advice?

Python

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA