473,326 Members | 2,438 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Problem with urllib.py

Hey,

I want to open a list of URLs with Pythons urllib and the fuction
open(URL) automatically. It is important that the program open ONLY
normal http-sites and no https-sites with user/password-request.
So exists a possibility that I could cancel all site requests with
user/password-dialogues?

Thx

--

Volker

Jul 18 '05 #1
3 2005
Am Thu, 22 Jul 2004 10:43:38 +0200 schrieb Volker M.:
Hey,

I want to open a list of URLs with Pythons urllib and the fuction
open(URL) automatically. It is important that the program open ONLY
normal http-sites and no https-sites with user/password-request.
So exists a possibility that I could cancel all site requests with
user/password-dialogues?


Hi,

urllib is not interactive. If you don't send a
login+password you get an "not authorized" response
with the corresponding http error code.
You can check this return code in your script.

By the way, the user/password request (Pop-Up of browser)
is HTTP Basic Authentication, it can be used with
http or https.

HTH,
Thomas

--
Thomas Güttler, http://www.thomas-guettler.de/
Jul 18 '05 #2
danke :))

--
Volker
Jul 18 '05 #3
"Volker M." <sp********@gmx.de> writes:
I want to open a list of URLs with Pythons urllib and the fuction
open(URL) automatically. It is important that the program open ONLY
normal http-sites and no https-sites with user/password-request.
So exists a possibility that I could cancel all site requests with
user/password-dialogues?


Assuming you mean you don't want to handle Basic HTTP Authentication
(and you don't care whether http or https), you can use
urllib2.urlopen() instead of urllib.urlopen() You will then get a
urllib2.HTTPError with a .code of 401 when a site wants Basic
Authentication.

If you do mean https, though, again with urllib2:

class NullHTTPSHandler(urllib2.HTTPSHandler):
def https_open(self, request):
return None

o = urllib2.build_opener(NullHTTPSHandler())

response = o.open(url)
In general, urllib2 splits up the job of opening URLs into handlers,
so it's more 'turn-off-and-on-able' than urllib.

Since you're writing a robot, one other thing: the alpha version of my
ClientCookie package (urllib2-replacement with addons) contains code
for obeying robots.txt files (albeit not yet well tested, IIRC):

import ClientCookie
o = ClientCookie.build_opener(ClientCookie.HTTPRobotRu lesProcessor())

response = o.open(url)
Some time soon I'll have to make a distribution of this stuff that
works properly with 2.4 (which includes changes to urllib2 from
ClientCookie)...
John
Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: jeff | last post by:
Hiya im trying to pull tags off a website using python ive got a few things running that have the potential to work its just i cant get them to becuase of certain errors? basically i dont...
0
by: C GIllespie | last post by:
Dear All, I'm having problems using the urllib module and was wondering if anyone could suggest a solution. The only thing I can thing of is that I'm using at university and my uni uses a...
0
by: Pieter Edelman | last post by:
Hi all, I'm trying to submit some data using a POST request to a HTTP server with BASIC authentication with python, but I can't get it to work. Since it's driving me completely nuts, so here's...
8
by: Ritesh Raj Sarraf | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello Everybody, I've got a small problem with urlretrieve. Even passing a bad url to urlretrieve doesn't raise an exception. Or does it? If...
1
by: Timothy Wu | last post by:
Hi, I'm trying to fill the form on page http://www.cbs.dtu.dk/services/TMHMM/ using urllib. There are two peculiarities. First of all, I am filling in incorrect key/value pairs in the...
11
by: Johnny Lee | last post by:
Hi, I was using urllib to grab urls from web. here is the work flow of my program: 1. Get base url and max number of urls from user 2. Call filter to validate the base url 3. Read the source...
6
by: justsee | last post by:
Hi, I'm using Python 2.3 on Windows for the first time, and am doing something wrong in using urllib to retrieve images from urls embedded in a csv file. If I explicitly specify a url and image...
1
by: kristian.hermansen | last post by:
keherman@ibmlnx20:/tmp$ cat helloworld.py #!/usr/bin/env python import pygtk pygtk.require('2.0')
1
by: John Nagle | last post by:
If you try to open a password protected page with "urllib.urlopen()", you get "Enter username for EnterPassword at example.com:" on standard output, followed by a read for input! This seems to...
5
by: supercooper | last post by:
I am downloading images using the script below. Sometimes it will go for 10 mins, sometimes 2 hours before timing out with the following error: Traceback (most recent call last): File...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.