472,811 Members | 1,143 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,811 software developers and data experts.

Problem with urllib.py

Hey,

I want to open a list of URLs with Pythons urllib and the fuction
open(URL) automatically. It is important that the program open ONLY
normal http-sites and no https-sites with user/password-request.
So exists a possibility that I could cancel all site requests with
user/password-dialogues?

Thx

--

Volker

Jul 18 '05 #1
3 1966
Am Thu, 22 Jul 2004 10:43:38 +0200 schrieb Volker M.:
Hey,

I want to open a list of URLs with Pythons urllib and the fuction
open(URL) automatically. It is important that the program open ONLY
normal http-sites and no https-sites with user/password-request.
So exists a possibility that I could cancel all site requests with
user/password-dialogues?


Hi,

urllib is not interactive. If you don't send a
login+password you get an "not authorized" response
with the corresponding http error code.
You can check this return code in your script.

By the way, the user/password request (Pop-Up of browser)
is HTTP Basic Authentication, it can be used with
http or https.

HTH,
Thomas

--
Thomas Güttler, http://www.thomas-guettler.de/
Jul 18 '05 #2
danke :))

--
Volker
Jul 18 '05 #3
"Volker M." <sp********@gmx.de> writes:
I want to open a list of URLs with Pythons urllib and the fuction
open(URL) automatically. It is important that the program open ONLY
normal http-sites and no https-sites with user/password-request.
So exists a possibility that I could cancel all site requests with
user/password-dialogues?


Assuming you mean you don't want to handle Basic HTTP Authentication
(and you don't care whether http or https), you can use
urllib2.urlopen() instead of urllib.urlopen() You will then get a
urllib2.HTTPError with a .code of 401 when a site wants Basic
Authentication.

If you do mean https, though, again with urllib2:

class NullHTTPSHandler(urllib2.HTTPSHandler):
def https_open(self, request):
return None

o = urllib2.build_opener(NullHTTPSHandler())

response = o.open(url)
In general, urllib2 splits up the job of opening URLs into handlers,
so it's more 'turn-off-and-on-able' than urllib.

Since you're writing a robot, one other thing: the alpha version of my
ClientCookie package (urllib2-replacement with addons) contains code
for obeying robots.txt files (albeit not yet well tested, IIRC):

import ClientCookie
o = ClientCookie.build_opener(ClientCookie.HTTPRobotRu lesProcessor())

response = o.open(url)
Some time soon I'll have to make a distribution of this stuff that
works properly with 2.4 (which includes changes to urllib2 from
ClientCookie)...
John
Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: jeff | last post by:
Hiya im trying to pull tags off a website using python ive got a few things running that have the potential to work its just i cant get them to becuase of certain errors? basically i dont...
0
by: C GIllespie | last post by:
Dear All, I'm having problems using the urllib module and was wondering if anyone could suggest a solution. The only thing I can thing of is that I'm using at university and my uni uses a...
0
by: Pieter Edelman | last post by:
Hi all, I'm trying to submit some data using a POST request to a HTTP server with BASIC authentication with python, but I can't get it to work. Since it's driving me completely nuts, so here's...
8
by: Ritesh Raj Sarraf | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello Everybody, I've got a small problem with urlretrieve. Even passing a bad url to urlretrieve doesn't raise an exception. Or does it? If...
1
by: Timothy Wu | last post by:
Hi, I'm trying to fill the form on page http://www.cbs.dtu.dk/services/TMHMM/ using urllib. There are two peculiarities. First of all, I am filling in incorrect key/value pairs in the...
11
by: Johnny Lee | last post by:
Hi, I was using urllib to grab urls from web. here is the work flow of my program: 1. Get base url and max number of urls from user 2. Call filter to validate the base url 3. Read the source...
6
by: justsee | last post by:
Hi, I'm using Python 2.3 on Windows for the first time, and am doing something wrong in using urllib to retrieve images from urls embedded in a csv file. If I explicitly specify a url and image...
1
by: kristian.hermansen | last post by:
keherman@ibmlnx20:/tmp$ cat helloworld.py #!/usr/bin/env python import pygtk pygtk.require('2.0')
1
by: John Nagle | last post by:
If you try to open a password protected page with "urllib.urlopen()", you get "Enter username for EnterPassword at example.com:" on standard output, followed by a read for input! This seems to...
5
by: supercooper | last post by:
I am downloading images using the script below. Sometimes it will go for 10 mins, sometimes 2 hours before timing out with the following error: Traceback (most recent call last): File...
0
by: erikbower65 | last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps: 1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal. 2. Connect to...
0
linyimin
by: linyimin | last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
0
by: erikbower65 | last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA: 1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
0
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: Taofi | last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same This are my field names ID, Budgeted, Actual, Status and Differences ...
0
by: Rina0 | last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.