By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,562 Members | 1,219 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,562 IT Pros & Developers. It's quick & easy.

web spider and password protected pages

P: n/a
I've been writing a simple web spider for fun, and I've run into a
problem I can't figure out. The spider hangs (waits for username and
pass) when I hit a page that requires .htaccess authentication.

self.f = urllib.urlopen('http://blogbloc.com/~jay/test/')
#nothing below here gets executed
print self.f.info()
....

It hangs as soon as I call urllib.urlopen(). I was going to try to read
the info and break for pages that require authentication, but it hangs
before I can call self.f.info()

Any ideas?

Jul 18 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
jdonnell wrote:
I've been writing a simple web spider for fun, and I've run into a
problem I can't figure out. The spider hangs (waits for username and
pass) when I hit a page that requires .htaccess authentication.

self.f = urllib.urlopen('http://blogbloc.com/~jay/test/')
#nothing below here gets executed
print self.f.info()
...

It hangs as soon as I call urllib.urlopen(). I was going to try to read
the info and break for pages that require authentication, but it hangs
before I can call self.f.info()

Any ideas?


I tried Google. First I looked for "python urlopen authentication".
I scanned the top page for the word "authentication" and found a
few references, then something called FancyURLOpener. Adding that
to my search, skipping down a couple of links, I quickly found
a page that starts "Here is an explanation about how to handle password
protected sites."

Another approach that often works is to throw in the word
"recipe", hoping perhaps to get a hit in the Python Cookbook
page: try "python http authentication recipe", for example.

I hope that teaches you a bit about how to fish, rather than
just giving you one. ;-)

-Peter
Jul 18 '05 #2

P: n/a
"I quickly found
a page that starts "Here is an explanation about how to handle password
protected sites."

....

I hope that teaches you a bit about how to fish, rather than
just giving you one. ;-) "

Actually, I found a much easier solution, but since you know how to
fish I don't need to tell you what it is ;)

Jul 18 '05 #3

P: n/a
jdonnell wrote:
"I quickly found
a page that starts "Here is an explanation about how to handle password
protected sites."

...

I hope that teaches you a bit about how to fish, rather than
just giving you one. ;-) "

Actually, I found a much easier solution, but since you know how to
fish I don't need to tell you what it is ;)


Nevertheless, perhaps you'll still post the answer here so
that others who come along later can benefit from your
experience in the same way that you benefited from reading
whatever page you found (even if you didn't benefit from
my suggestions...).

That's the way this forum works best -- thanks,
-Peter
Jul 18 '05 #4

P: n/a
"Nevertheless, perhaps you'll still post the answer here so
that others who come along later can benefit from your
experience in the same way that you benefited from reading
whatever page you found (even if you didn't benefit from
my suggestions...). "

Your funny :) Perhaps you should take your own advice. My guess is that
the google search you described will return different results in a few
months. Your first post won't benefit those who "come along later".

"The spider hangs (waits for username and
pass) when I hit a page that requires .htaccess authentication."

I was using urllib.
urllib2 doesn't have this problem. Simply switching urllib to urllib2
fixed the problem

Jul 18 '05 #5

P: n/a
jdonnell wrote:
"Nevertheless, perhaps you'll still post the answer here so
that others who come along later can benefit from your
experience in the same way that you benefited from reading
whatever page you found (even if you didn't benefit from
my suggestions...). "

Your funny :) Perhaps you should take your own advice. My guess is that
the google search you described will return different results in a few
months. Your first post won't benefit those who "come along later".


Sure it will, for those with the wits to understand that I
was trying to show someone how he can *search* for the
information himself, rather than having to beg for others
to do his work for his. Whatever results Google shows in
a few months is irrelevant... it's the technique that mattered.

If there'd been the slightest sign that you'd actually tried
to find the answer yourself before you asked, your criticism
would be far more effective.

-Peter
Jul 18 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.