473,791 Members | 3,111 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

web spider and password protected pages

I've been writing a simple web spider for fun, and I've run into a
problem I can't figure out. The spider hangs (waits for username and
pass) when I hit a page that requires .htaccess authentication.

self.f = urllib.urlopen( 'http://blogbloc.com/~jay/test/')
#nothing below here gets executed
print self.f.info()
....

It hangs as soon as I call urllib.urlopen( ). I was going to try to read
the info and break for pages that require authentication, but it hangs
before I can call self.f.info()

Any ideas?

Jul 18 '05 #1
5 2445
jdonnell wrote:
I've been writing a simple web spider for fun, and I've run into a
problem I can't figure out. The spider hangs (waits for username and
pass) when I hit a page that requires .htaccess authentication.

self.f = urllib.urlopen( 'http://blogbloc.com/~jay/test/')
#nothing below here gets executed
print self.f.info()
...

It hangs as soon as I call urllib.urlopen( ). I was going to try to read
the info and break for pages that require authentication, but it hangs
before I can call self.f.info()

Any ideas?


I tried Google. First I looked for "python urlopen authentication" .
I scanned the top page for the word "authentication " and found a
few references, then something called FancyURLOpener. Adding that
to my search, skipping down a couple of links, I quickly found
a page that starts "Here is an explanation about how to handle password
protected sites."

Another approach that often works is to throw in the word
"recipe", hoping perhaps to get a hit in the Python Cookbook
page: try "python http authentication recipe", for example.

I hope that teaches you a bit about how to fish, rather than
just giving you one. ;-)

-Peter
Jul 18 '05 #2
"I quickly found
a page that starts "Here is an explanation about how to handle password
protected sites."

....

I hope that teaches you a bit about how to fish, rather than
just giving you one. ;-) "

Actually, I found a much easier solution, but since you know how to
fish I don't need to tell you what it is ;)

Jul 18 '05 #3
jdonnell wrote:
"I quickly found
a page that starts "Here is an explanation about how to handle password
protected sites."

...

I hope that teaches you a bit about how to fish, rather than
just giving you one. ;-) "

Actually, I found a much easier solution, but since you know how to
fish I don't need to tell you what it is ;)


Nevertheless, perhaps you'll still post the answer here so
that others who come along later can benefit from your
experience in the same way that you benefited from reading
whatever page you found (even if you didn't benefit from
my suggestions...) .

That's the way this forum works best -- thanks,
-Peter
Jul 18 '05 #4
"Neverthele ss, perhaps you'll still post the answer here so
that others who come along later can benefit from your
experience in the same way that you benefited from reading
whatever page you found (even if you didn't benefit from
my suggestions...) . "

Your funny :) Perhaps you should take your own advice. My guess is that
the google search you described will return different results in a few
months. Your first post won't benefit those who "come along later".

"The spider hangs (waits for username and
pass) when I hit a page that requires .htaccess authentication. "

I was using urllib.
urllib2 doesn't have this problem. Simply switching urllib to urllib2
fixed the problem

Jul 18 '05 #5
jdonnell wrote:
"Neverthele ss, perhaps you'll still post the answer here so
that others who come along later can benefit from your
experience in the same way that you benefited from reading
whatever page you found (even if you didn't benefit from
my suggestions...) . "

Your funny :) Perhaps you should take your own advice. My guess is that
the google search you described will return different results in a few
months. Your first post won't benefit those who "come along later".


Sure it will, for those with the wits to understand that I
was trying to show someone how he can *search* for the
information himself, rather than having to beg for others
to do his work for his. Whatever results Google shows in
a few months is irrelevant... it's the technique that mattered.

If there'd been the slightest sign that you'd actually tried
to find the answer yourself before you asked, your criticism
would be far more effective.

-Peter
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
3417
by: Wm | last post by:
Something just occurred to me... <yeah, I know, it scared me too> I just password-protected a website by including a password authentication script in each page of a private section. The script checks the login against the mySQL database. This type of protection will only affect the .php pages, won't it? The images that are contained in the pages are not protected, as they would be if I had a .htaccess file on the parent directory..? This...
0
2385
by: Auction software | last post by:
Free download full version , all products http://netauction8.url4life.com/ Groupawy --------------- Google Groups Email spider. The first email spider for google groups. Millions of valid and active emails in one easy location to collect. Spiderawy
7
2950
by: Eagle35 | last post by:
any one now any good java/Html for password pages so i can protect some pages?? Thanks
3
2436
by: griffith | last post by:
I need some rather technical spidering advice, and I'm hoping that this is a good place to find it (and my apologies if this isn't). My site contains pages of images, where each image includes a map that generates a popup. You can see an example here: http://album.dweeb.org/pages/1980_1.html Click on a stamp, and javascript invokes a popup. All of that works just fine. Here's what doesn't work well. Stamps have an official Scott...
0
2028
by: Auction software | last post by:
Free download full version , all products from Mewsoft dot com http://netauction8.url4life.com/ Groupawy --------------- Google Groups Email spider. The first email spider for google groups. Millions of valid and active emails in one easy location to collect. Spiderawy
0
2082
by: dtsearch | last post by:
New release expands-through a .NET Spider API, to Linux, and to OpenOffice-dtSearch's ability to index over a terabyte of text in a single index, with indexed search time typically less than a second BETHESDA, MD (January 10, 2006) dtSearch Corp., a leading supplier of enterprise and developer text retrieval software, announces Version 7.2 of its product line for instantly searching terabytes of documents across a desktop, network,...
5
2801
by: nick | last post by:
I need to create a simple asp.net application that use password protect some html pages. The html page provider doesn't know asp.net. And the host doesn't allow me to create user accounts. What's the best way to store users/password except database tables? and to store html files?
3
2398
by: Tony Lance | last post by:
Big Bertha Thing spider Cosmic Ray Series Possible Real World System Constructs http://web.onetel.com/~tonylance/spider.html Access page JPG 11K Image Astrophysics net ring Access site Newsgroup Reviews including uk.rec.cycling Drawing of a clockwork spider wheel and hairpin.
2
3435
by: =?Utf-8?B?Q2hhcnRz?= | last post by:
I have been writing C# programs to spider yellow page to get list of restaurant name, address to the database. When I encounter button or hyperlink, I don’t know how to use the program to click the button or hyperlink. Does anyone have this type of sample code in either C#, vb.net? Thanks, Charts
0
9669
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10428
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10207
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9997
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9030
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6776
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5559
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4110
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3718
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.