I've been writing a simple web spider for fun, and I've run into a
problem I can't figure out. The spider hangs (waits for username and
pass) when I hit a page that requires .htaccess authentication.
self.f = urllib.urlopen( 'http://blogbloc.com/~jay/test/')
#nothing below here gets executed
print self.f.info()
....
It hangs as soon as I call urllib.urlopen( ). I was going to try to read
the info and break for pages that require authentication, but it hangs
before I can call self.f.info()
Any ideas? 5 2445
jdonnell wrote: I've been writing a simple web spider for fun, and I've run into a problem I can't figure out. The spider hangs (waits for username and pass) when I hit a page that requires .htaccess authentication.
self.f = urllib.urlopen( 'http://blogbloc.com/~jay/test/') #nothing below here gets executed print self.f.info() ...
It hangs as soon as I call urllib.urlopen( ). I was going to try to read the info and break for pages that require authentication, but it hangs before I can call self.f.info()
Any ideas?
I tried Google. First I looked for "python urlopen authentication" .
I scanned the top page for the word "authentication " and found a
few references, then something called FancyURLOpener. Adding that
to my search, skipping down a couple of links, I quickly found
a page that starts "Here is an explanation about how to handle password
protected sites."
Another approach that often works is to throw in the word
"recipe", hoping perhaps to get a hit in the Python Cookbook
page: try "python http authentication recipe", for example.
I hope that teaches you a bit about how to fish, rather than
just giving you one. ;-)
-Peter
"I quickly found
a page that starts "Here is an explanation about how to handle password
protected sites."
....
I hope that teaches you a bit about how to fish, rather than
just giving you one. ;-) "
Actually, I found a much easier solution, but since you know how to
fish I don't need to tell you what it is ;)
jdonnell wrote: "I quickly found a page that starts "Here is an explanation about how to handle password protected sites."
...
I hope that teaches you a bit about how to fish, rather than just giving you one. ;-) "
Actually, I found a much easier solution, but since you know how to fish I don't need to tell you what it is ;)
Nevertheless, perhaps you'll still post the answer here so
that others who come along later can benefit from your
experience in the same way that you benefited from reading
whatever page you found (even if you didn't benefit from
my suggestions...) .
That's the way this forum works best -- thanks,
-Peter
"Neverthele ss, perhaps you'll still post the answer here so
that others who come along later can benefit from your
experience in the same way that you benefited from reading
whatever page you found (even if you didn't benefit from
my suggestions...) . "
Your funny :) Perhaps you should take your own advice. My guess is that
the google search you described will return different results in a few
months. Your first post won't benefit those who "come along later".
"The spider hangs (waits for username and
pass) when I hit a page that requires .htaccess authentication. "
I was using urllib.
urllib2 doesn't have this problem. Simply switching urllib to urllib2
fixed the problem
jdonnell wrote: "Neverthele ss, perhaps you'll still post the answer here so that others who come along later can benefit from your experience in the same way that you benefited from reading whatever page you found (even if you didn't benefit from my suggestions...) . "
Your funny :) Perhaps you should take your own advice. My guess is that the google search you described will return different results in a few months. Your first post won't benefit those who "come along later".
Sure it will, for those with the wits to understand that I
was trying to show someone how he can *search* for the
information himself, rather than having to beg for others
to do his work for his. Whatever results Google shows in
a few months is irrelevant... it's the technique that mattered.
If there'd been the slightest sign that you'd actually tried
to find the answer yourself before you asked, your criticism
would be far more effective.
-Peter This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Wm |
last post by:
Something just occurred to me... <yeah, I know, it scared me too> I just
password-protected a website by including a password authentication script
in each page of a private section. The script checks the login against the
mySQL database. This type of protection will only affect the .php pages,
won't it? The images that are contained in the pages are not protected, as
they would be if I had a .htaccess file on the parent directory..? This...
|
by: Auction software |
last post by:
Free download full version , all products
http://netauction8.url4life.com/
Groupawy
---------------
Google Groups Email spider. The first email spider for google groups.
Millions of valid and active emails in one easy location to collect.
Spiderawy
|
by: Eagle35 |
last post by:
any one now any good java/Html for password pages so i can protect some
pages??
Thanks
|
by: griffith |
last post by:
I need some rather technical spidering advice, and I'm hoping that this
is a good place to find it (and my apologies if this isn't). My site
contains pages of images, where each image includes a map that
generates a popup. You can see an example here:
http://album.dweeb.org/pages/1980_1.html
Click on a stamp, and javascript invokes a popup. All of that works
just fine. Here's what doesn't work well. Stamps have an official
Scott...
|
by: Auction software |
last post by:
Free download full version , all products from Mewsoft dot com
http://netauction8.url4life.com/
Groupawy
---------------
Google Groups Email spider. The first email spider for google groups.
Millions of valid and active emails in one easy location to collect.
Spiderawy
| |
by: dtsearch |
last post by:
New release expands-through a .NET Spider API, to Linux, and to
OpenOffice-dtSearch's ability to index over a terabyte of text in a
single index, with indexed search time typically less than a second
BETHESDA, MD (January 10, 2006) dtSearch Corp., a leading supplier of
enterprise and developer text retrieval software, announces Version 7.2
of its product line for instantly searching terabytes of documents
across a desktop, network,...
|
by: nick |
last post by:
I need to create a simple asp.net application that use password protect some
html pages. The html page provider doesn't know asp.net. And the host doesn't
allow me to create user accounts.
What's the best way to store users/password except database tables? and to
store html files?
|
by: Tony Lance |
last post by:
Big Bertha Thing spider
Cosmic Ray Series
Possible Real World System Constructs
http://web.onetel.com/~tonylance/spider.html
Access page JPG 11K Image
Astrophysics net ring Access site
Newsgroup Reviews including uk.rec.cycling
Drawing of a clockwork spider wheel and hairpin.
|
by: =?Utf-8?B?Q2hhcnRz?= |
last post by:
I have been writing C# programs to spider yellow page to get list of
restaurant name, address to the database. When I encounter button or
hyperlink, I don’t know how to use the program to click the button or
hyperlink. Does anyone have this type of sample code in either C#, vb.net?
Thanks,
Charts
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |