By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,792 Members | 1,489 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,792 IT Pros & Developers. It's quick & easy.

Agnostic fetching

P: n/a

OK, that sounds stupid. Anyway, I've been learning Python for some
time now, and am currently having fun with the urllib and urllib2
modules, but have run into a problem(?) - is there any way to fetch
(urllib.retrieve) files from a server without knowing the filenames?
For instance, there is smth like folder/spam.egg, folder/
unpredictable.egg and so on. If not, perhaps some kind of glob to
create a list of existing files? I'd really appreciate some help,
since I'm really out of my (newb) depth here.
Aug 2 '08 #1
Share this Question
Share on Google+
4 Replies

P: n/a
On Fri, 01 Aug 2008 17:05:00 -0700, jorpheus wrote:
OK, that sounds stupid. Anyway, I've been learning Python for some
time now, and am currently having fun with the urllib and urllib2
modules, but have run into a problem(?) - is there any way to fetch
(urllib.retrieve) files from a server without knowing the filenames?
For instance, there is smth like folder/spam.egg, folder/
unpredictable.egg and so on. If not, perhaps some kind of glob to
create a list of existing files? I'd really appreciate some help,
since I'm really out of my (newb) depth here.
You might try the os.path module and/or the glob module in the standard
python library.
Aug 2 '08 #2

P: n/a


jorpheus wrote:
OK, that sounds stupid. Anyway, I've been learning Python for some
time now, and am currently having fun with the urllib and urllib2
modules, but have run into a problem(?) - is there any way to fetch
(urllib.retrieve) files from a server without knowing the filenames?
For instance, there is smth like folder/spam.egg, folder/
unpredictable.egg and so on. If not, perhaps some kind of glob to
create a list of existing files? I'd really appreciate some help,
since I'm really out of my (newb) depth here.
If you are asking whether servers will let you go fishing around their
file system, the answer is that http is not designed for that (whereas
ftp is as long as you stay under the main ftp directory). You can try
random file names, but the server may get unhappy and think you are
trying to break in through a back door or something. You are *expected*
to start at ..../index.html and proceed with the links given there. Or
to use a valid filename that was retrieved by that method.

Aug 2 '08 #3

P: n/a
Bruce Frederiksen schrieb:
On Fri, 01 Aug 2008 17:05:00 -0700, jorpheus wrote:
>OK, that sounds stupid. Anyway, I've been learning Python for some
time now, and am currently having fun with the urllib and urllib2
modules, but have run into a problem(?) - is there any way to fetch
(urllib.retrieve) files from a server without knowing the filenames?
For instance, there is smth like folder/spam.egg, folder/
unpredictable.egg and so on. If not, perhaps some kind of glob to
create a list of existing files? I'd really appreciate some help,
since I'm really out of my (newb) depth here.

You might try the os.path module and/or the glob module in the standard
python library.
Not on remote locations. The only work on your local filesystem.

Diez
Aug 2 '08 #4

P: n/a
jorpheus wrote:
OK, that sounds stupid. Anyway, I've been learning Python for some
time now, and am currently having fun with the urllib and urllib2
modules, but have run into a problem(?) - is there any way to fetch
(urllib.retrieve) files from a server without knowing the filenames?
For instance, there is smth like folder/spam.egg, folder/
unpredictable.egg and so on. If not, perhaps some kind of glob to
create a list of existing files? I'd really appreciate some help,
since I'm really out of my (newb) depth here.
If you happen to have a URL that simply lists files, then what you have
to do is relatively simple. Just fetch the html from the folder url,
then parse the html and look for the anchor tags. You can then fetch
those anchor urls that interest you. BeautifulSoup can help out with
this. Should be able to list all anchor tags in an html string in just
one line of code. Combine urllib2 and BeautifulSoup and you'll have a
winner.
Aug 2 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.