473,406 Members | 2,769 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Get directory from http web site

Hi all :)

I was wondering if there's some neat and easy way to get the entire
contents of a directory at a specific web url address.

I have the following link:

http://www.infomedia.it/immagini/riviste/covers/cp

and as you can see it's just a list containing all the files (images)
that I need. Is it possible to retrieve this list (not the physical
files) and have it stored in a variable of type list or something?

And, if so, what would be the easiest and most efficient way?

Thank you so much in advance.

Rock

Jul 22 '05 #1
3 8271
rock69 enlightened us with:
I was wondering if there's some neat and easy way to get the entire
contents of a directory at a specific web url address. [...] Is it
possible to retrieve this list (not the physical files) and have it
stored in a variable of type list or something?


Check out the chapter on HTML parsing at
http://www.diveintopython.org/

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa
Jul 22 '05 #2
rock69 wrote:
Hi all :)

I was wondering if there's some neat and easy way to get the entire
contents of a directory at a specific web url address.

I have the following link:

http://www.infomedia.it/immagini/riviste/covers/cp

and as you can see it's just a list containing all the files (images)
that I need. Is it possible to retrieve this list (not the physical
files) and have it stored in a variable of type list or something?


BeautifulSoup and urllib do this easily:
from BeautifulSoup import BeautifulSoup
import urllib
data = urllib.urlopen('http://www.infomedia.it/immagini/riviste/covers/cp/').read()
soup = BeautifulSoup(data)
anchors = soup.fetch('a')
len(anchors) 164 for a in anchors[:10]:

... print a['href'], a.string
...
?N=D Name
?M=A Last modified
?S=A Size
?D=A Description
/immagini/riviste/covers/ Parent Directory
cp100.jpg cp100.jpg
cp100sm.jpg cp100sm.jpg
cp101.jpg cp101.jpg
cp101sm.jpg cp101sm.jpg
cp102.jpg cp102.jpg

http://www.crummy.com/software/BeautifulSoup/

Kent
Aug 6 '05 #3
You might want to also modify your c:/python/Lib/urllib.py file.
By adding/modifying the following headers.

self.addheaders = [('User-agent', 'Mozilla/4.0')]
#Trick the server into thinking it is explorer

self.addheaders = [('Referer','http://www.infomedia.it')]
#Trick the site that you clicked on a link from their site.

Aug 7 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Opa | last post by:
Hi, Does anyone know how to get a list of files for a given directory from a given url. I have the following, but get an error indicating that URI formats are not supported. ...
10
by: huzz | last post by:
I have web application that quaries the Active Directory to get user details.. everything works fine but someday I'll get System.Runtime.InteropServices.COMExection and if I restart the client...
1
by: Len Svitenko | last post by:
I have a site in asp.net. In that site I have a directory full of excel spreadsheets. When a user logs in it points them to the spreadsheet that I want them to see. However, if they then exit...
8
by: nick | last post by:
I have a problem and I've been using a cheezy work around and was wondering if anyone else out there has a better solution. The problem: Let's say I have a web application appA. Locally, I set...
4
by: Jerry | last post by:
I'm having just a bit of trouble wrapping my brain around the task of working with folders that are above the site's root folder. I let users upload photos (.jpg/.gif files) which can...
11
by: Steve Franks | last post by:
I'm using VS.NET 2005 Beta 2. I have a helper C# class I wrote that I placed in my /App_Code directory. Everything runs fine locally. However when I use the "Copy Web" function to upload the site...
8
by: Nate | last post by:
I am running on Window 2003. I have a website built in ASP.NET 2.0. I need to have a Virtual Directory running an application in 1.1. I have configured each in its own Application Pool. The 1.1...
2
by: Loane Sharp | last post by:
Hi there I have an ASP.NET web application created in Visual Studio 2005. The application is installed to a subdirectory in wwwroot, which is mapped in turn to a virtual directory under the...
5
by: GenCode | last post by:
What is the best way to read a "readable" web directory... I know I can do this Client.DownloadFile("http://www.mydomain.com/readabledir/", c:\ \dir.txt"); But that gives me the html and all...
6
by: Eric | last post by:
Hi Everyone, I have an ASP 1.1 website being hosted in a virtual directory that is a child node of an ASP 2.0 website. So I have: http://mainapp (which is a 2.0 site) And: ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.