By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,035 Members | 1,384 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,035 IT Pros & Developers. It's quick & easy.

how to count and extract images

P: n/a
Joe
I'm trying to get the location of the image uisng

start = s.find('<a href="somefile') + len('<a
href="somefile')
stop = s.find('">Save File</a></B>',
start) fileName = s[start:stop]
and then construct the url with the filename to download the image
which works fine as cause every image has the Save File link and I can
count number of images easy the problem is when there is more than image I
try using while loop downlaod files, wirks fine for the first one but
always matches the same, how can count and thell the look to skip the fist
one if it has been downloaded and go to next one, and if next one is
downloaded go to next one, and so on.
Oct 23 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Joe <di******@lycos.com> wrote:
I'm trying to get the location of the image uisng

start = s.find('<a href="somefile') + len('<a
href="somefile')
stop = s.find('">Save File</a></B>',
start) fileName = s[start:stop]
and then construct the url with the filename to download the image
which works fine as cause every image has the Save File link and I can
count number of images easy the problem is when there is more than image I
try using while loop downlaod files, wirks fine for the first one but
always matches the same, how can count and thell the look to skip the fist
one if it has been downloaded and go to next one, and if next one is
downloaded go to next one, and so on.


Pass the index from where the search must start as the second argument
to the s.find method -- you're already doing that for the second call,
so it should be pretty obvious it will also work for the first one, no?
Alex
Oct 24 '05 #2

P: n/a
Joe <di******@lycos.com> writes:
start = s.find('<a href="somefile') + len('<a
href="somefile')
stop = s.find('">Save File</a></B>',
start) fileName = s[start:stop]
and then construct the url with the filename to download the image
which works fine as cause every image has the Save File link and I can
count number of images easy the problem is when there is more than image I
try using while loop downlaod files, wirks fine for the first one but
always matches the same, how can count and thell the look to skip the fist
one if it has been downloaded and go to next one, and if next one is
downloaded go to next one, and so on.


To answer your question, use the first optional argument to find in both
invocations of find:

stop = 0
while end >= 0:
start = s.find('<a href="somefile', stop) + len('<a href="somefile')
stop = s.find('">Save File</a></B>', start)
fileName = s[start:stop]

Now, to give you some advice: don't do this by hand, use an HTML
parsing library. The code above is incredibly fragile, and will break
on any number of minor variations in the input text. Using a real
parser not only avoids all those problems, it makes your code shorter.
I like BeautifulSoup:

soup = BeautifulSoup(s)
for anchor in soup.fetch('a'):
fileName = anchor['href']

to get all the hrefs. If you only want the ones that have "Save File"
in the link text, you'd do:

soup = BeautifulSoup(s)
for link in soup.fetchText('Save File'):
fileName = link.findParent('a')['href']

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Oct 24 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.