467,180 Members | 1,036 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,180 developers. It's quick & easy.

how to count and extract images

Joe
I'm trying to get the location of the image uisng

start = s.find('<a href="somefile') + len('<a
href="somefile')
stop = s.find('">Save File</a></B>',
start) fileName = s[start:stop]
and then construct the url with the filename to download the image
which works fine as cause every image has the Save File link and I can
count number of images easy the problem is when there is more than image I
try using while loop downlaod files, wirks fine for the first one but
always matches the same, how can count and thell the look to skip the fist
one if it has been downloaded and go to next one, and if next one is
downloaded go to next one, and so on.
Oct 23 '05 #1
  • viewed: 1828
Share:
2 Replies
Joe <di******@lycos.com> wrote:
I'm trying to get the location of the image uisng

start = s.find('<a href="somefile') + len('<a
href="somefile')
stop = s.find('">Save File</a></B>',
start) fileName = s[start:stop]
and then construct the url with the filename to download the image
which works fine as cause every image has the Save File link and I can
count number of images easy the problem is when there is more than image I
try using while loop downlaod files, wirks fine for the first one but
always matches the same, how can count and thell the look to skip the fist
one if it has been downloaded and go to next one, and if next one is
downloaded go to next one, and so on.


Pass the index from where the search must start as the second argument
to the s.find method -- you're already doing that for the second call,
so it should be pretty obvious it will also work for the first one, no?
Alex
Oct 24 '05 #2
Joe <di******@lycos.com> writes:
start = s.find('<a href="somefile') + len('<a
href="somefile')
stop = s.find('">Save File</a></B>',
start) fileName = s[start:stop]
and then construct the url with the filename to download the image
which works fine as cause every image has the Save File link and I can
count number of images easy the problem is when there is more than image I
try using while loop downlaod files, wirks fine for the first one but
always matches the same, how can count and thell the look to skip the fist
one if it has been downloaded and go to next one, and if next one is
downloaded go to next one, and so on.


To answer your question, use the first optional argument to find in both
invocations of find:

stop = 0
while end >= 0:
start = s.find('<a href="somefile', stop) + len('<a href="somefile')
stop = s.find('">Save File</a></B>', start)
fileName = s[start:stop]

Now, to give you some advice: don't do this by hand, use an HTML
parsing library. The code above is incredibly fragile, and will break
on any number of minor variations in the input text. Using a real
parser not only avoids all those problems, it makes your code shorter.
I like BeautifulSoup:

soup = BeautifulSoup(s)
for anchor in soup.fetch('a'):
fileName = anchor['href']

to get all the hrefs. If you only want the ones that have "Save File"
in the link text, you'd do:

soup = BeautifulSoup(s)
for link in soup.fetchText('Save File'):
fileName = link.findParent('a')['href']

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Oct 24 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Mark | last post: by
5 posts views Thread by Jim Carlock | last post: by
5 posts views Thread by Steve | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.