By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,660 Members | 1,187 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,660 IT Pros & Developers. It's quick & easy.

spider, why isnt it finding the url?

P: n/a
this program doesnt produce any output, however i know from testing
that the url-regexp matches urls...

import urllib
import re

site = urllib.urlopen("http://www.python.org")

email = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
url = re.compile("^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}
([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?
((\?\w+=\w+)?(&\w+=\w+)*)?")

for row in site:
obj = url.search(row)
if obj != None:
print obj.group()
Jun 27 '08 #1
Share this Question
Share on Google+
1 Reply


P: n/a
On 23 Maj, 02:02, notnorweg...@yahoo.se wrote:
this program doesnt produce any output, however i know from testing
that the url-regexp matches urls...

import urllib
import re

site = urllib.urlopen("http://www.python.org")

email = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
url = re.compile("^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}
([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?
((\?\w+=\w+)?(&\w+=\w+)*)?")

for row in site:
obj = url.search(row)
if obj != None:
print obj.group()
hmm ok it it printing it rows per rows. not what i expected.

Jun 27 '08 #2

This discussion thread is closed

Replies have been disabled for this discussion.