a list of tuples.
My code is like below:
Expand|Select|Wrap|Line Numbers
- url=unicode(r"((http|ftp)://)?(((([\d]+\.)+){3}[\d]+(/[\w./]+)?)|([a-z]\w*((\.\w+)+){2,})([/][\w.~]*)*)")
- m=re.findall(url,html)
- for i in m:
- print i
the code will print many tuples, and each tuple seems not to represent
a url. e.g, one of them is as below:
(u'http://', u'http', u'image.zhongsou.com/image/netchina.gif', u'',
u'', u'', u'', u'image.zhongsou.com', u'.com', u'.com',
u'/netchina.gif')
Why is there two "http" in it? and why are there so many ampty strings
in the tupe above? It's obviously not a url. How can I get the urls
correctly?
Thanks in advance.
--
ðÐðÄ´ÏÃ÷¾ø¶¥¡¢¸ãЦ֮¼«£¬ÊÇÈËÀàµÄºÃÅóÓÑ¡£
Ö±µ½ÓÐÒ»Ì죬ÎҲŷ¢¾õ£¬ÎÒÊÇðÐðÄ¡£
ÎÒÊÇ·*ǽµÄðÐðÄ¡£