473,387 Members | 1,863 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

how can I extract all urls in a string by using re.findall() ?

I want to retrieve all urls in a string. When I use re.fiandall, I get
a list of tuples.
My code is like below:

Expand|Select|Wrap|Line Numbers
  1. url=unicode(r"((http|ftp)://)?(((([\d]+\.)+){3}[\d]+(/[\w./]+)?)|([a-z]\w*((\.\w+)+){2,})([/][\w.~]*)*)")
  2. m=re.findall(url,html)
  3. for i in m:
  4. print i
  5.  
html is a variable of string type which contains many urls in it.
the code will print many tuples, and each tuple seems not to represent
a url. e.g, one of them is as below:

(u'http://', u'http', u'image.zhongsou.com/image/netchina.gif', u'',
u'', u'', u'', u'image.zhongsou.com', u'.com', u'.com',
u'/netchina.gif')

Why is there two "http" in it? and why are there so many ampty strings
in the tupe above? It's obviously not a url. How can I get the urls
correctly?

Thanks in advance.
--
ðÐðÄ´ÏÃ÷¾ø¶¥¡¢¸ãЦ֮¼«£¬ÊÇÈËÀàµÄºÃÅóÓÑ¡£
Ö±µ½ÓÐÒ»Ì죬ÎҲŷ¢¾õ£¬ÎÒÊÇðÐðÄ¡£
ÎÒÊÇ·*ǽµÄðÐðÄ¡£
Jul 18 '05 #1
0 1164

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Logical | last post by:
I wanted to do: include('page.htm?id=12&foo=bar'); But since I can't (and don't want to make another seperate HTTP request with include('http://...')); I was wondering if there's a function...
4
by: yinglcs | last post by:
Hi, how can I extract 2 integers from a string in python? for example, my source string is this: Total size: 173233 (371587) I want to extract the integer 173233 and 371587 from that...
5
by: deko | last post by:
If I have random and unpredictable user agent strings containing URLs, what is the best way to extract the URL? For example, let's say the string looks like this: registered NYSE 943 <a...
5
markmcgookin
by: markmcgookin | last post by:
Hi Folks, I am writing a program to analyse an html page in java, I am connecting to a website, then going to extract ALL the links from it. I think the best way to do this is using the <a...
1
by: gcmartijn | last post by:
I'm trying to extract something like this: <object classid=clsid:D27CDB6E-AE6D-11cf-96B8-444553540000 codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/...
9
by: shapper | last post by:
Hello, How can I filter a List(Of String)? I need to get the list elements which start with the letters contained in the variable Text. Thanks, Miguel
4
by: dkasyap | last post by:
Hi, I have a huge string containing html tags, some of these tags being <img src="URL"> ones. I need to extract the urls from all the occurences of these tags in the input string. This is what I...
1
by: Walter Cruz | last post by:
On Fri, Sep 5, 2008 at 11:29 AM, Jackie Wang <jackie.python@gmail.comwrote: Use BeautifulSoup. 's - Walter
1
by: masterinex | last post by:
Hi guys , Im a little unfamiliar with Python . Hope you can take a look at this: Im trying to extract the number 7.2 from the html string below using python: '''<a...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.