Skip Montanaro wrote on Mon, 24 Nov 2003 21:35:48 -0600:
>> Since I am very poor in regex, can someone show me how to do it using
>> a few examples?
<snip> Don> http://kodos.sourceforge.net
If you're a Mac Python person there's also Dinu Gherman's excellent
RegexPlor:
http://starship.python.net/crew/gherman/RegexPlor.html
<snip>
I'm biased here, but Kiki (but
http://project5.freezope.org/kiki) is
cross-platform and doesn't depend on Qt but on wxPy which is much easier
for Windows users.
Anyway, here's a regex I ripped out of my own code - you might want to
simplify it though:
"""Regex for finding URLs:
URL's start with http(s)/ftp/news ((http)|(ftp)|(news))
followed by ://
then any number of non-whitespace characters including
numbers, dots, forward slashes, commas, question marks,
ampersands, equality signs, dashes, underscores and plusses,
but ending in a non-dot and non-plus!
Result:
(?:http|https|ftp|news)://(?:[@a-zA-Z0-9,/%:\&+#\?=\-_~;]+\.*)+[a-zA-Z0-9,/%:\&#\?=\-_]
Tests:
Plain old link:
http://www.mail.yahoo.com.
Containing numbers:
ftp://bla.com/di~ng/co.rt,39,%93 or other
Go to
news://bl_a.com/?ha-h+a&query=tb for more info.
A real link: <a href="http://x.com">http://x.com</a>.
ftp://verylong.org/url/must/be/chopp...itwontfit.html
(long one)
<IMG src="http://b.com/image.gif" /> (a plain image tag)
<a href=http://fixedlink.com/orginialinvalid.html>fixed</a> (original
invalid HTML)
Link containing an anchor
<b>"http://myhomepage.com/index.html#01"</b>.
"""
--
Yours,
Andrei
=====
Mail address in header catches spam. Real contact info (decode with rot13):
ce******@jnanqbb.ay. Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V ernq
gur yvfg, fb gurer'f ab arrq gb PP.