467,864 Members | 1,771 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 467,864 developers. It's quick & easy.

Regular Expressions to find URL's in text

I am working on an ASP page that parses text using the VBScript.RegExp
regular expression object. My reg expression right now is as follows:

[a-z]+\.[a-z]+\.[a-z]+/

And if find URL's no problem like: windowsupdate.microsoft.com,
www.cnn.com, etc.

But I need to also find any URL, like these:

www.amazon.com/books/atoz/index.html
OR
msdn.microsoft.com/newsgroups/default.aspx

Some URL with a deeper path than something.something.com if that makes
sense. Any ideas?
Jul 19 '05 #1
  • viewed: 2322
Share:
3 Replies
Nah...

What happens if someone writes a sentence and forgets to put a space between
the last word of the sentence, the period and the first word of the next
sentence?

URLs can take many forms and definitely don't need three parts. Some have
two some have four. What happens if someone puts in an IP address?

To get round the path/page name problem you should able to say where your
pattern matches anywhere in the string, not matches exactly.

Sorry to be the bearer of bad news.
"SROSeaner" <SR*******@discussions.microsoft.com> wrote in message
news:FA**********************************@microsof t.com...
I am working on an ASP page that parses text using the VBScript.RegExp
regular expression object. My reg expression right now is as follows:

[a-z]+\.[a-z]+\.[a-z]+/

And if find URL's no problem like: windowsupdate.microsoft.com,
www.cnn.com, etc.

But I need to also find any URL, like these:

www.amazon.com/books/atoz/index.html
OR
msdn.microsoft.com/newsgroups/default.aspx

Some URL with a deeper path than something.something.com if that makes
sense. Any ideas?

Jul 19 '05 #2
SROSeaner <SR*******@discussions.microsoft.com> wrote in message news:<FA**********************************@microso ft.com>...
I am working on an ASP page that parses text using the VBScript.RegExp
regular expression object. My reg expression right now is as follows:

[a-z]+\.[a-z]+\.[a-z]+/

And if find URL's no problem like: windowsupdate.microsoft.com,
www.cnn.com, etc.

But I need to also find any URL, like these:

www.amazon.com/books/atoz/index.html
OR
msdn.microsoft.com/newsgroups/default.aspx

Some URL with a deeper path than something.something.com if that makes
sense. Any ideas?


Why don't you just parse it to the first / character, and see if that conforms?
Jul 19 '05 #3
Thanks for your help. I got my parser to get all URL's in many forms
including IP addresses all from a disorganized html file. It is possible,
just a bugger to get going.

"Larry Bud" wrote:
SROSeaner <SR*******@discussions.microsoft.com> wrote in message news:<FA**********************************@microso ft.com>...
I am working on an ASP page that parses text using the VBScript.RegExp
regular expression object. My reg expression right now is as follows:

[a-z]+\.[a-z]+\.[a-z]+/

And if find URL's no problem like: windowsupdate.microsoft.com,
www.cnn.com, etc.

But I need to also find any URL, like these:

www.amazon.com/books/atoz/index.html
OR
msdn.microsoft.com/newsgroups/default.aspx

Some URL with a deeper path than something.something.com if that makes
sense. Any ideas?


Why don't you just parse it to the first / character, and see if that conforms?

Jul 19 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by David Lees | last post: by
8 posts views Thread by Michael McGarry | last post: by
3 posts views Thread by rdi | last post: by
4 posts views Thread by Egyd Csaba | last post: by
7 posts views Thread by Billa | last post: by
25 posts views Thread by Mike | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.