By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,857 Members | 1,813 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,857 IT Pros & Developers. It's quick & easy.

Regular Expressions to find URL's in text

P: n/a
I am working on an ASP page that parses text using the VBScript.RegExp
regular expression object. My reg expression right now is as follows:

[a-z]+\.[a-z]+\.[a-z]+/

And if find URL's no problem like: windowsupdate.microsoft.com,
www.cnn.com, etc.

But I need to also find any URL, like these:

www.amazon.com/books/atoz/index.html
OR
msdn.microsoft.com/newsgroups/default.aspx

Some URL with a deeper path than something.something.com if that makes
sense. Any ideas?
Jul 19 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Nah...

What happens if someone writes a sentence and forgets to put a space between
the last word of the sentence, the period and the first word of the next
sentence?

URLs can take many forms and definitely don't need three parts. Some have
two some have four. What happens if someone puts in an IP address?

To get round the path/page name problem you should able to say where your
pattern matches anywhere in the string, not matches exactly.

Sorry to be the bearer of bad news.
"SROSeaner" <SR*******@discussions.microsoft.com> wrote in message
news:FA**********************************@microsof t.com...
I am working on an ASP page that parses text using the VBScript.RegExp
regular expression object. My reg expression right now is as follows:

[a-z]+\.[a-z]+\.[a-z]+/

And if find URL's no problem like: windowsupdate.microsoft.com,
www.cnn.com, etc.

But I need to also find any URL, like these:

www.amazon.com/books/atoz/index.html
OR
msdn.microsoft.com/newsgroups/default.aspx

Some URL with a deeper path than something.something.com if that makes
sense. Any ideas?

Jul 19 '05 #2

P: n/a
SROSeaner <SR*******@discussions.microsoft.com> wrote in message news:<FA**********************************@microso ft.com>...
I am working on an ASP page that parses text using the VBScript.RegExp
regular expression object. My reg expression right now is as follows:

[a-z]+\.[a-z]+\.[a-z]+/

And if find URL's no problem like: windowsupdate.microsoft.com,
www.cnn.com, etc.

But I need to also find any URL, like these:

www.amazon.com/books/atoz/index.html
OR
msdn.microsoft.com/newsgroups/default.aspx

Some URL with a deeper path than something.something.com if that makes
sense. Any ideas?


Why don't you just parse it to the first / character, and see if that conforms?
Jul 19 '05 #3

P: n/a
Thanks for your help. I got my parser to get all URL's in many forms
including IP addresses all from a disorganized html file. It is possible,
just a bugger to get going.

"Larry Bud" wrote:
SROSeaner <SR*******@discussions.microsoft.com> wrote in message news:<FA**********************************@microso ft.com>...
I am working on an ASP page that parses text using the VBScript.RegExp
regular expression object. My reg expression right now is as follows:

[a-z]+\.[a-z]+\.[a-z]+/

And if find URL's no problem like: windowsupdate.microsoft.com,
www.cnn.com, etc.

But I need to also find any URL, like these:

www.amazon.com/books/atoz/index.html
OR
msdn.microsoft.com/newsgroups/default.aspx

Some URL with a deeper path than something.something.com if that makes
sense. Any ideas?


Why don't you just parse it to the first / character, and see if that conforms?

Jul 19 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.