|
Hi,
I'm a complete newbie when using regular expressions, so forgive me if
my delimma sounds stupid.
I have an ASP.NET app that utilizes url rewriting. It's a simple
bookstore that allows users to view books by author. The author's name
is just a url encoded link tag. In my observation, author names can be
quite tricky. Therefore I've written a regular expression for each
possible format the author names could be in. Below is an example of
author names with the regular expression included:
J.K. Rowling -- (\w\.\w\.\+\w+)
Dr. Martin Luther King Jr. -- (\w{2}\.\+\w+\+\w+\+\w+\+\w{2})
Nora Roberts -- (\w+\+\w+)
Thomas L. Friedman -- (\w+\+\w\.\+\w+)
Mark Victor Hansen -- (\w+\+\w+\+\w+)
T. Harv Eker -- (\w\.\+\w+\+\w+)
George R. R. Martin -- (\w+\+\w\.\+\w\.\+\w+)
Nigel Da Costa Lewis -- (\w+\+\w+\+\w+\+\w+)
Amiira Ruotola-Behrendt -- (\w+\+\w+-\w+)
Kristie J. Nelson-Neuhaus -- (\w+\+\w\.\+\w+-\w+)
I'm aware that some of them work as I've tested them, but others (the
latter four) are difficult. For instance, George R. R. Martin could be
written without a space between the two initials. And having dashes
(-) in the names present a challenge as well.
Without desiring to write each regular expression as a separate
expression, I put them together using OR logic using the pipe symbol
(|). So the entire thing would be like this:
(\w\.\w\.\+\w+|\w{2}\.\+\w+\+\w+\+\w+\+\w{2}|\w+\+ \w+|\w+\+\w\.\+\w+)
and so forth.
Since I'm a newbie, I'm bound to be doing something in an inefficient
manner. Can the regular expressions I've shown be improved. I'd like
to know how.
Thanks,
Mr. RAD |