Connecting Tech Pros Worldwide Help | Site Map

regexp question

  #1  
Old February 28th, 2007, 10:05 AM
Kris
Guest
 
Posts: n/a
I have a text and I want to match several words like this "(controller|
data|IT)?" at least one of the words in the group should be mandatory,
but I dont want to match a word that is part of another word like this
"datamanager" it should only match the word "data" or any other word
in the group.

Any suggestions?

  #2  
Old February 28th, 2007, 10:45 AM
Heiko Richler
Guest
 
Posts: n/a

re: regexp question


Kris wrote:
Quote:
I have a text and I want to match several words like this "(controller|
data|IT)?" at least one of the words in the group should be mandatory,
but I dont want to match a word that is part of another word like this
"datamanager" it should only match the word "data" or any other word
in the group.
>
Any suggestions?
You know there are classes for characters, like \w? Well there are
classes for non characters:
^ Start of Line or Data
$ End of Line or Data
\b Word-boundery, change from \w to \W

- "(\b(controller|data|IT)\b)?"

But some letters like German umlaute may be seen as Non-Word-Elements,
depending on your the encoding.

In this cases you may try search for your words between non letters.
- "((\s|^)(controller|data|IT)(\s|$))?"

what about data-base?

Heiko
--
http://portal.richler.de/ Namensportal zu Richler
http://www.richler.de/ Heiko Richler: Computer - Know How!
http://www.richler.info/ private Homepage
  #3  
Old February 28th, 2007, 11:25 AM
Kris
Guest
 
Posts: n/a

re: regexp question


On Feb 28, 11:29 am, Heiko Richler <heiko-rich...@nefkom.netwrote:
Quote:
Kris wrote:
Quote:
I have a text and I want to match several words like this "(controller|
data|IT)?" at least one of the words in the group should be mandatory,
but I dont want to match a word that is part of another word like this
"datamanager" it should only match the word "data" or any other word
in the group.
>
Quote:
Any suggestions?
>
You know there are classes for characters, like \w? Well there are
classes for non characters:
^ Start of Line or Data
$ End of Line or Data
\b Word-boundery, change from \w to \W
>
- "(\b(controller|data|IT)\b)?"
>
But some letters like German umlaute may be seen as Non-Word-Elements,
depending on your the encoding.
>
In this cases you may try search for your words between non letters.
- "((\s|^)(controller|data|IT)(\s|$))?"
>
what about data-base?
>
Heiko
--http://portal.richler.de/Namensportal zu Richlerhttp://www.richler.de/ Heiko Richler: Computer - Know How!http://www.richler.info/ private Homepage
Hi Heiko

thanks for the reply

I know about boundaries and such, just couldnt get it to work, now I
see why, I had tesed \b(controller...)\b which didnt work as intended.

Kris

Closed Thread


Similar Threads
Thread Thread Starter Forum Replies Last Post
RegExp question Hulk answers 1 April 3rd, 2006 02:15 PM
RegExp Question Matt answers 6 February 1st, 2006 04:05 AM
Regexp Question: Two Nots Makes a Right to Left? Sped Erstad answers 3 November 16th, 2005 10:04 AM
Regexp question Philippe C. Martin answers 1 July 18th, 2005 07:13 PM
regexp question python_charmer2000 answers 1 July 18th, 2005 07:21 AM