473,396 Members | 2,003 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Problem Creating Regex Expression

I am finally taking the time to get to know regex, but it seems I have
taken a bit of a tumble.

I have the following (dummy) data:

<td>Name:</td <td>Kherie Kali</td>

If I use this expression: "<td>Name:</td>\s*<td>Kherie Kali</td>"

I indeed get a match.

The next step I took is to get this without knowing the name in
advance. I then used the following expression: <td>Name:</td>
\s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>

and I didn't get a thing.

Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
symbol and then the * make it multiple words or symbols? Any light
that could be shed on this situation is appreciated. I don't
necessarily want the answer to my quandry, but insight as to what I am
doing wrong.

Thank you,

-Sean

Feb 8 '08 #1
5 1149
On Feb 8, 7:59*am, Sean <ColdFusion...@gmail.comwrote:
I am finally taking the time to get to know regex, but it seems I have
taken a bit of a tumble.

I have the following (dummy) data:

<td>Name:</td*<td>Kherie Kali</td>

If I use this expression: "<td>Name:</td>\s*<td>Kherie Kali</td>"

I indeed get a match.

The next step I took is to get this without knowing the name in
advance. I then used the following expression: <td>Name:</td>
\s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>

and I didn't get a thing.

Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
symbol and then the * make it multiple words or symbols? Any light
that could be shed on this situation is appreciated. I don't
necessarily want the answer to my quandry, but insight as to what I am
doing wrong.

Thank you,

-Sean
Hi, Sean

"([a-zA-Z_$][a-zA-Z0-9_$]*)" will match any letter, underscore or '$'
character followed by zero or more letters, digits, underscores, '$'
chars.

It seems you don't take into account space in the middle of "Kherie
Kali".
If you write more specific requirements I could write a RegEx

Thanks,
Sergey
Feb 8 '08 #2
The following will work capture all content inside <td></tdtags:

(?<=td>)(.*?)(?=<)

The first part is a positive look-ahead, indicating that a match must be
preceded by the character sequence "td>" (non-capturing). The second part
indicates any character 0 or more times with a lazy quantifier, meaning that
it will capture as few times as possible. The third part is a positive
look-ahead, indicating that the match must be followed by a "<" character.
Since there are no "<" characters in the actual tag's content, this stops
the match at the end of the tag.

--
HTH,

Kevin Spencer
Chicken Salad Surgeon
Microsoft MVP

"Sean" <Co***********@gmail.comwrote in message
news:9e**********************************@1g2000hs l.googlegroups.com...
>I am finally taking the time to get to know regex, but it seems I have
taken a bit of a tumble.

I have the following (dummy) data:

<td>Name:</td <td>Kherie Kali</td>

If I use this expression: "<td>Name:</td>\s*<td>Kherie Kali</td>"

I indeed get a match.

The next step I took is to get this without knowing the name in
advance. I then used the following expression: <td>Name:</td>
\s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>

and I didn't get a thing.

Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
symbol and then the * make it multiple words or symbols? Any light
that could be shed on this situation is appreciated. I don't
necessarily want the answer to my quandry, but insight as to what I am
doing wrong.

Thank you,

-Sean

Feb 8 '08 #3
On Feb 8, 8:19*am, "Kevin Spencer" <unclechutney@localhostwrote:
The following will work capture all content inside <td></tdtags:

(?<=td>)(.*?)(?=<)

The first part is a positive look-ahead, indicating that a match must be
preceded by the character sequence "td>" (non-capturing). The second part
indicates any character 0 or more times with a lazy quantifier, meaning that
it will capture as few times as possible. The third part is a positive
look-ahead, indicating that the match must be followed by a "<" character.
Since there are no "<" characters in the actual tag's content, this stops
the match at the end of the tag.

--
HTH,

Kevin Spencer
Chicken Salad Surgeon
Microsoft MVP

"Sean" <ColdFusion...@gmail.comwrote in message

news:9e**********************************@1g2000hs l.googlegroups.com...
I am finally taking the time to get to know regex, but it seems I have
taken a bit of a tumble.
I have the following (dummy) data:
<td>Name:</td*<td>Kherie Kali</td>
If I use this expression: "<td>Name:</td>\s*<td>Kherie Kali</td>"
I indeed get a match.
The next step I took is to get this without knowing the name in
advance. I then used the following expression: <td>Name:</td>
\s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>
and I didn't get a thing.
Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
symbol and then the * make it multiple words or symbols? Any light
that could be shed on this situation is appreciated. I don't
necessarily want the answer to my quandry, but insight as to what I am
doing wrong.
Thank you,
-Sean- Hide quoted text -

- Show quoted text -
Kevin and Sergey,

Kevin: Thank you for the explanation! This find the Name: Tag, but
won't find the dummy person's actual name. I think I have to add in a
place for spaces like sergey said.

Sergey: I have been fittling with the last sequence by attempting to
add spaces, but I still can't get it to work for some reason. There
really is no specific requirements, I'm just trying to pull that name
out.

Thank you both for the explanations, it was very helpful, I'm just
still having problems understanding why it won't work. The latest one
I used was "([a-zA-Z_$][a-zA-Z0-9_$]\s*)"

-Sean
Feb 8 '08 #4
On Feb 8, 4:13*pm, Sean <ColdFusion...@gmail.comwrote:
On Feb 8, 8:19*am, "Kevin Spencer" <unclechutney@localhostwrote:


The following will work capture all content inside <td></tdtags:
(?<=td>)(.*?)(?=<)
The first part is a positive look-ahead, indicating that a match must be
preceded by the character sequence "td>" (non-capturing). The second part
indicates any character 0 or more times with a lazy quantifier, meaning that
it will capture as few times as possible. The third part is a positive
look-ahead, indicating that the match must be followed by a "<" character.
Since there are no "<" characters in the actual tag's content, this stops
the match at the end of the tag.
--
HTH,
Kevin Spencer
Chicken Salad Surgeon
Microsoft MVP
"Sean" <ColdFusion...@gmail.comwrote in message
news:9e**********************************@1g2000hs l.googlegroups.com...
>I am finally taking the time to get to know regex, but it seems I have
taken a bit of a tumble.
I have the following (dummy) data:
<td>Name:</td*<td>Kherie Kali</td>
If I use this expression: "<td>Name:</td>\s*<td>Kherie Kali</td>"
I indeed get a match.
The next step I took is to get this without knowing the name in
advance. I then used the following expression: <td>Name:</td>
\s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>
and I didn't get a thing.
Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
symbol and then the * make it multiple words or symbols? Any light
that could be shed on this situation is appreciated. I don't
necessarily want the answer to my quandry, but insight as to what I am
doing wrong.
Thank you,
-Sean- Hide quoted text -
- Show quoted text -

Kevin and Sergey,

Kevin: Thank you for the explanation! This find the Name: Tag, but
won't find the dummy person's actual name. I think I have to add in a
place for spaces like sergey said.

Sergey: I have been fittling with the last sequence by attempting to
add spaces, but I still can't get it to work for some reason. There
really is no specific requirements, I'm just trying to pull that name
out.

Thank you both for the explanations, it was very helpful, I'm just
still having problems understanding why it won't work. The latest one
I used was "([a-zA-Z_$][a-zA-Z0-9_$]\s*)"

-Sean- Hide quoted text -

- Show quoted text -
You should put \s inside brackets "([a-zA-Z_$][a-zA-Z0-9_$\s]*)"
If you don't have specific requirements than probably you could use
expression
similar to what Kevin suggests or something like "<td>Name:</td>
\s*<td>(.*?)</td>"

Thanks,
Sergey
Feb 8 '08 #5
On Feb 8, 9:50*am, Sergey Zyuzin <forever....@gmail.comwrote:
On Feb 8, 4:13*pm, Sean <ColdFusion...@gmail.comwrote:


On Feb 8, 8:19*am, "Kevin Spencer" <unclechutney@localhostwrote:
The following will work capture all content inside <td></tdtags:
(?<=td>)(.*?)(?=<)
The first part is a positive look-ahead, indicating that a match must be
preceded by the character sequence "td>" (non-capturing). The second part
indicates any character 0 or more times with a lazy quantifier, meaning that
it will capture as few times as possible. The third part is a positive
look-ahead, indicating that the match must be followed by a "<" character.
Since there are no "<" characters in the actual tag's content, this stops
the match at the end of the tag.
--
HTH,
Kevin Spencer
Chicken Salad Surgeon
Microsoft MVP
"Sean" <ColdFusion...@gmail.comwrote in message
>news:9e**********************************@1g2000h sl.googlegroups.com...
I am finally taking the time to get to know regex, but it seems I have
taken a bit of a tumble.
I have the following (dummy) data:
<td>Name:</td*<td>Kherie Kali</td>
If I use this expression: "<td>Name:</td>\s*<td>Kherie Kali</td>"
I indeed get a match.
The next step I took is to get this without knowing the name in
advance. I then used the following expression: <td>Name:</td>
\s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>
and I didn't get a thing.
Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
symbol and then the * make it multiple words or symbols? Any light
that could be shed on this situation is appreciated. I don't
necessarily want the answer to my quandry, but insight as to what I am
doing wrong.
Thank you,
-Sean- Hide quoted text -
- Show quoted text -
Kevin and Sergey,
Kevin: Thank you for the explanation! This find the Name: Tag, but
won't find the dummy person's actual name. I think I have to add in a
place for spaces like sergey said.
Sergey: I have been fittling with the last sequence by attempting to
add spaces, but I still can't get it to work for some reason. There
really is no specific requirements, I'm just trying to pull that name
out.
Thank you both for the explanations, it was very helpful, I'm just
still having problems understanding why it won't work. The latest one
I used was "([a-zA-Z_$][a-zA-Z0-9_$]\s*)"
-Sean- Hide quoted text -
- Show quoted text -

You should put \s inside brackets "([a-zA-Z_$][a-zA-Z0-9_$\s]*)"
If you don't have specific requirements than probably you could use
expression
similar to what Kevin suggests or something like "<td>Name:</td>
\s*<td>(.*?)</td>"

Thanks,
Sergey- Hide quoted text -

- Show quoted text -
Perfect, thank you both!

I used your last suggestion Sergey, and did the following: "<td>Name:</
td>\s*<td>(?<a0>(.*?))</td>" which correctly matched "Kherie Kali" and
put it into the a0 group.

I appreciate both of your help!

-Sean
Feb 8 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Henry | last post by:
I have this simple code, string escaped = Regex.Escape( @"`~!@#$%^&*()_=+{}\|;:',<.>/?" + "\"" ); string input = @"a&+" + "\"" + @"@(-d)\e"; Regex re = new Regex( string.Format(@"(+)", escaped),...
7
by: derek.google | last post by:
I hope a Boost question is not too off-topic here. It seems that upgrading to Boost 1.33 broke some old regex code that used to work. I have reduced the problem to this simple example: cout <<...
7
by: alphatan | last post by:
Is there relative source or document for this purpose? I've searched the index of "Mastering Regular Expression", but cannot get the useful information for C. Thanks in advanced. -- Learning...
11
by: Dimitris Georgakopuolos | last post by:
Hello, I have a text file that I load up to a string. The text includes certain expression like {firstName} or {userName} that I want to match and then replace with a new expression. However,...
2
by: Jac | last post by:
Hey, I have following string : blabla {\*\bkmkstart test1}{\*\bkmkend test1} line1 {\*\bkmkstart test2}{\*\bkmkend test2} I want to change the string to the following with regex. blabla...
4
by: | last post by:
Here is an interesting one. Running asp.net 2.0 beta 2. I have a regular expression used in a regex validator that works on the client side in Firefox but not in IE. Any ideas? IE always reports...
2
by: Marcos Góis | last post by:
Can sombody explain me, why the following code has two different output values? --- Imports Microsoft.VisualBasic Imports System Imports System.Collections Imports...
5
by: =?Utf-8?B?SkF1bA==?= | last post by:
I am currently working on a project and need to get a return… even if that return is a failure. I must also add that I have no control over either the Regular Expression that will be used or the...
6
by: Phil Barber | last post by:
I am using Regex to validate a file name. I have everything I need except I would like the dot(.) in the filename only to appear once. My question is it possible to allow one instance of character...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.