I am having a problem matching some text. It is a very simple pattern
but it doesn't seem to work. Here goes.
<td[^>]*>.*?</td>
That is the pattern, it should match any <td></td> pair. Here is my
input data.
<td valign="top">Bu yer<a href="http://www.google.com" >google</a><img
src="www.google .com/s.gif" width="4" border="0">(<a
href="www.googl e.com">9</a> )<span> </span></td>
<td valign="top">
Buyer
<a href="http://www.google.com" >google</a><img
src="www.google .com/s.gif" width="4" border="0">
(
<a href="www.googl e.com">9</a> )<span> </span></td>
The first and second are exactly the same but the first has the spaces
removed. The pattern will match the first but not match the second. I
am very confused.
I have ran some tests. This pattern will match the first but not the
second.
<td[^>]*>.*?Buyer
This will match both of them.
<td[^>]*>\s*?Buyer
This indicates to me that the '.' is not matching a space character.
Any ideas? 9 2814
taylorjonl wrote: I am having a problem matching some text. It is a very simple pattern but it doesn't seem to work. Here goes.
<td[^>]*>.*?</td>
That is the pattern, it should match any <td></td> pair.
Just out of interest, what are you expecting the '?' to do? Usually it
comes after a different character that you want to match 0 or 1 times -
but in this case you don't have a previous character (the .* is the
previous bit).
I'm far from an expert on regexes, but I don't understand what that '?'
will actually match.
It may be part of the problem.
Jon
The ? after the * tells the regex to be non-greedy. Normally it is
greedy so if we had the input of
<td>bucket1</td><td>bucket2</td><td>bucket3</td>
<td[^>]*>.*</td>
would match
<td>bucket1</td><td>bucket2</td><td>bucket3</td>
Because well, it is greedy and will make the largest match. Adding the
? tells it to be non-greedy.
Isn't this just due to the dot NOT matching newlines by default (while
\n is included in \s)
[MSDN about dot:]
"Matches any character except \n. If modified by the Singleline option,
a period character matches any character. For more information, see
Regular Expression Options."
Hi taylorjonl and Jon Skeet,
I have a few points to make here :
1. Analyzing the sample string you gave and the 1st Regex pattern
(<td[^>]*>.*?</td>), I realized that it matches perfectly. It is what
you need. The only thing that you need to do now, is enable the Regex
option to allow the dot "." to match a Newline character. This equates
to "Dot matches Newline" in other Regex flavours and
RegexOptions.Si ngleLine in .NET.
I don't know which Regex validator you're using to run your tests, but
just try it with that option enabled, and it will definitely work.
2. The ".*?" - This has a special meaning. ".*" alone means "Match any
character any no. of times, as many times as possible (Greedily)" and
".*?" means "Match any character any no. of times, but as few times as
possible (Lazily)".
The difference between a Greedy and a Lazy match is that the former
will match as many occurrences as possible, while the second will match
as few as possible. The latter will give you the shortest match.
I usually use (.*?) to match anything between any other text.
If you simply used the Regex pattern "<td(.*?)</td>" it would still
solve taylorjonl's problem. It just means match anything that comes
between 2 <td>'s. (including spaces, newlines and what not !)
3. I think the important point in deciding any Regex Pattern is what
you want to retrieve from it. (what will be stored in the back
reference). For instance, in your sample string, what exactly do you
intend to retrieve ? Whatever it is should be in brackets.
Assuming it's the "Buyer" part, use this Regex pattern (Remember to set
the RegexOptions.Ne wline flag option)
<td[^>]*>(.*?)<a.*?</td>
Try a replace action with the Regex pattern "$1" (.NET notation), and
you will have found some Buyers !!! ;-)
Hope this helps,
Regards,
Cerebrus.
You are the bomb, this has been driving me nuts trying to figure it out
and I know it had to be something simple. If .NET would only behave
like the rest of the world when it comes to regular expressions.
Thanks again, works like a charm now.
Well, you know... .NET is... kinda Exceptional !!! ;-)
BTW, what part of that sample string did you want to retrieve ?
Regards,
Cerebrus.
And you're most welcome...
Regards,
Cerebrus.
That string is just a test string I used. I am actually going to be
extracting certain pieces of information from an eBay feedback page. I
have since the last post came up with the following do to this so far.
using System.Text.Reg ularExpressions ;
Regex regex = new Regex(
@"<tr[^>]*>[^<]*<td></td>[^<]*<td[^>]*>.*?alt=""(?<t ype>[^"""
+
@"]+)""></td>[^<]*<td></td>[^<]*<td[^>]*>(?<message>[^<]*)"
+
@"<br></td>[^<]*<td></td>[^<]*<td[^>]*>.*?</td>[^<]*<td>"
+
@"</td>[^<]*<td[^>]*>(?<date>.*?) </td>[^<]*<td></td>[^<]*
"
+ @"
<td[^>]*>[^>]*>(?<item>\d{10 })</a></td>[^<]*<td></td>[^<"
+ @"]*</tr>",
RegexOptions.Ig noreCase
| RegexOptions.Mu ltiline
| RegexOptions.Si ngleline
| RegexOptions.Ig norePatternWhit espace
| RegexOptions.Co mpiled
);
That will get my all the importan sections that I can reference by
name.
I am using a program called Expresso which is wonderful for testing
these out.
Thanks for the help.
taylorjonl wrote: The ? after the * tells the regex to be non-greedy. Normally it is greedy so if we had the input of
<td>bucket1</td><td>bucket2</td><td>bucket3</td>
<td[^>]*>.*</td>
would match
<td>bucket1</td><td>bucket2</td><td>bucket3</td>
Because well, it is greedy and will make the largest match. Adding the ? tells it to be non-greedy.
Aha - great, thanks for that. There's always more to know about
regexes...
Jon This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Henry |
last post by:
I have this simple code,
string escaped = Regex.Escape( @"`~!@#$%^&*()_=+{}\|;:',<.>/?" + "\"" );
string input = @"a&+" + "\"" + @"@(-d)\e";
Regex re = new Regex( string.Format(@"(+)", escaped),
RegexOptions.CultureInvariant );
string s = re.Replace( input, "" );
It doesn't seem to work, regular expression return without filter out any
character
|
by: derek.google |
last post by:
I hope a Boost question is not too off-topic here. It seems that
upgrading to Boost 1.33 broke some old regex code that used to work. I
have reduced the problem to this simple example:
cout << boost::regex_replace(string("foo"),
boost::regex(".*"),
string("bar")) << endl;
The above code prints "barbar" where I expect "bar". Can anyone shed
some light on this? It used to work with 1.30 (though regex_replace
|
by: James Dean |
last post by:
I wanted to use regular expressions but unfortunetely it is too
slow.....Should they be so slow or am i doing something wrong. I am
reading in bytes from a file then converting them to char then making a
string out of each of the individual bytes. I check if its in the
correct format...and take out the various paretres i need. It looked
nice and neat so i am not happy that i may have to use another
method.....any alternative solutions?.
|
by: |
last post by:
Here is an interesting one. Running asp.net 2.0 beta 2. I have a regular
expression used in a regex validator that works on the client side in
Firefox but not in IE. Any ideas? IE always reports the field is invalid.
The expression is:
^(?!\d)(?=.*\d)(?=.*)(?=.*)(?=.*).{8,25}$
If I enter "Test_Field1" Firefox considers it valid on client side, IE
doesnt. Server side considers it valid too because when I submit the form in
|
by: ad |
last post by:
I am useing VS2005 to develop wep application.
I use a RegularExpress both in RegularExpressionValidator and Regex class to
validate a value.
The RegularExpress is 20|\-9|\-1|?\d{1}
When I enter 33 and validate with RegularExpressionValidator, it fail to
pass.
But when I validate with regex class :
Regex.IsMatch(Sight0L, @"20|\-9|\-1|?\d{1}");
| |
by: =?Utf-8?B?amFj?= |
last post by:
Hi,
I have problems with following code and don’t find the bug :
// Set
ArrayList aArray = new ArrayList();
regStr = new Regex(@"\?)*(\d+)\]");
if(text != null && regStr.IsMatch(text))
{
|
by: apoorva.groups |
last post by:
Hi
I am facing problem while using regexec function.
Ex: String = "abc_def_hig"
sub string = "def"
regexc if I use regexec the it will find the sub string in string and
it will return 0.
I want to modify the sub string such that it matches only if the
|
by: mikko.n |
last post by:
I have recently been experimenting with GNU C library regular
expression functions and noticed a problem with pattern matching. It
seems to recognize only the first match but ignoring the rest of them.
An example:
mikko.c:
-----
#include <stdio.h>
#include <regex.h>
|
by: =?Utf-8?B?TWFya19C?= |
last post by:
The following is working for me but I want to include numbers in scientific
notation.
public double Evaluate( string expr )
{
const string Num = @"(\-?\d+\.?\d*|\-?\.\d+)"
Regex reMulDiv = new Regex(Num + @"\s*()\s*" + Num);
other stuff:
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |