473,898 Members | 3,177 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Regex Match Problem

I have this in my body tag:

something();som ething();
document.thisFo rm.textBox1.foc us();something( );

And I want to find a part between the semicolons that ends in focus()
and remove the entire value between the semicolons.

My Regular Expression looks like this but it is not matching, can
anyone help?

";([^.]*).focus()"
Thanks.
Jul 18 '05 #1
3 2535
On 10 Mar 2004 07:17:12 -0800, br************@ hotmail.com (bdwise)
wrote:
I have this in my body tag:

something();so mething();
document.thisF orm.textBox1.fo cus();something ();

And I want to find a part between the semicolons that ends in focus()
and remove the entire value between the semicolons.

My Regular Expression looks like this but it is not matching, can
anyone help?

";([^.]*).focus()"


You need to escape the metacharacters. Try
r";.*\.focus\(\ ).*;"
Also, use a raw quote, so you don't have to escape the escapes.

Don't forget to set re.DOTALL if you want the '.*' to capture newlines
also.

There is a nice interactive tool for testing these expressions at
.../Pyton23/Tools/Scripts/redemo.py

There is a good intro to regular expressions at
http://www.amk.ca/python/howto/regex/

-- Dave

Jul 18 '05 #2
In message <8n************ *************** *****@4ax.com>, David MacQuigg
wrote:
On 10 Mar 2004 07:17:12 -0800, br************@ hotmail.com (bdwise)
wrote:
";([^.]*).focus()"

Just thought I'd expand on what you've done above and what Dave suggests
below as I remember what it was like when I was new to REs:

1) . means 'match any one character'
2) ^ at the start of a character set [] means 'match anything except the
following set of characters.
3) ( and ) are used to a) group expressions together, and b) collect one or
more parts of expressions.

So 1) and 2) mean that '[^.]*' is meaningless as far as REs are concerned.
It indicates something like "match 0 or more of anything that's not one of
any character".

And 3) indicates that, as Dave said:
You need to escape the metacharacters. Try
r";.*\.focus\(\ ).*;"
That's why he's suggested putting '\' in front of the '(' and ')' when you
want to literally search for those characters. 'Escaping' them in this way
means "match exactly this character, ignoring its usual meaning" where the
usual meaning of '(' and ')' would be "group the pattern between the '('
and ')' and store it for later retrieval.

You'll also notice that Dave has escaped the '.' before 'focus' as you want
to match a literal '.' rather than have it mean "match any character".
Also, use a raw quote, so you don't have to escape the escapes.
Dave has put an 'r' at the front of the string. You may know that Python
interprets certain things in a string. For example, '\n' in a string gets
turned into a newline character. It's always best to make your REs r"raw
strings" in this way so that those special characters are no longer special
and have their regular, literal meaning.
Don't forget to set re.DOTALL if you want the '.*' to capture newlines
also.


Matches are normally constrained to be wholly within a single 'paragraph'
(between newlines). If the text you're searching for will only ever be
wholly on a single line, you don't need to set re.DOTALL.

Given everything above, Dave's RE:

r";.*\.focus\(\ ).*;"

means "match a single ';' followed by 0 or more of any character followed by
a literal '.' then the letters 'focus', a literal '(' and ')', followed by
0 or more of any character followed by a ';'. Which is more or less what
you want, but not exactly...

There is another thing you should be aware of: RE matching tends to be
'greedy'. That is, if a string that matches part of your pattern occurs
more than once within the search area, it will match the rightmost
occurrence of that part of the pattern. For example, "a.*f" matched against
the string "The big, bad wolf scared him off." will match from the 'a' in
'bad' right up to the 2nd 'f' in 'off' even though there are two other 'f's
before it. Matches are greedy by default and they can swallow up far more
than you intended.

There is a way of making these 'wildcard' RE characters non-greedy, which is
by putting a '?' after them. So, you could alter Dave's RE slightly:

r";.*?\.focus\( \).*?;"

to make sure that the first occurence of '.focus' is matched, followed by
the very next occurrence of ';'.

Dave's RE will match from the first ';' it finds right up to the ';' after
'focus()'. If you run it on the text you gave in your example, it will turn
this:

something();som ething();
document.thisFo rm.textBox1.foc us();something( );

into this:

something();
something();

In other words, it will match too much. If you think about it, what you want
to do is match from the character *after* a ';' up to the ';' following
'focus()'. A good RE for doing this is this:

r"[^;]*?\.focus\(\).* ?;"

You'll see that it's a slight alteration to Dave's RE. It first checks for
the shortest sequence of characters that don't include a ';' and that are
immediately followed by '.focus();' - and that's what you want and that's
*all* that you want.

So you can do something like this:

line = re.sub(r"[^;]*?\.focus\(\).* ?;", "", linefromfile)

and line will either be the same as linefromfile if the RE didn't match, or
it will be linefromfile with the matching text ("anything.focu s();")
deleted.

I hope this helps and that I haven't "simplicate d" things... :o)

--
Garry Knight
ga*********@gmx .net ICQ 126351135
Linux registered user 182025
Jul 18 '05 #3
In message <10************ ***@echo.uk.cla ra.net>, Garry Knight wrote:
I remember what it was like when I was new to REs: .... '[^.]*' is meaningless as far as REs are concerned.
It indicates something like "match 0 or more of anything that's not one of
any character".


Seems I'm still fairly new to REs: I forgot that a '.' is literal when used
in a character class. Apologies for that.

--
Garry Knight
ga*********@gmx .net ICQ 126351135
Linux registered user 182025
Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
9785
by: aevans1108 | last post by:
expanding this message to microsoft.public.dotnet.xml Greetings Please direct me to the right group if this is an inappropriate place to post this question. Thanks. I want to format a numeric value according to an arbitrary regular expression.
8
5615
by: Bibe | last post by:
I've been trying to get this going for awhile now, and need help. I've done a regex object, and when I use IsMatch, it's behavior is quite weird. I am trying to use Regex to make sure that a variable is only alphanumeric (no strange characters). Here's the code: Regex regExp = new Regex("*");
2
5987
by: Jose | last post by:
There's something for me to learn with this example, i'm sure :) Given this text: "....." and my first attempt at capture the groups: "(?:\)" RegExTest gives me what i expect: 6 captured groups: Contact, Region, All ContractRegion, ASIA PACIFIC, Japan, Japan. However, with this C# code, i just get 2 capture groups: ",
3
5337
by: Jeff McPhail | last post by:
I am using Regex.Match in a large application and the memory is growing out of control. I have tried several ways to try and release the memory and none of them work. Here are some similar examples of what I have tried... string testString = "lkf slkdjflksd sdfjlksdjff fsdjlsdfj flk;sjkf"; while(true) { Regex .Match(testString,@"(\w)"); } ---------------------------------------------------------------------- string testString = "lkf...
17
3988
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher http://forta.com/books/0672325667/
3
2125
by: jg | last post by:
I made a mistake somewhere in my vb code and I look, check and read against the articles and help on regex, I still can't find the mistake I made. I know my test string and the test patterns works, because I used on a vs. script to check. I also believe I foolwed followed the regex syntax for dotnet. here is the source code for the function and testing Public Function regtest(ByVal StringIn As String, ByVal patrn As
9
2807
by: taylorjonl | last post by:
I am having a problem matching some text. It is a very simple pattern but it doesn't seem to work. Here goes. <td*>.*?</td> That is the pattern, it should match any <td></td> pair. Here is my input data. <td valign="top">Buyer<a href="http://www.google.com">google</a><img src="www.google.com/s.gif" width="4" border="0">(<a
3
1404
by: spamsickle | last post by:
I have a Perl background, so some of what I know in other contexts is probably getting in the way of what I need to learn now. With that said, I'm having a problem getting my regex to work as I expect. I have a string value like "John Q Public" in a textbox called "Name", and I want to use the regex to split out first name and last name. Here's what I've coded: Dim FirstName As String Dim LastName As String If...
3
5276
by: gisleyt | last post by:
I'm trying to compile a perfectly valid regex, but get the error message: r = re.compile(r'(*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d*)?.*') Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/lib/python2.3/sre.py", line 179, in compile return _compile(pattern, flags) File "/usr/lib/python2.3/sre.py", line 230, in _compile
1
12227
by: jonnyboy6969 | last post by:
Hi All Really hoping someone can help me out here with my deficient regex skills :) I have a function which takes a string of HTML and replaces a term (word or phrase) with a link. The pupose is that I seek out terms which are in a glossary on our site, and automatically link to this definition. Its slightly complex becase certain elements have to be ignored, for exampleI dont want to add links within existing links, or for example link...
0
9993
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
11265
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10954
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9662
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
8036
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7191
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
6078
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
4297
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3308
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.