473,790 Members | 3,185 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Regular Expression: Matching substring

Hi,

I'm currently running into a confusion on regex and hopefully you guys
can clear it up for me.

Suppose I have a regular expression (0|(1(01*0)*1)) * and two test
strings: 110_1011101_ and _101101_1. (The underscores are not part of
the string. They are added to show that both string has a substring
that matches the pattern.) Applying a match() function on the first
string returns true while false for the second. The difference is the
first one has unmatched chunk in the beginning while the second at the
end. How's the regex rule work here?

Thanks.

Apr 13 '06 #1
7 3275
Hi Kevin,

You may notice that, for matching the regex (0|(1(01*0)*1)) *, the left
most
three characters of a string must not be ``101" while not followed by
an `0'.
After reading the first `1', automata expects `1' or ``00" or ``010"
or ``11",
right?:)
Kevin CH 寫道:
Hi,

I'm currently running into a confusion on regex and hopefully you guys
can clear it up for me.

Suppose I have a regular expression (0|(1(01*0)*1)) * and two test
strings: 110_1011101_ and _101101_1. (The underscores are not part of
the string. They are added to show that both string has a substring
that matches the pattern.) Applying a match() function on the first
string returns true while false for the second. The difference is the
first one has unmatched chunk in the beginning while the second at the
end. How's the regex rule work here?

Thanks.


Apr 13 '06 #2

Leon wrote:
Hi Kevin,

You may notice that, for matching the regex (0|(1(01*0)*1)) *, the left
most
three characters of a string must not be ``101" while not followed by
an `0'.
After reading the first `1', automata expects `1' or ``00" or ``010"
or ``11",
right?:) Why it must expect "010"? Why not say "0110", since 1* can represent 0
or more repetitions.



Kevin CH 寫道:
Hi,

I'm currently running into a confusion on regex and hopefully you guys
can clear it up for me.

Suppose I have a regular expression (0|(1(01*0)*1)) * and two test
strings: 110_1011101_ and _101101_1. (The underscores are not part of
the string. They are added to show that both string has a substring
that matches the pattern.) Applying a match() function on the first
string returns true while false for the second. The difference is the
first one has unmatched chunk in the beginning while the second at the
end. How's the regex rule work here?

Thanks.


Apr 13 '06 #3
You are right. In fact the procedure is as follows:
The substr ``101101" is no problem, if stop here, match will
successful.
But the tailing `1' occurs, so we may imagine the working automata move
to a state, which according to the regexp's outer most `)', and ready
to repeat
the whole regexp again. In this case, the answer is ``yes" only when
there exists
at least two ``1", but only one here.

BTW, the first string is matched exactly, according to your notion, it
should be written as: _11_0_1011101

Apr 13 '06 #4
Oh yea, I see what's my confusion now. In the first string, I didn't
consider 11 and 0 matches the pattern without the repetition. Actually
what I did is I entered the pattern and the test strings into kudos (a
python regexp debugger) and got the match groups, which didn't have 11
and 0 as matches, although they do match the pattern "(0|1(01*0)*1)" .

Thank you for the help.
Leon wrote:
You are right. In fact the procedure is as follows:
The substr ``101101" is no problem, if stop here, match will
successful.
But the tailing `1' occurs, so we may imagine the working automata move
to a state, which according to the regexp's outer most `)', and ready
to repeat
the whole regexp again. In this case, the answer is ``yes" only when
there exists
at least two ``1", but only one here.

BTW, the first string is matched exactly, according to your notion, it
should be written as: _11_0_1011101


Apr 13 '06 #5
On 13/04/2006 12:33 PM, Kevin CH wrote:
Hi,

I'm currently running into a confusion on regex and hopefully you guys
can clear it up for me.

Suppose I have a regular expression (0|(1(01*0)*1)) * and two test
strings: 110_1011101_ and _101101_1. (The underscores are not part of
the string. They are added to show that both string has a substring
that matches the pattern.) Applying a match() function on the first
string returns true while false for the second.
Perhaps you are using grep, or you have stumbled on the old deprecated
"regex" module and are using that instead of the "re" module. Perhaps
not as you are using only 2 plain vanilla RE operations which should
work the same way everywhere. Perhaps you are having trouble with
search() versus match() -- if so, read the section on this topic in the
re docs. It's rather hard to tell what you are doing without seeing the
code you are using.
The difference is the
first one has unmatched chunk in the beginning
With re's match(), the whole string matches.
while the second at the
end.
With re's match(), the part you marked with underscores (at the
*beginning*) matches.

How's the regex rule work here?


Let's abbreviate your pattern as (0|X)*
This means 0 or more occurrences of strings that match either 0 or X.

Case 1 gives us 11 matching X [it's a 1 followed by zero occurrences of
(01*0) followed by a 1], then a 0, then 1011101 matching X [it's a 1
foll. by 1 occ. of (01110) followed by a 1].

Case 2 gives us 101101 matching X [it's a 1 foll. by 1 occ of (0110)
foll by a 1] -- then there's a 1 that doesn't match anything.

Here's some code and its output:

C:\junk>type kevinch.py
import re

rx = re.compile(r"(0 |(1(01*0)*1))*" )

def doit(n, s):
print "Case", n
m = rx.match(s)
if m:
print "0123456789 "
print s
for k in range(4):
print "span(%d) -> %r" % (k, m.span(k))
else:
print "... no match"

s1 = "110_1011101_". replace('_', '')
s2 = "_101101_1".rep lace('_', '')
doit(1, s1)
doit(2, s2)

C:\junk>kevinch .py
Case 1
0123456789
1101011101
span(0) -> (0, 10)
span(1) -> (3, 10)
span(2) -> (3, 10)
span(3) -> (4, 9)
Case 2
0123456789
1011011
span(0) -> (0, 6)
span(1) -> (0, 6)
span(2) -> (0, 6)
span(3) -> (1, 5)

HTH,
John
Apr 13 '06 #6
Thank you for your reply.
Perhaps you are using grep, or you have stumbled on the old deprecated
"regex" module and are using that instead of the "re" module. Perhaps
not as you are using only 2 plain vanilla RE operations which should
work the same way everywhere. Perhaps you are having trouble with
search() versus match() -- if so, read the section on this topic in the
re docs. It's rather hard to tell what you are doing without seeing the
code you are using.


Sorry I should have said it up front. I'm using Kudos (which I'm sure
uses re module) to test these strings on the pattern, and had the match
results as I stated. (search() of course gives me true since the
pattern appears in the substrings of both strings.)

Apr 13 '06 #7
"Kevin CH" wrote:

news:11******** **************@ v46g2000cwv.goo glegroups.com.. .
Thank you for your reply.
Perhaps you are using grep, or you have stumbled on the old deprecated
"regex" module and are using that instead of the "re" module. Perhaps
not as you are using only 2 plain vanilla RE operations which should
work the same way everywhere. Perhaps you are having trouble with
search() versus match() -- if so, read the section on this topic in the
re docs. It's rather hard to tell what you are doing without seeing the
code you are using.


Sorry I should have said it up front. I'm using Kudos (which I'm sure
uses re module) to test these strings on the pattern, and had the match
results as I stated. (search() of course gives me true since the
pattern appears in the substrings of both strings.)


Python's "match" function doesn't return "true" or "false"; it returns a match
object if the string matches the pattern, and None if not. since your pattern
can match the empty string, it'll match any target string (all strings start with
an empty string), and will never return false.

looks like the "debugger" does a great job of hiding how things really work...

</F>

Apr 13 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
4185
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make regular expressions easier to create and use (and in my experience as a regular expression user, it makes them MUCH easier to create and use.) I'm still working on formal documentation, and in any case, such documentation isn't necessarily the...
3
2030
by: Tom | last post by:
I have struggled with the issue of whether or not to use Regular Expressions for a long time now, and after implementing many text manipulating solutions both ways, I've found that writing specialized code instead of an RE is almost always the better solution. Here is why.... RE's are complex. Sure it is one line of code, but it is on hell of a line. Some of my RE remind me of the obfuscated code contest winners, where one line of...
11
3921
by: Martin Robins | last post by:
I am trying to parse a string that is similar in form to an OLEDB connection string using regular expressions; in principle it is working, but certain character combinations in the string being parsed can completely wreck it. The string I am trying to parse is as follows: commandText=insert into (Text) values (@message + N': ' + @category);commandType=StoredProcedure; message=@message; category=@category I am looking to retrive name value...
7
2192
by: Patient Guy | last post by:
Coding patterns for regular expressions is completely unintuitive, as far as I can see. I have been trying to write script that produces an array of attribute components within an HTML element. Consider the example of the HTML element TABLE with the following attributes producing sufficient complexity within the element: <table id="machines" class="noborders inred" style="margin:2em 4em;background-color:#ddd;">
2
5045
by: Jonas | last post by:
I got a string from which I want to extract some info. The string has a format like this "$MyINFO $ALL %s %s$ $%s$%s$%s$|" ie "$MyINFO $ALL smurf hmm$ $LAN(T3)$yes@mail.no$85899345920$|" for doing this I use the The GNU C Library regular expression library. My matchstring looks like this "$MyINFO $ALL \\(.*\\) \\(.*\\)$ $\\(.*\\)$\\(.*\\)$\\(.*\\)$|" it works fine for the 4 strings but it's unable to match the last one which it seems to...
8
2145
by: Natalia DeBow | last post by:
Hi, I am stuck trying to come up with a regular expression for the following pattern: A string that contains "/*" but that does not contain */ within it. Basically I am searching for C-style multiline comments and would much rather use Regex than strings. Here is what I have, but it does not seem to work. Match match = Regex.Match(textLine, @"\s*(/)(?<comment>(?!/).*)$");
6
9067
by: likong | last post by:
Hi, Any idea about how to write a regular expression that matches a substring xxx as long as the string does NOT contain substring yyy? Thanks. Kong
25
5170
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART (CONDUCTION DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How do I gain access to the expression (not the matches) at runtime? Thanks, Mike
12
363
by: =?Utf-8?B?SlA=?= | last post by:
I am a newbie to regular expressions and want to extract a number from the end of a string. The string would have these formats: image/4567 image/45678 image/456789 I would also want to extract the name if possible from this string too: "image/4567">name</a>
0
9666
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
10419
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10201
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10147
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9023
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7531
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5424
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4100
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3709
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.