473,490 Members | 2,473 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Regular Expression: Matching substring

Hi,

I'm currently running into a confusion on regex and hopefully you guys
can clear it up for me.

Suppose I have a regular expression (0|(1(01*0)*1))* and two test
strings: 110_1011101_ and _101101_1. (The underscores are not part of
the string. They are added to show that both string has a substring
that matches the pattern.) Applying a match() function on the first
string returns true while false for the second. The difference is the
first one has unmatched chunk in the beginning while the second at the
end. How's the regex rule work here?

Thanks.

Apr 13 '06 #1
7 3256
Hi Kevin,

You may notice that, for matching the regex (0|(1(01*0)*1))*, the left
most
three characters of a string must not be ``101" while not followed by
an `0'.
After reading the first `1', automata expects `1' or ``00" or ``010"
or ``11",
right?:)
Kevin CH 寫道:
Hi,

I'm currently running into a confusion on regex and hopefully you guys
can clear it up for me.

Suppose I have a regular expression (0|(1(01*0)*1))* and two test
strings: 110_1011101_ and _101101_1. (The underscores are not part of
the string. They are added to show that both string has a substring
that matches the pattern.) Applying a match() function on the first
string returns true while false for the second. The difference is the
first one has unmatched chunk in the beginning while the second at the
end. How's the regex rule work here?

Thanks.


Apr 13 '06 #2

Leon wrote:
Hi Kevin,

You may notice that, for matching the regex (0|(1(01*0)*1))*, the left
most
three characters of a string must not be ``101" while not followed by
an `0'.
After reading the first `1', automata expects `1' or ``00" or ``010"
or ``11",
right?:) Why it must expect "010"? Why not say "0110", since 1* can represent 0
or more repetitions.



Kevin CH 寫道:
Hi,

I'm currently running into a confusion on regex and hopefully you guys
can clear it up for me.

Suppose I have a regular expression (0|(1(01*0)*1))* and two test
strings: 110_1011101_ and _101101_1. (The underscores are not part of
the string. They are added to show that both string has a substring
that matches the pattern.) Applying a match() function on the first
string returns true while false for the second. The difference is the
first one has unmatched chunk in the beginning while the second at the
end. How's the regex rule work here?

Thanks.


Apr 13 '06 #3
You are right. In fact the procedure is as follows:
The substr ``101101" is no problem, if stop here, match will
successful.
But the tailing `1' occurs, so we may imagine the working automata move
to a state, which according to the regexp's outer most `)', and ready
to repeat
the whole regexp again. In this case, the answer is ``yes" only when
there exists
at least two ``1", but only one here.

BTW, the first string is matched exactly, according to your notion, it
should be written as: _11_0_1011101

Apr 13 '06 #4
Oh yea, I see what's my confusion now. In the first string, I didn't
consider 11 and 0 matches the pattern without the repetition. Actually
what I did is I entered the pattern and the test strings into kudos (a
python regexp debugger) and got the match groups, which didn't have 11
and 0 as matches, although they do match the pattern "(0|1(01*0)*1)".

Thank you for the help.
Leon wrote:
You are right. In fact the procedure is as follows:
The substr ``101101" is no problem, if stop here, match will
successful.
But the tailing `1' occurs, so we may imagine the working automata move
to a state, which according to the regexp's outer most `)', and ready
to repeat
the whole regexp again. In this case, the answer is ``yes" only when
there exists
at least two ``1", but only one here.

BTW, the first string is matched exactly, according to your notion, it
should be written as: _11_0_1011101


Apr 13 '06 #5
On 13/04/2006 12:33 PM, Kevin CH wrote:
Hi,

I'm currently running into a confusion on regex and hopefully you guys
can clear it up for me.

Suppose I have a regular expression (0|(1(01*0)*1))* and two test
strings: 110_1011101_ and _101101_1. (The underscores are not part of
the string. They are added to show that both string has a substring
that matches the pattern.) Applying a match() function on the first
string returns true while false for the second.
Perhaps you are using grep, or you have stumbled on the old deprecated
"regex" module and are using that instead of the "re" module. Perhaps
not as you are using only 2 plain vanilla RE operations which should
work the same way everywhere. Perhaps you are having trouble with
search() versus match() -- if so, read the section on this topic in the
re docs. It's rather hard to tell what you are doing without seeing the
code you are using.
The difference is the
first one has unmatched chunk in the beginning
With re's match(), the whole string matches.
while the second at the
end.
With re's match(), the part you marked with underscores (at the
*beginning*) matches.

How's the regex rule work here?


Let's abbreviate your pattern as (0|X)*
This means 0 or more occurrences of strings that match either 0 or X.

Case 1 gives us 11 matching X [it's a 1 followed by zero occurrences of
(01*0) followed by a 1], then a 0, then 1011101 matching X [it's a 1
foll. by 1 occ. of (01110) followed by a 1].

Case 2 gives us 101101 matching X [it's a 1 foll. by 1 occ of (0110)
foll by a 1] -- then there's a 1 that doesn't match anything.

Here's some code and its output:

C:\junk>type kevinch.py
import re

rx = re.compile(r"(0|(1(01*0)*1))*")

def doit(n, s):
print "Case", n
m = rx.match(s)
if m:
print "0123456789"
print s
for k in range(4):
print "span(%d) -> %r" % (k, m.span(k))
else:
print "... no match"

s1 = "110_1011101_".replace('_', '')
s2 = "_101101_1".replace('_', '')
doit(1, s1)
doit(2, s2)

C:\junk>kevinch.py
Case 1
0123456789
1101011101
span(0) -> (0, 10)
span(1) -> (3, 10)
span(2) -> (3, 10)
span(3) -> (4, 9)
Case 2
0123456789
1011011
span(0) -> (0, 6)
span(1) -> (0, 6)
span(2) -> (0, 6)
span(3) -> (1, 5)

HTH,
John
Apr 13 '06 #6
Thank you for your reply.
Perhaps you are using grep, or you have stumbled on the old deprecated
"regex" module and are using that instead of the "re" module. Perhaps
not as you are using only 2 plain vanilla RE operations which should
work the same way everywhere. Perhaps you are having trouble with
search() versus match() -- if so, read the section on this topic in the
re docs. It's rather hard to tell what you are doing without seeing the
code you are using.


Sorry I should have said it up front. I'm using Kudos (which I'm sure
uses re module) to test these strings on the pattern, and had the match
results as I stated. (search() of course gives me true since the
pattern appears in the substrings of both strings.)

Apr 13 '06 #7
"Kevin CH" wrote:

news:11**********************@v46g2000cwv.googlegr oups.com...
Thank you for your reply.
Perhaps you are using grep, or you have stumbled on the old deprecated
"regex" module and are using that instead of the "re" module. Perhaps
not as you are using only 2 plain vanilla RE operations which should
work the same way everywhere. Perhaps you are having trouble with
search() versus match() -- if so, read the section on this topic in the
re docs. It's rather hard to tell what you are doing without seeing the
code you are using.


Sorry I should have said it up front. I'm using Kudos (which I'm sure
uses re module) to test these strings on the pattern, and had the match
results as I stated. (search() of course gives me true since the
pattern appears in the substrings of both strings.)


Python's "match" function doesn't return "true" or "false"; it returns a match
object if the string matches the pattern, and None if not. since your pattern
can match the empty string, it'll match any target string (all strings start with
an empty string), and will never return false.

looks like the "debugger" does a great job of hiding how things really work...

</F>

Apr 13 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
4154
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
3
1996
by: Tom | last post by:
I have struggled with the issue of whether or not to use Regular Expressions for a long time now, and after implementing many text manipulating solutions both ways, I've found that writing...
11
3857
by: Martin Robins | last post by:
I am trying to parse a string that is similar in form to an OLEDB connection string using regular expressions; in principle it is working, but certain character combinations in the string being...
7
2175
by: Patient Guy | last post by:
Coding patterns for regular expressions is completely unintuitive, as far as I can see. I have been trying to write script that produces an array of attribute components within an HTML element. ...
2
5026
by: Jonas | last post by:
I got a string from which I want to extract some info. The string has a format like this "$MyINFO $ALL %s %s$ $%s$%s$%s$|" ie "$MyINFO $ALL smurf hmm$ $LAN(T3)$yes@mail.no$85899345920$|" for doing...
8
2108
by: Natalia DeBow | last post by:
Hi, I am stuck trying to come up with a regular expression for the following pattern: A string that contains "/*" but that does not contain */ within it. Basically I am searching for C-style...
6
9055
by: likong | last post by:
Hi, Any idea about how to write a regular expression that matches a substring xxx as long as the string does NOT contain substring yyy? Thanks. Kong
25
5128
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
12
363
by: =?Utf-8?B?SlA=?= | last post by:
I am a newbie to regular expressions and want to extract a number from the end of a string. The string would have these formats: image/4567 image/45678 image/456789 I would also want to...
0
7108
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
6967
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
1
6847
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7352
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
4875
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4565
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
1383
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
618
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
272
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.