Hi,
I'm currently running into a confusion on regex and hopefully you guys
can clear it up for me.
Suppose I have a regular expression (0|(1(01*0)*1))* and two test
strings: 110_1011101_ and _101101_1. (The underscores are not part of
the string. They are added to show that both string has a substring
that matches the pattern.) Applying a match() function on the first
string returns true while false for the second. The difference is the
first one has unmatched chunk in the beginning while the second at the
end. How's the regex rule work here?
Thanks. 7 3256
Hi Kevin,
You may notice that, for matching the regex (0|(1(01*0)*1))*, the left
most
three characters of a string must not be ``101" while not followed by
an `0'.
After reading the first `1', automata expects `1' or ``00" or ``010"
or ``11",
right?:)
Kevin CH 寫道: Hi,
I'm currently running into a confusion on regex and hopefully you guys can clear it up for me.
Suppose I have a regular expression (0|(1(01*0)*1))* and two test strings: 110_1011101_ and _101101_1. (The underscores are not part of the string. They are added to show that both string has a substring that matches the pattern.) Applying a match() function on the first string returns true while false for the second. The difference is the first one has unmatched chunk in the beginning while the second at the end. How's the regex rule work here? Thanks.
Leon wrote: Hi Kevin,
You may notice that, for matching the regex (0|(1(01*0)*1))*, the left most three characters of a string must not be ``101" while not followed by an `0'. After reading the first `1', automata expects `1' or ``00" or ``010" or ``11", right?:)
Why it must expect "010"? Why not say "0110", since 1* can represent 0
or more repetitions.
Kevin CH 寫道:
Hi,
I'm currently running into a confusion on regex and hopefully you guys can clear it up for me.
Suppose I have a regular expression (0|(1(01*0)*1))* and two test strings: 110_1011101_ and _101101_1. (The underscores are not part of the string. They are added to show that both string has a substring that matches the pattern.) Applying a match() function on the first string returns true while false for the second. The difference is the first one has unmatched chunk in the beginning while the second at the end. How's the regex rule work here? Thanks.
You are right. In fact the procedure is as follows:
The substr ``101101" is no problem, if stop here, match will
successful.
But the tailing `1' occurs, so we may imagine the working automata move
to a state, which according to the regexp's outer most `)', and ready
to repeat
the whole regexp again. In this case, the answer is ``yes" only when
there exists
at least two ``1", but only one here.
BTW, the first string is matched exactly, according to your notion, it
should be written as: _11_0_1011101
Oh yea, I see what's my confusion now. In the first string, I didn't
consider 11 and 0 matches the pattern without the repetition. Actually
what I did is I entered the pattern and the test strings into kudos (a
python regexp debugger) and got the match groups, which didn't have 11
and 0 as matches, although they do match the pattern "(0|1(01*0)*1)".
Thank you for the help.
Leon wrote: You are right. In fact the procedure is as follows: The substr ``101101" is no problem, if stop here, match will successful. But the tailing `1' occurs, so we may imagine the working automata move to a state, which according to the regexp's outer most `)', and ready to repeat the whole regexp again. In this case, the answer is ``yes" only when there exists at least two ``1", but only one here.
BTW, the first string is matched exactly, according to your notion, it should be written as: _11_0_1011101
On 13/04/2006 12:33 PM, Kevin CH wrote: Hi,
I'm currently running into a confusion on regex and hopefully you guys can clear it up for me.
Suppose I have a regular expression (0|(1(01*0)*1))* and two test strings: 110_1011101_ and _101101_1. (The underscores are not part of the string. They are added to show that both string has a substring that matches the pattern.) Applying a match() function on the first string returns true while false for the second.
Perhaps you are using grep, or you have stumbled on the old deprecated
"regex" module and are using that instead of the "re" module. Perhaps
not as you are using only 2 plain vanilla RE operations which should
work the same way everywhere. Perhaps you are having trouble with
search() versus match() -- if so, read the section on this topic in the
re docs. It's rather hard to tell what you are doing without seeing the
code you are using.
The difference is the first one has unmatched chunk in the beginning
With re's match(), the whole string matches.
while the second at the end.
With re's match(), the part you marked with underscores (at the
*beginning*) matches.
How's the regex rule work here?
Let's abbreviate your pattern as (0|X)*
This means 0 or more occurrences of strings that match either 0 or X.
Case 1 gives us 11 matching X [it's a 1 followed by zero occurrences of
(01*0) followed by a 1], then a 0, then 1011101 matching X [it's a 1
foll. by 1 occ. of (01110) followed by a 1].
Case 2 gives us 101101 matching X [it's a 1 foll. by 1 occ of (0110)
foll by a 1] -- then there's a 1 that doesn't match anything.
Here's some code and its output:
C:\junk>type kevinch.py
import re
rx = re.compile(r"(0|(1(01*0)*1))*")
def doit(n, s):
print "Case", n
m = rx.match(s)
if m:
print "0123456789"
print s
for k in range(4):
print "span(%d) -> %r" % (k, m.span(k))
else:
print "... no match"
s1 = "110_1011101_".replace('_', '')
s2 = "_101101_1".replace('_', '')
doit(1, s1)
doit(2, s2)
C:\junk>kevinch.py
Case 1
0123456789
1101011101
span(0) -> (0, 10)
span(1) -> (3, 10)
span(2) -> (3, 10)
span(3) -> (4, 9)
Case 2
0123456789
1011011
span(0) -> (0, 6)
span(1) -> (0, 6)
span(2) -> (0, 6)
span(3) -> (1, 5)
HTH,
John
Thank you for your reply. Perhaps you are using grep, or you have stumbled on the old deprecated "regex" module and are using that instead of the "re" module. Perhaps not as you are using only 2 plain vanilla RE operations which should work the same way everywhere. Perhaps you are having trouble with search() versus match() -- if so, read the section on this topic in the re docs. It's rather hard to tell what you are doing without seeing the code you are using.
Sorry I should have said it up front. I'm using Kudos (which I'm sure
uses re module) to test these strings on the pattern, and had the match
results as I stated. (search() of course gives me true since the
pattern appears in the substrings of both strings.)
"Kevin CH" wrote:
news:11**********************@v46g2000cwv.googlegr oups.com... Thank you for your reply.
Perhaps you are using grep, or you have stumbled on the old deprecated "regex" module and are using that instead of the "re" module. Perhaps not as you are using only 2 plain vanilla RE operations which should work the same way everywhere. Perhaps you are having trouble with search() versus match() -- if so, read the section on this topic in the re docs. It's rather hard to tell what you are doing without seeing the code you are using.
Sorry I should have said it up front. I'm using Kudos (which I'm sure uses re module) to test these strings on the pattern, and had the match results as I stated. (search() of course gives me true since the pattern appears in the substrings of both strings.)
Python's "match" function doesn't return "true" or "false"; it returns a match
object if the string matches the pattern, and None if not. since your pattern
can match the empty string, it'll match any target string (all strings start with
an empty string), and will never return false.
looks like the "debugger" does a great job of hiding how things really work...
</F> This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Kenneth McDonald |
last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate
feedback, suggestions, and criticism as I work towards finalizing the
API and feature sets. rex is a module intended to make...
|
by: Tom |
last post by:
I have struggled with the issue of whether or not to use Regular
Expressions for a long time now, and after implementing many text
manipulating solutions both ways, I've found that writing...
|
by: Martin Robins |
last post by:
I am trying to parse a string that is similar in form to an OLEDB connection string using regular expressions; in principle it is working, but certain character combinations in the string being...
|
by: Patient Guy |
last post by:
Coding patterns for regular expressions is completely unintuitive, as far
as I can see. I have been trying to write script that produces an array
of attribute components within an HTML element.
...
|
by: Jonas |
last post by:
I got a string from which I want to extract some info. The string has a
format like this "$MyINFO $ALL %s %s$ $%s$%s$%s$|" ie "$MyINFO $ALL
smurf hmm$ $LAN(T3)$yes@mail.no$85899345920$|" for doing...
| |
by: Natalia DeBow |
last post by:
Hi,
I am stuck trying to come up with a regular expression for the following
pattern:
A string that contains "/*" but that does not contain */ within it.
Basically I am searching for C-style...
|
by: likong |
last post by:
Hi,
Any idea about how to write a regular expression that matches a
substring xxx as long as the string does NOT contain substring yyy?
Thanks.
Kong
|
by: Mike |
last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in
matches. I would like to get what the actual regular expression is.
In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
|
by: =?Utf-8?B?SlA=?= |
last post by:
I am a newbie to regular expressions and want to extract a number from the
end of a string. The string would have these formats:
image/4567
image/45678
image/456789
I would also want to...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...
| |