473,387 Members | 1,757 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Regex anomaly


Hello,

Has anyone has issue with compiled re's vis-a-vis the re.I (ignore
case) flag? I can't make sense of this compiled re producing a
different match when given the flag, odd both in it's difference from
the uncompiled regex (as I thought the uncompiled api was a wrapper
around a compile-and-execute block) and it's difference from the
compiled version with no flag specified. The match given is utter
nonsense given the input re.

In [48]: import re
In [49]: reStr = r"([a-z]+)://"
In [51]: against = "http://www.hello.com"
In [53]: re.match(reStr, against).groups()
Out[53]: ('http',)
In [54]: re.match(reStr, against, re.I).groups()
Out[54]: ('http',)
In [55]: reCompiled = re.compile(reStr)
In [56]: reCompiled.match(against).groups()
Out[56]: ('http',)
In [57]: reCompiled.match(against, re.I).groups()
Out[57]: ('tp',)

cheers,
-Mike

Jan 3 '06 #1
12 1359
<mi********@gmail.com> wrote:

Hello,

Has anyone has issue with compiled re's vis-a-vis the re.I (ignore
case) flag? I can't make sense of this compiled re producing a
different match when given the flag, odd both in it's difference from
the uncompiled regex (as I thought the uncompiled api was a wrapper
around a compile-and-execute block) and it's difference from the
compiled version with no flag specified. The match given is utter
nonsense given the input re.

In [48]: import re
In [49]: reStr = r"([a-z]+)://"
In [51]: against = "http://www.hello.com"
In [53]: re.match(reStr, against).groups()
Out[53]: ('http',)
In [54]: re.match(reStr, against, re.I).groups()
Out[54]: ('http',)
In [55]: reCompiled = re.compile(reStr)
In [56]: reCompiled.match(against).groups()
Out[56]: ('http',)
In [57]: reCompiled.match(against, re.I).groups()
Out[57]: ('tp',)


LOL, and you'll be LOL too when you see the problem :-)

You can't give the re.I flag to reCompiled.match(). You have to give
it to re.compile(). The second argument to reCompiled.match() is the
position where to start searching. I'm guessing re.I is defined as 2,
which explains the match you got.

This is actually one of those places where duck typing let us down.
If we had type bondage, re.I would be an instance of RegExFlags, and
reCompiled.match() would have thrown a TypeError when the second
argument wasn't an integer. I'm not saying type bondage is inherently
better than duck typing, just that it has its benefits at times.
Jan 3 '06 #2
On 2 Jan 2006 21:00:53 -0800, mi********@gmail.com <mi********@gmail.com> wrote:

Has anyone has issue with compiled re's vis-a-vis the re.I (ignore
case) flag? I can't make sense of this compiled re producing a
different match when given the flag, odd both in it's difference from
the uncompiled regex (as I thought the uncompiled api was a wrapper
around a compile-and-execute block) and it's difference from the
compiled version with no flag specified. The match given is utter
nonsense given the input re.


The re.compile and re.match methods take the flag parameter:

compile( pattern[, flags])
match( pattern, string[, flags])

But the regular expression object method takes different paramters:

match( string[, pos[, endpos]])

It's not a little confusing that the parameters to re.match() and
re.compile().match() are so different, but that's the cause of what
you're seeing.

You need to do:

reCompiled = re.compile(reStr, re.I)
reCompiled.match(against).groups()

to get the behaviour you want.

Andrew
Jan 3 '06 #3
>>>>> mike klaas <mi********@gmail.com> writes:
In [48]: import re
In [49]: reStr = r"([a-z]+)://"
In [51]: against = "http://www.hello.com"
In [53]: re.match(reStr, against).groups()
Out[53]: ('http',)
In [54]: re.match(reStr, against, re.I).groups()
Out[54]: ('http',)
In [55]: reCompiled = re.compile(reStr)
In [56]: reCompiled.match(against).groups()
Out[56]: ('http',)
In [57]: reCompiled.match(against, re.I).groups()
Out[57]: ('tp',)


I can reproduce this on Debian Linux testing, both python 2.3 and python
2.4. Seems like a bug. search() also exhibits the same behavior.

Ganesan
--
Ganesan Rajagopal (rganesan at debian.org) | GPG Key: 1024D/5D8C12EA
Web: http://employees.org/~rganesan | http://rganesan.blogspot.com
Jan 3 '06 #4
Thanks guys, that is probably the most ridiculous mistake I've made in
years <g>

-Mike

Jan 3 '06 #5
In article <11**********************@f14g2000cwb.googlegroups .com>,
mi********@gmail.com wrote:
Thanks guys, that is probably the most ridiculous mistake I've made in
years <g>

-Mike


If that's the more ridiculous you can come up with, you're not trying hard
enough. I've done much worse.
Jan 3 '06 #6
>>>>> mike klaas <mi********@gmail.com> writes:
Thanks guys, that is probably the most ridiculous mistake I've made in
years <g>


I was taken too :-). This is quite embarassing, considering that I remember
reading a big thread in python devel list about this a while back!

Ganesan

--
Ganesan Rajagopal (rganesan at debian.org) | GPG Key: 1024D/5D8C12EA
Web: http://employees.org/~rganesan | http://rganesan.blogspot.com
Jan 3 '06 #7
Would this particular inconsistency be candidate for change in Py3k?
Seems to me the pos and endpos arguments are redundant with slicing,
and the re.match function would benefit from having the same arguments
as pattern.match. Of course, this is a backwards-incompatible change;
that's why I suggested Py3k.

Jan 3 '06 #8
On 3 Jan 2006 02:20:52 -0800, Sam Pointon <fr*************@gmail.com> wrote:
Would this particular inconsistency be candidate for change in Py3k?
Seems to me the pos and endpos arguments are redundant with slicing,
Being able to specify the start and end indices for a search is
important when working with very large strings (multimegabyte) --
where slicing would create a copy, specifying pos and endpos allows
for memory-efficient searching in limited areas of a string.
and the re.match function would benefit from having the same arguments
as pattern.match.


Not at all; the flags need to be specified when the regex is compiled,
as they affect the compiled representation (finite state automaton I
expect) of the regex. If the flags were given in pattern.match(), then
there'd be no performance benefit gained from precompiling the regex.

Andrew
Jan 3 '06 #9
In article <11**********************@f14g2000cwb.googlegroups .com>,
"Sam Pointon" <fr*************@gmail.com> wrote:
Would this particular inconsistency be candidate for change in Py3k?
Seems to me the pos and endpos arguments are redundant with slicing,
and the re.match function would benefit from having the same arguments
as pattern.match. Of course, this is a backwards-incompatible change;
that's why I suggested Py3k.


I don't see any way to implement re.I at match time; it's something that
needs to get done at regex compile time. It's available in the
module-level match() call, because that one is really compile-then-match().
Jan 3 '06 #10
In article <ro***********************@reader2.panix.com>,
Roy Smith <ro*@panix.com> wrote:
In article <11**********************@f14g2000cwb.googlegroups .com>,
"Sam Pointon" <fr*************@gmail.com> wrote:
Would this particular inconsistency be candidate for change in Py3k?
Seems to me the pos and endpos arguments are redundant with slicing,
and the re.match function would benefit from having the same arguments
as pattern.match. Of course, this is a backwards-incompatible change;
that's why I suggested Py3k.


I don't see any way to implement re.I at match time;


It's easy: just compile two machines, one with re.I and one without and
package them as if they were one. Then use the flag to pick a compiled
machine at run time.

rg
Jan 3 '06 #11
Roy Smith wrote:
LOL, and you'll be LOL too when you see the problem :-)

You can't give the re.I flag to reCompiled.match(). You have to give
it to re.compile(). The second argument to reCompiled.match() is the
position where to start searching. I'm guessing re.I is defined as 2,
which explains the match you got.

This is actually one of those places where duck typing let us down.
If we had type bondage, re.I would be an instance of RegExFlags, and
reCompiled.match() would have thrown a TypeError when the second
argument wasn't an integer. I'm not saying type bondage is inherently
better than duck typing, just that it has its benefits at times.

Even with duck-typing, we could cut our users a break. Making
our flags instances of a distinct class doesn't actually require
type bondage.

We could define the __or__ method for RegExFlags, but really,
or-ing together integer flags is old habit from low-level
languages. Really we should pass a set of flags.
--
--Bryan
Jan 5 '06 #12

Bryan> We could define the __or__ method for RegExFlags, but really,
Bryan> or-ing together integer flags is old habit from low-level
Bryan> languages. Really we should pass a set of flags.

Good idea. Added to the Python3.0Suggestions wiki page:

http://wiki.python.org/moin/Python3%2e0Suggestions

Skip
Jan 5 '06 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Tim Conner | last post by:
Is there a way to write a faster function ? public static bool IsNumber( char Value ) { if (Regex.IsMatch( Value.ToString(), @"^+$" )) { return true; } else return false; }
20
by: jeevankodali | last post by:
Hi I have an .Net application which processes thousands of Xml nodes each day and for each node I am using around 30-40 Regex matches to see if they satisfy some conditions are not. These Regex...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
16
by: clintonG | last post by:
At design-time the application just decides to go boom claiming it can't find a dll. This occurs sporadically. Doing a simple edit in the HTML for example and then viewing the application has...
6
by: Extremest | last post by:
I have a huge regex setup going on. If I don't do each one by itself instead of all in one it won't work for. Also would like to know if there is a faster way tried to use string.replace with all...
7
by: Extremest | last post by:
I am using this regex. static Regex paranthesis = new Regex("(\\d*/\\d*)", RegexOptions.IgnoreCase); it should find everything between parenthesis that have some numbers onyl then a forward...
3
by: aspineux | last post by:
My goal is to write a parser for these imaginary string from the SMTP protocol, regarding RFC 821 and 1869. I'm a little flexible with the BNF from these RFC :-) Any comment ? tests= def...
1
by: mai | last post by:
Hi everyone, i'm trying to exhibit FIFO anomaly(page replacement algorithm),, I searched over 2000 random strings but i couldnt find any anomaly,, am i I doing it right?,, Please help,,,The...
15
by: morleyc | last post by:
Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.