Regex anomaly - Python

mike.klaas

Hello,

Has anyone has issue with compiled re's vis-a-vis the re.I (ignore
case) flag? I can't make sense of this compiled re producing a
different match when given the flag, odd both in it's difference from
the uncompiled regex (as I thought the uncompiled api was a wrapper
around a compile-and-execute block) and it's difference from the
compiled version with no flag specified. The match given is utter
nonsense given the input re.

In [48]: import re
In [49]: reStr = r"([a-z]+)://"
In [51]: against = "http://www.hello.com"
In [53]: re.match(reStr, against).groups()
Out[53]: ('http',)
In [54]: re.match(reStr, against, re.I).groups()
Out[54]: ('http',)
In [55]: reCompiled = re.compile(reStr)
In [56]: reCompiled.match(against).groups()
Out[56]: ('http',)
In [57]: reCompiled.match(against, re.I).groups()
Out[57]: ('tp',)

cheers,
-Mike

Jan 3 '06 #1

Subscribe Post Reply

1359

Roy Smith

<mi********@gmail.com> wrote:

Hello,

Has anyone has issue with compiled re's vis-a-vis the re.I (ignore
case) flag? I can't make sense of this compiled re producing a
different match when given the flag, odd both in it's difference from
the uncompiled regex (as I thought the uncompiled api was a wrapper
around a compile-and-execute block) and it's difference from the
compiled version with no flag specified. The match given is utter
nonsense given the input re.

In [48]: import re
In [49]: reStr = r"([a-z]+)://"
In [51]: against = "http://www.hello.com"
In [53]: re.match(reStr, against).groups()
Out[53]: ('http',)
In [54]: re.match(reStr, against, re.I).groups()
Out[54]: ('http',)
In [55]: reCompiled = re.compile(reStr)
In [56]: reCompiled.match(against).groups()
Out[56]: ('http',)
In [57]: reCompiled.match(against, re.I).groups()
Out[57]: ('tp',)

LOL, and you'll be LOL too when you see the problem :-)

You can't give the re.I flag to reCompiled.match(). You have to give
it to re.compile(). The second argument to reCompiled.match() is the
position where to start searching. I'm guessing re.I is defined as 2,
which explains the match you got.

This is actually one of those places where duck typing let us down.
If we had type bondage, re.I would be an instance of RegExFlags, and
reCompiled.match() would have thrown a TypeError when the second
argument wasn't an integer. I'm not saying type bondage is inherently
better than duck typing, just that it has its benefits at times.

Jan 3 '06 #2

Andrew Durdin

On 2 Jan 2006 21:00:53 -0800, mi********@gmail.com <mi********@gmail.com> wrote:

Has anyone has issue with compiled re's vis-a-vis the re.I (ignore
case) flag? I can't make sense of this compiled re producing a
different match when given the flag, odd both in it's difference from
the uncompiled regex (as I thought the uncompiled api was a wrapper
around a compile-and-execute block) and it's difference from the
compiled version with no flag specified. The match given is utter
nonsense given the input re.

The re.compile and re.match methods take the flag parameter:

compile( pattern[, flags])
match( pattern, string[, flags])

But the regular expression object method takes different paramters:

match( string[, pos[, endpos]])

It's not a little confusing that the parameters to re.match() and
re.compile().match() are so different, but that's the cause of what
you're seeing.

You need to do:

reCompiled = re.compile(reStr, re.I)
reCompiled.match(against).groups()

to get the behaviour you want.

Andrew

Jan 3 '06 #3

Ganesan Rajagopal

>>>>> mike klaas <mi********@gmail.com> writes:

In [48]: import re
In [49]: reStr = r"([a-z]+)://"
In [51]: against = "http://www.hello.com"
In [53]: re.match(reStr, against).groups()
Out[53]: ('http',)
In [54]: re.match(reStr, against, re.I).groups()
Out[54]: ('http',)
In [55]: reCompiled = re.compile(reStr)
In [56]: reCompiled.match(against).groups()
Out[56]: ('http',)
In [57]: reCompiled.match(against, re.I).groups()
Out[57]: ('tp',)

I can reproduce this on Debian Linux testing, both python 2.3 and python
2.4. Seems like a bug. search() also exhibits the same behavior.

Ganesan
--
Ganesan Rajagopal (rganesan at debian.org) | GPG Key: 1024D/5D8C12EA
Web: http://employees.org/~rganesan | http://rganesan.blogspot.com

Jan 3 '06 #4

mike.klaas

Thanks guys, that is probably the most ridiculous mistake I've made in
years <g>

-Mike

Jan 3 '06 #5

Roy Smith

In article <11**********************@f14g2000cwb.googlegroups .com>,
mi********@gmail.com wrote:

Thanks guys, that is probably the most ridiculous mistake I've made in
years <g>

-Mike

If that's the more ridiculous you can come up with, you're not trying hard
enough. I've done much worse.

Jan 3 '06 #6

Ganesan Rajagopal

>>>>> mike klaas <mi********@gmail.com> writes:

Thanks guys, that is probably the most ridiculous mistake I've made in
years <g>

I was taken too :-). This is quite embarassing, considering that I remember
reading a big thread in python devel list about this a while back!

Ganesan

--
Ganesan Rajagopal (rganesan at debian.org) | GPG Key: 1024D/5D8C12EA
Web: http://employees.org/~rganesan | http://rganesan.blogspot.com

Jan 3 '06 #7

Sam Pointon

Would this particular inconsistency be candidate for change in Py3k?
Seems to me the pos and endpos arguments are redundant with slicing,
and the re.match function would benefit from having the same arguments
as pattern.match. Of course, this is a backwards-incompatible change;
that's why I suggested Py3k.

Jan 3 '06 #8

Andrew Durdin

On 3 Jan 2006 02:20:52 -0800, Sam Pointon <fr*************@gmail.com> wrote:

Would this particular inconsistency be candidate for change in Py3k?
Seems to me the pos and endpos arguments are redundant with slicing,
Being able to specify the start and end indices for a search is
important when working with very large strings (multimegabyte) --
where slicing would create a copy, specifying pos and endpos allows
for memory-efficient searching in limited areas of a string.
and the re.match function would benefit from having the same arguments
as pattern.match.

Not at all; the flags need to be specified when the regex is compiled,
as they affect the compiled representation (finite state automaton I
expect) of the regex. If the flags were given in pattern.match(), then
there'd be no performance benefit gained from precompiling the regex.

Andrew

Jan 3 '06 #9

Roy Smith

In article <11**********************@f14g2000cwb.googlegroups .com>,
"Sam Pointon" <fr*************@gmail.com> wrote:

Would this particular inconsistency be candidate for change in Py3k?
Seems to me the pos and endpos arguments are redundant with slicing,
and the re.match function would benefit from having the same arguments
as pattern.match. Of course, this is a backwards-incompatible change;
that's why I suggested Py3k.

I don't see any way to implement re.I at match time; it's something that
needs to get done at regex compile time. It's available in the
module-level match() call, because that one is really compile-then-match().

Jan 3 '06 #10

Ron Garret

In article <ro***********************@reader2.panix.com>,
Roy Smith <ro*@panix.com> wrote:

In article <11**********************@f14g2000cwb.googlegroups .com>,
"Sam Pointon" <fr*************@gmail.com> wrote:
Would this particular inconsistency be candidate for change in Py3k?
Seems to me the pos and endpos arguments are redundant with slicing,
and the re.match function would benefit from having the same arguments
as pattern.match. Of course, this is a backwards-incompatible change;
that's why I suggested Py3k.

I don't see any way to implement re.I at match time;

It's easy: just compile two machines, one with re.I and one without and
package them as if they were one. Then use the flag to pick a compiled
machine at run time.

rg

Jan 3 '06 #11

Bryan Olson

Roy Smith wrote:

LOL, and you'll be LOL too when you see the problem :-)

You can't give the re.I flag to reCompiled.match(). You have to give
it to re.compile(). The second argument to reCompiled.match() is the
position where to start searching. I'm guessing re.I is defined as 2,
which explains the match you got.

This is actually one of those places where duck typing let us down.
If we had type bondage, re.I would be an instance of RegExFlags, and
reCompiled.match() would have thrown a TypeError when the second
argument wasn't an integer. I'm not saying type bondage is inherently
better than duck typing, just that it has its benefits at times.

Even with duck-typing, we could cut our users a break. Making
our flags instances of a distinct class doesn't actually require
type bondage.

We could define the __or__ method for RegExFlags, but really,
or-ing together integer flags is old habit from low-level
languages. Really we should pass a set of flags.
--
--Bryan

Jan 5 '06 #12

skip

Bryan> We could define the __or__ method for RegExFlags, but really,
Bryan> or-ing together integer flags is old habit from low-level
Bryan> languages. Really we should pass a set of flags.

Good idea. Added to the Python3.0Suggestions wiki page:

http://wiki.python.org/moin/Python3%2e0Suggestions

Skip

Jan 5 '06 #13

Similar topics

How can I do this without Regex ?

by: Tim Conner | last post by:

Is there a way to write a faster function ? public static bool IsNumber( char Value ) { if (Regex.IsMatch( Value.ToString(), @"^+$" )) { return true; } else return false; }

C# / C Sharp

Regex - Memory performance

by: jeevankodali | last post by:

Hi I have an .Net application which processes thousands of Xml nodes each day and for each node I am using around 30-40 Regex matches to see if they satisfy some conditions are not. These Regex...

C# / C Sharp

Which RegEx Testing Tool Do You Prefer?

by: clintonG | last post by:

I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...

ASP.NET

What's this 2.0 compilation anomaly?

by: clintonG | last post by:

At design-time the application just decides to go boom claiming it can't find a dll. This occurs sporadically. Doing a simple edit in the HTML for example and then viewing the application has...

ASP.NET

How to get rid of the regex????

by: Extremest | last post by:

I have a huge regex setup going on. If I don't do each one by itself instead of all in one it won't work for. Also would like to know if there is a faster way tried to use string.replace with all...

C# / C Sharp

Quick regex question

by: Extremest | last post by:

I am using this regex. static Regex paranthesis = new Regex("(\\d*/\\d*)", RegexOptions.IgnoreCase); it should find everything between parenthesis that have some numbers onyl then a forward...

C# / C Sharp

A nice way to use regex for complicate parsing

by: aspineux | last post by:

My goal is to write a parser for these imaginary string from the SMTP protocol, regarding RFC 821 and 1869. I'm a little flexible with the BNF from these RFC :-) Any comment ? tests= def...

Python

FIFO Anomaly,,

by: mai | last post by:

Hi everyone, i'm trying to exhibit FIFO anomaly(page replacement algorithm),, I searched over 2000 random strings but i couldnt find any anomaly,, am i I doing it right?,, Please help,,,The...

C / C++

Regex to remove \t \r \n from string

by: morleyc | last post by:

Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris

C# / C Sharp

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing