Hello,
Has anyone has issue with compiled re's vis-a-vis the re.I (ignore
case) flag? I can't make sense of this compiled re producing a
different match when given the flag, odd both in it's difference from
the uncompiled regex (as I thought the uncompiled api was a wrapper
around a compile-and-execute block) and it's difference from the
compiled version with no flag specified. The match given is utter
nonsense given the input re.
In [48]: import re
In [49]: reStr = r"([a-z]+)://"
In [51]: against = "http://www.hello.com"
In [53]: re.match(reStr, against).groups()
Out[53]: ('http',)
In [54]: re.match(reStr, against, re.I).groups()
Out[54]: ('http',)
In [55]: reCompiled = re.compile(reStr)
In [56]: reCompiled.match(against).groups()
Out[56]: ('http',)
In [57]: reCompiled.match(against, re.I).groups()
Out[57]: ('tp',)
cheers,
-Mike 12 1359
<mi********@gmail.com> wrote: Hello,
Has anyone has issue with compiled re's vis-a-vis the re.I (ignore case) flag? I can't make sense of this compiled re producing a different match when given the flag, odd both in it's difference from the uncompiled regex (as I thought the uncompiled api was a wrapper around a compile-and-execute block) and it's difference from the compiled version with no flag specified. The match given is utter nonsense given the input re.
In [48]: import re In [49]: reStr = r"([a-z]+)://" In [51]: against = "http://www.hello.com" In [53]: re.match(reStr, against).groups() Out[53]: ('http',) In [54]: re.match(reStr, against, re.I).groups() Out[54]: ('http',) In [55]: reCompiled = re.compile(reStr) In [56]: reCompiled.match(against).groups() Out[56]: ('http',) In [57]: reCompiled.match(against, re.I).groups() Out[57]: ('tp',)
LOL, and you'll be LOL too when you see the problem :-)
You can't give the re.I flag to reCompiled.match(). You have to give
it to re.compile(). The second argument to reCompiled.match() is the
position where to start searching. I'm guessing re.I is defined as 2,
which explains the match you got.
This is actually one of those places where duck typing let us down.
If we had type bondage, re.I would be an instance of RegExFlags, and
reCompiled.match() would have thrown a TypeError when the second
argument wasn't an integer. I'm not saying type bondage is inherently
better than duck typing, just that it has its benefits at times.
On 2 Jan 2006 21:00:53 -0800, mi********@gmail.com <mi********@gmail.com> wrote: Has anyone has issue with compiled re's vis-a-vis the re.I (ignore case) flag? I can't make sense of this compiled re producing a different match when given the flag, odd both in it's difference from the uncompiled regex (as I thought the uncompiled api was a wrapper around a compile-and-execute block) and it's difference from the compiled version with no flag specified. The match given is utter nonsense given the input re.
The re.compile and re.match methods take the flag parameter:
compile( pattern[, flags])
match( pattern, string[, flags])
But the regular expression object method takes different paramters:
match( string[, pos[, endpos]])
It's not a little confusing that the parameters to re.match() and
re.compile().match() are so different, but that's the cause of what
you're seeing.
You need to do:
reCompiled = re.compile(reStr, re.I)
reCompiled.match(against).groups()
to get the behaviour you want.
Andrew
>>>>> mike klaas <mi********@gmail.com> writes: In [48]: import re In [49]: reStr = r"([a-z]+)://" In [51]: against = "http://www.hello.com" In [53]: re.match(reStr, against).groups() Out[53]: ('http',) In [54]: re.match(reStr, against, re.I).groups() Out[54]: ('http',) In [55]: reCompiled = re.compile(reStr) In [56]: reCompiled.match(against).groups() Out[56]: ('http',) In [57]: reCompiled.match(against, re.I).groups() Out[57]: ('tp',)
I can reproduce this on Debian Linux testing, both python 2.3 and python
2.4. Seems like a bug. search() also exhibits the same behavior.
Ganesan
--
Ganesan Rajagopal (rganesan at debian.org) | GPG Key: 1024D/5D8C12EA
Web: http://employees.org/~rganesan | http://rganesan.blogspot.com
Thanks guys, that is probably the most ridiculous mistake I've made in
years <g>
-Mike
In article <11**********************@f14g2000cwb.googlegroups .com>, mi********@gmail.com wrote: Thanks guys, that is probably the most ridiculous mistake I've made in years <g>
-Mike
If that's the more ridiculous you can come up with, you're not trying hard
enough. I've done much worse.
>>>>> mike klaas <mi********@gmail.com> writes: Thanks guys, that is probably the most ridiculous mistake I've made in years <g>
I was taken too :-). This is quite embarassing, considering that I remember
reading a big thread in python devel list about this a while back!
Ganesan
--
Ganesan Rajagopal (rganesan at debian.org) | GPG Key: 1024D/5D8C12EA
Web: http://employees.org/~rganesan | http://rganesan.blogspot.com
Would this particular inconsistency be candidate for change in Py3k?
Seems to me the pos and endpos arguments are redundant with slicing,
and the re.match function would benefit from having the same arguments
as pattern.match. Of course, this is a backwards-incompatible change;
that's why I suggested Py3k.
On 3 Jan 2006 02:20:52 -0800, Sam Pointon <fr*************@gmail.com> wrote: Would this particular inconsistency be candidate for change in Py3k? Seems to me the pos and endpos arguments are redundant with slicing,
Being able to specify the start and end indices for a search is
important when working with very large strings (multimegabyte) --
where slicing would create a copy, specifying pos and endpos allows
for memory-efficient searching in limited areas of a string.
and the re.match function would benefit from having the same arguments as pattern.match.
Not at all; the flags need to be specified when the regex is compiled,
as they affect the compiled representation (finite state automaton I
expect) of the regex. If the flags were given in pattern.match(), then
there'd be no performance benefit gained from precompiling the regex.
Andrew
In article <11**********************@f14g2000cwb.googlegroups .com>,
"Sam Pointon" <fr*************@gmail.com> wrote: Would this particular inconsistency be candidate for change in Py3k? Seems to me the pos and endpos arguments are redundant with slicing, and the re.match function would benefit from having the same arguments as pattern.match. Of course, this is a backwards-incompatible change; that's why I suggested Py3k.
I don't see any way to implement re.I at match time; it's something that
needs to get done at regex compile time. It's available in the
module-level match() call, because that one is really compile-then-match().
In article <ro***********************@reader2.panix.com>,
Roy Smith <ro*@panix.com> wrote: In article <11**********************@f14g2000cwb.googlegroups .com>, "Sam Pointon" <fr*************@gmail.com> wrote:
Would this particular inconsistency be candidate for change in Py3k? Seems to me the pos and endpos arguments are redundant with slicing, and the re.match function would benefit from having the same arguments as pattern.match. Of course, this is a backwards-incompatible change; that's why I suggested Py3k.
I don't see any way to implement re.I at match time;
It's easy: just compile two machines, one with re.I and one without and
package them as if they were one. Then use the flag to pick a compiled
machine at run time.
rg
Roy Smith wrote: LOL, and you'll be LOL too when you see the problem :-)
You can't give the re.I flag to reCompiled.match(). You have to give it to re.compile(). The second argument to reCompiled.match() is the position where to start searching. I'm guessing re.I is defined as 2, which explains the match you got.
This is actually one of those places where duck typing let us down. If we had type bondage, re.I would be an instance of RegExFlags, and reCompiled.match() would have thrown a TypeError when the second argument wasn't an integer. I'm not saying type bondage is inherently better than duck typing, just that it has its benefits at times.
Even with duck-typing, we could cut our users a break. Making
our flags instances of a distinct class doesn't actually require
type bondage.
We could define the __or__ method for RegExFlags, but really,
or-ing together integer flags is old habit from low-level
languages. Really we should pass a set of flags.
--
--Bryan
Bryan> We could define the __or__ method for RegExFlags, but really,
Bryan> or-ing together integer flags is old habit from low-level
Bryan> languages. Really we should pass a set of flags.
Good idea. Added to the Python3.0Suggestions wiki page: http://wiki.python.org/moin/Python3%2e0Suggestions
Skip This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Tim Conner |
last post by:
Is there a way to write a faster function ?
public static bool IsNumber( char Value )
{
if (Regex.IsMatch( Value.ToString(), @"^+$" ))
{
return true;
}
else return false;
}
|
by: jeevankodali |
last post by:
Hi
I have an .Net application which processes thousands of Xml nodes each
day and for each node I am using around 30-40 Regex matches to see if
they satisfy some conditions are not. These Regex...
|
by: clintonG |
last post by:
I'm using an .aspx tool I found at but as nice as the interface is I
think I need to consider using others. Some can generate C# I understand.
Your preferences please...
<%= Clinton Gallagher
...
|
by: clintonG |
last post by:
At design-time the application just decides to go boom claiming it can't
find a dll. This occurs sporadically. Doing a simple edit in the HTML for
example and then viewing the application has...
|
by: Extremest |
last post by:
I have a huge regex setup going on. If I don't do each one by itself
instead of all in one it won't work for. Also would like to know if
there is a faster way tried to use string.replace with all...
|
by: Extremest |
last post by:
I am using this regex.
static Regex paranthesis = new Regex("(\\d*/\\d*)",
RegexOptions.IgnoreCase);
it should find everything between parenthesis that have some numbers
onyl then a forward...
|
by: aspineux |
last post by:
My goal is to write a parser for these imaginary string from the SMTP
protocol, regarding RFC 821 and 1869.
I'm a little flexible with the BNF from these RFC :-)
Any comment ?
tests=
def...
|
by: mai |
last post by:
Hi everyone,
i'm trying to exhibit FIFO anomaly(page replacement algorithm),, I
searched over 2000 random strings but i couldnt find any anomaly,, am
i I doing it right?,, Please help,,,The...
|
by: morleyc |
last post by:
Hi, i would like to remove a number of characters from my string (\t
\r \n which are throughout the string), i know regex can do this but i
have no idea how. Any pointers much appreciated.
Chris
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |