builtin regular expressions?

Antoine De Groote

Hello,

Can anybody tell me the reason(s) why regular expressions are not built
into Python like it is the case with Ruby and I believe Perl? Like for
example in the following Ruby code

line = 'some string'

case line
when /title=(.*)/
puts "Title is #$1"
when /track=(.*)/
puts "Track is #$1"
when /artist=(.*)/
puts "Artist is #$1"
end

I'm sure there are good reasons, but I just don't see them.

Python Culture says: 'Explicit is better than implicit'. May it be
related to this?

Regards,
antoine

Sep 30 '06 #1

Subscribe Post Reply

1802

Sybren Stuvel

Antoine De Groote enlightened us with:

Can anybody tell me the reason(s) why regular expressions are not built
into Python like it is the case with Ruby and I believe Perl?

They _are_ built into Python. Python ships with the 're' module.

Python Culture says: 'Explicit is better than implicit'. May it be
related to this?

The creators of Python chose not to include an extention of the Python
syntax for regular expressions. It's a lot simpler to have a
straight-forward syntax, and simply pass the regular expressions to
the appropriate functions.

Sybren
--
Sybren StÃ¼vel
StÃ¼vel IT - http://www.stuvel.eu/

Sep 30 '06 #2

Jorge Godoy

Antoine De Groote <an*****@vo.luwrites:

Hello,

Can anybody tell me the reason(s) why regular expressions are not built into
Python like it is the case with Ruby and I believe Perl? Like for example in
the following Ruby code

line = 'some string'

case line
when /title=(.*)/
puts "Title is #$1"
when /track=(.*)/
puts "Track is #$1"
when /artist=(.*)/
puts "Artist is #$1"
end

I'm sure there are good reasons, but I just don't see them.

Python Culture says: 'Explicit is better than implicit'. May it be related to
this?

See if this isn't better to read:

================================================== ==============================
def print_message(some_str):
if some_str.startswith('track='):
print "Your track is", some_str[6:]
elif some_str.startswith('title='):
print "It's a title of", some_str[6:]
elif some_str.startswith('artist='):
print "It was played by", some_str[7:]
else:
print "Oops, I dunno the pattern for this line..."
return

for line in ['track="My favorite track"', 'title="My favorite song"',
'artist="Those Dudes"', 'Something else']:
print_message(line)
================================================== ==============================

Expected output:

================================================== ==============================
Your track is "My favorite track"
It's a title of "My favorite song"
It was played by "Those Dudes"
Oops, I dunno the pattern for this line...
================================================== ==============================
I came from Perl and was used to think with regular expressions for
everything. Now I rarely use them. They aren't needed to solve most of the
problems.

--
Jorge Godoy <jg****@gmail.com>

Sep 30 '06 #3

John Roth

Antoine De Groote wrote:

Hello,

Can anybody tell me the reason(s) why regular expressions are not built
into Python like it is the case with Ruby and I believe Perl? Like for
example in the following Ruby code

line = 'some string'

case line
when /title=(.*)/
puts "Title is #$1"
when /track=(.*)/
puts "Track is #$1"
when /artist=(.*)/
puts "Artist is #$1"
end

I'm sure there are good reasons, but I just don't see them.

Partially it's a matter of language design
philosophy, and partially it's a matter of
the early history of the language.

Guido tends toward a very clean, almost mathematical
minimalist approach: things should be put into the core
language that duplicate other things. Larry Wall tends
toward what I think of as a "kitchen sink" approach.
Put it in!

The other is early history. Python started out as the
scripting language for an operating system research
project at a university. Perl started out as a language
for doing text manipulation in a working systems
administration environment.

There's a different issue that, I think, illustrates
this very nicely: text substitution. Python uses
the "%" operator for text substitution. I suspect
that Guido doesn't like it very much, because it
has recently grown a second library for text
substitution, and there's a PEP for 3.0 for yet
a third library. And guess what? Neither of them
uses an operator.

Python Culture says: 'Explicit is better than implicit'. May it be
related to this?

It's more "there should be one, and preferably
only one, obvious way to do something."

John Roth

>
Regards,
antoine

Sep 30 '06 #4

Mirco Wahab

Thus spoke Antoine De Groote (on 2006-09-30 11:24):

Can anybody tell me the reason(s) why regular expressions are not built
into Python like it is the case with Ruby and I believe Perl? Like for
example in the following Ruby code
I'm sure there are good reasons, but I just don't see them.
Python Culture says: 'Explicit is better than implicit'. May it be
related to this?

I think it is exactly because the /\b(\d+)\s+\/\//
together with $_ and $whatever=~/\/\/(?=\d+)/ are
seen as the 'line noise' everybody talks about ;-)

Regex as part of the core language expressions
makes code very very hard to understand for
newcomers, and because Python is ...

To invoke an additional cost in using Regexes,
the language simply prevents them in a lot of
situations. Thats it. You have to think three
times before you use them once. In the end,
you solve the problem 'elsewise' because of
the pain invoked ;-)

Regards

Mirco

Sep 30 '06 #5

Mirco Wahab

Thus spoke Jorge Godoy (on 2006-09-30 14:37):

Antoine De Groote <an*****@vo.luwrites:
>I'm sure there are good reasons, but I just don't see them.
Python Culture says: 'Explicit is better than implicit'. May it be related to
this?

See if this isn't better to read:

def print_message(some_str):
if some_str.startswith('track='):
print "Your track is", some_str[6:]
elif some_str.startswith('title='):
print "It's a title of", some_str[6:]
elif some_str.startswith('artist='):
print "It was played by", some_str[7:]
else:
print "Oops, I dunno the pattern for this line..."
return

for line in ['track="My favorite track"', 'title="My favorite song"',
'artist="Those Dudes"', 'Something else']:
print_message(line)

I don't see the point here, this example can be
translated amost 1:1 to Perl and gets much more
readable in the end, consider:
sub print_message {
if (/^(track=)/ ){ print 'Your track is ' .substr($_, length $1)."\n" }
elsif(/^(title=)/ ){ print 'It\'s a title of '.substr($_, length $1)."\n" }
elsif(/^(artist=)/){ print 'It was played by '.substr($_, length $1)."\n" }
else { print "Oops, I dunno the pattern for this line...\n" }
}

print_message for ( 'track="My favorite track"', 'title="My favorite song"',
'artist="Those Dudes"', 'Something else' );
Now one could argue if simple Regexes like
/^track=/ are much worse compared to
more explicit formulations, like
str.startswith('track=')

I came from Perl and was used to think with regular expressions for
everything. Now I rarely use them. They aren't needed to solve
most of the problems.

OK, I do Perl and Python side by side and didn't reach
that point so far, maybe beause I read the Friedel-Book
( http://www.oreilly.com/catalog/regex2/reviews.html )
sometimes and actually *like* the concept of regular expressions.
Regards

Mirco

Sep 30 '06 #6

Duncan Booth

Jorge Godoy <jg****@gmail.comwrote:

See if this isn't better to read:

================================================= =======================
========
def print_message(some_str):
if some_str.startswith('track='):
print "Your track is", some_str[6:]
elif some_str.startswith('title='):
print "It's a title of", some_str[6:]
elif some_str.startswith('artist='):
print "It was played by", some_str[7:]
else:
print "Oops, I dunno the pattern for this line..."
return

for line in ['track="My favorite track"', 'title="My favorite song"',
'artist="Those Dudes"', 'Something else']:
print_message(line)
================================================= =======================
========

Expected output:

Or you could make it even clearer by using a loop:

messages = [
('track=', 'Your track is'),
('title=', "It's a title of"),
('artist=', "It was played by"),
]

def print_message(s):
for key, msg in messages:
if s.startswith(key):
print msg,s[len(key):]
break
else:
print "Oops, I dunno the pattern for this line..."
for line in ['track="My favorite track"', 'title="My favorite song"',
'artist="Those Dudes"', 'Something else']:
print_message(line)

Sep 30 '06 #7

Steve Holden

Mirco Wahab wrote:

Thus spoke Antoine De Groote (on 2006-09-30 11:24):

>>Can anybody tell me the reason(s) why regular expressions are not built
into Python like it is the case with Ruby and I believe Perl? Like for
example in the following Ruby code
I'm sure there are good reasons, but I just don't see them.
Python Culture says: 'Explicit is better than implicit'. May it be
related to this?

I think it is exactly because the /\b(\d+)\s+\/\//
together with $_ and $whatever=~/\/\/(?=\d+)/ are
seen as the 'line noise' everybody talks about ;-)

Regex as part of the core language expressions
makes code very very hard to understand for
newcomers, and because Python is ...

To invoke an additional cost in using Regexes,
the language simply prevents them in a lot of
situations. Thats it. You have to think three
times before you use them once. In the end,
you solve the problem 'elsewise' because of
the pain invoked ;-)

Tim Peters frequently says something along the lines of "If you have a
problem and you try to solve it with regexes, then you have TWO
problems". This isn't because the Python use of regexes is more
difficult than Perl's: it's because regexes themselves are inherently
difficult when the patterns get at all sophisticated.

If you think Perl gives better text-handing features nobody is going to
mind if you use Perl - appropriate choice of language is a sign of
maturity. Personally I tend to favour the way Python does it, but I
don't require that anyone else does.

The real answer to "why doesn't Python do it like Perl?" is "Because
Python's not like Perl".

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 30 '06 #8

Jorge Godoy

Mirco Wahab <wa***@chemie.uni-halle.dewrites:

I don't see the point here, this example can be
translated amost 1:1 to Perl and gets much more
readable in the end, consider:

I could make it shorter in Python as well. But for a newbie that haven't seen
the docs for strings in Python I thought the terse version would be more
interesting.

At least he'll see that there are methods to do what he wants already builtin
with the language.

sub print_message {
if (/^(track=)/ ){ print 'Your track is ' .substr($_, length $1)."\n" }
elsif(/^(title=)/ ){ print 'It\'s a title of '.substr($_, length $1)."\n" }
elsif(/^(artist=)/){ print 'It was played by '.substr($_, length $1)."\n" }
else { print "Oops, I dunno the pattern for this line...\n" }
}

print_message for ( 'track="My favorite track"', 'title="My favorite song"',
'artist="Those Dudes"', 'Something else' );

If I were writing in Perl I'd not use substr like this and would write code
similar to the one the OP posted (i.e., /^track=(.*)/).

OK, I do Perl and Python side by side and didn't reach
that point so far, maybe beause I read the Friedel-Book
( http://www.oreilly.com/catalog/regex2/reviews.html )
sometimes and actually *like* the concept of regular expressions.

I like them as well. I just don't see the need to use them everywhere. :-)
--
Jorge Godoy <jg****@gmail.com>

Sep 30 '06 #9

Jorge Godoy

Duncan Booth <du**********@invalid.invalidwrites:

Or you could make it even clearer by using a loop:

Oops. Should have read your message before replying to Mirco. But again,
since the OP didn't read the docs for string operations I tried the terse
approach so that he actually sees the correspondence from Perl to Python.

Dictionaries (and hashes in Perl) are very powerful and solve very interesting
problems (specially when one is looking for something like a "case"
implementation in Python).
--
Jorge Godoy <jg****@gmail.com>

Sep 30 '06 #10

Mirco Wahab

Thus spoke Jorge Godoy (on 2006-09-30 17:50):

Mirco Wahab <wa***@chemie.uni-halle.dewrites:

I could make it shorter in Python as well. But for a newbie that haven't seen
the docs for strings in Python I thought the terse version would be more
interesting.

At least he'll see that there are methods to do what he wants already builtin
with the language.

>sub print_message {
if (/^(track=)/ ){ print 'Your track is ' .substr($_, length $1)."\n" }

....

If I were writing in Perl I'd not use substr like this and would write code
similar to the one the OP posted (i.e., /^track=(.*)/).

Right, I actually tried to match your example as close as I could
get and to use only *simple* Regexes (as stated below). What one
really would do (as you probably meant above) is sth. like:

sub print_message {
if (/^track="(.+?)"/ ){ print "Your track is $1\n" }
...

which has a "more complicated" regex that is usually
not understood easily by newbies.

>OK, I do Perl and Python side by side and didn't reach
that point so far, maybe beause I read the Friedel-Book
( http://www.oreilly.com/catalog/regex2/reviews.html )
sometimes and actually *like* the concept of regular expressions.

I like them as well. I just don't see the need to use them everywhere. :-)

I like Python for its radically plain look, my
underlying feeling of Python is: "Pascal", whereas
Perl feels and tastes like "C" to me ;-)

Regards

Mirco

Sep 30 '06 #11

Jorge Godoy

Mirco Wahab <wa***@chemie.uni-halle.dewrites:

sub print_message {
if (/^track="(.+?)"/ ){ print "Your track is $1\n" }
...

which has a "more complicated" regex that is usually
not understood easily by newbies.

Specially the non-greedy part. :-) I don't believe that non-greedyness would
be adequate here since I believe he's willing to process the whole line.

================================================== =========================
$line = "track='My favorite track'";
if ($line =~ /^track=(.+?)/) { print "My track is $1\n"};
================================================== =========================

outputs

================================================== =========================
My track is '
================================================== =========================

While

================================================== =========================
$line = "track='My favorite track'";
if ($line =~ /^track=(.+)/) { print "My track is $1\n"};
================================================== =========================

outputs

================================================== =========================
My track is 'My favorite track'
================================================== =========================

and what I'd use

================================================== =========================
$line = "track='My favorite track'";
if ($line =~ /^track=(.*)/) { print "My track is $1\n"};
================================================== =========================

has the same output. ;-)

All this running perl 5.8.8.
Be seeing you,
--
Jorge Godoy <jg****@gmail.com>

Sep 30 '06 #12

Antoine De Groote

Just to get it clear at the beginning, I started this thread. I'm not a
newbie (don't get me wrong, I don't see this as an insult whatsoever,
after all, you couldn't know, and I appreciate it being so nice to
newbies anyway). I'm not an expert either, but I'm quite comfortable
with the language by now. It's just that, when I started Python I loved
it for its simplicity and for the small amount of code it takes to get
something done. So the idea behind my original post was that the
Perl/Ruby way takes even less to type (for the regex topic of this
discussion, I'm not generalizing), and that I like a lot. To me (and I
may be alone) the Perl/Ruby way is more "beautiful" (Python culture:
Beautiful is better than ugly) than the Python way (in this particular
case) and therefore I couldn't see the reasons.

Some of you say that this regex stuff is used rarely enough so that
being verbose (and therefore more readable ?) is in these few cases the
better choice. To me this a perfectly reasonable and maybe it is just
true (as far as one can talk about true/false for something subjective
as this). I dont' know (yet) ;-)

I just have to learn accept the fact that Python is more verbose more
often than Ruby (I don't know Perl really). Don't get me wrong though, I
know the benefits of this (at least in some cases) and I can understand
that one opts for it. Hopefully I will end up some day preferring the
Python way.

Thanks for your explanations.

Regards,
antoine

Mirco Wahab wrote:

Thus spoke Jorge Godoy (on 2006-09-30 17:50):
>Mirco Wahab <wa***@chemie.uni-halle.dewrites:

I could make it shorter in Python as well. But for a newbie that haven't seen
the docs for strings in Python I thought the terse version would be more
interesting.

OK

>At least he'll see that there are methods to do what he wants already builtin
with the language.

OK

>>sub print_message {
if (/^(track=)/ ){ print 'Your track is ' .substr($_, length $1)."\n" }

...
>If I were writing in Perl I'd not use substr like this and would write code
similar to the one the OP posted (i.e., /^track=(.*)/).

Right, I actually tried to match your example as close as I could
get and to use only *simple* Regexes (as stated below). What one
really would do (as you probably meant above) is sth. like:

sub print_message {
if (/^track="(.+?)"/ ){ print "Your track is $1\n" }
...

which has a "more complicated" regex that is usually
not understood easily by newbies.

>>OK, I do Perl and Python side by side and didn't reach
that point so far, maybe beause I read the Friedel-Book
( http://www.oreilly.com/catalog/regex2/reviews.html )
sometimes and actually *like* the concept of regular expressions.
I like them as well. I just don't see the need to use them everywhere. :-)

I like Python for its radically plain look, my
underlying feeling of Python is: "Pascal", whereas
Perl feels and tastes like "C" to me ;-)

Regards

Mirco

Sep 30 '06 #13

Jorge Godoy

Antoine De Groote <an*****@vo.luwrites:

Just to get it clear at the beginning, I started this thread. I'm not a newbie

Sorry :-) I got to this wrong conclusion because of the way I read your
message.

an expert either, but I'm quite comfortable with the language by now. It's
just that, when I started Python I loved it for its simplicity and for the
small amount of code it takes to get something done. So the idea behind my

See that being "small" is not all that important. From the Zen of Python:

Explicit is better than implicit.

original post was that the Perl/Ruby way takes even less to type (for the
regex topic of this discussion, I'm not generalizing), and that I like a
lot. To me (and I may be alone) the Perl/Ruby way is more "beautiful" (Python
culture: Beautiful is better than ugly) than the Python way (in this
particular case) and therefore I couldn't see the reasons.

You can import the re module and use regular expressions in Python, but you
probably know that.

Some of you say that this regex stuff is used rarely enough so that being
verbose (and therefore more readable ?) is in these few cases the better
choice. To me this a perfectly reasonable and maybe it is just true (as far as
one can talk about true/false for something subjective as this). I dont' know
(yet) ;-)

It is to me. :-) If you're parsing simple structures then it might not be to
you (for complex structures you'd end up with some sort of parser).

I just have to learn accept the fact that Python is more verbose more often
than Ruby (I don't know Perl really). Don't get me wrong though, I know the
benefits of this (at least in some cases) and I can understand that one opts
for it. Hopefully I will end up some day preferring the Python way.

One thing that is also interesting: code completion. One editor can help you
write "startswith" but it can't help with "/^". The same goes for "endswith"
compared to "$/".

I just mentioned this because in the argument of "less code to write leads to
less bugs" doesn't mean that we have typed all what is written :-)

--
Jorge Godoy <jg****@gmail.com>

Sep 30 '06 #14

Antoine De Groote

Jorge Godoy wrote:

Antoine De Groote <an*****@vo.luwrites:

>Just to get it clear at the beginning, I started this thread. I'm not a newbie

Sorry :-) I got to this wrong conclusion because of the way I read your
message.

no problem ;-) maybe my formulation was a bit naive, too...

>
>an expert either, but I'm quite comfortable with the language by now. It's
just that, when I started Python I loved it for its simplicity and for the
small amount of code it takes to get something done. So the idea behind my

See that being "small" is not all that important. From the Zen of Python:

Explicit is better than implicit.

>original post was that the Perl/Ruby way takes even less to type (for the
regex topic of this discussion, I'm not generalizing), and that I like a
lot. To me (and I may be alone) the Perl/Ruby way is more "beautiful" (Python
culture: Beautiful is better than ugly) than the Python way (in this
particular case) and therefore I couldn't see the reasons.

You can import the re module and use regular expressions in Python, but you
probably know that.

yes I know that ... ;-) again

>
>Some of you say that this regex stuff is used rarely enough so that being
verbose (and therefore more readable ?) is in these few cases the better
choice. To me this a perfectly reasonable and maybe it is just true (as far as
one can talk about true/false for something subjective as this). I dont' know
(yet) ;-)

It is to me. :-) If you're parsing simple structures then it might not be to
you (for complex structures you'd end up with some sort of parser).

>I just have to learn accept the fact that Python is more verbose more often
than Ruby (I don't know Perl really). Don't get me wrong though, I know the
benefits of this (at least in some cases) and I can understand that one opts
for it. Hopefully I will end up some day preferring the Python way.

One thing that is also interesting: code completion. One editor can help you
write "startswith" but it can't help with "/^". The same goes for "endswith"
compared to "$/".

I just mentioned this because in the argument of "less code to write leads to
less bugs" doesn't mean that we have typed all what is written :-)

Excellent point! Love it :-) Helps me overcome it.

Regards,
antoine

Sep 30 '06 #15

MRAB

Antoine De Groote wrote:

Just to get it clear at the beginning, I started this thread. I'm not a
newbie (don't get me wrong, I don't see this as an insult whatsoever,
after all, you couldn't know, and I appreciate it being so nice to
newbies anyway). I'm not an expert either, but I'm quite comfortable
with the language by now. It's just that, when I started Python I loved
it for its simplicity and for the small amount of code it takes to get
something done. So the idea behind my original post was that the
Perl/Ruby way takes even less to type (for the regex topic of this
discussion, I'm not generalizing), and that I like a lot. To me (and I
may be alone) the Perl/Ruby way is more "beautiful" (Python culture:
Beautiful is better than ugly) than the Python way (in this particular
case) and therefore I couldn't see the reasons.

Some of you say that this regex stuff is used rarely enough so that
being verbose (and therefore more readable ?) is in these few cases the
better choice. To me this a perfectly reasonable and maybe it is just
true (as far as one can talk about true/false for something subjective
as this). I dont' know (yet) ;-)

I just have to learn accept the fact that Python is more verbose more
often than Ruby (I don't know Perl really). Don't get me wrong though, I
know the benefits of this (at least in some cases) and I can understand
that one opts for it. Hopefully I will end up some day preferring the
Python way.

One of the differences between the Python way and the Perl way is that
the Perl way has a side-effect: Perl assigns to the variables $1, $2,
etc. each time you execute a regular expression whereas Python just
returns a match object for each, so it's not overwriting the results of
the previous one. I find the Python way cleaner.

Sep 30 '06 #16

Thorsten Kampe

* Steve Holden (Sat, 30 Sep 2006 17:46:03 +0100)

Mirco Wahab wrote:
Thus spoke Antoine De Groote (on 2006-09-30 11:24):
Tim Peters frequently says something along the lines of "If you have a
problem and you try to solve it with regexes, then you have TWO
problems".

It's originally from Jamie Zawinski:
'Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.'

And the simple reason why Regular Expressions are not a part of the
core Python language is that Regular Expressions are overrated. They
are simply not the preferred tool for every kind of text manipulation
and extraction.

Thorsten

Sep 30 '06 #17

Mirco Wahab

Thus spoke Jorge Godoy (on 2006-09-30 19:04):

Mirco Wahab <wa***@chemie.uni-halle.dewrites:
> sub print_message {
if (/^track="(.+?)"/ ){ print "Your track is $1\n" }
...

Specially the non-greedy part. :-) I don't believe that non-greedyness would
be adequate here since I believe he's willing to process the whole line.

Ohh, but I think it really is, my intention was
to get the quotet text out of the quotes, if
there is any, eg.:
sub print_message {
if (/^track="(.+?)"/ ){ print "Your track is $1\n" }
...
}

print_message for 'track="My favorite track"', 'title="My favorite song"',
'artist="Those Dudes"', 'Something else' ;

(... our quoting chars are just inverted.)

$line = "track='My favorite track'";
if ($line =~ /^track=(.+?)/) { print "My track is $1\n"};

outputs
My track is '

Of course, you can't have the nongreedy thing
without a following item, in the case mentioned,
a second \" (which would have been consumed
in the 'greedy' mode).

and what I'd use

$line = "track='My favorite track'";
if ($line =~ /^track=(.*)/) { print "My track is $1\n"};

OK, but to pull the quoted text alone, you'd
need the non-greedy thing, as in

...
if ( /^track='(.+?)'/ ){ print "Your track is $1\n" }
...

Alternatively, you could use the negated character class
for that:

if ( /^track='([^']+)/ ){ print "Your track is $1\n" }

which has exactly the same character count (so taste matters here) ...

Regards

Mirco

Sep 30 '06 #18

Mirco Wahab

Thus spoke MRAB (on 2006-09-30 20:54):

Antoine De Groote wrote:
>I just have to learn accept the fact that Python is more verbose more
often than Ruby (I don't know Perl really).

One of the differences between the Python way and the Perl way is that
the Perl way has a side-effect: Perl assigns to the variables $1, $2,
etc. each time you execute a regular expression whereas Python just
returns a match object for each, so it's not overwriting the results of
the previous one. I find the Python way cleaner.

This statement reduces to the fact, that

- in Python you keep the matches if you want,
but drop them if you don't

import re
text = "this is 5 some 89 stuff 12 with numbers 21 interspersed"
matches = re.split('\D+', text)

- in Perl you keep the matches if you want,
but drop them if you don't

$text = "this is 5 some 89 stuff 12 with numbers 21 interspersed";
@matches = $text=~/(\d+)/g;

I fail to see the benefit of a re-object, I consider these to
be just there because regexes aren't in the core language.

Regards

Mirco

Sep 30 '06 #19

Marc 'BlackJack' Rintsch

In <ef**********@mlucom4.urz.uni-halle.de>, Mirco Wahab wrote:

I fail to see the benefit of a re-object, I consider these to
be just there because regexes aren't in the core language.

One benefit is that it's an object. You can stuff it into a dictionary or
give it as argument to a function.

Ciao,
Marc 'BlackJack' Rintsch

Sep 30 '06 #20

Steve Holden

Mirco Wahab wrote:

Thus spoke MRAB (on 2006-09-30 20:54):

>>Antoine De Groote wrote:

>>>I just have to learn accept the fact that Python is more verbose more
often than Ruby (I don't know Perl really).

One of the differences between the Python way and the Perl way is that
the Perl way has a side-effect: Perl assigns to the variables $1, $2,
etc. each time you execute a regular expression whereas Python just
returns a match object for each, so it's not overwriting the results of
the previous one. I find the Python way cleaner.

This statement reduces to the fact, that

- in Python you keep the matches if you want,
but drop them if you don't

import re
text = "this is 5 some 89 stuff 12 with numbers 21 interspersed"
matches = re.split('\D+', text)

- in Perl you keep the matches if you want,
but drop them if you don't

$text = "this is 5 some 89 stuff 12 with numbers 21 interspersed";
@matches = $text=~/(\d+)/g;

I fail to see the benefit of a re-object, I consider these to
be just there because regexes aren't in the core language.

Fine. Just because you fail to see the benefit, however, that doesn't
mean there isn't one. Maybe we just aren't explaining it in terms you
can appreciate?

regards
Stve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 30 '06 #21

Roy Smith

In article <pa***************************@gmx.net>,
Marc 'BlackJack' Rintsch <bj****@gmx.netwrote:

In <ef**********@mlucom4.urz.uni-halle.de>, Mirco Wahab wrote:

I fail to see the benefit of a re-object, I consider these to
be just there because regexes aren't in the core language.

One benefit is that it's an object. You can stuff it into a dictionary or
give it as argument to a function.

More to the point:

You can stick it in a bottle, you can hold it in your hand, AMEN!

Sep 30 '06 #22

MonkeeSage

Roy Smith wrote:

More to the point:

You can stick it in a bottle, you can hold it in your hand, AMEN!

That's simply an implementation detail of perl regexps; it doesn't
really address the issue. For example, in ruby a regexp is an object.
Matches are also objects when one uses the String#match and
Regexp#match methods rather than the =~ operator:

s = 'file=blah.txt'
r = /file=(.*)/
puts r
# =(?-mix:file=(.*))

m = s.match(r)
puts m.inspect
# =#<MatchData:0xb7c24f90>

m = r.match(s)
puts m.inspect
# =#<MatchData:0xb7c24e64>

Regards,
Jordan

Oct 1 '06 #23

bearophileHUGS

Thorsten Kampe:

And the simple reason why Regular Expressions are not a part of the
core Python language is that Regular Expressions are overrated. They
are simply not the preferred tool for every kind of text manipulation
and extraction.

And their syntax is ugly and low level, they are difficult to read and
debug.
A way better syntax can probably be adopted (reverb and Pyparsing seem
much more pythonic).

Bye,
bearophile

Oct 1 '06 #24

Jorgen Grahn

On Sat, 30 Sep 2006 20:01:57 +0100, Thorsten Kampe <th******@thorstenkampe.dewrote:
....

>
It's originally from Jamie Zawinski:
'Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.'

And the simple reason why Regular Expressions are not a part of the
core Python language is that Regular Expressions are overrated.

It seems to me they are currently /underrated/ in the Python community. Or,
I suspect, everybody disrespects them in public but secretly use them when
they're hacking ;-)

They are simply not the preferred tool for every kind of text manipulation
and extraction.

Oh yes, agreed there. str.split, str.startswith, substr in str ... take you
a long way without a single backslash. I use them more frequently these
days; for example, I would have solved the original poster's problem in that
way.

However, there is a set of common problems which would be hell to solve
without regexes. I used to do that stuff in C when I was young and stupid;
now would never go back to hand-coded, buggy loops just for the sake of
avoiding regexes.

Possibly I have more need of regexes than some others, because I do a lot of
traditional Unix programming, which is heavy on text processing and
text-based mini-languages.

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.dyndns.org R'lyeh wgah'nagl fhtagn!

Oct 1 '06 #25

Jorgen Grahn

On Sat, 30 Sep 2006 11:24:56 +0200, Antoine De Groote <an*****@vo.luwrote:

Hello,

Can anybody tell me the reason(s) why regular expressions are not built
into Python like it is the case with Ruby and I believe Perl?

In a way, there /is/ builtin support -- regexes would be much more painful
to use unless you had the regex-quoted strings.

r'\s*(\d+),\s*(\d+)\s='
'\\s*(\\d+),\\s*(\\d+)\\s='

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.dyndns.org R'lyeh wgah'nagl fhtagn!

Oct 1 '06 #26

Kay Schluehr

Antoine De Groote wrote:

Hello,

Can anybody tell me the reason(s) why regular expressions are not built
into Python like it is the case with Ruby and I believe Perl? Like for
example in the following Ruby code

line = 'some string'

case line
when /title=(.*)/
puts "Title is #$1"
when /track=(.*)/
puts "Track is #$1"
when /artist=(.*)/
puts "Artist is #$1"
end

I'm sure there are good reasons, but I just don't see them.

Python Culture says: 'Explicit is better than implicit'. May it be
related to this?

Regards,
antoine

I notice two issues here. Only one has anything to do with regular
expressions. The other one with 'explicit is better than implicit': the
many implicit passing operations of Rubys case statement. Using
pseudo-Python notation this could be refactored into a more explicit
and not far less concise style:

if line.match( "title=(.*)" ) as m:
print "Title is %s"%m.group(1)
elif line.match( "track=(.*)" ) as m:
print "Track is %s"%m.group(1)
elif line.match( "artist=(.*)" ) as m:
print "Artist is %s"%m.group(1)

Here the result of the test line.match( ) is assigned to the local
variable m if bool(line.match( )) == True. Later m can be used in the
subsequent block. Moreover match becomes a string method. No need for
extra importing re and applying re.compile(). Both can be done in
str.match() if necessary.

Kay

Oct 1 '06 #27

Mirco Wahab

Thus spoke Steve Holden (on 2006-09-30 23:58):

Mirco Wahab wrote:
>Thus spoke MRAB (on 2006-09-30 20:54):
>>>One of the differences between the Python way and the Perl way is that
the Perl way has a side-effect: ...
I fail to see the benefit of a re-object, I consider these to
be just there because regexes aren't in the core language.

Fine. Just because you fail to see the benefit, however, that doesn't
mean there isn't one. Maybe we just aren't explaining it in terms you
can appreciate?

OK, maybe my communication style is bad or
somehow 'Perl newsgroup-hardened' ;-)

Of course, I'd like to really understand
these things (as they were intended), probably
modifying my look onto programming concepts then.

Therefore, I'd like to have a usable and understandable
example of Regex-object juggling, that shows clearly
what its real benefit is (or gives an idea of - ).

Afterwards, I'll surely try to deconstruct
your assumptions hidden there in order to
*get a better understanding* for myself ;-)

Regards and thanks

Mirco

Oct 1 '06 #28

MonkeeSage

Mirco Wahab wrote:

Therefore, I'd like to have a usable and understandable
example of Regex-object juggling, that shows clearly
what its real benefit is (or gives an idea of - ).

Here are some benefits:

DRY - You can assign a regexp to a variable and pass it around or call
specific instance methods on it. Thus...
Overhead - You don't need to keep compiling the same expression if you
need to use it several times, you can just assign it to a variable and
reference that.
Clarity - named methods are clearer than operators. ( .match vs. =~ ).

Regards,
Jordan

Oct 1 '06 #29

bearophileHUGS

Jorgen Grahn:

However, there is a set of common problems which would be hell to solve
without regexes.

I agree, and I think no one is thinking about dropping REs from Python
stdlib.
Here people are mostly talking about what's better between having them
in the stdlibs instead of inside the core language, and about their
syntax.

Bye,
bearophile

Oct 1 '06 #30

Paddy

Antoine De Groote wrote:

Hello,

Can anybody tell me the reason(s) why regular expressions are not built
into Python like it is the case with Ruby and I believe Perl? Like for
example in the following Ruby code

line = 'some string'

case line
when /title=(.*)/
puts "Title is #$1"
when /track=(.*)/
puts "Track is #$1"
when /artist=(.*)/
puts "Artist is #$1"
end

I'm sure there are good reasons, but I just don't see them.

I'd say that Ruby took a lot of its early design clues from Perl. If
you want to attract Perl programmers then they expect such things.
Python has an emphasis on clarity/readability and on maintainability.
It has other string methods that are much easier to read and maintain
than regular expressions.
There is of course a class of problem well suited to regular
expressions but it is easy for people coming to Python from Perl or AWK
to rely too heavily on RE's.

Personally, my problem with REs is that their is inadequate debug
capabilites for them. The best way I have found is to shove some
example text into Kodos and incrementally refine my RE by viewing the
RE's results on the example text. RE's also tend to look like line
noise, even when you use the x switch and use whitesspace+comments.

Python Culture says: 'Explicit is better than implicit'. May it be
related to this?

Well, by putting RE's into the language I presume you also mean doing
the equivalent of setting group variables? That would be a 'magic' side
effect unless you explicitly assigned them, e.g:

person, craving = r"My\s(\S+)likes\s(\S+)" ~~ text
# using ~~ for an RE match operator

The more readable I try and make it though, the more I appreciate the
Python way of doing things.

- Pad.

Oct 1 '06 #31

Fredrik Lundh

Steve Holden wrote:

Fine. Just because you fail to see the benefit, however, that doesn't
mean there isn't one.

the real reason is of course, as Richard Feynman famously observed, that
in Python, "everything is made of objects".

</F>

Oct 1 '06 #32

Max M

Jorgen Grahn skrev:

On Sat, 30 Sep 2006 20:01:57 +0100, Thorsten Kampe <th******@thorstenkampe.dewrote:

>And the simple reason why Regular Expressions are not a part of the
core Python language is that Regular Expressions are overrated.

It seems to me they are currently /underrated/ in the Python community. Or,
I suspect, everybody disrespects them in public but secretly use them when
they're hacking ;-)

When I used to program in Perl I used regex' for lots of stuff. In
python I probably use them once every half year. I sinply do not need them.
Max M

Oct 1 '06 #33

MonkeeSage

Max M wrote:

When I used to program in Perl I used regex' for lots of stuff. In
python I probably use them once every half year. I sinply do not need them.

I think you can pretty much do it either way without any big benefits /
losses. There are edge-cases that will break a praser just like there
are ones that will break a regexp. In perl5 you can slice strings ( my
$s="Cat in a tree"; ${s:0:3} == 'Cat' ) and other things like that, but
I seldom see those used except for very trivial cases (i.e., the
seeming reverse of the python practice). But this hasn't caused some
kind of huge epidemic of non-working programs / libraries in the perl
world. If you understand how to use regexps and they are easier for
you, there is no reason not to use them. On the other hand, if they
make it harder for you, don't use them.

Regards,
Jordan

Oct 2 '06 #34

Nick Craig-Wood

Kay Schluehr <ka**********@gmx.netwrote:

I notice two issues here. Only one has anything to do with regular
expressions. The other one with 'explicit is better than implicit': the
many implicit passing operations of Rubys case statement. Using
pseudo-Python notation this could be refactored into a more explicit
and not far less concise style:

if line.match( "title=(.*)" ) as m:
print "Title is %s"%m.group(1)
elif line.match( "track=(.*)" ) as m:
print "Track is %s"%m.group(1)
elif line.match( "artist=(.*)" ) as m:
print "Artist is %s"%m.group(1)

Here the result of the test line.match( ) is assigned to the local
variable m if bool(line.match( )) == True. Later m can be used in the
subsequent block.

Interesting!

This is exactly the area that (for me) python regexps' become more
clunky that perl's - not being able to assign and test the match
object in one line.

This leads to the rather wordy

m = re.match("title=(.*)", line)
if m:
print "Title is %s" % m.group(1)
else:
m = re.match("track=(.*)", line)
if m:
print "Track is %s"%m.group(1)
else:
m = re.match("artist=(.*)", line)
if m:
print "Artist is %s"%m.group(1)
If you could write

if re.match("title=(.*)", line) as m:
print "Title is %s" % m.group(1)
elif re.match("track=(.*)", line) as m:
print "Track is %s" % m.group(1)
elif re.match("artist=(.*)", line) as m:
print "Artist is %s" % m.group(1)

that would be a significant benefit.

You can of course define a helper class like this

class Matcher:
"""Regexp matcher helper"""
def match(self, r,s):
"""Do a regular expression match and return if it matched."""
self.value = re.match(r,s)
return self.value
def __getitem__(self, n):
"""Return n'th matched () item."""
return self.value.group(n)

Which makes this bit really quite neat

m = Matcher()
if m.match("title=(.*)", line):
print "Title is %s" % m[1]
elif m.match("track=(.*)", line):
print "Track is %s" % m[1]
elif m.match("artist=(.*)", line):
print "Artist is %s" % m[1]

Moreover match becomes a string method. No need for extra importing
re and applying re.compile(). Both can be done in str.match() if
necessary.

I'm happy with the re module. Having transitioned from perl to python
some time ago now, I find myself using many fewer regexps due to the
much better built in string methods of python. This is a good thing,
because regexps should be used sparingly and they do degenerate into
line noise quite quickly...

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick

Oct 2 '06 #35

builtin regular expressions?

Similar topics