473,385 Members | 1,673 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Python's regular expression?

Hi all,

I am a C/C++/Perl user and want to switch to Python (I found Python is
more similar to C).

Does Python support robust regular expression like Perl?

And Python and Perl's File content manipulation, which is better?

Any suggestions will be appreciated!
Best regards,
Davy

May 8 '06 #1
19 2281
"Davy" <zh*******@gmail.com> writes:
Does Python support robust regular expression like Perl?


Yep, Python regular expression is robust. Have a look at the Regex Howto:
http://www.amk.ca/python/howto/regex/ and the re module:
http://docs.python.org/lib/module-re.html

--
Lawrence - http://www.oluyede.org/blog
"Nothing is more dangerous than an idea
if it's the only one you have" - E. A. Chartier
May 8 '06 #2
Hi Davy wrote:
I am a C/C++/Perl user and want to switch to Python
OK
(I found Python is more similar to C).
;-) More similar than what?
Does Python support robust regular expression like Perl?
It supports them fairly good, but it's
not 'integrated' - at least it feels not
integrated for me ;-) If you did a lot of
Perl, you know what 'integrated' means ...
And Python and Perl's File content manipulation, which is better?
What is a 'file content manipulation'?
Did you mean 'good xxx level file IO',
where xxx means either 'low' or 'high'?
Any suggestions will be appreciated!


Just try to start a small project in Python -
from source that you already have in C or Perl
or something.
Regards

Mirco
May 8 '06 #3
Hi Mirco,

Thank you!

More similar than Perl ;-)

And what's 'integrated' mean (must include some library)?

I like C++ file I/O, is it 'low' or 'high'?

Regards,
Davy

May 8 '06 #4
By the way, is there any tutorial talk about how to use the Python
Shell (IDE). I wish it simple like VC++ :)

Regards,
Davy

May 8 '06 #5
Hi Davy
More similar than Perl ;-)
But C has { }'s everywhere, so has Perl ;-)
And what's 'integrated' mean (must include some library)?
Yes. In Python, regular expressions are just
another function library - you use them like
in Java or C.

In Perl, it's part of the core language, you
use the awk-style (eg: /.../) regular expressions
everywhere you want.

If you used regexp in C/C++ before, you can use them
in almost the same way in Python - which may give you
an easy start.

BTW. Python has some fine extensions to the
perl(5)-Regexes, e.g. 'named backreferences'.

But you won't see much regular expressions
in Python code posted to this group, maybe
because it looks clunky - which is unpythonic ;-)

Lets see - a really simple find/match
would look like this in Python:

import re

t = 'blue socks and red shoes'
p = re.compile('(blue|white|red)')
if p.match(t):
print t

which prints the text 't' because of
the positive pattern match.

In Perl, you write:

use Acme::Pythonic;

$t = 'blue socks and red shoes'
if ($t =~ /(blue|white|red)/):
print $t

which is one line shorter (no need
to compile the regular expression
in advance).
I like C++ file I/O, is it 'low' or 'high'?


C++ has afaik actually three levels of I/O:

(1) - (from C, very low) operating system level, included
by <io.h> which provides direct access to operating system
services (read(), write(), lseek() etc.)

(2) - C-Standard-Library buffered IO, included by <stdio.h>,
provides structured 'mid-level' access like (block-) fread()/
fwrite(), line read (fgets()) and formatted I/O (fprintf()/
fscanf())

(3) - C++/streams library (high level, <fstream>, <iostream>, <sstream>),
which abstracts out the i/o devices, provides the same set of
functionality for any abstract input or output.

Perl provides all three levels of I/O, the 'abstracting' is introduced
by modules which tie 'handle variables' to anything that may receive
or send data.

Python also does a good job on all three levels, but provides
the (low level) operating system I/O by external modules (afaik).
I didn't do much I/O in Python, so I can't say much here.

Regards

Mirco
May 8 '06 #6
On 8/05/2006 10:31 PM, Mirco Wahab wrote:
[snip]

Lets see - a really simple find/match
would look like this in Python:

import re

t = 'blue socks and red shoes'
p = re.compile('(blue|white|red)')
if p.match(t):
What do you expect when t == "green socks and red shoes"? Is it possible
that you mean to use search() rather than match()?
print t

which prints the text 't' because of
the positive pattern match.

In Perl, you write:

use Acme::Pythonic;

$t = 'blue socks and red shoes'
if ($t =~ /(blue|white|red)/):
print $t

which is one line shorter (no need
to compile the regular expression
in advance).


There is no need to compile the regex in advance in Python, either.
Please consider the module-level function search() ...
if re.search(r"blue|white|red", t):
# also, no need for () in the regex.

May 8 '06 #7
Mirco Wahab wrote:
Lets see - a really simple find/match
would look like this in Python:

import re

t = 'blue socks and red shoes'
p = re.compile('(blue|white|red)')
if p.match(t):
print t

which prints the text 't' because of
the positive pattern match.

In Perl, you write:

use Acme::Pythonic;

$t = 'blue socks and red shoes'
if ($t =~ /(blue|white|red)/):
print $t

which is one line shorter (no need
to compile the regular expression
in advance).


There is no need to compile the regular expression in advance in Python
either:

t = 'blue socks and red shoes'
if re.match('(blue|white|red)', t):
print t

The only advantage to compiling in advance is a small speed up, and most of
the time that won't be significant.
May 8 '06 #8
Hi John
import re

t = 'blue socks and red shoes'
p = re.compile('(blue|white|red)')
if p.match(t):
What do you expect when t == "green socks and red shoes"? Is it possible
that you mean to use search() rather than match()?


This is interesting.
What's in this example the difference then between:

import re

t = 'blue socks and red shoes'
if re.compile('blue|white|red').match(t):
print t

and

t = 'blue socks and red shoes'
if re.search('blue|white|red', t):
print t
There is no need to compile the regex in advance in Python, either.
Please consider the module-level function search() ...
if re.search(r"blue|white|red", t):
# also, no need for () in the regex.


Thats true. Thank you for pointing this out.
But what would be an appropriate use
of search() vs. match()? When to use what?

I answered the posting in the first place
because also I'm coming from a C/C++/Perl
background and trying to get along in Python.

Thanks,

Mirco

May 8 '06 #9
Hi Duncan
There is no need to compile the regular expression in advance in Python
either:
...
The only advantage to compiling in advance is a small speed up, and most of
the time that won't be significant.


I read 'some' introductions into Python Regexes
and got confused in the first place when to use
what and why.

After some minutes in this NG I start to get
the picture. So I narrowed the above regex-question
down to a nice equivalence between Perl and Python:

Python:

import re

t = 'blue socks and red shoes'
if re.match('blue|white|red', t):
print t

t = 'blue socks and red shoes'
if re.search('blue|white|red', t):
print t

Perl:

use Acme::Pythonic;

$t = 'blue socks and red shoes'
if $t =~ /blue|white|red/:
print $t
And Python Regexes eventually lost (for me) some of
their (what I believed) 'clunky appearance' ;-)

Thanks

Mirco
May 8 '06 #10
On 8/05/2006 11:13 PM, Mirco Wahab wrote:
Hi John
import re

t = 'blue socks and red shoes'
p = re.compile('(blue|white|red)')
if p.match(t): What do you expect when t == "green socks and red shoes"? Is it possible
that you mean to use search() rather than match()?


This is interesting.
What's in this example the difference then between:


I suggest that you (a) read the description on the difference between
search and match in the manual (b) try out search and match on both
your original string and the one I proposed.

import re

t = 'blue socks and red shoes'
if re.compile('blue|white|red').match(t):
print t

and

t = 'blue socks and red shoes'
if re.search('blue|white|red', t):
print t [snip]
But what would be an appropriate use
of search() vs. match()? When to use what?


ReadTheFantasticManual :-)

May 8 '06 #11
Hi John
But what would be an appropriate use
of search() vs. match()? When to use what?


ReadTheFantasticManual :-)


From the manual you mentioned, i don't get
the point of 'match'. So why should you use
an extra function entry match(),

re.match('whatever', t):

which is, according to the FM,
equivalent to (a special case of?)

re.search('^whatever', t):

For me, it looks like match() should
be used on simple string comparisons
like a 'ramped up C-strcmp()'.

Or isn't ist? Maybe I dont get it ;-)

Thanks

Mirco
May 8 '06 #12
Mirco Wahab <pe*********************@gmx.de> wrote:
After some minutes in this NG I start to get
the picture. So I narrowed the above regex-question
down to a nice equivalence between Perl and Python:

Python:

import re

t = 'blue socks and red shoes'
if re.match('blue|white|red', t):
print t

t = 'blue socks and red shoes'
if re.search('blue|white|red', t):
print t

Perl:

use Acme::Pythonic;

$t = 'blue socks and red shoes'
if $t =~ /blue|white|red/:
print $t

And Python Regexes eventually lost (for me) some of
their (what I believed) 'clunky appearance' ;-)


If you are used to perl regexes there is one clunkiness of python
regexpes which you'll notice eventually...

Let's make the above example a bit more "real world", ie use the
matched item in some way...

Perl:

$t = 'blue socks and red shoes';
if ( $t =~ /(blue|white|red)/ )
{
print "Colour: $1\n";
}

Which prints

Colour: blue

In python you have to express this like

import re

t = 'blue socks and red shoes'
match = re.search('(blue|white|red)', t)
if match:
print "Colour:", match.group(1)

Note the extra variable "match". You can't do assignment in an
expression in python which makes for the extra verbiosity, and you
need a variable to store the result of the match in (since python
doesn't have the magic $1..$9 variables).

This becomes particularly frustrating when you have to do a series of
regexp matches, eg

if ( $t =~ /(blue|white|red)/ )
{
print "Colour: $1\n";
}
elsif ( $t =~ /(socks|tights)/)
{
print "Garment: $1\n";
}
elsif ( $t =~ /(boot|shoe|trainer)/)
{
print "Footwear: $1\n";
}

Which translates to

match = re.search('(blue|white|red)', t)
if match:
print "Colour:", match.group(1)
else:
match = re.search('(socks|tights)', t)
if match:
print "Garment:", match.group(1)
else:
match = re.search('(boot|shoe|trainer)', t)
if match:
print "Footwear:", match.group(1)
# indented ad infinitum!

You can use a helper class to get over this frustration like this

import re

class Matcher:
def search(self, r,s):
self.value = re.search(r,s)
return self.value
def __getitem__(self, i):
return self.value.group(i)

m = Matcher()
t = 'blue socks and red shoes'

if m.search(r'(blue|white|red)', t):
print "Colour:", m[1]
elif m.search(r'(socks|tights)', t):
print "Garment:", m[1]
elif m.search(r'(boot|shoe|trainer)', t):
print "Footwear:", m[1]

Having made the transition from perl to python a couple of years ago,
I find myself using regexpes much less. In perl everything looks like
it needs a regexp, but python has a much richer set of string methods,
eg .startswith, .endswith, good subscripting and the nice "in"
operator for strings.

--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick
May 8 '06 #13
Nick Craig-Wood wrote:
Which translates to

match = re.search('(blue|white|red)', t)
if match:
print "Colour:", match.group(1)
else:
match = re.search('(socks|tights)', t)
if match:
print "Garment:", match.group(1)
else:
match = re.search('(boot|shoe|trainer)', t)
if match:
print "Footwear:", match.group(1)
# indented ad infinitum!


This of course gives priority to colours and only looks for garments or
footwear if the it hasn't matched on a prior pattern. If you actually
wanted to match the first occurrence of any of these (or if the condition
was re.match instead of re.search) then named groups can be a nice way of
simplifying the code:

PATTERN = '''
(?P<c>blue|white|red)
| (?P<g>socks|tights)
| (?P<f>boot|shoe|trainer)
'''
PATTERN = re.compile(PATTERN, re.VERBOSE)
TITLES = { 'c': 'Colour', 'g': 'Garment', 'f': 'Footwear' }

match = PATTERN.search(t)
if match:
grp = match.lastgroup
print "%s: %s" % (TITLES[grp], match.group(grp))

For something this simple the titles and group names could be the same, but
I'm assuming real code might need a bit more.
May 8 '06 #14
Hi Duncan
Nick Craig-Wood wrote:
Which translates to
match = re.search('(blue|white|red)', t)
if match:
else:
if match:
else:
if match:
This of course gives priority to colours and only looks for garments or
footwear if the it hasn't matched on a prior pattern. If you actually
wanted to match the first occurrence of any of these (or if the condition
was re.match instead of re.search) then named groups can be a nice way of
simplifying the code:


A good point. And a good example when to use named
capture group references. This is easily extended
for 'spitting out' all other occuring categories
(see below).
PATTERN = '''
(?P<c>blue|white|red)
...
This is one nice thing in Pythons Regex Syntax,
you have to emulate the ?P-thing in other
Regex-Systems more or less 'awk'-wardly ;-)
For something this simple the titles and group names could be the
same, but I'm assuming real code might need a bit more.

Non no, this is quite good because it involves
some math-generated table-code lookup.

I managed somehow to extend your example in order
to spit out all matches and their corresponding
category:

import re

PATTERN = '''
(?P<c>blue |white |red )
| (?P<g>socks|tights )
| (?P<f>boot |shoe |trainer)
'''

PATTERN = re.compile(PATTERN , re.VERBOSE)
TITLES = { 'c': 'Colour', 'g': 'Garment', 'f': 'Footwear' }

t = 'blue socks and red shoes'
for match in PATTERN.finditer(t):
grp = match.lastgroup
print "%s: %s" %( TITLES[grp], match.group(grp) )

which writes out the expected:
Colour: blue
Garment: socks
Colour: red
Footwear: shoe

The corresponding Perl-program would look like this:

$PATTERN = qr/
(blue |white |red )(?{'c'})
| (socks|tights )(?{'g'})
| (boot |shoe |trainer)(?{'f'})
/x;

%TITLES = (c =>'Colour', g =>'Garment', f =>'Footwear');

$t = 'blue socks and red shoes';
print "$TITLES{$^R}: $^N\n" while( $t=~/$PATTERN/g );

and prints the same:
Colour: blue
Garment: socks
Colour: red
Footwear: shoe

You don't have nice named match references (?P<..>)
in Perl-5, so you have to emulate this by an ordinary
code assertion (?{..}) an set some value ($^R) on
the fly - which is not that bad in the end (imho).

(?{..}) means "zero with code assertion",
this sets Perl-predefined $^R to its evaluated
value from the {...}

As you can see, the pattern matching related part
reduces from 4 lines to one line.

If you wouldn't need dictionary lookup and
get away with associated categories, all
you'd have to do would be this:

$PATTERN = qr/
(blue |white |red )(?{'Colour'})
| (socks|tights )(?{'Garment'})
| (boot |shoe |trainer)(?{'Footwear'})
/x;

$t = 'blue socks and red shoes';
print "$^R: $^N\n" while( $t=~/$PATTERN/g );

What's the point of all that? IMHO, Python's
Regex support is quite good and useful, but
won't give you an edge over Perl's in the end.

Thanks & Regards

Mirco

May 10 '06 #15
Davy wrote:
Hi all,
(snip) Does Python support robust regular expression like Perl?
Yes.
And Python and Perl's File content manipulation, which is better?
From a raw perf and write-only POV, Perl clearly beats Python (regarding
I/O, Perl is faster than C - or it least it was the last time I benched
it on a Linux box).

From a readability/maintenance POV, Perl is a perfect nightmare.
Any suggestions will be appreciated!


http://pythonology.org/success&story=esr
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"
May 10 '06 #16
Mirco Wahab wrote:
If you wouldn't need dictionary lookup and
get away with associated categories, all
you'd have to do would be this:

$PATTERN = qr/
(blue |white |red )(?{'Colour'})
| (socks|tights )(?{'Garment'})
| (boot |shoe |trainer)(?{'Footwear'})
/x;

$t = 'blue socks and red shoes';
print "$^R: $^N\n" while( $t=~/$PATTERN/g );

What's the point of all that? IMHO, Python's
Regex support is quite good and useful, but
won't give you an edge over Perl's in the end.


If you are desperate to collapse the code down to a single print statement
you can do that easily in Python as well:
PATTERN = ''' (?P<Colour>blue |white |red)
| (?P<Garment>socks|tights)
| (?P<Footwear>boot |shoe |trainer)
''' t = 'blue socks and red shoes'
print '\n'.join("%s:%s" % (match.lastgroup,

match.group(match.lastgroup))
for match in re.finditer(PATTERN, t, re.VERBOSE))
Colour:blue
Garment:socks
Colour:red
Footwear:shoe
May 10 '06 #17
bruno at modulix wrote:
From a readability/maintenance POV, Perl is a perfect nightmare.


It's certainly true that perl lacks the the eminently readable quality of
python. But then so do C, C++, Java, and a lot of other languages.

And I'll grant you that perl is more susceptible to the 'executable
line-noise' style than most other languages. This results from its
heritage as a quick-and-dirty awk/sed type text processing language.

But perl doesn't *have* to look that way, and not every perl program is a
'perfect nightmare'. If you follow good practices like turning on strict
checking, using readable variable names, avoiding $_, etc, you can produce
pretty readable and maintainable code. It takes some discipline, but it's
very doable. I've worked with some perl programs for over 5 years without
any trouble. About the only thing you can't avoid are the sigils
everywhere.

Would I recommend perl for readable, maintainable code? No, not when better
options like Python are available. But it can be done with some effort.

May 10 '06 #18
On Wed, 10 May 2006 06:44:27 GMT in comp.lang.python, Edward Elliott
<no****@127.0.0.1> wrote:


Would I recommend perl for readable, maintainable code? No, not when better
options like Python are available. But it can be done with some effort.


I'm reminded of a comment made a few years ago by John Levine,
moderator of comp.compilers. He said something like "It's clearly
possible to write good code in C++. It's just that no one does."

Regards,
-=Dave

--
Change is inevitable, progress is not.
May 10 '06 #19
Dave Hansen wrote:
On Wed, 10 May 2006 06:44:27 GMT in comp.lang.python, Edward Elliott
<no****@127.0.0.1> wrote:


Would I recommend perl for readable, maintainable code? No, not
when better options like Python are available. But it can be done
with some effort.


I'm reminded of a comment made a few years ago by John Levine,
moderator of comp.compilers. He said something like "It's clearly
possible to write good code in C++. It's just that no one does."


Reminds me of the quote that used to appear on the front page of the
ViewCVS project (seems to have gone now that they've moved and renamed
themselves to ViewVC). Can't recall the attribution off the top of my
head:

"[Perl] combines the power of C with the readability of PostScript"

Scathing ... but very funny :-)
Dave.

--

May 10 '06 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
by: K_Lee | last post by:
I documented the regex internal implementation code for both Tcl and Python. As much as I like Tcl, I like Python's code much more. Tcl's Stub interface to the external commands is confusing to...
4
by: pekka niiranen | last post by:
Hi there, I have perl script that uses dynamically constructed regular in this way: ------perl code starts ---- $result ""; $key = AAA\?01; $key = quotemeta $key; $line = " ...
75
by: Xah Lee | last post by:
http://python.org/doc/2.4.1/lib/module-re.html http://python.org/doc/2.4.1/lib/node114.html --------- QUOTE The module defines several functions, constants, and an exception. Some of the...
3
by: Vibha Tripathi | last post by:
Hi Folks, I put a Regular Expression question on this list a couple days ago. I would like to rephrase my question as below: In the Python re.sub(regex, replacement, subject)...
9
by: Dieter Vanderelst | last post by:
Dear all, I'm currently comparing Python versus Perl to use in a project that involved a lot of text processing. I'm trying to determine what the most efficient language would be for our...
5
by: Avi Kak | last post by:
Folks, Does regular expression processing in Python allow for executable code to be embedded inside a regular expression? For example, in Perl the following two statements $regex =...
1
by: Wehrdamned | last post by:
Hi, As I understand it, python uses a pcre engine to work with regular expression. My question is, then, why expressions like : Traceback (most recent call last): File "<stdin>", line 1, in...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.