473,396 Members | 2,014 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

regular expression: perl ==> python

Hi,
i am so use to perl's regular expression that i find it hard
to memorize the functions in python; so i would appreciate if
people can tell me some equivalents.

1) In perl:
$line = "The food is under the bar in the barn.";
if ( $line =~ /foo(.*)bar/ ) { print "got <$1>\n"; }

in python, I don't know how I can do this?
How does one capture the $1? (I know it is \1 but it is still not clear
how I can simply print it.
thanks

Jul 18 '05 #1
17 1956
le*******@yahoo.com wrote:

1) In perl:
$line = "The food is under the bar in the barn.";
if ( $line =~ /foo(.*)bar/ ) { print "got <$1>\n"; }

in python, I don't know how I can do this?


I don't know Perl very well, but I believe this is more or less the
equivalent:
import re
line = "The food is under the bar in the barn."
matcher = re.compile(r'foo(.*)bar')
match = matcher.search(line)
print 'got <%s>' % match.group(1) got <d is under the bar in the >

Of course, you can do this in fewer lines if you like:
print 'got <%s>' % re.search(r'foo(.*bar)', line).group(1)

got <d is under the bar in the bar>

Steve
Jul 18 '05 #2
<le*******@yahoo.com> wrote:
i am so use to perl's regular expression that i find it hard
to memorize the functions in python; so i would appreciate if
people can tell me some equivalents.

1) In perl:
$line = "The food is under the bar in the barn.";
if ( $line =~ /foo(.*)bar/ ) { print "got <$1>\n"; }

in python, I don't know how I can do this?
How does one capture the $1? (I know it is \1 but it is still not clear
how I can simply print it.


in Python, the RE machinery returns match objects, which has methods
that let you dig out more information about the match. "captured groups"
are available via the "group" method:

m = re.search(..., line)
if m:
print "got", m.group(1)

see the regex howto (or the RE chapter in the library reference) for more
information:

http://www.amk.ca/python/howto/regex/

</F>

Jul 18 '05 #3
JZ
Dnia 21 Dec 2004 21:12:09 -0800, le*******@yahoo.com napisał(a):
1) In perl:
$line = "The food is under the bar in the barn.";
if ( $line =~ /foo(.*)bar/ ) { print "got <$1>\n"; }

in python, I don't know how I can do this?
How does one capture the $1? (I know it is \1 but it is still not clear
how I can simply print it.
thanks


import re
line = "The food is under the bar in the barn."
if re.search(r'foo(.*)bar',line):
print 'got %s\n' % _.group(1)

--
JZ ICQ:6712522
http://zabiello.om
Jul 18 '05 #4
"JZ" <wn******@mnovryyb.pbz> wrote:
import re
line = "The food is under the bar in the barn."
if re.search(r'foo(.*)bar',line):
print 'got %s\n' % _.group(1)


Traceback (most recent call last):
File "jz.py", line 4, in ?
print 'got %s\n' % _.group(1)
NameError: name '_' is not defined

</F>

Jul 18 '05 #5
Fredrik Lundh wrote:
"JZ" <wn******@mnovryyb.pbz> wrote:

import re
line = "The food is under the bar in the barn."
if re.search(r'foo(.*)bar',line):
print 'got %s\n' % _.group(1)

Traceback (most recent call last):
File "jz.py", line 4, in ?
print 'got %s\n' % _.group(1)
NameError: name '_' is not defined


He was using the python interactive prompt, which I suspect you already
knew.
Jul 18 '05 #6
JZ
Dnia Wed, 22 Dec 2004 10:27:39 +0100, Fredrik Lundh napisał(a):
import re
line = "The food is under the bar in the barn."
if re.search(r'foo(.*)bar',line):
print 'got %s\n' % _.group(1)


Traceback (most recent call last):
File "jz.py", line 4, in ?
print 'got %s\n' % _.group(1)
NameError: name '_' is not defined


I forgot to add: I am using Python 2.3.4/Win32 (from ActiveState.com). The
code works in my interpreter.

--
JZ
Jul 18 '05 #7
"JZ" wrote:
import re
line = "The food is under the bar in the barn."
if re.search(r'foo(.*)bar',line):
print 'got %s\n' % _.group(1)


Traceback (most recent call last):
File "jz.py", line 4, in ?
print 'got %s\n' % _.group(1)
NameError: name '_' is not defined


I forgot to add: I am using Python 2.3.4/Win32 (from ActiveState.com). The
code works in my interpreter.


only if you type it into the interactive prompt. see:

http://www.python.org/doc/2.4/tut/no...00000000000000

"In interactive mode, the last printed expression is assigned to the variable _.
This means that when you are using Python as a desk calculator, it is some-
what easier to continue calculations /.../"

the "_" symbol has no special meaning when you run a Python program, so the
"if re.search" construct won't work.

</F>

Jul 18 '05 #8
JZ
Dnia Wed, 22 Dec 2004 16:55:55 +0100, Fredrik Lundh napisał(a):
the "_" symbol has no special meaning when you run a Python program,


That's right. So the final code will be:

import re
line = "The food is under the bar in the barn."
found = re.search('foo(.*)bar',line)
if found: print 'got %s\n' % found.group(1)

--
JZ ICQ:6712522
http://zabiello.com
Jul 18 '05 #9
> 1) In perl:
$line = "The food is under the bar in the barn.";
if ( $line =~ /foo(.*)bar/ ) { print "got <$1>\n"; }

in python, I don't know how I can do this?
How does one capture the $1? (I know it is \1 but it is still not clear
how I can simply print it.
thanks

Fredrik Lundh <fr*****@pythonware.com> wrote: "JZ" <wn******@mnovryyb.pbz> wrote:
import re
line = "The food is under the bar in the barn."
if re.search(r'foo(.*)bar',line):
print 'got %s\n' % _.group(1)


Traceback (most recent call last):
File "jz.py", line 4, in ?
print 'got %s\n' % _.group(1)
NameError: name '_' is not defined


I've found that a slight irritation in python compared to perl - the
fact that you need to create a match object (rather than relying on
the silver thread of $_ (etc) running through your program ;-)

import re
line = "The food is under the bar in the barn."
m = re.search(r'foo(.*)bar',line)
if m:
print 'got %s\n' % m.group(1)

This becomes particularly irritating when using if, elif etc, to
match a series of regexps, eg

line = "123123"
m = re.search(r'^(\d+)$', line)
if m:
print "int",int(m.group(1))
else:
m = re.search(r'^(\d*\.\d*)$', line)
if m:
print "float",float(m.group(1))
else:
print "unknown thing", line

The indentation keeps growing which looks rather untidy compared to
the perl

$line = "123123";
if ($line =~ /^(\d+)$/) {
print "int $1\n";
}
elsif ($line =~ /^(\d*\.\d*)$/) {
print "float $1\n";
}
else {
print "unknown thing $line\n";
}

Is there an easy way round this? AFAIK you can't assign a variable in
a compound statement, so you can't use elif at all here and hence the
problem?

I suppose you could use a monstrosity like this, which relies on the
fact that list.append() returns None...

line = "123123"
m = []
if m.append(re.search(r'^(\d+)$', line)) or m[-1]:
print "int",int(m[-1].group(1))
elif m.append(re.search(r'^(\d*\.\d*)$', line)) or m[-1]:
print "float",float(m[-1].group(1))
else:
print "unknown thing", line

--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick
Jul 18 '05 #10
Nick Craig-Wood wrote:
I've found that a slight irritation in python compared to perl - the
fact that you need to create a match object (rather than relying on
the silver thread of $_ (etc) running through your program ;-)
the old "regex" engine associated the match with the pattern, but that
approach isn't thread safe...
line = "123123"
m = re.search(r'^(\d+)$', line)
if m:
print "int",int(m.group(1))
else:
m = re.search(r'^(\d*\.\d*)$', line)
if m:
print "float",float(m.group(1))
else:
print "unknown thing", line


that's not a very efficient way to match multiple patterns, though. a
much better way is to combine the patterns into a single one, and use
the "lastindex" attribute to figure out which one that matched. see

http://effbot.org/zone/xml-scanner.htm

for more on this topic.

</F>

Jul 18 '05 #11

Fredrik Lundh wrote:
"JZ" wrote:
> import re
> line = "The food is under the bar in the barn."
> if re.search(r'foo(.*)bar',line):
> print 'got %s\n' % _.group(1)

Traceback (most recent call last):
File "jz.py", line 4, in ?
print 'got %s\n' % _.group(1)
NameError: name '_' is not defined
I forgot to add: I am using Python 2.3.4/Win32 (from ActiveState.com). The
code works in my interpreter.


only if you type it into the interactive prompt. see:


No, it doesn't work at all, anywhere. Did you actually try this?

http://www.python.org/doc/2.4/tut/no...00000000000000
"In interactive mode, the last printed expression is assigned to the variable _. This means that when you are using Python as a desk calculator, it is some- what easier to continue calculations /.../"


In the 3 lines that are executed before the exception, there are *no*
printed expressions.

Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
import re
line = "The food is under the bar in the barn."
if re.search(r'foo(.*)bar',line): .... print 'got %s\n' % _.group(1)
....
Traceback (most recent call last):
File "<stdin>", line 2, in ?
NameError: name '_' is not defined


Jul 18 '05 #12
John Machin wrote:
> I forgot to add: I am using Python 2.3.4/Win32 (from ActiveState.com). The
> code works in my interpreter.


only if you type it into the interactive prompt. see:


No, it doesn't work at all, anywhere. Did you actually try this?


the OP claims that it works in his ActiveState install (PythonWin?). maybe he
played with re.search before typing in the commands he quoted; maybe Python-
Win contains some extra hacks?

as I've illustrated earlier, it definitely doesn't work in a script executed by a standard
Python...

</F>

Jul 18 '05 #13

Fredrik Lundh wrote:
John Machin wrote:
>
> I forgot to add: I am using Python 2.3.4/Win32 (from ActiveState.com). The > code works in my interpreter.

only if you type it into the interactive prompt. see:
No, it doesn't work at all, anywhere. Did you actually try this?


the OP claims that it works in his ActiveState install (PythonWin?).

maybe he played with re.search before typing in the commands he quoted; maybe Python- Win contains some extra hacks?

as I've illustrated earlier, it definitely doesn't work in a script executed by a standard Python...

</F>


It is quite possible that the OP played with re.search before before
typing in the commands he quoted; however *you* claimed that it [his
quoted commands] worked "only if you type it into the interactive
prompt". It doesn't work, in the unqualified sense that I understood.

Anyway, enough of punch-ups about how many dunces can angle on the hat
of a pun -- I did appreciate your other posting about multiple patterns
and "lastindex"; thanks.

Jul 18 '05 #14
On 22 Dec 2004 17:30:04 GMT, Nick Craig-Wood <ni**@craig-wood.com> wrote:
Is there an easy way round this? AFAIK you can't assign a variable in
a compound statement, so you can't use elif at all here and hence the
problem?

I suppose you could use a monstrosity like this, which relies on the
fact that list.append() returns None...

line = "123123"
m = []
if m.append(re.search(r'^(\d+)$', line)) or m[-1]:
print "int",int(m[-1].group(1))
elif m.append(re.search(r'^(\d*\.\d*)$', line)) or m[-1]:
print "float",float(m[-1].group(1))
else:
print "unknown thing", line


I wrote a scanner for a recursive decent parser a while back. This is
the pattern i used for using mulitple regexps, instead of using an
if/elif/else chain.

import re
patterns = [
(re.compile('^(\d+)$'),int),
(re.compile('^(\d+\.\d*)$'),float),
]

def convert(s):
for regexp, action in patterns:
m = regexp.match(s)
if not m:
continue
return action(m.group(1))
raise ValueError, "Invalid input %r, was not a numeric string" % (s,)

if __name__ == '__main__':
tests = [ ("123123",123123), ("123.123",123.123), ("123.",123.) ]
for input, expected in tests:
assert convert(input) == expected

try:
convert('')
convert('abc')
except:
pass
else:
assert None,"Should Raise on invalid input"
Of course, I wrote the tests first. I used your regexp's but I was
confused as to why you were always using .group(1), but decided to
leave it. I would probably actually send the entire match object to
the action. Using something like:
(re.compile('^(\d+)$'),lambda m:int(m.group(1)),
and
return action(m)

but lambdas are going out fashion. :(

Stephen Thorne
Jul 18 '05 #15
Fredrik Lundh <fr*****@pythonware.com> wrote:
that's not a very efficient way to match multiple patterns, though. a
much better way is to combine the patterns into a single one, and use
the "lastindex" attribute to figure out which one that matched.
lastindex is useful, yes.
see

http://effbot.org/zone/xml-scanner.htm

for more on this topic.


I take your point. However I don't find the below very readable -
making 5 small regexps into 1 big one, plus a game of count the
brackets doesn't strike me as a huge win...

xml = re.compile(r"""
<([/?!]?\w+) # 1. tags
|&(\#?\w+); # 2. entities
|([^<>&'\"=\s]+) # 3. text strings (no special characters)
|(\s+) # 4. whitespace
|(.) # 5. special characters
""", re.VERBOSE)

Its probably faster though, so I give in gracelessly ;-)

--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick
Jul 18 '05 #16
Nick Craig-Wood wrote:
I take your point. However I don't find the below very readable -
making 5 small regexps into 1 big one, plus a game of count the
brackets doesn't strike me as a huge win...


if you're doing that a lot, you might wish to create a helper function.

the undocumented sre.Scanner provides a ready-made mechanism for this
kind of RE matching; see

http://aspn.activestate.com/ASPN/Mai...on-dev/1614344

for some discussion.

here's (a slight variation of) the code example they're talking about:

def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)

scanner = sre.Scanner([
(r"[a-zA-Z_]\w*", s_ident),
(r"\d+\.\d*", s_float),
(r"\d+", s_int),
(r"=|\+|-|\*|/", s_operator),
(r"\s+", None),
])
print scanner.scan("sum = 3*foo + 312.50 + bar")

(['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], '')

</F>

Jul 18 '05 #17
Fredrik Lundh <fr*****@pythonware.com> wrote:
the undocumented sre.Scanner provides a ready-made mechanism for this
kind of RE matching; see

http://aspn.activestate.com/ASPN/Mai...on-dev/1614344

for some discussion.

here's (a slight variation of) the code example they're talking about:

def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)

scanner = sre.Scanner([
(r"[a-zA-Z_]\w*", s_ident),
(r"\d+\.\d*", s_float),
(r"\d+", s_int),
(r"=|\+|-|\*|/", s_operator),
(r"\s+", None),
])
>>> print scanner.scan("sum = 3*foo + 312.50 + bar")

(['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'],
'')


That is very cool - exactly the kind of problem I come across quite
often!

I've found the online documentation (using pydoc) for re / sre in
general to be a bit lacking.

For instance nowhere in

pydoc sre

Does it tell you what methods a match object has (or even what type it
is). To find this out you have to look at the HTML documentation.
This is probably what Windows people look at by default but Unix
hackers like me expect everything (or at least a hint) to be in the
man/pydoc pages.

Just noticed in pydoc2.4 a new section

MODULE DOCS
http://www.python.org/doc/current/lib/module-sre.html

Which is at least a hint that you are looking in the wrong place!
....however that page doesn't exist ;-)

--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick
Jul 18 '05 #18

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: pekka niiranen | last post by:
Hi there, I have perl script that uses dynamically constructed regular in this way: ------perl code starts ---- $result ""; $key = AAA\?01; $key = quotemeta $key; $line = " ...
3
by: Vibha Tripathi | last post by:
Hi Folks, I put a Regular Expression question on this list a couple days ago. I would like to rephrase my question as below: In the Python re.sub(regex, replacement, subject)...
19
by: Davy | last post by:
Hi all, I am a C/C++/Perl user and want to switch to Python (I found Python is more similar to C). Does Python support robust regular expression like Perl? And Python and Perl's File...
5
by: Avi Kak | last post by:
Folks, Does regular expression processing in Python allow for executable code to be embedded inside a regular expression? For example, in Perl the following two statements $regex =...
34
by: Antoine De Groote | last post by:
Hello, Can anybody tell me the reason(s) why regular expressions are not built into Python like it is the case with Ruby and I believe Perl? Like for example in the following Ruby code line =...
1
by: Wehrdamned | last post by:
Hi, As I understand it, python uses a pcre engine to work with regular expression. My question is, then, why expressions like : Traceback (most recent call last): File "<stdin>", line 1, in...
3
by: John Nagle | last post by:
Here's a large Perl regular expression, from a Perl address parser in CPAN: use re 'eval'; $Addr_Match{street} = qr/ (?: # special case for addresses like 100 South Street...
3
by: seberino | last post by:
How similar is Python's re module (regular expressions) compared to Perl's and grep's regular expression syntaxes? I really hope regular expression syntax is sufficiently standardized that we...
8
by: Uwe Schmitt | last post by:
Hi, Is anobody aware of this post: http://swtch.com/~rsc/regexp/regexp1.html ? Are there any plans to speed up Pythons regular expression module ? Or is the example in this artricle too...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.