f*cking re module

jwaixs

arg... I've lost 1.5 hours of my precious time to try letting re work
correcty. There's really not a single good re tutorial or documentation
I could found! There are only reference, and if you don't know how a
module work you won't learn it from a reference!

This is the problem:

import re
str = "blabla<python>Re modules sucks!</python>blabla"
re.search("(<python>)(/python>)", str).group()

Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'NoneType' object has no attribute 'group'

the only thing I want are the number of places blabla, Re modules
sucks! and blabla are.

Noud

Jul 21 '05 #1

Subscribe Post Reply

2941

Erik Max Francis

jwaixs wrote:

arg... I've lost 1.5 hours of my precious time to try letting re work
correcty. There's really not a single good re tutorial or documentation
I could found! There are only reference, and if you don't know how a
module work you won't learn it from a reference!

Then Google for regular expression tutorials, not regular expression
references.

import re
str = "blabla<python>Re modules sucks!</python>blabla"
re.search("(<python>)(/python>)", str).group()

Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'NoneType' object has no attribute 'group'

the only thing I want are the number of places blabla, Re modules
sucks! and blabla are.

Your question is still not clear. What you're searching for is
'<python></python>', which isn't there, so .search returns None, and so
you get that exception (.group takes an argument, anyway).

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
With such a weapon I could boil the Earth to vapor.
-- Chmeee

Jul 21 '05 #2

Simon Brunning

On 4 Jul 2005 01:04:47 -0700, jwaixs <jw****@gmail.com> wrote:

arg... I've lost 1.5 hours of my precious time to try letting re work
correcty. There's really not a single good re tutorial or documentation
I could found!

http://www.amk.ca/python/howto/regex/

--
Cheers,
Simon B,
si***@brunningonline.net,
http://www.brunningonline.net/simon/blog/

Jul 21 '05 #3

Matthias Huening

jwaixs (04.07.2005 10:04):

arg... I've lost 1.5 hours of my precious time to try letting re work
correcty. There's really not a single good re tutorial or documentation
I could found!

Did you try this one?
http://www.amk.ca/python/howto/regex/regex.html

import re
str = "blabla<python>Re modules sucks!</python>blabla"
re.search("(<python>)(/python>)", str).group()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'NoneType' object has no attribute 'group'

RE doesn't find "(<python>)(/python>)" because it isn't in your string.
That's why group fails.

import re
s = "blabla<python>Re module is great!</python>blabla"
re.search("(<python>).*(/python>)", s).group()

'<python>Re module is great!</python>'
Matthias

Jul 21 '05 #4

Max M

jwaixs wrote:

arg... I've lost 1.5 hours of my precious time to try letting re work
correcty. There's really not a single good re tutorial or documentation
I could found! There are only reference, and if you don't know how a
module work you won't learn it from a reference!

If you want to try out re interactively you could use:

<Python>\Tools\Scripts\redemo.py
--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science

Jul 21 '05 #5

Diez B. Roggisch

jwaixs wrote:

arg... I've lost 1.5 hours of my precious time to try letting re work
correcty. There's really not a single good re tutorial or documentation
I could found! There are only reference, and if you don't know how a
module work you won't learn it from a reference!

This is the problem:

import re
str = "blabla<python>Re modules sucks!</python>blabla"
re.search("(<python>)(/python>)", str).group()

Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'NoneType' object has no attribute 'group'

the only thing I want are the number of places blabla, Re modules
sucks! and blabla are.

Others gave you advice on how to deal withe regexes. I'm going to add
that regexes aren't the way to go for this - use HTMLParser. With your
regex, you won't be able to handle correctly either this

<foo>some text</foo><foo>some other text</foo>

as you will get the whole string, not just the first match. You can
alter the so-called longest match behaviour, but then

<foo>some oute text <foo>some inner text</foo> some more outer text</foo>
won't work....
Try and do not use regexps. Or at least do it in a way that you tokenize
the text and then can sweep over it collecting the data you need
yourself (but that's basically rewriting the html parsers out there).

Diez

Jul 21 '05 #6

Gurpreet Sachdeva

try:
re.search("(<python>)(/python>)", str).group()
except:
print 'not found'

otherwise,

re.search("(<python>).*?(\/python>)", str).group()

this is will help!

Regards,
Gurpreet Singh

Blogging [at] http://garrythegambler.blogspot.com

On 4 Jul 2005 01:04:47 -0700, jwaixs <jw****@gmail.com> wrote:

arg... I've lost 1.5 hours of my precious time to try letting re work
correcty. There's really not a single good re tutorial or documentation
I could found! There are only reference, and if you don't know how a
module work you won't learn it from a reference!

This is the problem:

import re
str = "blabla<python>Re modules sucks!</python>blabla"
re.search("(<python>)(/python>)", str).group()

Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'NoneType' object has no attribute 'group'

the only thing I want are the number of places blabla, Re modules
sucks! and blabla are.

Noud

--
http://mail.python.org/mailman/listinfo/python-list

--
Thanks and Regards,
GSS

Jul 21 '05 #7

Stephen Harris

"jwaixs" <jw****@gmail.com> wrote in message
news:11**********************@g47g2000cwa.googlegr oups.com...

arg... I've lost 1.5 hours of my precious time to try letting re work
correcty. There's really not a single good re tutorial or documentation
I could found! There are only reference, and if you don't know how a
module work you won't learn it from a reference!

PowerGrep is a commercial Windows tool. However, it comes
with a manual that has a 45 page tutorial on regular expressions.
www.powergrep.com/manual/PowerGREP.pdf tutorial: pages 109-156
There is also a new Wrox book besides the O'Reilly/Friedl Owl book.

http://www.uhacc.org/tech_docs/guides/regex1.php
also regex2.php, and regex3.php

Only the delimiters have been changed to protect the innocent,
Stephen

Jul 21 '05 #8

jwaixs

Thank you for your replies, it's much obvious now. I know more what I
can and can't do with the re module. But is it possible to search for
more than one string in the same line?

bv. I want to replace the <python> with " "
</python> with "\n" and every thing that's not between the two python
tags must begin with "\nprint \"\"\"" and end with "\"\"\"\n"? Or do I
need more than one call?

Jul 21 '05 #9

Cyril BAZIN

If you are looking for HTML tags or something like that. Have a look
at the HTMLParser (docs.python.org).

On 4 Jul 2005 03:37:02 -0700, jwaixs <jw****@gmail.com> wrote:

Thank you for your replies, it's much obvious now. I know more what I
can and can't do with the re module. But is it possible to search for
more than one string in the same line?

bv. I want to replace the <python> with " "
</python> with "\n" and every thing that's not between the two python
tags must begin with "\nprint \"\"\"" and end with "\"\"\"\n"? Or do I
need more than one call?

--
http://mail.python.org/mailman/listinfo/python-list

Jul 21 '05 #10

George Sakkis

"jwaixs" <jw****@gmail.com> wrote:

Thank you for your replies, it's much obvious now. I know more what I
can and can't do with the re module. But is it possible to search for
more than one string in the same line?

bv. I want to replace the <python> with " "
</python> with "\n" and every thing that's not between the two python
tags must begin with "\nprint \"\"\"" and end with "\"\"\"\n"? Or do I
need more than one call?

You can do it in one call, but it's ugly; as other have told you
already, use HTMLParser or some other parsing package. Now if you
insist...

regex = re.compile(r'''(?:
(?:<python>)
(.*?) # group 1: inside tags
(?:</python>)
) | # OR
([^<]*) # group 2: outside tags
''', re.DOTALL | re.VERBOSE)

def replace(match):
g1,g2 = match.groups()
if g1:
return g1
else:
return '\nprint """%s"""\n' % g2
text = '''this is <python>a stupid
sentence</python> but still I
<python>have to</python> write it.'''

print regex.sub(replace,text)

===== Output ==================

print """this is """
a stupid
sentence
print """ but still I
"""
have to
print """ write it."""

=======================

George

Jul 21 '05 #11

Gustavo Niemeyer

> > the only thing I want are the number of places blabla, Re modules

sucks! and blabla are.

Your question is still not clear. What you're searching for is
'<python></python>', which isn't there, so .search returns None, and so
you get that exception (.group takes an argument, anyway).

That's what I love in that news group. Someone comes with a
stupid and arrogant question, and someone else answers in a
calm and reasonable way.

Thanks Erik.

--
Gustavo Niemeyer
http://niemeyer.net

Jul 21 '05 #12

James Stroud

On Monday 04 July 2005 12:39 pm, Gustavo Niemeyer wrote:

the only thing I want are the number of places blabla, Re modules
sucks! and blabla are.

Your question is still not clear. What you're searching for is
'<python></python>', which isn't there, so .search returns None, and so
you get that exception (.group takes an argument, anyway).

That's what I love in that news group. Someone comes with a
stupid and arrogant question, and someone else answers in a
calm and reasonable way.

Its perhaps that they remember the frustration of being new to programming.
Those "wasted" 1.5 hr sessions getting nowhere add up pretty fast and then
the explicatives begin to flow.

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

Jul 21 '05 #13

Erik Max Francis

James Stroud wrote:

Its perhaps that they remember the frustration of being new to programming.
Those "wasted" 1.5 hr sessions getting nowhere add up pretty fast and then
the explicatives begin to flow.

Also because the best way to make someone who's having a tantrum look
foolish in public is to treat them with dignity and respect while they
continue rambling and ranting.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
Sometimes a cigar is just a cigar.
-- Sigmund Freud

Jul 21 '05 #14

D H

Gustavo Niemeyer wrote:

That's what I love in that news group. Someone comes with a
stupid and arrogant question, and someone else answers in a
calm and reasonable way.

....and then someone else comes along and calls the first person stupid
and arrogant, which is deemed QOTW. :)

Jul 21 '05 #15

François Pinard

[D H]

Gustavo Niemeyer wrote:
That's what I love in that news group. Someone comes with a
stupid and arrogant question, and someone else answers in a
calm and reasonable way.
...and then someone else comes along and calls the first person stupid
and arrogant, which is deemed QOTW. :)

Hey! The "question", not the "person"! One might say "the subject",
but then, it has to be the subject of the message, of course! :-)

--
François Pinard http://pinard.progiciels-bpi.ca

Jul 21 '05 #16

Greg Lindstrom

> That's what I love in that news group. Someone comes with a

stupid and arrogant question, and someone else answers in a
calm and reasonable way.

Me, too. Indeed, that's a great reason to be a part of this community.
I didn't see the original question as either stupid or arrogant; I read
it as being from a frustrated user not knowing the regular expression
package. We've all been there. While it would be nice if the question
could be expressed without the explatives, it does no good at all to
return the hostile tone. Another thing I like is that after the
question was answered, other ways to approach the problem were offered
along with reasons as to why the new way is a better approach. And no
one had to get hurt!

Perhaps Python is a victim of it's own design here. We, or at least I,
have grown to expect things to be clear and easy to understand/use.
Regular expressions are not either, though I use them all the time (I
learned them when I was a system admin on a Sun network and tend to
fall back on them when I need a "quick fix"). I hear that Perl 6 is
going to have a rewrite of regular expressions; it will be interesting
to see what their hard work produces.

--greg

Jul 21 '05 #17

Michael Hoffman

Greg Lindstrom wrote:

I hear that Perl 6 is
going to have a rewrite of regular expressions; it will be interesting
to see what their hard work produces.

From what I saw a while ago, it didn't look like it would be any
simpler or more elegant. But that was a while ago.
--
Michael Hoffman

Jul 21 '05 #18

jwaixs

To reply to the last part of the discussion and esspecially to Gustavo
Niemeyer, I really apriciate the way in which I had been answered. And
I won't have any questions about the re module that I had before I post
this threat.

I was frustration and should, probebly, not post this frustration. But
it was never my purpous to ask something stupid or asking it arrogant.

The python re module is, in my opinion, a non beginner user friendly
module. And it's not meant for beginning python programmers. I don't
have any experience with perl or related script/programming languages
like python. (I prefer to do things in c) So the re module is
completely new for me.

If I upset someones clean mind posting a "stupid" and "arrogant"
question, I'm sorry and won't post my frustrasion on this usenet group
anymore.

But thank you for all your help about the re module,

Noud Aldenhoven

Jul 21 '05 #19

George Sakkis

"jwaixs" <jw****@gmail.com> wrote:

To reply to the last part of the discussion and esspecially to Gustavo
Niemeyer, I really apriciate the way in which I had been answered. And
I won't have any questions about the re module that I had before I post
this threat.

I was frustration and should, probebly, not post this frustration. But
it was never my purpous to ask something stupid or asking it arrogant.
I think most people who answered you calmly realized that it was
frustration rather than arrogance and trolling behavior. This happens
often if you're new to something and/or this something is inherently
complicated; I almost did the same mistake when I had to mess with the
loathsome autoconf/automake/autohell files and tools, but at least I
refrained from the expletives.
The python re module is, in my opinion, a non beginner user friendly
module. And it's not meant for beginning python programmers. I don't
have any experience with perl or related script/programming languages
like python. (I prefer to do things in c) So the re module is
completely new for me.
It's not the python re module beginner unfriendly; it's the regular
expression concepts and syntax that the module implements. Read some
introductory material about regexps first (Wikipedia is a good place to
start: http://en.wikipedia.org/wiki/Regular_expression) and then get
familiar with the re module by trying the examples interactively in the
interpreter.
If I upset someones clean mind posting a "stupid" and "arrogant"
question, I'm sorry and won't post my frustrasion on this usenet group
anymore.
It's always a good idea to wait for a few hours before posting anything
you may regret later, but this doesn't mean you should leave this
group; as you saw already, it's far more tolerant and civilized than
other groups in this respect.
But thank you for all your help about the re module,

Noud Aldenhoven

Cheers,
George

Jul 21 '05 #20

Don

jwaixs wrote:

To reply to the last part of the discussion and esspecially to Gustavo
Niemeyer, I really apriciate the way in which I had been answered. And
I won't have any questions about the re module that I had before I post
this threat.

I was frustration and should, probebly, not post this frustration. But
it was never my purpous to ask something stupid or asking it arrogant.

The python re module is, in my opinion, a non beginner user friendly
module. And it's not meant for beginning python programmers. I don't
have any experience with perl or related script/programming languages
like python. (I prefer to do things in c) So the re module is
completely new for me.

If I upset someones clean mind posting a "stupid" and "arrogant"
question, I'm sorry and won't post my frustrasion on this usenet group
anymore.

But thank you for all your help about the re module,

Noud Aldenhoven

I would recommend that you give Kodos a try:

http://kodos.sourceforge.net/

Doesn't make the re syntax any easier, but I find that it allows you to more
quickly develop workable code.

-Don

Jul 21 '05 #21

Paul McGuire

Your elaboration on what problem you are actually trying to solve gave
me some additional insights into your question. It looks like you are
writing a Python-HTML templating system, by embedding Python within
HTML using <python>...</python> tags.

As many may have already guessed, I worked up a pyparsing treatment of
your problem. As part of the implementation, I reinterpreted your
transformations slightly. You said:

I want to replace the <python> with " ", </python>
with "\n" and every thing that's not between the two
python tags must begin with "\nprint \"\"\"" and
end with "\"\"\"\n"

If this were an HTML page with <python> tags, it might look like:

<some HTML>
<python>
x = 1
</python>
<some more HTML>

The corresponding CGI python code would then read:
print """<some HTML>\n"""
x = 1
print """<some more HTML>\n"""

So we can reinterpret your transformation as:
1. From start of file to first <python> tag,
enclose in print """<leading stuff>\n"""
2. From <python> tag to </python tag, print contents
3. From </python> tag to next <python> tag,
enclose in print """<stuff between tags>\n"""
4. From last </python> tag to end of file,
enclose in print """<ending stuff>\n"""

Or more formally:
<beginning of file> -> 'print r"""'
<python> -> '"""\n'
<\python> -> 'print r"""'
<end of file> -> '"""\n'

Now that we have this defined, we can consider adding some standard
imports to the <beginning of file> transformation, such as "import
sys", etc.

Here is a working implementation. The grammar itself is only about 10
lines of code, mostly in defining the replacement transforms. The last
18 lines are the test case itself, printing the transformed string, and
then eval'ing the transformed string.

========================
# Take HTML that has <python> </python> tags interspersed, with python
code
# between the <python> tags. Convert to running python cgi program.

# replace <python> with r'"""\n' and </python> with r'\nprint """'
# also put 'print """\ \n' at the beginning and '"""\n' at the end

from pyparsing import *

class OnlyOnce(object):
def __init__(self, methodCall):
self.callable = methodCall
self.called = False
def __call__(self,s,l,t):
if not self.called:
self.called = True
return self.callable(s,l,t)
raise ParseException(s,l,"")

stringStartText = """import sys
print "Content-Type: text/html\\n"
print r\"\"\""""
stringEndText = '"""\n'
startPythonText = '"""\n'
endPythonText = '\nprint r"""\n'

# define grammar
pythonStart = CaselessLiteral("<python>")
pythonEnd = CaselessLiteral("</python>")
sStart = StringStart()
sEnd = StringEnd()

sStart.setParseAction( OnlyOnce( replaceWith(stringStartText) ) )
sEnd.setParseAction( replaceWith(stringEndText) )
pythonStart.setParseAction( replaceWith(startPythonText) )
pythonEnd.setParseAction( replaceWith(endPythonText) )

xform = sStart | sEnd | pythonStart | pythonEnd

# run test case
htmlWithPython = r"""<HTML>
<HEAD>
<TITLE>Sample Page Created from Python</TITLE>
</HEAD>
<BODY>
<H1>Sample Page Created from Python</H1>
<python>
for i in range(10):
print "This is line %d<br>" % i
</python>
</BODY>
</HTML>
"""

generatedPythonCode = xform.transformString( htmlWithPython )
print generatedPythonCode
print
exec(generatedPythonCode)
========================
Here is the output:
import sys
print "Content-Type: text/html\n"
print r"""<HTML>
<HEAD>
<TITLE>Sample Page Created from Python</TITLE>
</HEAD>
<BODY>
<H1>Sample Page Created from Python</H1>
"""

for i in range(10):
print "This is line %d<br>" % i

print r"""

</BODY>
</HTML>
"""
Content-Type: text/html

<HTML>
<HEAD>
<TITLE>Sample Page Created from Python</TITLE>
</HEAD>
<BODY>
<H1>Sample Page Created from Python</H1>

This is line 0<br>
This is line 1<br>
This is line 2<br>
This is line 3<br>
This is line 4<br>
This is line 5<br>
This is line 6<br>
This is line 7<br>
This is line 8<br>
This is line 9<br>
</BODY>
</HTML>
========================

This exercise was interesting to me in that it uncovered some
unexpected behavior in pyparsing when matching on positional tokens (in
this case StringStart and StringEnd). I learned that:
1. Since StringStart does not advance the parsing position in the
string, it is necessary to ensure that the parse action get run only
once, and then raise a ParseException on subsequent calls. The little
class OnlyOnce takes care of this (I will probably fold OnlyOnce into
the next point release of pyparsing).
2. StringEnd is not well matched during scanString or transformString
if there is no trailing whitespace at the end of the input. Even a
trailing \n is sufficient. My first example of testdata ended with the
closing </HTML> tag, with no carriage return, and
scanString/transformString failed to match. If I added a newline to
close the </HTML> tag, then scanString could find the StringEnd. This
is not a terrible workaround, but it's another loose end to tie up in
the next release.

Enjoy!
-- Paul

Jul 21 '05 #22

Gustavo Niemeyer

> To reply to the last part of the discussion and esspecially to Gustavo

Niemeyer, I really apriciate the way in which I had been answered. And
I won't have any questions about the re module that I had before I
post this threat.
Great! As I said, that's a nice news group.
I was frustration and should, probebly, not post this frustration. But
it was never my purpous to ask something stupid or asking it arrogant.
You can post frustration for sure. But saying "f*cking re module"
and than showing that your problem is not understanding the very
basics of regular expressions is not a nice way to ask for help.
The python re module is, in my opinion, a non beginner user friendly
module. And it's not meant for beginning python programmers. I don't
have any experience with perl or related script/programming languages
like python. (I prefer to do things in c) So the re module is
completely new for me.
re.match("<p>(.*)</p>", "<p>foobar</p>").group(1)

'foobar'

That's very similar to your problem and looks quite clear and
simple, but you must understand what regular expressions are
about, of course.
If I upset someones clean mind posting a "stupid" and "arrogant"
question, I'm sorry and won't post my frustrasion on this usenet group
anymore.
Not upset at all. You just asked it the wrong way, and I was impressed
that even then there were lots of people to help you. That's not
usual in most places.
But thank you for all your help about the re module,

In my opinion you would benefit from looking at some documentation
talking about regular expressions in general. After you understand
them, the re module will look f*cking simple. ;-)

--
Gustavo Niemeyer
http://niemeyer.net

Jul 21 '05 #23

Paul McGuire

Forgot to mention: Download pyparsing at
http://pyparsing.sourceforge.net.

-- Paul

Jul 21 '05 #24

Terry Hancock

On Tuesday 05 July 2005 10:04 am, jwaixs wrote:

The python re module is, in my opinion, a non beginner user friendly
module. And it's not meant for beginning python programmers. I don't
have any experience with perl or related script/programming languages
like python. (I prefer to do things in c) So the re module is
completely new for me.

I think you had an error of expectations here: Python provides
a regular expression module which is consistent with regular
expression syntax as it has come to be defined. Your frustration
was entirely with regular expressions themselves, not the Python
implementation. You would've had the same experience in Perl
or Ruby or for that matter, sed or awk.

In that vein, the Python documentation for the re module is only
that -- documentation for the module. It does not (and couldn't
reasonably be expected to) cover the subject of regular expressions
themselves. You might as well expect the Python manual to
explain "object oriented programming", "data structures",
"functional programming" or other semester-long computer
science subjects. There are entire books dedicated to the
subject of learning regular expressions and pattern recognition
in general.

It's a fairly complex subject. After all, it's merely the simplest
one-dimensional case of a pattern recognition system, which
is essentially an AI discipline. Only the fact that the
one-dimensional case of discrete text is an extremely simple
case makes the subject tenable at all for ordinary programs.

If you're really interested in using this new technique, I would
suggest that you be prepared to be patient and tackle the
problem of learning it seriously, just as you did when you
learned to program in C in the first place. And you might
want to read one of those aforementioned books, such as:

Mastering Regular Expressions (2nd ed)
by Jeffrey E. F. Friedl
http://www.oreilly.com/catalog/regex2/

And IMHO, Python actually makes regular expressions a lot
easier to handle than they are in some of the other languages
you could be attempting this in. I've only tried regexes in
Python, Perl, and Javascript, but Python is definitely the one
I find easiest to cope with. ;-)

--
Terry Hancock ( hancock at anansispaceworks.com )
Anansi Spaceworks http://www.anansispaceworks.com

Jul 21 '05 #25

Raymond Hettinger

> There's really not a single good re tutorial or documentation

I could found!

With * being a greedy operator, your post's subject line matches,
"firetrucking" which, of course, has nothing to do with regular
expressions, or python.org's re how-to guide, or Amazon's 18 books on
the subject, or the hundreds of available on-line tutorials.

http://www.amk.ca/python/howto/regex/
http://www.amazon.com/exec/obidos/se...115182-7050550
http://www.google.com/search?q=regul...ssion+tutorial
Raymond

Jul 21 '05 #26

Simon Brunning

On 6 Jul 2005 01:01:34 -0700, Raymond Hettinger <py****@rcn.com> wrote:

With * being a greedy operator, your post's subject line matches,
"firetrucking"

Nope:

print re.match('f*cking', 'firetrucking') None

The OP was clearly showing his lack of regex nouce here. Clearly he
wanted 'f.*cking':
print re.match('f.*cking', 'firetrucking')

<_sre.SRE_Match object at 0x01196058>

;-)

--
Cheers,
Simon B,
si***@brunningonline.net,
http://www.brunningonline.net/simon/blog/

Jul 21 '05 #27

Jorgen Grahn

On 5 Jul 2005 08:04:21 -0700, jwaixs <jw****@gmail.com> wrote:
....

The python re module is, in my opinion, a non beginner user friendly
module. And it's not meant for beginning python programmers. I don't
have any experience with perl or related script/programming languages
like python. (I prefer to do things in c) So the re module is
completely new for me.

Actually, REs are a useful tool even in C, for example for validating input
before parsing it with custom code, so you can forget about some of the
error checking in the parsing code.

.... although you have to be on a Unix system to be reasonably sure that they
are available, and even then you're looking at a much less powerful RE
language than the ones in Python and Perl.

That's another problem with REs -- there are many slightly different RE
languages, and sometimes you cannot even be sure which one you're working
with.

/Jorgen

--
// Jorgen Grahn <jgrahn@ Ph'nglui mglw'nafh Cthulhu
\X/ algonet.se> R'lyeh wgah'nagl fhtagn!

Jul 21 '05 #28

Chris Smith

>>>>> "Michael" == Michael Hoffman <ca*******@mh391.invalid> writes:

Michael> Greg Lindstrom wrote:

I hear that Perl 6 is going to have a rewrite of regular
expressions; it will be interesting to see what their hard work
produces.

Michael> From what I saw a while ago, it didn't look like it
Michael> would be any simpler or more elegant. But that was a
Michael> while ago. -- Michael Hoffman

Oh, come on: what's a Perliodic Table of Operators, between friends?
http://www.ozonehouse.com/mark/blog/...odicTable.html
R,
C

Jul 21 '05 #29

Steven D'Aprano

On Thu, 07 Jul 2005 06:47:54 -0400, Chris Smith wrote:

>> "Michael" == Michael Hoffman <ca*******@mh391.invalid> writes:
Michael> Greg Lindstrom wrote: >> I hear that Perl 6 is going to have a rewrite of regular
>> expressions; it will be interesting to see what their hard work
>> produces.

Michael> From what I saw a while ago, it didn't look like it
Michael> would be any simpler or more elegant. But that was a
Michael> while ago. -- Michael Hoffman

Oh, come on: what's a Perliodic Table of Operators, between friends?
http://www.ozonehouse.com/mark/blog/...odicTable.html

That, and the discussion on operators by Larry Wall, are two of the most
scary things I've ever seen. Is there any possible sequence of bytes that
will not be a valid Perl expression or operator?

I was going to suggest a single null byte, but then I realised that it
is probably be an operator for converting a decimal numeric string to an
associative array mapping integers to German verbs.

All jokes aside, thank goodness Guido isn't Larry. Larry actually gives
the thumbs-up to syntax that will confuse programmers:

[quote]
So Perl could keep track of a unary = operator, even if the human
programmer might be confused. So I'd place a unary = operator in the
category of "OK, but don't use it for anything that will cause widespread
confusion."
[end quote]

There is no unary = operator in Perl, yet, but Larry sees nothing wrong
with the concept of syntax that confuses programmers, or of code which can
be parsed as legal code by Perl even if it can't be displayed on the
developer's computer.

--
Steven.

Jul 21 '05 #30

Mike Meyer

Steven D'Aprano <st***@REMOVETHIScyber.com.au> writes:

On Thu, 07 Jul 2005 06:47:54 -0400, Chris Smith wrote:
Oh, come on: what's a Perliodic Table of Operators, between friends?
http://www.ozonehouse.com/mark/blog/...odicTable.html

That, and the discussion on operators by Larry Wall, are two of the most
scary things I've ever seen. Is there any possible sequence of bytes that
will not be a valid Perl expression or operator?

Yes, but will it get as bad as TECO, where guessing what your name did
as a command sequence was a popular past time?

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Jul 21 '05 #31

paron

That is so handy!! Thanks!

Ron

Jul 21 '05 #32

François Pinard

[Mike Meyer]

Steven D'Aprano <st***@REMOVETHIScyber.com.au> writes:
Is there any possible sequence of bytes that will not be a valid
Perl expression or operator?

Perl has lots of syntax, and good warning facilities, it is not so bad.
[...] TECO, where guessing what your name did as a command sequence
was a popular past time?

Yes, this is much more likely to be meaningful! Yet, don't laugh at
this venerable editor. Remember that Emacs started as an *E*xtended set
of TECO *mac*ros, as its name still say. :-)

I once worked with a PL/I compiler (on a big IBM mainframe), which was
trying to be helpful by spitting pages of:

Error SUCH AND SUCH, assuming that THIS AND THIS was meant.

and continuing compilation nevertheless. It was a common joke to say
that PL/I would compile some random valid program out of any garbage!

--
François Pinard http://pinard.progiciels-bpi.ca

Jul 21 '05 #33

Rocco Moretti

François Pinard wrote:

I once worked with a PL/I compiler (on a big IBM mainframe), which was
trying to be helpful by spitting pages of:

Error SUCH AND SUCH, assuming that THIS AND THIS was meant.

and continuing compilation nevertheless. It was a common joke to say
that PL/I would compile some random valid program out of any garbage!

We may laugh now (and then), but it was likely a valid design decision
at the time. If you're running a job on "big iron", depending on the
situation, you might have had only a block of a few hours on a
timeshared system, perhaps unattended. If the compiler refused to
continue, the rest of your block might have been wasted. (At the very
least, you would have had to sign up for an additional block later.)

If your program had only minor errors, there was likely a good chance
that the compiler might guess correctly, and your program would compile
to what you wanted in the first place. If not, by continuing on, the
compiler can flag additional errors later in your code, allowing you to
fix those bugs sooner. (Instead of choking on the first one and refusing
to continue.)

Error-checking-by-compiling only "works" if you have cheap computing
power you can run attended. (Can you imagine what TDD would be like if
you had to wait 24+ hrs between code executions?)

Jul 21 '05 #34

Kay Schluehr

jwaixs schrieb:

arg... I've lost 1.5 hours of my precious time to try letting re work
correcty.

1.5 hours are not enough for understanding regular expressions. But to
be honest: I never had the patience to learn them accurately and I
guess I will never do so as well as I don't ever learn sed or awk or
Perl. When my brain hit regexp syntax the first time I thought about
writing a little language that translates readable Python expressions
in awkward regexps as an intermediary language which gets finally
compiled into a finite state-machine descriptions. Fortunately I was
not the first one who considered his mental weakness/aesthetic
repulsion as a virtue:

http://home.earthlink.net/~jasonrandharper/reverb.py

I never took time for a critical review. The module just worked for my
purposes.

Kay

Jul 21 '05 #35

François Pinard

[Rocco Moretti]

François Pinard wrote:
I once worked with a PL/I compiler (on a big IBM mainframe), which was
trying to be helpful by spitting pages of: Error SUCH AND SUCH, assuming that THIS AND THIS was meant. and continuing compilation nevertheless. It was a common joke to say
that PL/I would compile some random valid program out of any garbage!

We may laugh now (and then), but it was likely a valid design decision
at the time. [...] Error-checking-by-compiling only "works" if you
have cheap computing power you can run attended. (Can you imagine what
TDD would be like if you had to wait 24+ hrs between code executions?)

Of course, all granted.[1]

The only way to be really productive, in those times, was to round-robin
oneself between a dozen simultaneous projects, or so, pushing on each
one in turn while concentrating hard to avoid spoiling one run, before
resubmitting that project and moving to the next. This kind of seek for
productivity was somehow exhausting for the mind.

Nowadays, things are infinitely easier. Even if easier, Python is sadly
"original" in stopping at the first syntax error, probably for avoiding
all concerns about error recovery, which is not always an easy matter.
I suspect it might be easier with Python than with many other languages.

--------------------
[1] PL/I was aggressively aiming both syntactic and semantic recovery.

--
François Pinard http://pinard.progiciels-bpi.ca

Jul 21 '05 #36

Mike Meyer

Rocco Moretti <ro**********@hotpop.com> writes:

François Pinard wrote:
If your program had only minor errors, there was likely a good chance
that the compiler might guess correctly, and your program would
compile to what you wanted in the first place. If not, by continuing
on, the compiler can flag additional errors later in your code,
allowing you to fix those bugs sooner. (Instead of choking on the
first one and refusing to continue.)

Error-checking-by-compiling only "works" if you have cheap computing
power you can run attended. (Can you imagine what TDD would be like if
you had to wait 24+ hrs between code executions?)

Yeah, but how many modern compilers give up after only one error? Most
compilers will reset the parser to a known state, and keep on trying
to parse the input. The none state may be erronious, leading to bogus
errors - possibly lots of them! - but at least it keeps trying. They
don't have to try to do DWIM to do this; they just have to have a
reasonable way to reset.

I only know one compiler that punts after the first error. Even with
lots of cheap computing power, it's still very annoying.

Come to think of it, Python does this, doesn't it? For some reason,
that doesn't annoy me. Maybe because I don't think of it as a
edit/compile/run cycle, but as an edit/run cycle, and I expect those
stop after one errror.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Jul 21 '05 #37

George Sakkis

"Mike Meyer" <mw*@mired.org> wrote:

I only know one compiler that punts after the first error. Even with
lots of cheap computing power, it's still very annoying.

Come to think of it, Python does this, doesn't it? For some reason,
that doesn't annoy me. Maybe because I don't think of it as a
edit/compile/run cycle, but as an edit/run cycle, and I expect those
stop after one errror.

Yes, the very short edit/run cycle is one reason. A second one is that
syntax errors are in general a pretty small percentage compared to
runtime errors (at least for anyone that has been using python for a
week or more), so even if the syntax checker did report all of them at
once it wouldn't make much difference in the overall debugging time.
Third, at least one editor (Komodo) reports syntax error on the fly as
you edit and this helps saving a few seconds now and then.

George

Jul 21 '05 #38

f*cking re module

Similar topics