By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,050 Members | 1,009 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,050 IT Pros & Developers. It's quick & easy.

Regular Expressions

P: n/a
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.
Feb 10 '07 #1
Share this Question
Share on Google+
20 Replies


P: n/a
"Geoff Hill" <th*************@gmail.comwrites:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.
Read the documentation?
Feb 10 '07 #2

P: n/a
On Feb 11, 10:26 am, "Geoff Hill" <thegeoffmeis...@gmail.comwrote:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.
I suggest that you work through the re HOWTO
http://www.amk.ca/python/howto/regex/
and by work through, I don't mean "read". I mean as each new concept
is introduced:
1. try the given example(s) yourself at the interactive prompt
2. try variations on the examples
3. read the relevant part of the Library Reference Manual

Also I'd suggest reading threads in this newsgroup where people are
asking for help with re.

HTH,
John

Feb 11 '07 #3

P: n/a
"John Machin" <sj******@lexicon.netwrites:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.

I suggest that you work through the re HOWTO
http://www.amk.ca/python/howto/regex/
Also remember Zawinski's law:
http://fishbowl.pastiche.org/2003/08...ar_expressions
Feb 11 '07 #4

P: n/a
On Feb 10, 6:26 pm, "Geoff Hill" <thegeoffmeis...@gmail.comwrote:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.
I highly recommend reading the book "Mastering Regular Expressions,"
which I believe is published by O'Reilly. It's a great reference and
helps peel the onion in terms of working through RE. They are a
language unto themselves. A fun brain exercise.

Feb 11 '07 #5

P: n/a
On 10 Feb 2007 18:58:51 -0800, gregarican <gr*********@gmail.comwrote:
On Feb 10, 6:26 pm, "Geoff Hill" <thegeoffmeis...@gmail.comwrote:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.

I highly recommend reading the book "Mastering Regular Expressions,"
which I believe is published by O'Reilly. It's a great reference and
helps peel the onion in terms of working through RE. They are a
language unto themselves. A fun brain exercise.

--
http://mail.python.org/mailman/listinfo/python-list
Absolutely: Get "Mastering Regular Expressions" by Jeffrey Friedl. Not
only is it easy to read, but you'll get a lot of mileage out of
regexes in general. Grep, Perl one-liners, Python, and other tools use
regexes, and you'll find that they are really clever little creatures
once you befriend a few of them.

Shawn
Feb 11 '07 #6

P: n/a
Thanks. O'Reilly is the way I learned Python, and I'm suprised that I didn't
think of a book by them earlier.
Feb 11 '07 #7

P: n/a
Geoff Hill wrote:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.

In fact that's a pretty smart stance. A quote attributed variously to
Tim Peters and Jamie Zawinski says "Some people, when confronted with a
problem, think 'I know, I'll use regular expressions.' Now they have two
problems."

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note: http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007

Feb 11 '07 #8

P: n/a
On Sun, 11 Feb 2007 07:05:30 +0000, Steve Holden wrote:
Geoff Hill wrote:
>What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.

In fact that's a pretty smart stance.
That's a little harsh -- regexes have their place, together with pointer
arithmetic, bit manipulations, reverse polish notation and goto. The
problem is when people use them inappropriately e.g. using a regex when a
simple string.find will do.
A quote attributed variously to
Tim Peters and Jamie Zawinski says "Some people, when confronted with a
problem, think 'I know, I'll use regular expressions.' Now they have two
problems."
I believe that is correctly attributed to Jamie Zawinski.
--
Steven

Feb 11 '07 #9

P: n/a
On Feb 11, 9:25 pm, Steven D'Aprano
<s...@REMOVE.THIS.cybersource.com.auwrote:
On Sun, 11 Feb 2007 07:05:30 +0000, Steve Holden wrote:
Geoff Hill wrote:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.
In fact that's a pretty smart stance.

That's a little harsh -- regexes have their place, together with pointer
arithmetic, bit manipulations, reverse polish notation and goto. The
problem is when people use them inappropriately e.g. using a regex when a
simple string.find will do.
Thanks for the tip-off, Steve and Steven. Looks like I'll have to
start hiding my 12C (datecode 2214) with its "GTO" button under the
loose floor-board whenever I hear a knock at the door ;-) Looks like
Agner Fog's gone a million, and there'll be a special place in hell
for people who combine regexes with bit manipulation, like Navarro &
Raffinot. And we won't even mention Heikki Hy,*7g^54d3j+__=

Feb 11 '07 #10

P: n/a
gregarican wrote:
On Feb 10, 6:26 pm, "Geoff Hill" <thegeoffmeis...@gmail.comwrote:
>What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.

I highly recommend reading the book "Mastering Regular Expressions,"
which I believe is published by O'Reilly. It's a great reference and
helps peel the onion in terms of working through RE. They are a
language unto themselves. A fun brain exercise.
There is no real mention of python in this book, but the first edition
is probably the best programming book I've ever read (excepting, perhaps
Text Processing in Python by Mertz.) Well, come to think of it, check
the latter book out. It has a great chapter on Python Regex. And its
free to download.

James
Feb 11 '07 #11

P: n/a
That's a little harsh -- regexes have their place, together with pointer
arithmetic, bit manipulations, reverse polish notation and goto. The
problem is when people use them inappropriately e.g. using a regex when a
simple string.find will do.
A quote attributed variously to
Tim Peters and Jamie Zawinski says "Some people, when confronted with a
problem, think 'I know, I'll use regular expressions.' Now they have two
problems."

I believe that is correctly attributed to Jamie Zawinski.

--
Steven
So as a newbie, I have to ask. I've played with the re module now for
a while, I think regular expressions are super fun and useful. As far
as them being a problem I found they can be tricky and sometimes the
regex's I've devised do unexpected things...(which I can think of two
instances where that unexpected thing was something that I had hoped
to get into further down the line, yay for me!). So I guess I don't
really understand why they are a "bad idea" to use. I don't know of
any other way yet to parse specific data out of a text, html, or xml
file without resorting to regular expressions.
What other ways are there?

Feb 11 '07 #12

P: n/a

jwzSome people, when confronted with a problem, think 'I know, I'll
jwzuse regular expressions.' Now they have two problems.

dblSo as a newbie, I have to ask.... So I guess I don't really
dblunderstand why they are a "bad idea" to use.

Regular expressions are fine in their place, however, you can get carried
away. For example:

http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

Skip
Feb 11 '07 #13

P: n/a
En Sun, 11 Feb 2007 13:35:26 -0300, de**************@gmail.com
<de**************@gmail.comescribió:
>(Steven?)
That's a little harsh -- regexes have their place, together with pointer
arithmetic, bit manipulations, reverse polish notation and goto. The
problem is when people use them inappropriately e.g. using a regex when
a
simple string.find will do.

So as a newbie, I have to ask. I've played with the re module now for
a while, I think regular expressions are super fun and useful. As far
as them being a problem I found they can be tricky and sometimes the
regex's I've devised do unexpected things...(which I can think of two
instances where that unexpected thing was something that I had hoped
to get into further down the line, yay for me!). So I guess I don't
really understand why they are a "bad idea" to use. I don't know of
any other way yet to parse specific data out of a text, html, or xml
file without resorting to regular expressions.
What other ways are there?
For very simple things, it's easier/faster to use string methods like find
or split. By example, splitting "2007-02-11" into y,m,d parts:
y,m,d = date.split("-")
is a lot faster than matching "(\d+)-(\d+)-(\d+)"
On the other hand, complex tasks like parsing an HTML/XML document,
*can't* be done with a regexp alone; but people insist anyway, and then
complain when it doesn't work as expected, and ask how to "fix" the
regexp...
Good usage of regexps maybe goes in the middle.

--
Gabriel Genellina

Feb 11 '07 #14

P: n/a
On Feb 12, 3:35 am, "deviantbunnyl...@gmail.com"
<deviantbunnyl...@gmail.comwrote:
That's a little harsh -- regexes have their place, together with pointer
arithmetic, bit manipulations, reverse polish notation and goto. The
problem is when people use them inappropriately e.g. using a regex when a
simple string.find will do.
A quote attributed variously to
Tim Peters and Jamie Zawinski says "Some people, when confronted with a
problem, think 'I know, I'll use regular expressions.' Now they have two
problems."
I believe that is correctly attributed to Jamie Zawinski.
--
Steven

So as a newbie, I have to ask. I've played with the re module now for
a while, I think regular expressions are super fun and useful. As far
as them being a problem I found they can be tricky and sometimes the
regex's I've devised do unexpected things...(which I can think of two
instances where that unexpected thing was something that I had hoped
to get into further down the line, yay for me!). So I guess I don't
really understand why they are a "bad idea" to use.
Regexes are not "bad". However people tend to overuse them, whether
they are overkill (like Gabriel's date-splitting example) or underkill
-- see your next sentence :-)
I don't know of
any other way yet to parse specific data out of a text, html, or xml
file without resorting to regular expressions.
What other ways are there?
Text: Paul Maguire's pyparsing module (Google is your friend); read
David Mertz's book on text processing with Python (free download, I
believe); modules for specific data formats e.g. csv

HTML: htmllib and HTMLParser (both in the Python library),
BeautifulSoup (again GIYF)

XML: xml.* in the Python library. ElementTree (recommended) is
included in Python 2.5; use xml.etree.cElementTree.

HTH,
John

Feb 11 '07 #15

P: n/a
de**************@gmail.com wrote:
>That's a little harsh -- regexes have their place, together with pointer
arithmetic, bit manipulations, reverse polish notation and goto. The
problem is when people use them inappropriately e.g. using a regex when a
simple string.find will do.
>>A quote attributed variously to
Tim Peters and Jamie Zawinski says "Some people, when confronted with a
problem, think 'I know, I'll use regular expressions.' Now they have two
problems."
I believe that is correctly attributed to Jamie Zawinski.

--
Steven

So as a newbie, I have to ask. I've played with the re module now for
a while, I think regular expressions are super fun and useful. As far
as them being a problem I found they can be tricky and sometimes the
regex's I've devised do unexpected things...(which I can think of two
instances where that unexpected thing was something that I had hoped
to get into further down the line, yay for me!). So I guess I don't
really understand why they are a "bad idea" to use. I don't know of
any other way yet to parse specific data out of a text, html, or xml
file without resorting to regular expressions.
What other ways are there?
Re's aren't inherently bad. Just avoid using them as a hammer to the
extent that all your problems look like nails.

They wouldn't exist if there weren't problems it was appropriate to use
them on. Just try to use simpler techniques first.

For example, don't use re's to find out if a string starts with a
specific substring when you could instead use the .startswith() string
method.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note: http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007

Feb 11 '07 #16

P: n/a
HTML: htmllib and HTMLParser (both in the Python library),
BeautifulSoup (again GIYF)

XML: xml.* in the Python library. ElementTree (recommended) is
included in Python 2.5; use xml.etree.cElementTree.
The source of HTMLParser and xmllib use regular expressions for
parsing out the data. htmllib calls sgmllib at the begining of it's
code--sgmllib starts off with a bunch of regular expressions used to
parse data. So the only real difference there I see is that someone
saved me the work of writing them ;0). I haven't looked at the source
for Beautiful Soup, though I have the sneaking suspicion that most
processing of html/xml is all based on regex's.

Feb 12 '07 #17

P: n/a
On Feb 12, 9:20 pm, "deviantbunnyl...@gmail.com"
<deviantbunnyl...@gmail.comwrote:
HTML: htmllib and HTMLParser (both in the Python library),
BeautifulSoup (again GIYF)

XML: xml.* in the Python library. ElementTree (recommended) is
included in Python 2.5; use xml.etree.cElementTree.

The source of HTMLParser and xmllib use regular expressions for
parsing out the data. htmllib calls sgmllib at the begining of it's
code--sgmllib starts off with a bunch of regular expressions used to
parse data. So the only real difference there I see is that someone
saved me the work of writing them ;0). I haven't looked at the source
for Beautiful Soup, though I have the sneaking suspicion that most
processing of html/xml is all based on regex's.
That's right. Those modules use regexes. You don't. You call functions
& classes in the modules.

Someone has written those modules and tested them and documented them
and they've had a fair old thrashing by quite a few people over the
years -- it may be the only difference in your way of thinking but
it's quite a large difference from you opening up the re docs and
getting stuck in single-handedly :-)

Feb 12 '07 #18

P: n/a
On 2007-02-10, Geoff Hill <th*************@gmail.comwrote:
What's the way to go about learning Python's regular
expressions? I feel like such an idiot - being so strong in a
programming language but knowing nothing about RE.
A great way to learn regular expressions is to implement them.

--
Neil Cerutti
Feb 12 '07 #19

P: n/a

dblThe source of HTMLParser and xmllib use regular expressions for
dblparsing out the data. htmllib calls sgmllib at the begining of it's
dblcode--sgmllib starts off with a bunch of regular expressions used
dblto parse data.

I am almost certain those modules use regular expressions for lexical
analysis (splitting the input byte stream into "words"), not for parsing
(extracting the structure of the "sentences").

If I have a simple expression:

(7 + 3.14) * CONST

that's just a stream of bytes, "(", "&", " ", "+", ... Lexical analysis
chunks that stream of bytes into the "words" of the language:

LPAREN (NUMBER, 7) PLUS (NUMBER, 3.14) RPAREN TIMES (IDENT, "CONST")

Parsing then constructs a higher level representation of that stream of
"words" (more commonly called tokens or lexemes). That representation is
application-dependent.

Regular expressions are ideal for lexical analysis. They are not-so-hot for
parsing unless the grammar of the language being parsed is *extremely*
simple.

Here are a couple much better expositions on the topics:

http://en.wikipedia.org/wiki/Lexical_analysis
http://en.wikipedia.org/wiki/Parsing

Skip

Feb 12 '07 #20

P: n/a
En Mon, 12 Feb 2007 07:20:11 -0300, de**************@gmail.com
<de**************@gmail.comescribió:
The source of HTMLParser and xmllib use regular expressions for
parsing out the data. htmllib calls sgmllib at the begining of it's
code--sgmllib starts off with a bunch of regular expressions used to
parse data. So the only real difference there I see is that someone
saved me the work of writing them ;0). I haven't looked at the source
for Beautiful Soup, though I have the sneaking suspicion that most
processing of html/xml is all based on regex's.
You can build a parser for SGML/HTML/XML documents using regexps AND
python code. You can't do that with regexps only.
By example, suppose you work hard to build a correct regexp for matching
an opening <atag. You extract this from the document: "<a href='xxx'>".
Is it actually an <atag? Maybe. But the text could be inside a comment.
Or in a CDATA section. Or inside javascript code. Or...
A regexp is good for recognizing tokens, and this can be used to build a
parser. But regular expressions alone can't parse these kind of documents,
just because their grammar is not regular.
(Python re engine is stronger that "mathematical" regular expressions, in
the sense that it can handle things like backreferences (?P=...) and
lookahead (?=...) but anyway it can't handle HTML)

--
Gabriel Genellina

Feb 12 '07 #21

This discussion thread is closed

Replies have been disabled for this discussion.