473,395 Members | 2,436 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Regular Expressions

What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.
Feb 10 '07 #1
20 3356
"Geoff Hill" <th*************@gmail.comwrites:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.
Read the documentation?
Feb 10 '07 #2
On Feb 11, 10:26 am, "Geoff Hill" <thegeoffmeis...@gmail.comwrote:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.
I suggest that you work through the re HOWTO
http://www.amk.ca/python/howto/regex/
and by work through, I don't mean "read". I mean as each new concept
is introduced:
1. try the given example(s) yourself at the interactive prompt
2. try variations on the examples
3. read the relevant part of the Library Reference Manual

Also I'd suggest reading threads in this newsgroup where people are
asking for help with re.

HTH,
John

Feb 11 '07 #3
"John Machin" <sj******@lexicon.netwrites:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.

I suggest that you work through the re HOWTO
http://www.amk.ca/python/howto/regex/
Also remember Zawinski's law:
http://fishbowl.pastiche.org/2003/08...ar_expressions
Feb 11 '07 #4
On Feb 10, 6:26 pm, "Geoff Hill" <thegeoffmeis...@gmail.comwrote:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.
I highly recommend reading the book "Mastering Regular Expressions,"
which I believe is published by O'Reilly. It's a great reference and
helps peel the onion in terms of working through RE. They are a
language unto themselves. A fun brain exercise.

Feb 11 '07 #5
On 10 Feb 2007 18:58:51 -0800, gregarican <gr*********@gmail.comwrote:
On Feb 10, 6:26 pm, "Geoff Hill" <thegeoffmeis...@gmail.comwrote:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.

I highly recommend reading the book "Mastering Regular Expressions,"
which I believe is published by O'Reilly. It's a great reference and
helps peel the onion in terms of working through RE. They are a
language unto themselves. A fun brain exercise.

--
http://mail.python.org/mailman/listinfo/python-list
Absolutely: Get "Mastering Regular Expressions" by Jeffrey Friedl. Not
only is it easy to read, but you'll get a lot of mileage out of
regexes in general. Grep, Perl one-liners, Python, and other tools use
regexes, and you'll find that they are really clever little creatures
once you befriend a few of them.

Shawn
Feb 11 '07 #6
Thanks. O'Reilly is the way I learned Python, and I'm suprised that I didn't
think of a book by them earlier.
Feb 11 '07 #7
Geoff Hill wrote:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.

In fact that's a pretty smart stance. A quote attributed variously to
Tim Peters and Jamie Zawinski says "Some people, when confronted with a
problem, think 'I know, I'll use regular expressions.' Now they have two
problems."

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note: http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007

Feb 11 '07 #8
On Sun, 11 Feb 2007 07:05:30 +0000, Steve Holden wrote:
Geoff Hill wrote:
>What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.

In fact that's a pretty smart stance.
That's a little harsh -- regexes have their place, together with pointer
arithmetic, bit manipulations, reverse polish notation and goto. The
problem is when people use them inappropriately e.g. using a regex when a
simple string.find will do.
A quote attributed variously to
Tim Peters and Jamie Zawinski says "Some people, when confronted with a
problem, think 'I know, I'll use regular expressions.' Now they have two
problems."
I believe that is correctly attributed to Jamie Zawinski.
--
Steven

Feb 11 '07 #9
On Feb 11, 9:25 pm, Steven D'Aprano
<s...@REMOVE.THIS.cybersource.com.auwrote:
On Sun, 11 Feb 2007 07:05:30 +0000, Steve Holden wrote:
Geoff Hill wrote:
What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.
In fact that's a pretty smart stance.

That's a little harsh -- regexes have their place, together with pointer
arithmetic, bit manipulations, reverse polish notation and goto. The
problem is when people use them inappropriately e.g. using a regex when a
simple string.find will do.
Thanks for the tip-off, Steve and Steven. Looks like I'll have to
start hiding my 12C (datecode 2214) with its "GTO" button under the
loose floor-board whenever I hear a knock at the door ;-) Looks like
Agner Fog's gone a million, and there'll be a special place in hell
for people who combine regexes with bit manipulation, like Navarro &
Raffinot. And we won't even mention Heikki Hy,*7g^54d3j+__=

Feb 11 '07 #10
gregarican wrote:
On Feb 10, 6:26 pm, "Geoff Hill" <thegeoffmeis...@gmail.comwrote:
>What's the way to go about learning Python's regular expressions? I feel
like such an idiot - being so strong in a programming language but knowing
nothing about RE.

I highly recommend reading the book "Mastering Regular Expressions,"
which I believe is published by O'Reilly. It's a great reference and
helps peel the onion in terms of working through RE. They are a
language unto themselves. A fun brain exercise.
There is no real mention of python in this book, but the first edition
is probably the best programming book I've ever read (excepting, perhaps
Text Processing in Python by Mertz.) Well, come to think of it, check
the latter book out. It has a great chapter on Python Regex. And its
free to download.

James
Feb 11 '07 #11
That's a little harsh -- regexes have their place, together with pointer
arithmetic, bit manipulations, reverse polish notation and goto. The
problem is when people use them inappropriately e.g. using a regex when a
simple string.find will do.
A quote attributed variously to
Tim Peters and Jamie Zawinski says "Some people, when confronted with a
problem, think 'I know, I'll use regular expressions.' Now they have two
problems."

I believe that is correctly attributed to Jamie Zawinski.

--
Steven
So as a newbie, I have to ask. I've played with the re module now for
a while, I think regular expressions are super fun and useful. As far
as them being a problem I found they can be tricky and sometimes the
regex's I've devised do unexpected things...(which I can think of two
instances where that unexpected thing was something that I had hoped
to get into further down the line, yay for me!). So I guess I don't
really understand why they are a "bad idea" to use. I don't know of
any other way yet to parse specific data out of a text, html, or xml
file without resorting to regular expressions.
What other ways are there?

Feb 11 '07 #12

jwzSome people, when confronted with a problem, think 'I know, I'll
jwzuse regular expressions.' Now they have two problems.

dblSo as a newbie, I have to ask.... So I guess I don't really
dblunderstand why they are a "bad idea" to use.

Regular expressions are fine in their place, however, you can get carried
away. For example:

http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

Skip
Feb 11 '07 #13
En Sun, 11 Feb 2007 13:35:26 -0300, de**************@gmail.com
<de**************@gmail.comescribió:
>(Steven?)
That's a little harsh -- regexes have their place, together with pointer
arithmetic, bit manipulations, reverse polish notation and goto. The
problem is when people use them inappropriately e.g. using a regex when
a
simple string.find will do.

So as a newbie, I have to ask. I've played with the re module now for
a while, I think regular expressions are super fun and useful. As far
as them being a problem I found they can be tricky and sometimes the
regex's I've devised do unexpected things...(which I can think of two
instances where that unexpected thing was something that I had hoped
to get into further down the line, yay for me!). So I guess I don't
really understand why they are a "bad idea" to use. I don't know of
any other way yet to parse specific data out of a text, html, or xml
file without resorting to regular expressions.
What other ways are there?
For very simple things, it's easier/faster to use string methods like find
or split. By example, splitting "2007-02-11" into y,m,d parts:
y,m,d = date.split("-")
is a lot faster than matching "(\d+)-(\d+)-(\d+)"
On the other hand, complex tasks like parsing an HTML/XML document,
*can't* be done with a regexp alone; but people insist anyway, and then
complain when it doesn't work as expected, and ask how to "fix" the
regexp...
Good usage of regexps maybe goes in the middle.

--
Gabriel Genellina

Feb 11 '07 #14
On Feb 12, 3:35 am, "deviantbunnyl...@gmail.com"
<deviantbunnyl...@gmail.comwrote:
That's a little harsh -- regexes have their place, together with pointer
arithmetic, bit manipulations, reverse polish notation and goto. The
problem is when people use them inappropriately e.g. using a regex when a
simple string.find will do.
A quote attributed variously to
Tim Peters and Jamie Zawinski says "Some people, when confronted with a
problem, think 'I know, I'll use regular expressions.' Now they have two
problems."
I believe that is correctly attributed to Jamie Zawinski.
--
Steven

So as a newbie, I have to ask. I've played with the re module now for
a while, I think regular expressions are super fun and useful. As far
as them being a problem I found they can be tricky and sometimes the
regex's I've devised do unexpected things...(which I can think of two
instances where that unexpected thing was something that I had hoped
to get into further down the line, yay for me!). So I guess I don't
really understand why they are a "bad idea" to use.
Regexes are not "bad". However people tend to overuse them, whether
they are overkill (like Gabriel's date-splitting example) or underkill
-- see your next sentence :-)
I don't know of
any other way yet to parse specific data out of a text, html, or xml
file without resorting to regular expressions.
What other ways are there?
Text: Paul Maguire's pyparsing module (Google is your friend); read
David Mertz's book on text processing with Python (free download, I
believe); modules for specific data formats e.g. csv

HTML: htmllib and HTMLParser (both in the Python library),
BeautifulSoup (again GIYF)

XML: xml.* in the Python library. ElementTree (recommended) is
included in Python 2.5; use xml.etree.cElementTree.

HTH,
John

Feb 11 '07 #15
de**************@gmail.com wrote:
>That's a little harsh -- regexes have their place, together with pointer
arithmetic, bit manipulations, reverse polish notation and goto. The
problem is when people use them inappropriately e.g. using a regex when a
simple string.find will do.
>>A quote attributed variously to
Tim Peters and Jamie Zawinski says "Some people, when confronted with a
problem, think 'I know, I'll use regular expressions.' Now they have two
problems."
I believe that is correctly attributed to Jamie Zawinski.

--
Steven

So as a newbie, I have to ask. I've played with the re module now for
a while, I think regular expressions are super fun and useful. As far
as them being a problem I found they can be tricky and sometimes the
regex's I've devised do unexpected things...(which I can think of two
instances where that unexpected thing was something that I had hoped
to get into further down the line, yay for me!). So I guess I don't
really understand why they are a "bad idea" to use. I don't know of
any other way yet to parse specific data out of a text, html, or xml
file without resorting to regular expressions.
What other ways are there?
Re's aren't inherently bad. Just avoid using them as a hammer to the
extent that all your problems look like nails.

They wouldn't exist if there weren't problems it was appropriate to use
them on. Just try to use simpler techniques first.

For example, don't use re's to find out if a string starts with a
specific substring when you could instead use the .startswith() string
method.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note: http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007

Feb 11 '07 #16
HTML: htmllib and HTMLParser (both in the Python library),
BeautifulSoup (again GIYF)

XML: xml.* in the Python library. ElementTree (recommended) is
included in Python 2.5; use xml.etree.cElementTree.
The source of HTMLParser and xmllib use regular expressions for
parsing out the data. htmllib calls sgmllib at the begining of it's
code--sgmllib starts off with a bunch of regular expressions used to
parse data. So the only real difference there I see is that someone
saved me the work of writing them ;0). I haven't looked at the source
for Beautiful Soup, though I have the sneaking suspicion that most
processing of html/xml is all based on regex's.

Feb 12 '07 #17
On Feb 12, 9:20 pm, "deviantbunnyl...@gmail.com"
<deviantbunnyl...@gmail.comwrote:
HTML: htmllib and HTMLParser (both in the Python library),
BeautifulSoup (again GIYF)

XML: xml.* in the Python library. ElementTree (recommended) is
included in Python 2.5; use xml.etree.cElementTree.

The source of HTMLParser and xmllib use regular expressions for
parsing out the data. htmllib calls sgmllib at the begining of it's
code--sgmllib starts off with a bunch of regular expressions used to
parse data. So the only real difference there I see is that someone
saved me the work of writing them ;0). I haven't looked at the source
for Beautiful Soup, though I have the sneaking suspicion that most
processing of html/xml is all based on regex's.
That's right. Those modules use regexes. You don't. You call functions
& classes in the modules.

Someone has written those modules and tested them and documented them
and they've had a fair old thrashing by quite a few people over the
years -- it may be the only difference in your way of thinking but
it's quite a large difference from you opening up the re docs and
getting stuck in single-handedly :-)

Feb 12 '07 #18
On 2007-02-10, Geoff Hill <th*************@gmail.comwrote:
What's the way to go about learning Python's regular
expressions? I feel like such an idiot - being so strong in a
programming language but knowing nothing about RE.
A great way to learn regular expressions is to implement them.

--
Neil Cerutti
Feb 12 '07 #19

dblThe source of HTMLParser and xmllib use regular expressions for
dblparsing out the data. htmllib calls sgmllib at the begining of it's
dblcode--sgmllib starts off with a bunch of regular expressions used
dblto parse data.

I am almost certain those modules use regular expressions for lexical
analysis (splitting the input byte stream into "words"), not for parsing
(extracting the structure of the "sentences").

If I have a simple expression:

(7 + 3.14) * CONST

that's just a stream of bytes, "(", "&", " ", "+", ... Lexical analysis
chunks that stream of bytes into the "words" of the language:

LPAREN (NUMBER, 7) PLUS (NUMBER, 3.14) RPAREN TIMES (IDENT, "CONST")

Parsing then constructs a higher level representation of that stream of
"words" (more commonly called tokens or lexemes). That representation is
application-dependent.

Regular expressions are ideal for lexical analysis. They are not-so-hot for
parsing unless the grammar of the language being parsed is *extremely*
simple.

Here are a couple much better expositions on the topics:

http://en.wikipedia.org/wiki/Lexical_analysis
http://en.wikipedia.org/wiki/Parsing

Skip

Feb 12 '07 #20
En Mon, 12 Feb 2007 07:20:11 -0300, de**************@gmail.com
<de**************@gmail.comescribió:
The source of HTMLParser and xmllib use regular expressions for
parsing out the data. htmllib calls sgmllib at the begining of it's
code--sgmllib starts off with a bunch of regular expressions used to
parse data. So the only real difference there I see is that someone
saved me the work of writing them ;0). I haven't looked at the source
for Beautiful Soup, though I have the sneaking suspicion that most
processing of html/xml is all based on regex's.
You can build a parser for SGML/HTML/XML documents using regexps AND
python code. You can't do that with regexps only.
By example, suppose you work hard to build a correct regexp for matching
an opening <atag. You extract this from the document: "<a href='xxx'>".
Is it actually an <atag? Maybe. But the text could be inside a comment.
Or in a CDATA section. Or inside javascript code. Or...
A regexp is good for recognizing tokens, and this can be used to build a
parser. But regular expressions alone can't parse these kind of documents,
just because their grammar is not regular.
(Python re engine is stronger that "mathematical" regular expressions, in
the sense that it can handle things like backreferences (?P=...) and
lookahead (?=...) but anyway it can't handle HTML)

--
Gabriel Genellina

Feb 12 '07 #21

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Michael McGarry | last post by:
Hi, I am horrible with Regular Expressions, can anyone recommend a book on it? Also I am trying to parse the following string to extract the number after load average. ".... load average:...
1
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
2
by: Sehboo | last post by:
Hi, I have several regular expressions that I need to run against documents. Is it possible to combine several expressions in one expression in Regex object. So that it is faster, or will I...
4
by: Együd Csaba | last post by:
Hi All, I'd like to "compress" the following two filter expressions into one - assuming that it makes sense regarding query execution performance. .... where (adate LIKE "2004.01.10 __:30" or...
7
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...
3
by: a | last post by:
I'm a newbie needing to use some Regular Expressions in PHP. Can I safely use the results of my tests using 'The Regex Coach' (http://www.weitz.de/regex-coach/index.html) Are the Regular...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
1
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find...
13
by: Wiseman | last post by:
I'm kind of disappointed with the re regular expressions module. In particular, the lack of support for recursion ( (?R) or (?n) ) is a major drawback to me. There are so many great things that can...
12
by: FAQEditor | last post by:
Anybody have any URL's to tutorials and/or references for Regular Expressions? The four I have so far are: http://docs.sun.com/source/816-6408-10/regexp.htm...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.