473,372 Members | 1,248 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,372 software developers and data experts.

unexpected behaviour for python regexp: caret symbol almost useless?

This regexp
'<widget class=".*" id=".*">'

works well with 'grep' for matching lines of the kind
<widget class="GtkWindow" id="window1">

on a XML .glade file

However that's not true for the re module in python, since this one
takes the regexp as if were specified this way: '^<widget class=".*"
id=".*">'

For some reason regexp on python decide to match from the start of the
line, no matter if you used or not the caret symbol '^'.

I have a hard time to note why this regexp wasn't working:
regexp = re.compile(r'<widget class=".*" id="(.*)">')

The solution was to consider spaces:
regexp = re.compile(r'\s*<widget class=".*" id="(.*)">\s*')

To reproduce behaviour just take a .glade file and this python script:
<code>
import re

glade_file_name = 'some.glade'

bad_regexp = re.compile(r'<widget class=".*" id="(.*)">')
good_regexp = re.compile(r'\s*<widget class=".*" id="(.*)">\s*')

for line in open(glade_file_name):
if bad_regexp.match(line):
print 'bad:', line.strip()
if good_regexp.match(line):
print 'good:', line.strip()
</code>

The thing is i should expected to have to put caret explicitly to tell
the regexp to match at the start of the line, something like:
r'^<widget class=".*" id="(.*)">'
however python regexp is taking care of that for me. This is not a
desired behaviour for what i know about regexp, but maybe i'm missing
something.

May 28 '06 #1
4 2731
conan wrote:
The thing is i should expected to have to put caret explicitly to tell
the regexp to match at the start of the line, something like:
r'^<widget class=".*" id="(.*)">'
however python regexp is taking care of that for me. This is not a
desired behaviour for what i know about regexp, but maybe i'm missing
something.


You want search(), not match().

http://docs.python.org/lib/matching-searching.html

Peter
May 28 '06 #2
"conan" <co************@gmail.com> wrote in message
news:11**********************@38g2000cwa.googlegro ups.com...
This regexp
'<widget class=".*" id=".*">'

works well with 'grep' for matching lines of the kind
<widget class="GtkWindow" id="window1">

on a XML .glade file


As Peter Otten has already mentioned, this is the difference between the re
"match" and "search" methods.

As purely a lateral exercise, here is a pyparsing rendition of your program:

------------------------------------
from pyparsing import makeXMLTags, line

# define pyparsing patterns for begin and end XML tags
widgetStart,widgetEnd = makeXMLTags("widget")

# read the file contents
glade_file_name = 'some.glade'
gladeContents = open(glade_file_name).read()

# scan the input string for matching tags
for widget,start,end in widgetStart.scanString(gladeContents):
print "good:", line(start, gladeContents).strip()
print widget["class"], widget["id"]
print "Class: %(class)s; Id: %(id)s" % widget
------------------------------------
Not quite an exact match, only the good lines get listed. But also check
out some of the other capabilities. To do this with re's, you have to
clutter up the re expression with field names, as in:

(r'<widget class=(?P<class>".*") id="(?P<id>.*)">')

The parsing patterns generated by makeXMLTags give dict-like and
attribute-like access to any attributes included with the tag. If not for
the unfortunate attribute name "class" (which is a Python keyword), you
could also reference these values as widget.class and widget.id.

If you are parsing HTML, there is also a makeHTMLTags method, which creates
patterns that are less rigid about upper/lower case and other XML
strictnesses.

-- Paul
May 28 '06 #3
Thank you, i have read this but somehow a missed it when the issue
arose.

May 29 '06 #4
Thank you Paul.

Since the only thing i'm doing is extracting this fields, and have no
plans to include other stuff, a regexp is fine. However i will take
into account 'pyparsing' when i need to do more complex parsing.

As you can see in the example i send, i was trying to get info from a
glade file, in particular i was tired of doing this everytime i need to
access a widget:

some_var = xml.get_widget('some_id')

(doing this is tiresome when you have more than 10 widgets)

So i do a little module to have all widgets instanciated as attributes
of the object, for anyone interested it is on:

http://www.lugmen.org.ar/~p10n/sourc.../GetWidgets.py

However is still pretty unmature, since it lacks some checks.

May 29 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Helmut Zeisel | last post by:
I want to build a static extension of Python using SWIG and VC++ 6.0 as described in http://www.swig.org/Doc1.3/Python.html#n8 for gcc. My file is testerl.i: ========================= %module...
7
by: Paul Gorodyansky | last post by:
Hi, Say I have a text in my TEXTAREA box - 01234567890 I want - using script - insert say "abc" in the middle. Works almost OK in Internet Explorer (with one problem) based on their example...
2
by: Gerhard Esterhuizen | last post by:
Hi, I am observing unexpected behaviour, in the form of a corrupted class member access, from a simple C++ program that accesses an attribute declared in a virtual base class via a chain of...
3
by: rimmer | last post by:
I'm writing an extension module in C in which I'm passing an array of floats from C to python. The code below illustrates a simple C function designed to output an array of floats. ---------...
852
by: Mark Tarver | last post by:
How do you compare Python to Lisp? What specific advantages do you think that one has over the other? Note I'm not a Python person and I have no axes to grind here. This is just a question for...
23
by: gu | last post by:
hi to all! after two days debugging my code, i've come to the point that the problem was caused by an unexpected behaviour of python. or by lack of some information about the program, of course!...
4
Jezternz
by: Jezternz | last post by:
Find the Caret starting and ending position in a textarea. I have been looking for a function that will do this for a long time. Its amazing how many site claim to have a function that works, but...
162
by: Sh4wn | last post by:
Hi, first, python is one of my fav languages, and i'll definitely keep developing with it. But, there's 1 one thing what I -really- miss: data hiding. I know member vars are private when you...
4
by: Matt | last post by:
Hello all, I have just discovered (the long way) that using a RegExp object with the 'global' flag set produces inconsistent results when its test() method is executed. I realize that 'global'...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.