By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,965 Members | 1,962 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,965 IT Pros & Developers. It's quick & easy.

re.match and non-alphanumeric characters

P: n/a
Dear all,

this is really driving me nuts and any help would be extremely
appreciated.

I have a string that contains some numeric data. I want to isolate
these data using re.match, as follows.

bogus = "IFC(35m)"
data = re.match(r'(\d+)',bogus)
print data.group(1)

I would expect to have "35" printed out to screen, but instead I get
an error that the regular expression did not match:

Traceback (most recent call last):
File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
line 20, in <module>
print data.group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Note that the same holds if I look for "35" straight, instead of "\d
+". If instead I look for "IFC" it works fine. That is, apparently
re.match will match only up to the first non-alphanumeric character
and ignore anything after a "(", "_", "[" and god knows what else.

I am using Python 2.6 (r26:66721, latest stable version). Am I missing
something very big and very important?
Nov 16 '08 #1
Share this Question
Share on Google+
8 Replies


P: n/a
r
On Nov 16, 10:33*am, The Web President <mattia.land...@gmail.com>
wrote:
Dear all,

this is really driving me nuts and any help would be extremely
appreciated.

I have a string that contains some numeric data. I want to isolate
these data using re.match, as follows.

bogus = "IFC(35m)"
data = re.match(r'(\d+)',bogus)
print data.group(1)

I would expect to have "35" printed out to screen, but instead I get
an error that the regular expression did not match:

Traceback (most recent call last):
* File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
line 20, in <module>
* * print data.group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Note that the same holds if I look for "35" straight, instead of "\d
+". If instead I look for "IFC" it works fine. That is, apparently
re.match will match only up to the first non-alphanumeric character
and ignore anything after a "(", "_", "[" and god knows what else.

I am using Python 2.6 (r26:66721, latest stable version). Am I missing
something very big and very important?
try re.search or re.findall
re.match is only at the beginning of a string
i almost never use it
>>re.search('(\d+)', bogus).group()
'35'
>>re.search('(\d+)', bogus).span()
(4, 6)
Nov 16 '08 #2

P: n/a
On Nov 16, 4:33*pm, The Web President <mattia.land...@gmail.com>
wrote:
Dear all,

this is really driving me nuts and any help would be extremely
appreciated.

I have a string that contains some numeric data. I want to isolate
these data using re.match, as follows.

bogus = "IFC(35m)"
data = re.match(r'(\d+)',bogus)
print data.group(1)

I would expect to have "35" printed out to screen, but instead I get
an error that the regular expression did not match:

Traceback (most recent call last):
* File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
line 20, in <module>
* * print data.group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Note that the same holds if I look for "35" straight, instead of "\d
+". If instead I look for "IFC" it works fine. That is, apparently
re.match will match only up to the first non-alphanumeric character
and ignore anything after a "(", "_", "[" and god knows what else.

I am using Python 2.6 (r26:66721, latest stable version). Am I missing
something very big and very important?
re.match() anchors the match at the start of the string. What you need
is re.search(). It's all in the documentation! :-)
Nov 16 '08 #3

P: n/a
En Sun, 16 Nov 2008 14:33:42 -0200, The Web President
<ma************@gmail.comescribió:
I have a string that contains some numeric data. I want to isolate
these data using re.match, as follows.

bogus = "IFC(35m)"
data = re.match(r'(\d+)',bogus)
print data.group(1)

I would expect to have "35" printed out to screen, but instead I get
an error that the regular expression did not match:
http://docs.python.org/library/re.ht...g-vs-searching

--
Gabriel Genellina

Nov 16 '08 #4

P: n/a
The Web President wrote:
Dear all,

this is really driving me nuts and any help would be extremely
appreciated.

I have a string that contains some numeric data. I want to isolate
these data using re.match, as follows.

bogus = "IFC(35m)"
data = re.match(r'(\d+)',bogus)
print data.group(1)

I would expect to have "35" printed out to screen, but instead I get
an error that the regular expression did not match:

Traceback (most recent call last):
File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
line 20, in <module>
print data.group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Note that the same holds if I look for "35" straight, instead of "\d
+". If instead I look for "IFC" it works fine. That is, apparently
re.match will match only up to the first non-alphanumeric character
and ignore anything after a "(", "_", "[" and god knows what else.

I am using Python 2.6 (r26:66721, latest stable version). Am I missing
something very big and very important?
Yep - re.search. Match matches the whole string. You want searching.
Diez
Nov 16 '08 #5

P: n/a
On Nov 17, 4:44*am, "Diez B. Roggisch" <de...@nospam.web.dewrote:
Match matches the whole string.
*ONLY* if the pattern ends with "$" or r"\Z"
Nov 16 '08 #6

P: n/a
John Machin schrieb:
On Nov 17, 4:44 am, "Diez B. Roggisch" <de...@nospam.web.dewrote:
> Match matches the whole string.

*ONLY* if the pattern ends with "$" or r"\Z"

You think so?

import re

rex = re.compile("abc.*def")

if rex.match("abc0123455678def"):
print "matched"

Diez
Nov 16 '08 #7

P: n/a
Diez B. Roggisch wrote:
John Machin schrieb:
>On Nov 17, 4:44 am, "Diez B. Roggisch" <de...@nospam.web.dewrote:
>> Match matches the whole string.

*ONLY* if the pattern ends with "$" or r"\Z"


You think so?

import re

rex = re.compile("abc.*def")

if rex.match("abc0123455678def"):
print "matched"
Your test is inconclusive: necessary, but not sufficient.
>>rex = re.compile("abc.*def")

if rex.match("abc0123455678defPLUSEXTRASTUFF"):
.... print "Matched"
....
Matched
>>>
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Nov 16 '08 #8

P: n/a
On Nov 17, 10:19*am, "Diez B. Roggisch" <de...@nospam.web.dewrote:
John Machin schrieb:
On Nov 17, 4:44 am, "Diez B. Roggisch" <de...@nospam.web.dewrote:
*Match matches the whole string.
*ONLY* if the pattern ends with "$" or r"\Z"

You think so?

import re

rex = re.compile("abc.*def")

if rex.match("abc0123455678def"):
* * *print "matched"
OK, I'll try again:

The following 3-tuples represent (pattern, string,
matched_portion_of_string):
('abc', 'abc', 'abc')
('abc', 'abcdef', 'abc')
('abc$', 'abc', 'abc')
('abc$', 'abcdef', '<no match>')

Saying "Match matches the whole string" is incorrect; see the second
case. If you want to ensure that the whole string matches the pattern,
the pattern needs to be terminated by "$" or "\Z".
Nov 17 '08 #9

This discussion thread is closed

Replies have been disabled for this discussion.