By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,100 Members | 2,979 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,100 IT Pros & Developers. It's quick & easy.

matching exactly a 4 digit number in python

P: n/a
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .

Here is my test program output and the test given below
Thanks for your help
Harijay

PyMate r8111 running Python 2.5.1 (/usr/bin/python)
>>testdigit.py
Matched I have 2004 rupees
Matched I have 3324234 and more
Matched As 3233
Matched 2323423414 is good
Matched 4444 dc sav 2412441 asdf
SKIPPED random1341also and also
SKIPPED
SKIPPED 13
Matched a 1331 saves
SKIPPED and and as dad
SKIPPED A has 13123123
SKIPPED A 13123
SKIPPED 123 adn
Matched 1312 times I have told you
DONE

#!/usr/bin/python
import re
x = [" I have 2004 rupees "," I have 3324234 and more" , " As 3233 " ,
"2323423414 is good","4444 dc sav 2412441 asdf " , "random1341also and
also" ,"","13"," a 1331 saves" ," and and as dad"," A has 13123123","
A 13123","123 adn","1312 times I have told you"]

p = re.compile(r'\d{4} ')

for elem in x:
if re.search(p,elem):
print "Matched " + elem
else:
print "SKIPPED " + elem

print "DONE"
Nov 21 '08 #1
Share this Question
Share on Google+
7 Replies


P: n/a
2008/11/21 harijay <ha*****@gmail.com>:
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"
Try with this:

p = re.compile(r'\d{4}$')

The $ character matches the end of the string. It should work.
Nov 21 '08 #2

P: n/a

"harijay" <ha*****@gmail.comwrote in message
news:74**********************************@j38g2000 yqa.googlegroups.com...
I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string
Try:
p = re.compile(r'\b\d{4}\b')

-Mark

Nov 21 '08 #3

P: n/a
On Nov 22, 8:46*am, harijay <hari...@gmail.comwrote:
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"
No it doesn't. When used with re.search on that string it matches
3324, it doesn't "match" the whole sentence.
>
I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .
{4} does NOT mean the same as {4,}.
{4} is the same as {4,4}
{4,} means {4,INFINITY}

Ignoring {4,}:

You need to specify a regex that says "4 digits followed by (non-digit
or end-of-string)". Have a try at that and come back here if you have
any more problems.

some test data:
xxx1234
xxx12345
xxx1234xxx
xxx12345xxx
xxx1234xxx1235xxx
xxx12345xxx1234xxx

Nov 21 '08 #4

P: n/a
On Nov 21, 4:46*pm, harijay <hari...@gmail.comwrote:
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .
No, why should it ? What you're saying is "give me 4 consecutive
digits", without specifying what should precede or follow these
digits. A correct expression is a bit more hairy:

p = re.compile(r'''
(?:\D|\b) # find a non-digit or word boundary..
(\d{4}) # .. followed by the 4 digits to be matched as group
#1..
(?:\D|\b) # .. which are followed by non-digit or word boundary
''', re.VERBOSE)
HTH,
George
Nov 21 '08 #5

P: n/a
George Sakkis wrote:
On Nov 21, 4:46 pm, harijay <hari...@gmail.comwrote:
>Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .

No, why should it ? What you're saying is "give me 4 consecutive
digits", without specifying what should precede or follow these
digits. A correct expression is a bit more hairy:

p = re.compile(r'''
(?:\D|\b) # find a non-digit or word boundary..
(\d{4}) # .. followed by the 4 digits to be matched as group
#1..
(?:\D|\b) # .. which are followed by non-digit or word boundary
''', re.VERBOSE)
You want to match a sequence of 4 digits: \d{4}
not preceded by a digit: (?<!\d)
not followed by a digit: (?!\d)

which is: re.compile(r'(?<!\d)\d{4}(?!\d)')
Nov 21 '08 #6

P: n/a
>I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.
>I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string
>However the regexp
p = re.compile(r'\d{4}')
>Matches even sentences that have longer than 4 numbers inside strings
..for example it matches "I have 3324234 and more"
Try this instead:
>>pat = re.compile(r"(?<!\d)(\d{4})(?!\d)")>>for s in x:
... m = pat.search(s)
... print repr(s),
... print (m is not None) and "matches" or "does not match"
...
' I have 2004 rupees ' matches
' I have 3324234 and more' does not match
' As 3233 ' matches
'2323423414 is good' does not match
'4444 dc sav 2412441 asdf ' matches
'random1341also and also' matches
'' does not match
'13' does not match
' a 1331 saves' matches
' and and as dad' does not match
' A has 13123123' does not match
'A 13123' does not match
'123 adn' does not match
'1312 times I have told you' matches

--
Skip Montanaro - sk**@pobox.com - http://smontanaro.dyndns.org/
Nov 21 '08 #7

P: n/a
Thanks John Machin and Mark Tolonen ..
SO I guess the correct one is to use the word boundary meta character
"\b"

so r'\b\d{4}\b' is what I need since it reads

a 4 digit number in between word boundaries

Thanks a tonne, and this being my second post to comp.lang.python. I
am always amazed at how helpful everyone on this group is

Hari

On Nov 21, 5:12*pm, John Machin <sjmac...@lexicon.netwrote:
On Nov 22, 8:46*am, harijay <hari...@gmail.comwrote:
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.
I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string
However the regexp
p = re.compile(r'\d{4}')
Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

No it doesn't. When used with re.search on that string it matches
3324, it doesn't "match" the whole sentence.
I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .

{4} does NOT mean the same as {4,}.
{4} is the same as {4,4}
{4,} means {4,INFINITY}

Ignoring {4,}:

You need to specify a regex that says "4 digits followed by (non-digit
or end-of-string)". Have a try at that and come back here if you have
any more problems.

some test data:
xxx1234
xxx12345
xxx1234xxx
xxx12345xxx
xxx1234xxx1235xxx
xxx12345xxx1234xxx
Nov 21 '08 #8

This discussion thread is closed

Replies have been disabled for this discussion.