473,406 Members | 2,217 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

matching exactly a 4 digit number in python

Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .

Here is my test program output and the test given below
Thanks for your help
Harijay

PyMate r8111 running Python 2.5.1 (/usr/bin/python)
>>testdigit.py
Matched I have 2004 rupees
Matched I have 3324234 and more
Matched As 3233
Matched 2323423414 is good
Matched 4444 dc sav 2412441 asdf
SKIPPED random1341also and also
SKIPPED
SKIPPED 13
Matched a 1331 saves
SKIPPED and and as dad
SKIPPED A has 13123123
SKIPPED A 13123
SKIPPED 123 adn
Matched 1312 times I have told you
DONE

#!/usr/bin/python
import re
x = [" I have 2004 rupees "," I have 3324234 and more" , " As 3233 " ,
"2323423414 is good","4444 dc sav 2412441 asdf " , "random1341also and
also" ,"","13"," a 1331 saves" ," and and as dad"," A has 13123123","
A 13123","123 adn","1312 times I have told you"]

p = re.compile(r'\d{4} ')

for elem in x:
if re.search(p,elem):
print "Matched " + elem
else:
print "SKIPPED " + elem

print "DONE"
Nov 21 '08 #1
7 21338
2008/11/21 harijay <ha*****@gmail.com>:
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"
Try with this:

p = re.compile(r'\d{4}$')

The $ character matches the end of the string. It should work.
Nov 21 '08 #2

"harijay" <ha*****@gmail.comwrote in message
news:74**********************************@j38g2000 yqa.googlegroups.com...
I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string
Try:
p = re.compile(r'\b\d{4}\b')

-Mark

Nov 21 '08 #3
On Nov 22, 8:46*am, harijay <hari...@gmail.comwrote:
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"
No it doesn't. When used with re.search on that string it matches
3324, it doesn't "match" the whole sentence.
>
I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .
{4} does NOT mean the same as {4,}.
{4} is the same as {4,4}
{4,} means {4,INFINITY}

Ignoring {4,}:

You need to specify a regex that says "4 digits followed by (non-digit
or end-of-string)". Have a try at that and come back here if you have
any more problems.

some test data:
xxx1234
xxx12345
xxx1234xxx
xxx12345xxx
xxx1234xxx1235xxx
xxx12345xxx1234xxx

Nov 21 '08 #4
On Nov 21, 4:46*pm, harijay <hari...@gmail.comwrote:
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .
No, why should it ? What you're saying is "give me 4 consecutive
digits", without specifying what should precede or follow these
digits. A correct expression is a bit more hairy:

p = re.compile(r'''
(?:\D|\b) # find a non-digit or word boundary..
(\d{4}) # .. followed by the 4 digits to be matched as group
#1..
(?:\D|\b) # .. which are followed by non-digit or word boundary
''', re.VERBOSE)
HTH,
George
Nov 21 '08 #5
George Sakkis wrote:
On Nov 21, 4:46 pm, harijay <hari...@gmail.comwrote:
>Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .

No, why should it ? What you're saying is "give me 4 consecutive
digits", without specifying what should precede or follow these
digits. A correct expression is a bit more hairy:

p = re.compile(r'''
(?:\D|\b) # find a non-digit or word boundary..
(\d{4}) # .. followed by the 4 digits to be matched as group
#1..
(?:\D|\b) # .. which are followed by non-digit or word boundary
''', re.VERBOSE)
You want to match a sequence of 4 digits: \d{4}
not preceded by a digit: (?<!\d)
not followed by a digit: (?!\d)

which is: re.compile(r'(?<!\d)\d{4}(?!\d)')
Nov 21 '08 #6
>I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.
>I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string
>However the regexp
p = re.compile(r'\d{4}')
>Matches even sentences that have longer than 4 numbers inside strings
..for example it matches "I have 3324234 and more"
Try this instead:
>>pat = re.compile(r"(?<!\d)(\d{4})(?!\d)")>>for s in x:
... m = pat.search(s)
... print repr(s),
... print (m is not None) and "matches" or "does not match"
...
' I have 2004 rupees ' matches
' I have 3324234 and more' does not match
' As 3233 ' matches
'2323423414 is good' does not match
'4444 dc sav 2412441 asdf ' matches
'random1341also and also' matches
'' does not match
'13' does not match
' a 1331 saves' matches
' and and as dad' does not match
' A has 13123123' does not match
'A 13123' does not match
'123 adn' does not match
'1312 times I have told you' matches

--
Skip Montanaro - sk**@pobox.com - http://smontanaro.dyndns.org/
Nov 21 '08 #7
Thanks John Machin and Mark Tolonen ..
SO I guess the correct one is to use the word boundary meta character
"\b"

so r'\b\d{4}\b' is what I need since it reads

a 4 digit number in between word boundaries

Thanks a tonne, and this being my second post to comp.lang.python. I
am always amazed at how helpful everyone on this group is

Hari

On Nov 21, 5:12*pm, John Machin <sjmac...@lexicon.netwrote:
On Nov 22, 8:46*am, harijay <hari...@gmail.comwrote:
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.
I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string
However the regexp
p = re.compile(r'\d{4}')
Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

No it doesn't. When used with re.search on that string it matches
3324, it doesn't "match" the whole sentence.
I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .

{4} does NOT mean the same as {4,}.
{4} is the same as {4,4}
{4,} means {4,INFINITY}

Ignoring {4,}:

You need to specify a regex that says "4 digits followed by (non-digit
or end-of-string)". Have a try at that and come back here if you have
any more problems.

some test data:
xxx1234
xxx12345
xxx1234xxx
xxx12345xxx
xxx1234xxx1235xxx
xxx12345xxx1234xxx
Nov 21 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

33
by: Prasad | last post by:
Hi, Can anyone please tell me how to store a 13 digit number in C language ? Also what is the format specifier to be used to access this variable ? Thanks in advance, Prasad P
2
by: dhutton | last post by:
How would one go about grabbing just the last 5 digits of a 16 digit number - then prefixing the last 5 numbers with 99800 so if the number is 8351101100000029 I need my SQL (Microsoft) to grab...
1
by: aidy | last post by:
Hi, I have x amount of rows in a db2 table. I want to update an account number column with a random eight digit number: This is where I have got UPDATE zzz2 SET AC = RND(/not sure what...
4
by: alimsdb | last post by:
Exactly match a 4 digit number My program but output is 2569 but I want it to output four digit no 2004. use strict; my $string = "I have 256987, 2004 and I a 587458"; if ($string =~...
11
by: Jordan218 | last post by:
Hi. How do I write a program that sums the digits of a 4-digit number? I have a program that works, but you have to input the digits separately. Can anyone help or possibly explain what I'm doing...
7
by: abhinuke | last post by:
Been brushing up my C,C++ for my new venture in Graduate Studies for this fall.I am doing basic programs in which I am trying this one right now. A 5-digit positive integer is entered through the...
1
by: bhavanik | last post by:
haii.. How can i find the output for this query 1. List the emps whose sal is 4 digit number ending with zero --> How to count the digits in a salary number thanks bhavani
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.