Phone number regular expression...

joemono

Hello everyone!

First, I appologize if this posting isn't proper "netiquette" for this
group.

I've been working with perl for almost 2 years now. However, my regular
expression knowledge is pretty limited. I wrote the following expression to
take (hopefully) any _reasonable_ phone number input, and format it as
(999) 999-9999 x 9999.

Here's what I've come up with. I would like your comments, if you've got the
time. I'm really interested in regular expressions, and I want to know if
what I'm doing is inefficient, slow, etc...

# area code
${0,1}\s*(\d{3}){0,1}\s*${0,1}
# optional parentheses, 3 digits, optional parentheses
(?=[-| ]*(\d{3}){1}[-| ]*(\d{4}){1}) #
match only if the first match is followed by

# what looks like a phone number

# this is the same match as the standard 7 digit phone number below
# main phone number
[-| ]*
(\d{3}){1} # first 3 digits
[-| ]*
(\d{4}){0,1} # second 4 digits

# extension
[-| |x|X]*
(\d{3,4}){0,1} # extension

For example, here's a question I have. Is there a way to use the look-ahead
match in the area code section _again_ for matching the main number, since
they are the same? I also know that I could use ? instead of {0,1}
(correct?), but I always get confused between that and non-greedy
quantifier. Does that make sense?

I wrote a script to test it (it generates many different possible phone
number inputs, and then applies the regular expression), and it _seems_ to
work. But like I said, I kinda don't know what I'm doing. I've been using
http://www.perldoc.com/perl5.6/pod/perlre.html heavily. It's pretty useful.

Here's another question, do people ever have extensions less than 3, or
greater than 4 numbers?

Thanks for your help!

Joe

Jul 19 '05 #1

Subscribe Post Reply

19194

Purl Gurl

joemono wrote:

(snipped)

I wrote the following expression to take (hopefully) any _reasonable_
phone number input, and format it as
(999) 999-9999 x 9999.
Parameter is "reasonable" American style phone numbers.

what I'm doing is inefficient, slow, etc...
(snipped a lot of regex matching)

Yes, very slow, very inefficient. Do not invoke a
regex engine unless you have no choice, or a regex
actually "proves" to be the most efficient method
found within a collection of tested methods.

Is there a way to use the look-ahead match
Never use look-ahead unless you have no choice.
Using any style of look-ahead will almost always
be slow and inefficient compared to other methods.

Note my "almost always" does not mean "always" as some
might ignorantly claim. In some cases, a look-ahead
could be your only choice, or most efficient choice.

do people ever have extensions less than 3, or greater than 4 numbers?

Extensions cannot be predicted. Length of an extension is
directly controlled by an internal PBX system. An extension
length can literally be any length.

What is the length of those extensions you hear during a
recorded menu selection? Is there more than one extension?
These type of numbers, could be a problem.

1-800-tru-idiots
if you are stupid, press 1 now
*next menu*
if you are stupid and gullible, press 2 now
*next menu*
if you are stupid, gullible and tired of this, press 3 now
*next menu*
Thank you for calling America Onlame! You are an idiot! Goodbye!
*dial tone*

I count three extensions each with a length of one.

Your methodology allows parentheses, hyphens and such, then
tries to match for all possible combinations. This is quite
inefficient and prone to error.

Remove all characters except numbers, then work with your data.
You are interested in phone numbers, are you not? So work with
numbers, nothing else.

Keep in mind, regardless of what methodology you employ, there
is a good chance there will be false positives and false negatives.
Parsing phone numbers is similar to parsing email addresses; it
is difficult and unpredictable.

Look over my method below. This method eliminates all characters
except numbers, then generates a very uniform output appropriate
for a data file. Output is also easy on the human eye.
Ever wonder why people use "spelled" phone numbers, like

1-800-bite-me

When someone tries to give me a spelled number, I say,

"Don't bother. I will not call you."
Purl Gurl
--
Rock Midis! Science Fiction! Amazing Androids!
http://www.purlgurl.net/~callgirl

My $test_it is used to exemplify a non-destructive
method, needed for a print of invalid numbers. You
could easily use $_ throughout as well, but this
defeats "full" printing of an invalid phone number.

#!perl

while (<DATA>)
{
my $test_it = $_;
$test_it =~ s/[^\d+]//g;

if ($test_it =~ tr/0-9// == 7)
{
substr ($test_it, 3, 0, " ");
print "$test_it\n";
}
elsif ($test_it =~ tr/0-9// == 10)
{
substr ($test_it, 3, 0, " ");
substr ($test_it, 7, 0, " ");
print "$test_it\n";
}
elsif ($test_it =~ tr/0-9// > 10)
{
substr ($test_it, 3, 0, " ");
substr ($test_it, 7, 0, " ");
substr ($test_it, 12, 0, " ");
print "$test_it\n";
}
else
{ print "Phone Number Appears Invalid: $_\n"; }
}
__DATA__
123-4567
123 4567
(310) 123 4567
310-123-4567
310-123-4567 ext 890
310 123 4567 890
123-4567FUBAR
310 123 FUBAR

PRINTED RESULTS:
________________

123 4567
123 4567
310 123 4567
310 123 4567
310 123 4567 890
310 123 4567 890
123 4567
Phone Number Appears Invalid: 310 123 FUBAR

Jul 19 '05 #2

Roy Johnson

I thought that you made a few odd (either esoteric or not Lazy enough)
implementation decisions.

Purl Gurl <pu******@purlgurl.net> wrote in message news:<3F***************@purlgurl.net>...

[...]You could easily use $_ throughout as well, but this
defeats "full" printing of an invalid phone number.

Instead of preserving $_ and working on $test_it, you could have saved
a copy and then worked on $_ itself.

You used s/[^\d+]//g instead of tr/0-9//dc to remove all non-digits.

You used tr/0-9// instead of length.

The use of the 4-argument version of substr() was neat, but a
judicious pattern match instead of length-checking makes for tighter
code:

while (<DATA>) {
my $save = $_;
tr/0-9//dc;
if (/(...)?(...)(....)/) {
printf "%3s %s %s %s\n", $1, $2, $3, $';
}
else {
print "Invalid phone number: $save\n";
}
}

Now let's go back to the issue of stripping all non-numerics. If you
do that, you can't distinguish 123-4567 x890 from (123) 456 7890.
Granted, when you dial, the phone doesn't know the difference, but
there may be some difference in how the person doing the dialing has
to behave.

If, instead of stripping the non-digits, you just look for groups of
digits (optional 3, then mandatory 3 and 4, then optional however
many) amongst the non-digits, you can address that:

#!perl
while (<DATA>) {
my $save = $_;
if (/^\D*(?:(\d{3})\D+)?(\d{3})\D+(\d{4})(?:\D+(\d+))?/) {
printf "%3s %s %s %s\n", $1, $2, $3, $4;
}
else {
print "Invalid phone number: $save\n";
}
}

__DATA__
123-4567
123 4567
123 4567 x890 <-- note
(310) 123 4567
310-123-4567
310-123-4567 ext 890
310 123 4567 890
123-4567FUBAR
310 123 FUBAR
Output is:
123 4567
123 4567
123 4567 890
310 123 4567
310 123 4567
310 123 4567 890
310 123 4567 890
123 4567
Invalid phone number: 310 123 FUBAR

Jul 19 '05 #3

Gunnar Hjalmarsson

joemono wrote:

I wrote the following expression to take (hopefully) any
_reasonable_ phone number input, and format it as (999) 999-9999 x
9999.

Hi Joe,

I don't know the likelihood in your case that people outside the US
are asked to enter their phone numbers. The reason why I mention it is
that I have tried to enter my non-US number at quite a few US based
web sites, resulting in error messages...

So, out from that experience, I'd say that a strict phone number
checking is sometimes a really bad idea. ;-)

Gunnar
(Sweden)

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Jul 19 '05 #4

Purl Gurl

Roy Johnson wrote:

Purl Gurl wrote in message I thought that you made a few odd (either esoteric or not Lazy enough)
implementation decisions.

I have no interest in reading Code Cop Crap.

It is annoying to open an article only to discover
this type of troll mule manure you write.

Respond to the originating author as you should.

You are wasting your time and the time of readers.
Purl Gurl

Jul 19 '05 #5

Roy Johnson

Purl Gurl <pu******@purlgurl.net> wrote in message news:<3F***************@purlgurl.net>...

I have no interest in reading Code Cop Crap.

Interesting. I have no interest in your critiques of my posts that
have nothing to do with Perl.

It's not "trolling" to point out that you're doing bizarre things when
straightforward methods are available. My code was much more clear
than yours, as well as being shorter.

delete $shoulder->{'chip'}

Jul 19 '05 #6

Similar topics

Regular Expression for a Phone Number

by: Brian Davis | last post by:

The problem is in the word boundary \b. A leading "(" will match as a word boundary before it gets to the test for a "(". Changing the expression to: (?n)(\b|$)1??\(?(?<areaCode>\d\d)?$??(?...

C# / C Sharp

Phone Format (770) 123-1234

by: Eddy Soeparmin | last post by:

Hi, How do I apply phone format in a string field? for example (770) 123-1234. Please let me know. Thanks. Eddy

ASP.NET

Regular Expression for UK phone numbers, postcode etc..

by: Andrew Banks | last post by:

In VS.NET, C#, I can validate agaist US Phone number, Zip, French phone number etc in the IDE... but nothing for the UK. Is there an online reference or add in for VS.NET that includes validation...

ASP.NET

Validate US Phone number with Regular Expression

by: Ori | last post by:

Hi, I'm looking for a good way to validate a US phone number and i though using regular expression for this. I want to support 3 different ways to enter a phone number: 1.Local Phone : 888-8899...

ASP.NET

Phone number regular expressions

by: venu | last post by:

Hi, I have a different requirement and it is : I need to validate a phone number field. It may or may not be a US phone number. The constraints are : *********************** # It should...

C# / C Sharp

phone number regular expression problem

by: venu | last post by:

Hi, I have a different requirement and it is : I need to validate a phone number field. It may or may not be a US phone number. The constraints are : *********************** # It should...

C# / C Sharp

Validation control and Phone

by: David C | last post by:

Is there a way to validate a specific phone# format on a control? I also want to be able to have the user enter an extension as part of the text. For example, the following would be valid. ...

ASP.NET

What is the Regular Expression for a 6 digit Phone number field

by: hellboss | last post by:

Hi ! Can u tel me what is the expression (Regular expression) for the a field like Telephone number which Doesnt exceeds 6 digits Ex:999999 d{5} or ()* Can we use this expression , if...

.NET Framework

Javascript to Validate the Phone Number Field Using Javascript :-)

by: Abhishek | last post by:

Hi this is my another validator in javascript to validate the Phone Number :-) <script language='javascript'> function funcCheckPhoneNumber(ctrtxtMobile,e){ if(window.event){ var strkeyIE =...

Javascript

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp