I want to pick all intergers and decimal numbers out of a string.
Would this be the most correct regular expression to use?
"\d+\.?\d*" 9 17206
On 23 Sep 2004 17:51:17 -0700, gary <ga*********@gmail.com> wrote: I want to pick all intergers and decimal numbers out of a string. Would this be the most correct regular expression to use?
"\d+\.?\d*"
That will work for numbers such as 0123 12.345 12. 0.5 -- but it
won't work for the following:
0x12AB .5 10e-3 -15 123L
If you want to handle some of those, then you'll need a more complicated regex.
If you want to accept numbers of the form .5 but don't care about 12.
then a better regex would be
\d*\.?\d+
Andrew Durdin wrote: That will work for numbers such as 0123 12.345 12. 0.5 -- but it won't work for the following: 0x12AB .5 10e-3 -15 123L
This will handle the normal floats including a leading + or -
and trailing exponent, all optional.
r"[+-]?((\d+(\.\d*)?)|\.\d+)([eE][+-]?[0-9]+)?"
Andrew da***@dalkescientific.com
gary wrote: I want to pick all intergers and decimal numbers out of a string. Would this be the most correct regular expression to use?
"\d+\.?\d*"
Examples, including the most extreme cases you want to handle,
are always a good idea.
-Peter
Peter Hansen <pe***@engcorp.com> wrote in message news:<pb********************@powergate.ca>... gary wrote: I want to pick all intergers and decimal numbers out of a string. Would this be the most correct regular expression to use?
"\d+\.?\d*"
Examples, including the most extreme cases you want to handle, are always a good idea.
-Peter
Here is an example of what I will be dealing with:
"""
TOTAL FIRST DOWNS 19 21
By Rushing 11 6
By Passing 6 10
By Penalty 2 5
THIRD DOWN EFFICIENCY 4-11-36% 6-14-43%
FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0%
TOTAL NET YARDS 379 271
Total Offensive Plays (inc. times thrown passing) 58 63
Average gain per offensive play 6.5 4.3
NET YARDS RUSHING 264 115
"""
I can only hope that they were nice and put a leading zero in front of
numbers less than 1.
On 25 Sep 2004 13:13:22 -0700, ga*********@gmail.com (gary) wrote: Peter Hansen <pe***@engcorp.com> wrote in message news:<pb********************@powergate.ca>... gary wrote: > I want to pick all intergers and decimal numbers out of a string. > Would this be the most correct regular expression to use? > > "\d+\.?\d*"
Examples, including the most extreme cases you want to handle, are always a good idea.
-Peter
Here is an example of what I will be dealing with: """ TOTAL FIRST DOWNS 19 21 By Rushing 11 6 By Passing 6 10 By Penalty 2 5 THIRD DOWN EFFICIENCY 4-11-36% 6-14-43% FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0% TOTAL NET YARDS 379 271 Total Offensive Plays (inc. times thrown passing) 58 63 Average gain per offensive play 6.5 4.3 NET YARDS RUSHING 264 115 """
I can only hope that they were nice and put a leading zero in front of numbers less than 1.
Are you sure you want to throw away all the info implicit in the structure of that data?
How about the columns? Will you get other input with more columns? Otherwise if your
numeric fields are as they appear, maybe just def extract(s):
... for a in s.split():
... if not a[0].isdigit(): continue
... if a.endswith('%'):
... for i in map(int,a[:-1].split('-')): yield i
... elif '.' in a: yield float(a)
... else: yield int(a)
... s = (
... """
... TOTAL FIRST DOWNS 19 21
... By Rushing 11 6
... By Passing 6 10
... By Penalty 2 5
... THIRD DOWN EFFICIENCY 4-11-36% 6-14-43%
... FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0%
... TOTAL NET YARDS 379 271
... Total Offensive Plays (inc. times thrown passing) 58 63
... Average gain per offensive play 6.5 4.3
... NET YARDS RUSHING 264 115
... """
... ) for num in extract(s): print num,
...
19 21 11 6 6 10 2 5 4 11 36 6 14 43 0 1 0 0 0 0 379 271 58 63 6.5 4.3 264 115
But I doubt that's what you really want ;-)
Regards,
Bengt Richter
gary wrote: Peter Hansen <pe***@engcorp.com> wrote in message news:<pb********************@powergate.ca>...Examples, including the most extreme cases you want to handle, are always a good idea.
Here is an example of what I will be dealing with: """ TOTAL FIRST DOWNS 19 21 By Rushing 11 6 By Passing 6 10 By Penalty 2 5 THIRD DOWN EFFICIENCY 4-11-36% 6-14-43% FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0% TOTAL NET YARDS 379 271 Total Offensive Plays (inc. times thrown passing) 58 63 Average gain per offensive play 6.5 4.3 NET YARDS RUSHING 264 115 """
I can only hope that they were nice and put a leading zero in front of numbers less than 1.
Good example of the input. Now all you need to do is tell
us exactly what kind of output you would expect to come
from the routine which you seek. ;-)
-Peter bo**@oz.net (Bengt Richter) wrote in message news:<cj*************************@theriver.com>... On 25 Sep 2004 13:13:22 -0700, ga*********@gmail.com (gary) wrote:
Peter Hansen <pe***@engcorp.com> wrote in message news:<pb********************@powergate.ca>... gary wrote: > I want to pick all intergers and decimal numbers out of a string. > Would this be the most correct regular expression to use? > > "\d+\.?\d*"
Examples, including the most extreme cases you want to handle, are always a good idea.
-Peter Here is an example of what I will be dealing with: """ TOTAL FIRST DOWNS 19 21 By Rushing 11 6 By Passing 6 10 By Penalty 2 5 THIRD DOWN EFFICIENCY 4-11-36% 6-14-43% FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0% TOTAL NET YARDS 379 271 Total Offensive Plays (inc. times thrown passing) 58 63 Average gain per offensive play 6.5 4.3 NET YARDS RUSHING 264 115 """
Are you sure you want to throw away all the info implicit in the structure of that data? How about the columns? Will you get other input with more columns?
There are several other instances in the files that I am extracting
data from where the numbers are not so nicely arranged in columns, so
I am really looking for something that could be used in all instances.
(http://www.nfl.com/gamecenter/gamebo...020929_TEN@OAK)
I do however still need to convert everything from string to numbers.
I was thinking about using the following for that unless someone has a
better solution: def StrToNum(str):
.... try: return int(str)
.... except ValueError:
.... try: return float(str)
.... except ValueError: return str
statlist = ['10', '6', '2002', 'tampa bay buccaneers', 'atlanta
falcons', 'the georgia dome', '1', '03', 'pm', 'est', 'artificial',
'0', '3', '7', '10', '0', '20', '3', '0', '3', '0', '0', '6', '15',
'14', '5', '2', '9', '10', '1', '2', '4', '13', '31', '3', '14', '21',
'1', '1', '100', '0', '1', '0', '327', '243', '59', '64', '5.5',
'3.8', '74', '70', '26', '22', '2.8', '3.2', '2', '3', '2', '3',
'253', '173', '2', '8', '4', '14', '261', '187', '31', '17', '1',
'38', '17', '4', '7.7', '4.1', '5', '3', '0', '3', '2', '2', '5',
'43.2', '5', '45.6', '0', '0', '0', '0', '0', '0', '31.2', '41.6',
'50', '40', '0', '0', '3', '40', '0', '0', '5', '120', '4', '50', '1',
'0', '6', '35', '6', '41', '1', '1', '0', '0', '2', '0', '0', '0',
'1', '0', '1', '0', '2', '2', '0', '0', '2', '2', '0', '0', '2', '2',
'2', '3', '0', '2', '0', '0', '2', '0', '0', '1', '0', '0', '0', '0',
'0', '0', '20', '6', '29', '34', '30', '26', '3', '37', '9', '59',
'9', '35', '6', '23', 0, 0, '11', '23', '5', '01', '5', '25', '8',
'37', 0, 0, '26'] [StrToNum(item) for item in statlist]
[10, 6, 2002, 'tampa bay buccaneers', 'atlanta falcons', 'the georgia
dome', 1, 3, 'pm', 'est', 'artificial', 0, 3, 7, 10, 0, 20, 3, 0, 3,
0, 0, 6, 15, 14, 5, 2, 9, 10, 1, 2, 4, 13, 31, 3, 14, 21, 1, 1, 100,
0, 1, 0, 327, 243, 59, 64, 5.5, 3.7999999999999998, 74, 70, 26, 22,
2.7999999999999998, 3.2000000000000002, 2, 3, 2, 3, 253, 173, 2, 8, 4,
14, 261, 187, 31, 17, 1, 38, 17, 4, 7.7000000000000002,
4.0999999999999996, 5, 3, 0, 3, 2, 2, 5, 43.200000000000003, 5,
45.600000000000001, 0, 0, 0, 0, 0, 0, 31.199999999999999,
41.600000000000001, 50, 40, 0, 0, 3, 40, 0, 0, 5, 120, 4, 50, 1, 0, 6,
35, 6, 41, 1, 1, 0, 0, 2, 0, 0, 0, 1, 0, 1, 0, 2, 2, 0, 0, 2, 2, 0, 0,
2, 2, 2, 3, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 20, 6, 29, 34,
30, 26, 3, 37, 9, 59, 9, 35, 6, 23, 0, 0, 11, 23, 5, 1, 5, 25, 8, 37,
0, 0, 26]
Another thing was that I found a negative number which kinds screws up
the regex's previously disscussed. So I came up with a workaround
below: str = """
.... FGs - PATs Had Blocked 0-0 0-0
.... Net Punting Average -6.3 33.3
.... TOTAL RETURN YARDAGE (Not Including Kickoffs) 14 257
.... No. and Yards Punt Returns 1-14 2-157
.... """ str = re.sub(r"(\d+)-",r"\1 ",str) #replace number followed by
dash with number followed by space teamstats = re.findall(r"-?\d+\.?\d*",str) #regex discussed before
but with an optional negative sign in front teamstats
['0', '0', '0', '0', '-6.3', '33.3', '14', '257', '1', '14', '2',
'157'] [StrToNum(item) for item in teamstats]
[0, 0, 0, 0, -6.2999999999999998, 33.299999999999997, 14, 257, 1, 14,
2, 157]
Gary
Peter Hansen <pe***@engcorp.com> wrote in message news:<jf********************@powergate.ca>... Good example of the input. Now all you need to do is tell us exactly what kind of output you would expect to come from the routine which you seek. ;-)
-Peter
Well for that particular example something of the form...
Cleveland at Cincinnati +8
would be nice ;-)
gary wrote: Peter Hansen <pe***@engcorp.com> wrote in message news:<jf********************@powergate.ca>...
Good example of the input. Now all you need to do is tell us exactly what kind of output you would expect to come from the routine which you seek. ;-)
Well for that particular example something of the form...
Cleveland at Cincinnati +8
would be nice ;-)
I know nothing about American football except that it
isn't played with a puck, so I don't think I get the joke...
-Peter This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Kenneth McDonald |
last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate
feedback, suggestions, and criticism as I work towards finalizing the
API and feature sets. rex is a module intended to make...
|
by: greenflame |
last post by:
I am trying to find a regular expression that returns true in the
following cases but no others.
2.0
2.4
2.
324.0e345
234e34
34.e-43
234.673
|
by: Robert Scheer |
last post by:
Hi.
I have a regularexpression validator control on a page. This regular
expression validates a textbox to accept only numbers and commas:
validationexpression="*"
I am trying to modify this...
|
by: Steve |
last post by:
Hi All,
I'm having a tough time converting the following regex.compile patterns
into the new re.compile format. There is also a differences in the
regsub.sub() vs. re.sub()
Could anyone lend...
|
by: Mike9900 |
last post by:
Hello,
I need a regular expression to match a currency with its symbol, for example
Pound66.99 must return 66.99 or Pound(66.99) or Pound-66.99 or -66.99Pound
return -66.99 or any other...
|
by: Michael_Burgess |
last post by:
Hi there,
I'm using the following regex validator:
^\d{0,4}.?\d{0,2}$
This is to validate that a text box has 0-4 numbers, possible followed
by a decimal point and possibly followed by 2...
|
by: =?Utf-8?B?ZG1idXNv?= |
last post by:
I am looking for a regular expression that would filter numbers in my vb.net
application. The integer part could have up to 5 digits and the fractional
part up to 2 digits.
I came up with the...
|
by: war |
last post by:
Hi ,
I am Having a doubt in Regular expression validator,Since i am not aware of that
i am having a text box it should accept any integer value upto 8 digit and it also should accept decimal if...
|
by: venugopal.sjce |
last post by:
Hi Friends,
I'm constructing a regular expression for validating an expression
which looks as any of the following forms:
1. =4*++2
OR
2. =Sum()*6
Some of the samples I have constructed...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |