473,580 Members | 2,837 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Parsing strings -> numbers

I've been looking all over in the docs, but I can't figure out how
you're *supposed* to parse formatted strings into numbers (and other
data types, for that matter) in Python.

In C#, you can say

int.Parse(myStr ing)

and it will turn a string like "-12,345" into a proper int. It works
for all sorts of data types with all sorts of formats, and you can
pass it locale parameters to tell it, for example, to parse a German
"12.345,67" into 12345.67. Java does this, too.
(Integer.parseI nt(myStr), IIRC).

What's the equivalent in Python?

And if the only problem is comma thousand-separators (e.g.,
"12,345.67" ), is there a higher-performance way to convert that into
the number 12345.67 than using Python's formal parsers?

Thanks.
Jul 18 '05 #1
6 22197

tuanglen> I've been looking all over in the docs, but I can't figure out
tuanglen> how you're *supposed* to parse formatted strings into numbers
tuanglen> (and other data types, for that matter) in Python.

Check out the locale module. From "pydoc locale":

Help on module locale:

NAME
locale - Locale support.

FILE
/Users/skip/local/lib/python2.4/locale.py

MODULE DOCS
http://www.python.org/doc/current/li...le-locale.html

DESCRIPTION
The module provides low-level access to the C lib's locale APIs
and adds high level number formatting APIs as well as a locale
aliasing engine to complement these.

...

FUNCTIONS
atof(str, func=<type 'float'>)
Parses a string as a float according to the locale settings.

atoi(str)
Converts a string to an integer according to the locale settings.

...

Skip

Jul 18 '05 #2
Hello Tuang,
In C#, you can say

int.Parse(myStr ing)

and it will turn a string like "-12,345" into a proper int. It works
for all sorts of data types with all sorts of formats, and you can
pass it locale parameters to tell it, for example, to parse a German
"12.345,67" into 12345.67. Java does this, too.
(Integer.parseI nt(myStr), IIRC).

What's the equivalent in Python? Python has a build in "int", "long" and "float" functions. However
they are more limited than what you want.
And if the only problem is comma thousand-separators (e.g.,
"12,345.67" ), is there a higher-performance way to convert that into
the number 12345.67 than using Python's formal parsers?

i = int("12,345.67" .replace(",", ""))

HTH.
Miki
Jul 18 '05 #3
Skip Montanaro <sk**@pobox.com > wrote in message news:<ma******* *************** *************** @python.org>...
tuanglen> I've been looking all over in the docs, but I can't figure out
tuanglen> how you're *supposed* to parse formatted strings into numbers
tuanglen> (and other data types, for that matter) in Python.

Check out the locale module. From "pydoc locale":

Help on module locale:

NAME
locale - Locale support.

FILE
/Users/skip/local/lib/python2.4/locale.py

MODULE DOCS
http://www.python.org/doc/current/li...le-locale.html

DESCRIPTION
The module provides low-level access to the C lib's locale APIs
and adds high level number formatting APIs as well as a locale
aliasing engine to complement these.

...

FUNCTIONS
atof(str, func=<type 'float'>)
Parses a string as a float according to the locale settings.

atoi(str)
Converts a string to an integer according to the locale settings.

...


Thanks for taking a shot at it, but it doesn't appear to work:
import locale
locale.atoi("-12,345") Traceback (most recent call last):
File "<interacti ve input>", line 1, in ?
File "C:\Python2321\ lib\locale.py", line 179, in atoi
return atof(str, int)
File "C:\Python2321\ lib\locale.py", line 175, in atof
return func(str)
ValueError: invalid literal for int(): -12,345 locale.getdefau ltlocale() ('en_US', 'cp1252') locale.atoi("-12345")

-12345

Given the locale it thinks I have, it should be able to parse
"-12,345" if it can handle formats containing thousands separators,
but apparently it can't.

If Python doesn't actually have its own parsing of formatted numbers,
what's the preferred Python approach for taking taking data, perhaps
formatted currencies such as "-$12,345.00" scraped off a Web page, and
turning it into numerical data?

Thanks.
Jul 18 '05 #4
tu******@hotmai l.com (Tuang) wrote in
news:df******** *************** **@posting.goog le.com:
locale.getdefau ltlocale() ('en_US', 'cp1252') locale.atoi("-12345") -12345

Given the locale it thinks I have, it should be able to parse
"-12,345" if it can handle formats containing thousands separators,
but apparently it can't.

If Python doesn't actually have its own parsing of formatted numbers,
what's the preferred Python approach for taking taking data, perhaps
formatted currencies such as "-$12,345.00" scraped off a Web page, and
turning it into numerical data?


The problem is that by default the numeric locale is not set up to parse
those numbers. You have to set that up separately:
import locale
locale.getlocal e(locale.LC_NUM ERIC) (None, None) locale.getlocal e() ['English_United Kingdom', '1252'] locale.setlocal e(locale.LC_NUM ERIC, "English") 'English_United States.1252' locale.atof('1, 234') 1234.0 locale.setlocal e(locale.LC_NUM ERIC, "French") 'French_France. 1252' locale.atof('1, 234')

1.234

Unless I've missed something, it doesn't support ignoring currency symbols
when parsing numbers, so you still can't handle "-$12,345.00" even if you
do set the numeric and monetary locales.

--
Duncan Booth du****@rcp.co.u k
int month(char *p){return(1248 64/((p[0]+p[1]-p[2]&0x1f)+1)%12 )["\5\x8\3"
"\6\7\xb\1\x9\x a\2\0\4"];} // Who said my code was obscure?
Jul 18 '05 #5
tuang> Thanks for taking a shot at it, but it doesn't appear to work:
import locale
locale.atoi("-12,345") Traceback (most recent call last):
File "<interacti ve input>", line 1, in ?
File "C:\Python2321\ lib\locale.py", line 179, in atoi
return atof(str, int)
File "C:\Python2321\ lib\locale.py", line 175, in atof
return func(str)
ValueError: invalid literal for int(): -12,345 locale.getdefau ltlocale() ('en_US', 'cp1252') locale.atoi("-12345")

-12345

Take a look at the output of locale.localeco nv() with various locales set.
I think you'll find that locale.localeco nv()['tousands_sep'] is '', not ','.
Failing that, you might want to simply replace the commas and dollar signs
with empty strings before passing to int() or float(), as someone else
suggested.

Be careful if you're scraping web pages which might not use the same charset
as you do. You may find something like:

$123.456,78

as a quote price on a European website. I don't know how to tell what the
remote site used as its locale when formatting numeric data. Perhaps
knowing the charset of the page is sufficient to make an educated guess.

Skip

Jul 18 '05 #6
Skip Montanaro <sk**@pobox.com > wrote

Be careful if you're scraping web pages which might not use the same charset
as you do. You may find something like:

$123.456,78

as a quote price on a European website. I don't know how to tell what the
remote site used as its locale when formatting numeric data. Perhaps
knowing the charset of the page is sufficient to make an educated guess.


Thanks, Skip. I'm not planning some sort of shady screen scraping
operation or anything of that sort. This is more of a generic question
about how to use Python as a convenient utility language.

Sometimes I'll find a table of interesting data somewhere as I'm just
surfing around the Web, and I'll want to grab the data and play with
it a bit. At that scale of operation, I can just look at the page
source and figure out the encoding, what the currency is, etc. I know
how to turn a formatted string into a usable number in other languages
that I use (though I might have to check the docs in some cases to
remind myself of the details), and since the docs didn't really make
it obvious what the "one clear and obvious way to do it" was in
Python, I thought I'd ask.

It appears as though Python doesn't (yet) have the same formal support
for format parsing and internationaliz ation that languages like C# and
Java have, but that's okay for now. I just wanted to make sure I
didn't start creating my own naive, homemade equivalents of functions
that are already part of the standard API.
Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
9432
by: Gerrit Holl | last post by:
Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $ Last-Modified: $Date: 2003/10/28 19:48:44 $ Author: A.M. Kuchling <amk@amk.ca> Status: Draft Type: Standards Track
2
2303
by: Peter Sprenger | last post by:
Hello, I hope somebody can help me with my problem. I am writing Zope python scripts that will do parsing on text for dynamic webpages: I am getting a text from an oracle database that contains different tags that have to be converted to a HTML expression. E.g. "<pic#>" ( # is an integer number) has to be converted to <img src="..."> where...
10
2618
by: Christopher Benson-Manica | last post by:
(if this is a FAQ, I apologize for not finding it) I have a C-style string that I'd like to cleanly separate into tokens (based on the '.' character) and then convert those tokens to unsigned integers. What is the best standard(!) C++ way to accomplish this? -- Christopher Benson-Manica | I *should* know what I'm talking about - if I...
4
3508
by: Gert Van den Eynde | last post by:
Hi all, Could you give me some pointers on how to parse a text input file in C++? Most will be config-file style input (keyword = data), but some maybe 'structures' like material{ name = n, position = x,y,z}. Things that I have in my mind now are: 1) simply reading in strings, analysing the strings myself, 2) writing a lexer/parser, 3)...
4
2638
by: ralphNOSPAM | last post by:
Is there a function or otherwise some way to pull out the target text within an XML tag? For example, in the XML tag below, I want to pull out 'CALIFORNIA'. <txtNameUSState>CALIFORNIA</txtNameUSState>
6
2112
by: Ulrich Vollenbruch | last post by:
Hi all! since I'am used to work with matlab for a long time and now have to work with c/c++, I have again some problems with the usage of strings, pointers and arrays. So please excuse my basic question: I want to parse a string like "3.12" to get two integers 3 and 12. I wanted to use the function STRTOK() I wrote a main- and a...
12
8697
by: Simone Mehta | last post by:
hi All, I am parsing a CSV file. I want to read every row into a char array of reasonable size and then extract strings from it. <snippet> char foo="hello,world,bye,bye,world"; ..... sscanf(foo,"%s%*%s%*%s%*%s%*%s",s1,s2,s3,s4,s5); <snippet/> This is giving me junk .
7
5118
by: Lucas Tam | last post by:
Hi all, Does anyone know of a GOOD example on parsing text with text qualifiers? I am hoping to parse text with variable length delimiters/qualifiers. Also, qualified text could run onto mulitple lines and contain characters like vbcrlf (thus the multiple lines). Anyhow, any help would be appreciated. Thanks!
18
4851
by: Atara | last post by:
In my apllication I use the following code: '-- My Code: Public Shared Function strDate2Date(ByVal strDate As String) As System.DateTime Dim isOk As Boolean = False If (strDate Is Nothing) Then isOk = False ElseIf Not (strDate.Length() = 6) Then isOk = False
6
2076
by: bwaichu | last post by:
I am writing a very basic web server, and I need to parse the HTTP Request string that I am receiving. Are there any good C books that suggest ways to parse strings effectively? Thanks!
0
7854
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
6533
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5665
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5349
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3790
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3806
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2295
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1394
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1118
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.