469,359 Members | 1,689 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,359 developers. It's quick & easy.

Parsing strings -> numbers

I've been looking all over in the docs, but I can't figure out how
you're *supposed* to parse formatted strings into numbers (and other
data types, for that matter) in Python.

In C#, you can say

int.Parse(myString)

and it will turn a string like "-12,345" into a proper int. It works
for all sorts of data types with all sorts of formats, and you can
pass it locale parameters to tell it, for example, to parse a German
"12.345,67" into 12345.67. Java does this, too.
(Integer.parseInt(myStr), IIRC).

What's the equivalent in Python?

And if the only problem is comma thousand-separators (e.g.,
"12,345.67"), is there a higher-performance way to convert that into
the number 12345.67 than using Python's formal parsers?

Thanks.
Jul 18 '05 #1
6 21856

tuanglen> I've been looking all over in the docs, but I can't figure out
tuanglen> how you're *supposed* to parse formatted strings into numbers
tuanglen> (and other data types, for that matter) in Python.

Check out the locale module. From "pydoc locale":

Help on module locale:

NAME
locale - Locale support.

FILE
/Users/skip/local/lib/python2.4/locale.py

MODULE DOCS
http://www.python.org/doc/current/li...le-locale.html

DESCRIPTION
The module provides low-level access to the C lib's locale APIs
and adds high level number formatting APIs as well as a locale
aliasing engine to complement these.

...

FUNCTIONS
atof(str, func=<type 'float'>)
Parses a string as a float according to the locale settings.

atoi(str)
Converts a string to an integer according to the locale settings.

...

Skip

Jul 18 '05 #2
Hello Tuang,
In C#, you can say

int.Parse(myString)

and it will turn a string like "-12,345" into a proper int. It works
for all sorts of data types with all sorts of formats, and you can
pass it locale parameters to tell it, for example, to parse a German
"12.345,67" into 12345.67. Java does this, too.
(Integer.parseInt(myStr), IIRC).

What's the equivalent in Python? Python has a build in "int", "long" and "float" functions. However
they are more limited than what you want.
And if the only problem is comma thousand-separators (e.g.,
"12,345.67"), is there a higher-performance way to convert that into
the number 12345.67 than using Python's formal parsers?

i = int("12,345.67".replace(",", ""))

HTH.
Miki
Jul 18 '05 #3
Skip Montanaro <sk**@pobox.com> wrote in message news:<ma*************************************@pyth on.org>...
tuanglen> I've been looking all over in the docs, but I can't figure out
tuanglen> how you're *supposed* to parse formatted strings into numbers
tuanglen> (and other data types, for that matter) in Python.

Check out the locale module. From "pydoc locale":

Help on module locale:

NAME
locale - Locale support.

FILE
/Users/skip/local/lib/python2.4/locale.py

MODULE DOCS
http://www.python.org/doc/current/li...le-locale.html

DESCRIPTION
The module provides low-level access to the C lib's locale APIs
and adds high level number formatting APIs as well as a locale
aliasing engine to complement these.

...

FUNCTIONS
atof(str, func=<type 'float'>)
Parses a string as a float according to the locale settings.

atoi(str)
Converts a string to an integer according to the locale settings.

...


Thanks for taking a shot at it, but it doesn't appear to work:
import locale
locale.atoi("-12,345") Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python2321\lib\locale.py", line 179, in atoi
return atof(str, int)
File "C:\Python2321\lib\locale.py", line 175, in atof
return func(str)
ValueError: invalid literal for int(): -12,345 locale.getdefaultlocale() ('en_US', 'cp1252') locale.atoi("-12345")

-12345

Given the locale it thinks I have, it should be able to parse
"-12,345" if it can handle formats containing thousands separators,
but apparently it can't.

If Python doesn't actually have its own parsing of formatted numbers,
what's the preferred Python approach for taking taking data, perhaps
formatted currencies such as "-$12,345.00" scraped off a Web page, and
turning it into numerical data?

Thanks.
Jul 18 '05 #4
tu******@hotmail.com (Tuang) wrote in
news:df*************************@posting.google.co m:
locale.getdefaultlocale() ('en_US', 'cp1252') locale.atoi("-12345") -12345

Given the locale it thinks I have, it should be able to parse
"-12,345" if it can handle formats containing thousands separators,
but apparently it can't.

If Python doesn't actually have its own parsing of formatted numbers,
what's the preferred Python approach for taking taking data, perhaps
formatted currencies such as "-$12,345.00" scraped off a Web page, and
turning it into numerical data?


The problem is that by default the numeric locale is not set up to parse
those numbers. You have to set that up separately:
import locale
locale.getlocale(locale.LC_NUMERIC) (None, None) locale.getlocale() ['English_United Kingdom', '1252'] locale.setlocale(locale.LC_NUMERIC, "English") 'English_United States.1252' locale.atof('1,234') 1234.0 locale.setlocale(locale.LC_NUMERIC, "French") 'French_France.1252' locale.atof('1,234')

1.234

Unless I've missed something, it doesn't support ignoring currency symbols
when parsing numbers, so you still can't handle "-$12,345.00" even if you
do set the numeric and monetary locales.

--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
Jul 18 '05 #5
tuang> Thanks for taking a shot at it, but it doesn't appear to work:
import locale
locale.atoi("-12,345") Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python2321\lib\locale.py", line 179, in atoi
return atof(str, int)
File "C:\Python2321\lib\locale.py", line 175, in atof
return func(str)
ValueError: invalid literal for int(): -12,345 locale.getdefaultlocale() ('en_US', 'cp1252') locale.atoi("-12345")

-12345

Take a look at the output of locale.localeconv() with various locales set.
I think you'll find that locale.localeconv()['tousands_sep'] is '', not ','.
Failing that, you might want to simply replace the commas and dollar signs
with empty strings before passing to int() or float(), as someone else
suggested.

Be careful if you're scraping web pages which might not use the same charset
as you do. You may find something like:

$123.456,78

as a quote price on a European website. I don't know how to tell what the
remote site used as its locale when formatting numeric data. Perhaps
knowing the charset of the page is sufficient to make an educated guess.

Skip

Jul 18 '05 #6
Skip Montanaro <sk**@pobox.com> wrote

Be careful if you're scraping web pages which might not use the same charset
as you do. You may find something like:

$123.456,78

as a quote price on a European website. I don't know how to tell what the
remote site used as its locale when formatting numeric data. Perhaps
knowing the charset of the page is sufficient to make an educated guess.


Thanks, Skip. I'm not planning some sort of shady screen scraping
operation or anything of that sort. This is more of a generic question
about how to use Python as a convenient utility language.

Sometimes I'll find a table of interesting data somewhere as I'm just
surfing around the Web, and I'll want to grab the data and play with
it a bit. At that scale of operation, I can just look at the page
source and figure out the encoding, what the currency is, etc. I know
how to turn a formatted string into a usable number in other languages
that I use (though I might have to check the docs in some cases to
remind myself of the details), and since the docs didn't really make
it obvious what the "one clear and obvious way to do it" was in
Python, I thought I'd ask.

It appears as though Python doesn't (yet) have the same formal support
for format parsing and internationalization that languages like C# and
Java have, but that's okay for now. I just wanted to make sure I
didn't start creating my own naive, homemade equivalents of functions
that are already part of the standard API.
Jul 18 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

8 posts views Thread by Gerrit Holl | last post: by
2 posts views Thread by Peter Sprenger | last post: by
10 posts views Thread by Christopher Benson-Manica | last post: by
4 posts views Thread by Gert Van den Eynde | last post: by
4 posts views Thread by ralphNOSPAM | last post: by
6 posts views Thread by Ulrich Vollenbruch | last post: by
12 posts views Thread by Simone Mehta | last post: by
7 posts views Thread by Lucas Tam | last post: by
6 posts views Thread by bwaichu | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.