I've been looking all over in the docs, but I can't figure out how
you're *supposed* to parse formatted strings into numbers (and other
data types, for that matter) in Python.
In C#, you can say
int.Parse(myString)
and it will turn a string like "-12,345" into a proper int. It works
for all sorts of data types with all sorts of formats, and you can
pass it locale parameters to tell it, for example, to parse a German
"12.345,67" into 12345.67. Java does this, too.
(Integer.parseInt(myStr), IIRC).
What's the equivalent in Python?
And if the only problem is comma thousand-separators (e.g.,
"12,345.67"), is there a higher-performance way to convert that into
the number 12345.67 than using Python's formal parsers?
Thanks. 6 22028
tuanglen> I've been looking all over in the docs, but I can't figure out
tuanglen> how you're *supposed* to parse formatted strings into numbers
tuanglen> (and other data types, for that matter) in Python.
Check out the locale module. From "pydoc locale":
Help on module locale:
NAME
locale - Locale support.
FILE
/Users/skip/local/lib/python2.4/locale.py
MODULE DOCS http://www.python.org/doc/current/li...le-locale.html
DESCRIPTION
The module provides low-level access to the C lib's locale APIs
and adds high level number formatting APIs as well as a locale
aliasing engine to complement these.
...
FUNCTIONS
atof(str, func=<type 'float'>)
Parses a string as a float according to the locale settings.
atoi(str)
Converts a string to an integer according to the locale settings.
...
Skip
Hello Tuang, In C#, you can say
int.Parse(myString)
and it will turn a string like "-12,345" into a proper int. It works for all sorts of data types with all sorts of formats, and you can pass it locale parameters to tell it, for example, to parse a German "12.345,67" into 12345.67. Java does this, too. (Integer.parseInt(myStr), IIRC).
What's the equivalent in Python?
Python has a build in "int", "long" and "float" functions. However
they are more limited than what you want.
And if the only problem is comma thousand-separators (e.g., "12,345.67"), is there a higher-performance way to convert that into the number 12345.67 than using Python's formal parsers?
i = int("12,345.67".replace(",", ""))
HTH.
Miki
Skip Montanaro <sk**@pobox.com> wrote in message news:<ma*************************************@pyth on.org>... tuanglen> I've been looking all over in the docs, but I can't figure out tuanglen> how you're *supposed* to parse formatted strings into numbers tuanglen> (and other data types, for that matter) in Python.
Check out the locale module. From "pydoc locale":
Help on module locale:
NAME locale - Locale support.
FILE /Users/skip/local/lib/python2.4/locale.py
MODULE DOCS http://www.python.org/doc/current/li...le-locale.html
DESCRIPTION The module provides low-level access to the C lib's locale APIs and adds high level number formatting APIs as well as a locale aliasing engine to complement these.
...
FUNCTIONS atof(str, func=<type 'float'>) Parses a string as a float according to the locale settings.
atoi(str) Converts a string to an integer according to the locale settings.
...
Thanks for taking a shot at it, but it doesn't appear to work: import locale locale.atoi("-12,345")
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python2321\lib\locale.py", line 179, in atoi
return atof(str, int)
File "C:\Python2321\lib\locale.py", line 175, in atof
return func(str)
ValueError: invalid literal for int(): -12,345 locale.getdefaultlocale()
('en_US', 'cp1252') locale.atoi("-12345")
-12345
Given the locale it thinks I have, it should be able to parse
"-12,345" if it can handle formats containing thousands separators,
but apparently it can't.
If Python doesn't actually have its own parsing of formatted numbers,
what's the preferred Python approach for taking taking data, perhaps
formatted currencies such as "-$12,345.00" scraped off a Web page, and
turning it into numerical data?
Thanks. tu******@hotmail.com (Tuang) wrote in
news:df*************************@posting.google.co m: locale.getdefaultlocale() ('en_US', 'cp1252') locale.atoi("-12345") -12345
Given the locale it thinks I have, it should be able to parse "-12,345" if it can handle formats containing thousands separators, but apparently it can't.
If Python doesn't actually have its own parsing of formatted numbers, what's the preferred Python approach for taking taking data, perhaps formatted currencies such as "-$12,345.00" scraped off a Web page, and turning it into numerical data?
The problem is that by default the numeric locale is not set up to parse
those numbers. You have to set that up separately: import locale locale.getlocale(locale.LC_NUMERIC)
(None, None) locale.getlocale()
['English_United Kingdom', '1252'] locale.setlocale(locale.LC_NUMERIC, "English")
'English_United States.1252' locale.atof('1,234')
1234.0 locale.setlocale(locale.LC_NUMERIC, "French")
'French_France.1252' locale.atof('1,234')
1.234
Unless I've missed something, it doesn't support ignoring currency symbols
when parsing numbers, so you still can't handle "-$12,345.00" even if you
do set the numeric and monetary locales.
--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
tuang> Thanks for taking a shot at it, but it doesn't appear to work: import locale locale.atoi("-12,345")
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python2321\lib\locale.py", line 179, in atoi
return atof(str, int)
File "C:\Python2321\lib\locale.py", line 175, in atof
return func(str)
ValueError: invalid literal for int(): -12,345 locale.getdefaultlocale()
('en_US', 'cp1252') locale.atoi("-12345")
-12345
Take a look at the output of locale.localeconv() with various locales set.
I think you'll find that locale.localeconv()['tousands_sep'] is '', not ','.
Failing that, you might want to simply replace the commas and dollar signs
with empty strings before passing to int() or float(), as someone else
suggested.
Be careful if you're scraping web pages which might not use the same charset
as you do. You may find something like:
$123.456,78
as a quote price on a European website. I don't know how to tell what the
remote site used as its locale when formatting numeric data. Perhaps
knowing the charset of the page is sufficient to make an educated guess.
Skip
Skip Montanaro <sk**@pobox.com> wrote Be careful if you're scraping web pages which might not use the same charset as you do. You may find something like:
$123.456,78
as a quote price on a European website. I don't know how to tell what the remote site used as its locale when formatting numeric data. Perhaps knowing the charset of the page is sufficient to make an educated guess.
Thanks, Skip. I'm not planning some sort of shady screen scraping
operation or anything of that sort. This is more of a generic question
about how to use Python as a convenient utility language.
Sometimes I'll find a table of interesting data somewhere as I'm just
surfing around the Web, and I'll want to grab the data and play with
it a bit. At that scale of operation, I can just look at the page
source and figure out the encoding, what the currency is, etc. I know
how to turn a formatted string into a usable number in other languages
that I use (though I might have to check the docs in some cases to
remind myself of the details), and since the docs didn't really make
it obvious what the "one clear and obvious way to do it" was in
Python, I thought I'd ask.
It appears as though Python doesn't (yet) have the same formal support
for format parsing and internationalization that languages like C# and
Java have, but that's okay for now. I just wanted to make sure I
didn't start creating my own naive, homemade equivalents of functions
that are already part of the standard API. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Gerrit Holl |
last post by:
Posted with permission from the author.
I have some comments on this PEP, see the (coming) followup to this message.
PEP: 321
Title: Date/Time...
|
by: Peter Sprenger |
last post by:
Hello,
I hope somebody can help me with my problem. I am writing Zope python
scripts that will do parsing on text for dynamic webpages: I am...
|
by: Christopher Benson-Manica |
last post by:
(if this is a FAQ, I apologize for not finding it)
I have a C-style string that I'd like to cleanly separate into tokens
(based on the '.'...
|
by: Gert Van den Eynde |
last post by:
Hi all,
Could you give me some pointers on how to parse a text input file in C++?
Most will be config-file style input (keyword = data), but some...
|
by: ralphNOSPAM |
last post by:
Is there a function or otherwise some way to pull out the target text
within an XML tag?
For example, in the XML tag below, I want to pull out...
|
by: Ulrich Vollenbruch |
last post by:
Hi all!
since I'am used to work with matlab for a long time and now have to work
with c/c++, I have again some problems with the usage of strings,...
|
by: Simone Mehta |
last post by:
hi All,
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char...
|
by: Lucas Tam |
last post by:
Hi all,
Does anyone know of a GOOD example on parsing text with text qualifiers?
I am hoping to parse text with variable length...
|
by: Atara |
last post by:
In my apllication I use the following code:
'-- My Code:
Public Shared Function strDate2Date(ByVal strDate As String) As
System.DateTime
Dim...
|
by: bwaichu |
last post by:
I am writing a very basic web server, and I need to parse the HTTP
Request string that I am receiving. Are there any good C books that
suggest...
|
by: better678 |
last post by:
Question:
Discuss your understanding of the Java platform. Is the statement "Java is interpreted" correct?
Answer:
Java is an object-oriented...
|
by: Kemmylinns12 |
last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
|
by: jalbright99669 |
last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
|
by: antdb |
last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine
In the overall architecture, a new "hyper-convergence" concept was...
|
by: Matthew3360 |
last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function.
Here is my code.
...
|
by: Matthew3360 |
last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
|
by: AndyPSV |
last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
|
by: Matthew3360 |
last post by:
Hi,
I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web...
|
by: Carina712 |
last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand....
| |