By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,261 Members | 1,325 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,261 IT Pros & Developers. It's quick & easy.

Parse string to double really fast

P: n/a
Hi,
I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.
Currently I am using "sscanf(line.c_str(), "%lf %lf %lf", &x, &y,
&z);" which is kind of slow.

Thanks in advance,
Thomas Kowalski

Jun 6 '07 #1
Share this Question
Share on Google+
17 Replies


P: n/a

"Thomas Kowalski" <th***@gmx.dewrote in message
news:11*********************@o5g2000hsb.googlegrou ps.com...
Hi,
I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.
Currently I am using "sscanf(line.c_str(), "%lf %lf %lf", &x, &y,
&z);" which is kind of slow.

Thanks in advance,
Thomas Kowalski
#include <iostream>
#include <sstream>
#include <string>

int main()
{
std::string line("123.456 987.654 3.1416");
std::istringstream iss(line);
double x(0);
double y(0);
double z(0);
if(!(iss >x >y >z))
std::cerr << "Conversion error\n";
else
std::cout << x << '\n' << y << '\n' << z << '\n';
return 0;
}

-Mike
Jun 6 '07 #2

P: n/a
On Jun 6, 2:18 pm, Thomas Kowalski <t...@gmx.dewrote:
Hi,
I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.
Currently I am using "sscanf(line.c_str(), "%lf %lf %lf", &x, &y,
&z);" which is kind of slow.

Thanks in advance,
Thomas Kowalski
If they are exactly of the form xxxx.yyyyyyyyy then you can do
somthing like:

double convert( const char * str )
{

return convert_char( str[0], 1000 ) +
convert_char( str[1], 100 ) +
convert_char( str[2], 10 ) +
convert_char( str[3], 1 ) +
convert_char( str[5], 0.1 ) +
.... get the rest ?
}

Where convert_char checks for ' ' or isdigit and does the appropriate
thing.

G

Jun 6 '07 #3

P: n/a
Hi Mike,
thank you for your quick answer. I used the stream-approach already.
My question is rather a hint to some really custom parser.
More about what I tried:
1. approach) Using stringstreams to parse. For my file (about 400.000
lines with 3 doubles) it took about 30s to parse.
2. approach) Using sscanf which took about 13s.
3. approach) Using atof and strchr to parse took about 8s.

The improvements in my opinion show that the IO is not yet the
limiting factor. The CPU is still busy at 100% during the parsing.

Since atof is using local information (we always use the "." as
separator) and also should be able to parse different representations
of float numbers, I guess there is plenty of room for improvement. In
case of a custom parser the search is also not necessary since the we
know that the next double will follow directly one char after the end
of the last.

Does anyone have experience with such optimizations?

Regards,
Thomas Kowalski

Jun 6 '07 #4

P: n/a
Hi Gianni,
thank you for your answer.
If they are exactly of the form xxxx.yyyyyyyyy then you can do
somthing like:
I have to admit that my example was not clear enough. The format is
not that fixed.
Its rather something like "123.456 987.654 3.1416" or "43123.987
654.1234556 3".
Means the representation is not scientific or hex and we use the "."
as a separator agnostic of the local.
double convert( const char * str )
{

return convert_char( str[0], 1000 ) +
convert_char( str[1], 100 ) +
convert_char( str[2], 10 ) +
convert_char( str[3], 1 ) +
convert_char( str[5], 0.1 ) +
.... get the rest ?

}

Where convert_char checks for ' ' or isdigit and does the appropriate
thing.
I doubt that this might be the fastest way to do things..

Thanks again,
Thomas Kowalski

Jun 6 '07 #5

P: n/a
On Tue, 05 Jun 2007 21:18:57 -0700, Thomas Kowalski wrote:
>I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.
try strtod()
--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch
Jun 6 '07 #6

P: n/a
Thomas Kowalski wrote:
Hi Gianni,
thank you for your answer.
>If they are exactly of the form xxxx.yyyyyyyyy then you can do
somthing like:

I have to admit that my example was not clear enough. The format is
not that fixed.
Its rather something like "123.456 987.654 3.1416" or "43123.987
654.1234556 3".
Means the representation is not scientific or hex and we use the "."
as a separator agnostic of the local.
>double convert( const char * str )
{

return convert_char( str[0], 1000 ) +
convert_char( str[1], 100 ) +
convert_char( str[2], 10 ) +
convert_char( str[3], 1 ) +
convert_char( str[5], 0.1 ) +
.... get the rest ?

}

Where convert_char checks for ' ' or isdigit and does the appropriate
thing.

I doubt that this might be the fastest way to do things..
But is it fast enough? You don't normally ever need the "fastest" way
to do things since that might be very hard to write. It might also be
very fast on one platform and very slow on another.

Can you describe accurately what the strings representing these numbers
looks like ? Signed/unsigned ? Max N digits before and M digits after
the decimal point. Do you need error checking ? (i.e. can you assume it
will be a valid non scientific notation floating point number ?).

The only potentially costly thing is the floating point multiply. You
can eliminate all the multiples if you know a few things about your numbers.

If you really need extreme fast, you can probably use SIMD instructions
to convert the chars to digits (0..9) and probably have a couple more to
perform all the adds. You might even create a table like so:
const double digit_array[10][30] =
{
{ 0e+15, 1e+15, 2e+15, .... },
{ 0e+14, 1e+14, ... }
....
{ 0e+0, 1e+0, 2e+0 ...},
....
{ 0e-14, 1e-14 .... )
};

This was no multiplication is needed.
Jun 6 '07 #7

P: n/a
But is it fast enough? You don't normally ever need the "fastest" way
to do things since that might be very hard to write. It might also be
very fast on one platform and very slow on another.
True. But it should be a reference implementation on what is possible.
Therefore I like to take any solution into account that is not too
hard to program and especially stable.
Can you describe accurately what the strings representing these numbers
looks like ? Signed/unsigned ?
Yes, the numbers can be signed or unsigned. There is never an explicit
"+" in front of the numbers.
Max N digits before and M digits after the decimal point.
Arbitrary. There is no limitation on characters before or after.
Do you need error checking ?
No, error checking is not need in the release build (if there is some
in the debug-build its fine).

>(i.e. can you assume it will be a valid non scientific notation floating point number ?).
Yes, I can assume that. In general I should assume that it might be
any string generated by
a sprintf(buffer, "%lf %lf %lf", x,y,z);
If you really need extreme fast, you can probably use SIMD instructions
to convert the chars to digits (0..9) and probably have a couple more to
perform all the adds. You might even create a table like so:

const double digit_array[10][30] =
{
{ 0e+15, 1e+15, 2e+15, .... },
{ 0e+14, 1e+14, ... }
....
{ 0e+0, 1e+0, 2e+0 ...},
....
{ 0e-14, 1e-14 .... )

};

This was no multiplication is needed.
I like the idea with table, although I worry about the portability
issues you mentioned. The future target platforms are x86 (32 and 64-
Bit) and Linux, Windows.

Thanks for your time and help,
Thomas Kowalski
Jun 6 '07 #8

P: n/a
Thomas Kowalski wrote:
Hi,
I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.
Currently I am using "sscanf(line.c_str(), "%lf %lf %lf", &x, &y,
&z);" which is kind of slow.
I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf(). This parses a
single double value from a const char*, and advances the
pointer -- by using it three times you will be able to parse
a line akin to what you have.

It supports scientific notation, but does not support radii
different than 10 (0x notation for example). Lookup tables are
used to speed up the process.

The code would linewrap if I pasted it here, so here's a link

http://tiny.pl/frtq

AFAIK, the only modifications would be to define
EException, EParseError and signaling_NaN() or you could remove
the error-handling altogether.

HTH,
- J.
Jun 6 '07 #9

P: n/a
On Wed, 06 Jun 2007 05:59:47 GMT in comp.lang.c++, "Mike Wahler"
<mk******@mkwahler.netwrote,
std::string line("123.456 987.654 3.1416");
std::istringstream iss(line);
double x(0);
double y(0);
double z(0);
if(!(iss >x >y >z))
std::cerr << "Conversion error\n";
What is your reason for thinking that constructing a string,
constructing a stringstream, and extracting the three doubles, would
be faster than the sscanf call that the OP wishes to improve upon?

Jun 6 '07 #10

P: n/a
Hi Thomas,

Thomas Kowalski <th***@gmx.dewrites:
Since atof is using local information (we always use the "." as
separator) and also should be able to parse different representations
of float numbers, I guess there is plenty of room for improvement. In
case of a custom parser the search is also not necessary since the we
know that the next double will follow directly one char after the end
of the last.
I think aside from a custom parser, strtod is your best choice. It has
the added benefit of returning a pointer to the end of value being
parsed so you can just increment it and pass to the next call to strtod.

hth,
-boris
--
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding
Jun 6 '07 #11

P: n/a
I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf().
Thanks a lot. I will give it a try. Dzi kuje :)
BTW: Which license do you use for the parser? GPL?

Thomas Kowalski

Jun 7 '07 #12

P: n/a
Thomas Kowalski wrote:
>I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf().

Thanks a lot. I will give it a try.
I found is about 60% slower than strtod() on my system (Solaris), so
your money will vary depending on how good your native strtod() family
perform.

--
Ian Collins.
Jun 7 '07 #13

P: n/a
Thomas Kowalski wrote:
>I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf().

Thanks a lot. I will give it a try. Dzi kuje :)
BTW: Which license do you use for the parser? GPL?
You're welcome. There's no licence, I rolled this one
during one evening, feel free to have it for free!

If you intend to use it, perhaps you should test it
extensively, I just wrote several simple tests to make
sure it works.

HTH,
- J.
Jun 9 '07 #14

P: n/a
Ian Collins wrote:
Thomas Kowalski wrote:
>>I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf().
Thanks a lot. I will give it a try.

I found is about 60% slower than strtod() on my system (Solaris), so
your money will vary depending on how good your native strtod() family
perform.
Thanks for the input. That goes to show how fragile such
hand-rolled optimizations really are. I measured on an Itanium-2
and on an AMD64 and my version was about twice as fast.

cheers,
- J.
Jun 9 '07 #15

P: n/a
Jacek Dziedzic wrote:
Ian Collins wrote:
>Thomas Kowalski wrote:
>>>I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf().
Thanks a lot. I will give it a try.

I found is about 60% slower than strtod() on my system (Solaris), so
your money will vary depending on how good your native strtod() family
perform.

Thanks for the input. That goes to show how fragile such
hand-rolled optimizations really are. I measured on an Itanium-2
and on an AMD64 and my version was about twice as fast.
Or how variable implementation's standard libraries can be!

--
Ian Collins.
Jun 9 '07 #16

P: n/a
On Jun 9, 10:25 pm, Ian Collins <ian-n...@hotmail.comwrote:
Jacek Dziedzic wrote:
Ian Collins wrote:
Thomas Kowalski wrote:
I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf().
Thanks a lot. I will give it a try.
I found is about 60% slower than strtod() on my system (Solaris), so
your money will vary depending on how good your native strtod() family
perform.
Thanks for the input. That goes to show how fragile such
hand-rolled optimizations really are. I measured on an Itanium-2
and on an AMD64 and my version was about twice as fast.
Or how variable implementation's standard libraries can be!
Note that you can often convert to floating point a lot faster
if 1) you don't mind small errors in the least significant bits,
or 2) you can limit the number of digits you have to deal with
in the input. A quality convertion routine in the standard
library can't do either of these. And needs integer arithmetic
of more than 32 bits to get correct results. If you're running
on a 32 bit machine, and limit the input to, say, 9 digits, you
should be able to do a lot better than the standard library.
This won't necessarily hold for a 64 bit machine, however.

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 9 '07 #17

P: n/a
Note that you can often convert to floating point a lot faster
if 1) you don't mind small errors in the least significant bits,
What exactly are you referring too? Errors caused by addition of
doubles?

Regards,
Thomas Kowalski

Jun 12 '07 #18

This discussion thread is closed

Replies have been disabled for this discussion.