Parse string to double really fast

Thomas Kowalski

Hi,
I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.
Currently I am using "sscanf(line.c_str(), "%lf %lf %lf", &x, &y,
&z);" which is kind of slow.

Thanks in advance,
Thomas Kowalski

Jun 6 '07 #1

Subscribe Post Reply

11270

Mike Wahler

"Thomas Kowalski" <th***@gmx.dewrote in message
news:11*********************@o5g2000hsb.googlegrou ps.com...

Hi,
I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.
Currently I am using "sscanf(line.c_str(), "%lf %lf %lf", &x, &y,
&z);" which is kind of slow.

Thanks in advance,
Thomas Kowalski

#include <iostream>
#include <sstream>
#include <string>

int main()
{
std::string line("123.456 987.654 3.1416");
std::istringstream iss(line);
double x(0);
double y(0);
double z(0);
if(!(iss >x >y >z))
std::cerr << "Conversion error\n";
else
std::cout << x << '\n' << y << '\n' << z << '\n';
return 0;
}

-Mike

Jun 6 '07 #2

Gianni Mariani

On Jun 6, 2:18 pm, Thomas Kowalski <t...@gmx.dewrote:

Hi,
I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.
Currently I am using "sscanf(line.c_str(), "%lf %lf %lf", &x, &y,
&z);" which is kind of slow.

Thanks in advance,
Thomas Kowalski

If they are exactly of the form xxxx.yyyyyyyyy then you can do
somthing like:

double convert( const char * str )
{

return convert_char( str[0], 1000 ) +
convert_char( str[1], 100 ) +
convert_char( str[2], 10 ) +
convert_char( str[3], 1 ) +
convert_char( str[5], 0.1 ) +
.... get the rest ?
}

Where convert_char checks for ' ' or isdigit and does the appropriate
thing.

G

Jun 6 '07 #3

Thomas Kowalski

Hi Mike,
thank you for your quick answer. I used the stream-approach already.
My question is rather a hint to some really custom parser.
More about what I tried:
1. approach) Using stringstreams to parse. For my file (about 400.000
lines with 3 doubles) it took about 30s to parse.
2. approach) Using sscanf which took about 13s.
3. approach) Using atof and strchr to parse took about 8s.

The improvements in my opinion show that the IO is not yet the
limiting factor. The CPU is still busy at 100% during the parsing.

Since atof is using local information (we always use the "." as
separator) and also should be able to parse different representations
of float numbers, I guess there is plenty of room for improvement. In
case of a custom parser the search is also not necessary since the we
know that the next double will follow directly one char after the end
of the last.

Does anyone have experience with such optimizations?

Regards,
Thomas Kowalski

Jun 6 '07 #4

Thomas Kowalski

Hi Gianni,
thank you for your answer.

If they are exactly of the form xxxx.yyyyyyyyy then you can do
somthing like:

I have to admit that my example was not clear enough. The format is
not that fixed.
Its rather something like "123.456 987.654 3.1416" or "43123.987
654.1234556 3".
Means the representation is not scientific or hex and we use the "."
as a separator agnostic of the local.

double convert( const char * str )
{

return convert_char( str[0], 1000 ) +
convert_char( str[1], 100 ) +
convert_char( str[2], 10 ) +
convert_char( str[3], 1 ) +
convert_char( str[5], 0.1 ) +
.... get the rest ?

}

Where convert_char checks for ' ' or isdigit and does the appropriate
thing.

I doubt that this might be the fastest way to do things..

Thanks again,
Thomas Kowalski

Jun 6 '07 #5

Roland Pibinger

On Tue, 05 Jun 2007 21:18:57 -0700, Thomas Kowalski wrote:

>I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.

try strtod()
--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch

Jun 6 '07 #6

Gianni Mariani

Thomas Kowalski wrote:

Hi Gianni,
thank you for your answer.

>If they are exactly of the form xxxx.yyyyyyyyy then you can do
somthing like:

I have to admit that my example was not clear enough. The format is
not that fixed.
Its rather something like "123.456 987.654 3.1416" or "43123.987
654.1234556 3".
Means the representation is not scientific or hex and we use the "."
as a separator agnostic of the local.

>double convert( const char * str )
{

return convert_char( str[0], 1000 ) +
convert_char( str[1], 100 ) +
convert_char( str[2], 10 ) +
convert_char( str[3], 1 ) +
convert_char( str[5], 0.1 ) +
.... get the rest ?

}

Where convert_char checks for ' ' or isdigit and does the appropriate
thing.

I doubt that this might be the fastest way to do things..

But is it fast enough? You don't normally ever need the "fastest" way
to do things since that might be very hard to write. It might also be
very fast on one platform and very slow on another.

Can you describe accurately what the strings representing these numbers
looks like ? Signed/unsigned ? Max N digits before and M digits after
the decimal point. Do you need error checking ? (i.e. can you assume it
will be a valid non scientific notation floating point number ?).

The only potentially costly thing is the floating point multiply. You
can eliminate all the multiples if you know a few things about your numbers.

If you really need extreme fast, you can probably use SIMD instructions
to convert the chars to digits (0..9) and probably have a couple more to
perform all the adds. You might even create a table like so:
const double digit_array[10][30] =
{
{ 0e+15, 1e+15, 2e+15, .... },
{ 0e+14, 1e+14, ... }
....
{ 0e+0, 1e+0, 2e+0 ...},
....
{ 0e-14, 1e-14 .... )
};

This was no multiplication is needed.

Jun 6 '07 #7

Thomas Kowalski

But is it fast enough? You don't normally ever need the "fastest" way

to do things since that might be very hard to write. It might also be
very fast on one platform and very slow on another.

True. But it should be a reference implementation on what is possible.
Therefore I like to take any solution into account that is not too
hard to program and especially stable.

Can you describe accurately what the strings representing these numbers
looks like ? Signed/unsigned ?

Yes, the numbers can be signed or unsigned. There is never an explicit
"+" in front of the numbers.

Max N digits before and M digits after the decimal point.

Arbitrary. There is no limitation on characters before or after.

Do you need error checking ?

No, error checking is not need in the release build (if there is some
in the debug-build its fine).

>(i.e. can you assume it will be a valid non scientific notation floating point number ?).

Yes, I can assume that. In general I should assume that it might be
any string generated by
a sprintf(buffer, "%lf %lf %lf", x,y,z);

If you really need extreme fast, you can probably use SIMD instructions
to convert the chars to digits (0..9) and probably have a couple more to
perform all the adds. You might even create a table like so:

const double digit_array[10][30] =
{
{ 0e+15, 1e+15, 2e+15, .... },
{ 0e+14, 1e+14, ... }
....
{ 0e+0, 1e+0, 2e+0 ...},
....
{ 0e-14, 1e-14 .... )

};

This was no multiplication is needed.

I like the idea with table, although I worry about the portability
issues you mentioned. The future target platforms are x86 (32 and 64-
Bit) and Linux, Windows.

Thanks for your time and help,
Thomas Kowalski

Jun 6 '07 #8

Jacek Dziedzic

Thomas Kowalski wrote:

Hi,
I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.
Currently I am using "sscanf(line.c_str(), "%lf %lf %lf", &x, &y,
&z);" which is kind of slow.

I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf(). This parses a
single double value from a const char*, and advances the
pointer -- by using it three times you will be able to parse
a line akin to what you have.

It supports scientific notation, but does not support radii
different than 10 (0x notation for example). Lookup tables are
used to speed up the process.

The code would linewrap if I pasted it here, so here's a link

http://tiny.pl/frtq

AFAIK, the only modifications would be to define
EException, EParseError and signaling_NaN() or you could remove
the error-handling altogether.

HTH,
- J.

Jun 6 '07 #9

David Harmon

On Wed, 06 Jun 2007 05:59:47 GMT in comp.lang.c++, "Mike Wahler"
<mk******@mkwahler.netwrote,

std::string line("123.456 987.654 3.1416");
std::istringstream iss(line);
double x(0);
double y(0);
double z(0);
if(!(iss >x >y >z))
std::cerr << "Conversion error\n";

What is your reason for thinking that constructing a string,
constructing a stringstream, and extracting the three doubles, would
be faster than the sscanf call that the OP wishes to improve upon?

Jun 6 '07 #10

Boris Kolpackov

Hi Thomas,

Thomas Kowalski <th***@gmx.dewrites:

Since atof is using local information (we always use the "." as
separator) and also should be able to parse different representations
of float numbers, I guess there is plenty of room for improvement. In
case of a custom parser the search is also not necessary since the we
know that the next double will follow directly one char after the end
of the last.

I think aside from a custom parser, strtod is your best choice. It has
the added benefit of returning a pointer to the end of value being
parsed so you can just increment it and pass to the next call to strtod.

hth,
-boris
--
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding

Jun 6 '07 #11

Thomas Kowalski

I have a string to double parser that performs about 2x times

better on my machine than strtod() or sscanf().

Thanks a lot. I will give it a try. Dzi kuje :)
BTW: Which license do you use for the parser? GPL?

Thomas Kowalski

Jun 7 '07 #12

Ian Collins

Thomas Kowalski wrote:

>I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf().

Thanks a lot. I will give it a try.

I found is about 60% slower than strtod() on my system (Solaris), so
your money will vary depending on how good your native strtod() family
perform.

--
Ian Collins.

Jun 7 '07 #13

Jacek Dziedzic

Thomas Kowalski wrote:

>I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf().

Thanks a lot. I will give it a try. Dzi kuje :)
BTW: Which license do you use for the parser? GPL?

You're welcome. There's no licence, I rolled this one
during one evening, feel free to have it for free!

If you intend to use it, perhaps you should test it
extensively, I just wrote several simple tests to make
sure it works.

HTH,
- J.

Jun 9 '07 #14

Jacek Dziedzic

Ian Collins wrote:

Thomas Kowalski wrote:

>>I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf().
Thanks a lot. I will give it a try.

I found is about 60% slower than strtod() on my system (Solaris), so
your money will vary depending on how good your native strtod() family
perform.

Thanks for the input. That goes to show how fragile such
hand-rolled optimizations really are. I measured on an Itanium-2
and on an AMD64 and my version was about twice as fast.

cheers,
- J.

Jun 9 '07 #15

Ian Collins

Jacek Dziedzic wrote:

Ian Collins wrote:
>Thomas Kowalski wrote:
>>>I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf().
Thanks a lot. I will give it a try.

I found is about 60% slower than strtod() on my system (Solaris), so
your money will vary depending on how good your native strtod() family
perform.

Thanks for the input. That goes to show how fragile such
hand-rolled optimizations really are. I measured on an Itanium-2
and on an AMD64 and my version was about twice as fast.

Or how variable implementation's standard libraries can be!

--
Ian Collins.

Jun 9 '07 #16

James Kanze

On Jun 9, 10:25 pm, Ian Collins <ian-n...@hotmail.comwrote:

Jacek Dziedzic wrote:
Ian Collins wrote:
Thomas Kowalski wrote:
I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf().
Thanks a lot. I will give it a try.

I found is about 60% slower than strtod() on my system (Solaris), so
your money will vary depending on how good your native strtod() family
perform.

Thanks for the input. That goes to show how fragile such
hand-rolled optimizations really are. I measured on an Itanium-2
and on an AMD64 and my version was about twice as fast.

Or how variable implementation's standard libraries can be!

Note that you can often convert to floating point a lot faster
if 1) you don't mind small errors in the least significant bits,
or 2) you can limit the number of digits you have to deal with
in the input. A quality convertion routine in the standard
library can't do either of these. And needs integer arithmetic
of more than 32 bits to get correct results. If you're running
on a 32 bit machine, and limit the input to, say, 9 digits, you
should be able to do a lot better than the standard library.
This won't necessarily hold for a 64 bit machine, however.

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 9 '07 #17

Thomas Kowalski

Note that you can often convert to floating point a lot faster

if 1) you don't mind small errors in the least significant bits,

What exactly are you referring too? Errors caused by addition of
doubles?

Regards,
Thomas Kowalski

Jun 12 '07 #18

by: | last post by:

Hi, I need to read a big CSV file, where different fields should be converted to different types, such as int, double, datetime, SqlMoney, etc. I have an array, which describes the fields and...

.NET Framework

How Do I parse this XML document, most efficiently?

by: Russell Mangel | last post by:

What would be the best way to parse this XML document? I want to avoid using XMLDocument. I don't know if I should use XMLTextReader, or Xpath classes. There is only one element <MessageStore>...

.NET Framework

double.Parse(double.MaxValue.ToString()) yields an Exception

by: Markus Kling | last post by:

"double.Parse(double.MaxValue.ToString())" yields the following Exception: Value was either too large or too small for a Double. at System.Number.ParseDouble(String value, NumberStyles options,...

C# / C Sharp

How to Parse a string with Embedded Double Quotes

by: Charles Law | last post by:

I have a string similar to the following: " MyString 40 "Hello world" all " It contains white space that may be spaces or tabs, or a combination, and I want to produce an array...

Visual Basic .NET

How do I parse a string number into a float with four decimal places?

by: Phil Mc | last post by:

OK this should be bread and butter, easy to do, but I seem to be going around in circles and not getting any answer to achieving this simple task. I have numbers in string format (they are...

C# / C Sharp

double.Parse produces overflow exception on double.MinValue

by: Samuel R. Neff | last post by:

I'm using a quasi open-source project and am running into an exception in double.Parse which is effectively this: double.Parse(double.MinValue.ToString()) System.OverflowException: Value was...

C# / C Sharp

How to parse a file in C++

by: AdrianH | last post by:

Assumptions I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C++ programming. FYI Although I have called...

C / C++

How to Parse a File in C

by: AdrianH | last post by:

Assumptions I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C programming. FYI Although I have called this...

C / C++

double.Parse(string) problem

by: Mika M | last post by:

Hi, Just for fun I'm trying to parse my GPS position string using C# 2005. When my code is trying to parse latitude string to double value like... double.Parse(items); // items = "6215.1058"...

C# / C Sharp

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Parse string to double really fast

Similar topics