473,396 Members | 1,970 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Number formatting

Hi all,

I'm currently facing something which is quite annoying and probably one of
you might have an idea of how to solve it efficiently. I have some software
(upon which I have no influence!!!!) which delivers data in scientific
notation and I have to read it. This is fairly simple, but here is the
tricky thing. This software is written in FORTRAN and shows the following
feature, which IMHO is rather a bug than a feature. If numbers get very
small like 7.0614E-238 it starts writing them out as 7.0614-238. So when I
parse the file what I get is 7.0614 because the minus is seen as a
separator. Of course I could start reading all the data a strings,
tokenizing them and start checking for this rather quirky behavior, but this
would slow down the process of reading the data which can be really huge!
Does anybody of you have an idea on how to "fix" this problem because I
cannot change the software which delivers these , IMHO corrupted values,
which are FORTRAN standard compliant.

Cheers
Chris
Oct 11 '06 #1
6 2253
Chris Theis wrote:
I'm currently facing something which is quite annoying and probably one of
you might have an idea of how to solve it efficiently. I have some
software (upon which I have no influence!!!!) which delivers data in
scientific notation and I have to read it. This is fairly simple, but here
is the tricky thing. This software is written in FORTRAN and shows the
following feature, which IMHO is rather a bug than a feature. If numbers
get very small like 7.0614E-238 it starts writing them out as 7.0614-238.
So when I parse the file what I get is 7.0614 because the minus is seen as
a separator. Of course I could start reading all the data a strings,
tokenizing them and start checking for this rather quirky behavior, but
this would slow down the process of reading the data which can be really
huge!
How do you know that parsing the - would slow the program down?

Here's a reprehensibly simple parser:

http://c2.com/cgi/wiki?MsWindowsResourceLint

Here's one of its member functions:

string const &
pullNextToken()
{
m_priorToken = m_currentToken;
extractNextToken();
return m_currentToken;
}

Here's a unit test on that function:

TEST_(TestCase, pullNextToken)
{

Source aSource("a b\nc\n d");

string
token = aSource.pullNextToken();
CPPUNIT_ASSERT_EQUAL("a", token);
token = aSource.pullNextToken();
CPPUNIT_ASSERT_EQUAL("b", token);
token = aSource.pullNextToken();
CPPUNIT_ASSERT_EQUAL("c", token);
token = aSource.pullNextToken();
CPPUNIT_ASSERT_EQUAL("d", token);
token = aSource.pullNextToken();
CPPUNIT_ASSERT_EQUAL("" , token); // EOF!

}

Now imagine if you wrote a dirt-simple parser, using fstream goodies, and
you also wrote unit tests like that. You could add a test that calls a hard
function ten thousand times, and then asserts that the CPU time didn't
exceed some obvious limit, like a thousandth of a second.

You will probably discover that your parser is not slow. If you only stream
characters, and never buffer strings into std::string (possibly slow), then
all your code might run inside the CPU's cache, without excessive data
motion on the main bus.

Never guess what could be slow; measure.

--
Phlip
http://www.greencheese.us/ZeekLand <-- NOT a blog!!!
Oct 11 '06 #2
Chris Theis wrote:
Hi all,

I'm currently facing something which is quite annoying and probably one of
you might have an idea of how to solve it efficiently. I have some software
(upon which I have no influence!!!!) which delivers data in scientific
notation and I have to read it. This is fairly simple, but here is the
tricky thing. This software is written in FORTRAN and shows the following
feature, which IMHO is rather a bug than a feature. If numbers get very
small like 7.0614E-238 it starts writing them out as 7.0614-238. So when I
parse the file what I get is 7.0614 because the minus is seen as a
separator. Of course I could start reading all the data a strings,
tokenizing them and start checking for this rather quirky behavior, but this
would slow down the process of reading the data which can be really huge!
Does anybody of you have an idea on how to "fix" this problem because I
cannot change the software which delivers these , IMHO corrupted values,
which are FORTRAN standard compliant.

Cheers
Chris
Pretty much every programmer of scientific code has had that "joy". I'd
be interested myself, whether there's some secret "Fortran locale",
that would make all of this obsolete. Looking at LC_NUMERIC and co. I
doubt so :(

In C++ I use code like the following. If you really, really need to go
for speed, you'll have to roll your parser yourself. However, if speed
was an absolute issue, you'd be reading/writing binary data anyways, so
there's no point. Btw, it would be pretty easy to fix this problem from
the fortran side.

#include <iostream>
#include <cmath>
#include <sstream>

struct fortran_double {
fortran_double operator = (const double d) {
value=d;
return *this;
}
operator double() const {return value;}
friend std::istream& operator >(std::istream &in,
fortran_double &fd);
private:
double value;
};
template <typename T>
inline T exp10 (T x)
{
static T log_10 = std::log(static_cast<T>(10.0));
return exp(log_10 * x);
}

std::istream& operator >(std::istream &in, fortran_double &fd) {
double d;
int mantissa;
in >d;
char ch=in.peek();
if (ch=='+' || ch=='-') {
in >mantissa;
d*=exp10(static_cast<double(mantissa));
}
fd = d;
return in;
}

int main () {
double x=0;
fortran_double fd;
std::istringstream in("1.2344-200");
in >fd;
x=fd;
std::cout << x << "\n";
}

Oct 11 '06 #3
"Phlip" <ph******@yahoo.comwrote in message
news:b%*******************@newssvr27.news.prodigy. net...
Chris Theis wrote:
>I'm currently facing something which is quite annoying and probably one
of you might have an idea of how to solve it efficiently. I have some
software (upon which I have no influence!!!!) which delivers data in
scientific notation and I have to read it. This is fairly simple, but
here is the tricky thing. This software is written in FORTRAN and shows
the following feature, which IMHO is rather a bug than a feature. If
numbers get very small like 7.0614E-238 it starts writing them out as
7.0614-238. So when I parse the file what I get is 7.0614 because the
minus is seen as a separator. Of course I could start reading all the
data a strings, tokenizing them and start checking for this rather quirky
behavior, but this would slow down the process of reading the data which
can be really huge!

How do you know that parsing the - would slow the program down?

Here's a reprehensibly simple parser:

http://c2.com/cgi/wiki?MsWindowsResourceLint

Here's one of its member functions:

string const &
pullNextToken()
{
m_priorToken = m_currentToken;
extractNextToken();
return m_currentToken;
}

Here's a unit test on that function:

TEST_(TestCase, pullNextToken)
{

Source aSource("a b\nc\n d");

string
token = aSource.pullNextToken();
CPPUNIT_ASSERT_EQUAL("a", token);
token = aSource.pullNextToken();
CPPUNIT_ASSERT_EQUAL("b", token);
token = aSource.pullNextToken();
CPPUNIT_ASSERT_EQUAL("c", token);
token = aSource.pullNextToken();
CPPUNIT_ASSERT_EQUAL("d", token);
token = aSource.pullNextToken();
CPPUNIT_ASSERT_EQUAL("" , token); // EOF!

}

Now imagine if you wrote a dirt-simple parser, using fstream goodies, and
you also wrote unit tests like that. You could add a test that calls a
hard function ten thousand times, and then asserts that the CPU time
didn't exceed some obvious limit, like a thousandth of a second.

You will probably discover that your parser is not slow. If you only
stream characters, and never buffer strings into std::string (possibly
slow), then all your code might run inside the CPU's cache, without
excessive data motion on the main bus.

Never guess what could be slow; measure.
Hi Phlip,

now you're actually guessing that I didn't measure, aren't you? ;-) Well,
the thing is that at one point I would have to use strings to assemble the
total number and finally convert it into a double. All of this is more work
than simply reading and storing a value. Therefore, I was looking for a
solution which doesn't necessarily need to re-assemble numbers via
strings/characters but rather some way to emulate this quirky FORTRAN
format. Although, I more and more get the impression that this simply
doesn't work and I will have to try to convince the responsponsible people
to adjust their format specifiers, as it's just a couple of key punches for
them, whereas I would have to invest quite some time to solve this.

Thanks
Chris

Oct 11 '06 #4

"Phlip" <ph******@yahoo.comwrote in message
news:b%*******************@newssvr27.news.prodigy. net...
Chris Theis wrote:
>
Here's a unit test on that function:
Now imagine if you wrote a dirt-simple parser, using fstream goodies, and
you also wrote unit tests like that.
You've got a little crush on that "unit test" thingie, don't you? C'mon,
fess up, you know you like it...

;-)

Oct 11 '06 #5
Howard wrote:
You've got a little crush on that "unit test" thingie, don't you? C'mon,
fess up, you know you like it...
A "crush"? You might also call it a marriage...

Chris Theis wrote:
>How do you know that parsing the - would slow the program down?
now you're actually guessing that I didn't measure, aren't you? ;-)
I answer "premature optimization is the root of all evil" too often here...
Well, the thing is that at one point I would have to use strings to
assemble the total number and finally convert it into a double.
At the bottom of my post I hinted that dealing in streams instead of strings
would be faster, and more like a parser.

So if you put my technique together with F.J.K.'s, you could use his main()
as your first unit test.
All of this is more work than simply reading and storing a value.
More coding for you or more work for the CPU? F.J.K.'s solution shows how to
parse and treat each number as you get it, without putting the numbers into
separate std::string objects or anything like that.
...Although, I more and more get the impression that this simply doesn't
work and I will have to try to convince the responsponsible people to
adjust their format specifiers, as it's just a couple of key punches for
them, whereas I would have to invest quite some time to solve this.
And in terms of process, one fixes a bug as close as possible to its source.
Don't output a bug, then detect it and clean up after it with extra
statements.

--
Phlip
http://www.greencheese.us/ZeekLand <-- NOT a blog!!!
Oct 11 '06 #6
Hi there,
Pretty much every programmer of scientific code has had that "joy". I'd
be interested myself, whether there's some secret "Fortran locale",
that would make all of this obsolete. Looking at LC_NUMERIC and co. I
doubt so :(
I did some research but I honestly doubt so too :-(
>
In C++ I use code like the following. If you really, really need to go
for speed, you'll have to roll your parser yourself. However, if speed
was an absolute issue, you'd be reading/writing binary data anyways, so
there's no point.
Binary is a little complicated as we have to remain portable for a lot of
platforms and there are some backwards compatibility issues with the program
delivering the data already. So this topic is unfortunately a little touchy
and beyond my influence.
Btw, it would be pretty easy to fix this problem from
the fortran side.
Yes that's for sure - it would be adding "E3" to the format string and
that's it. But the tricky thing is to convice the responsible, a hardcore
FORTRAN developer, to acknowledge that something like 7.0631-236 is an
expression and not a proper scientifc notation for a value ;-)

Thanks for the code - it's pretty much what I finally came up with and
implemented as a first work-around.

Thanks a lot guys
Chris


Oct 12 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
by: Douglas | last post by:
Gday, How would I format a number so that: TheValue = 32500 Displays in the TextBox as: $32,500.00
0
by: Josh Harris | last post by:
Here is my issue: I have a datagrid that is populated with a datatable. I want the columns of the datagrid to be sortable. I also want to format the numeric columns such as two decimal places...
2
by: Steve Peterson | last post by:
Hi I have an app where I have to deal with both Spanish & American formatting. I have a string that represents a number that I need to convert to Int32 before I enter it in the database. The...
29
by: james | last post by:
I have a problem that at first glance seems not that hard to figure out. But, so far, the answer has escaped me. I have an old database file that has the date(s) stored in it as number of days. An...
4
by: Brian Henry | last post by:
I have phone numbers like this in a data table 123-435-1234 1231231234 432.234.2321 they all have different formatting, what I want to do is get them all formatted like this (123) 123-1234
1
by: womblesjc | last post by:
I have a data bound Details View control in asp.net 2.0 that formats a phone number. The 'Default Mode' for the control is set to Edit. The phone number field is a template field and I can...
2
by: dcyale | last post by:
I have a report with the following paragraph: ="This BA presents a " & & " determination for the " & & " and associated habitat. We would appreciate you processing the biological opinion by " &...
109
by: jmcgill | last post by:
Hello. Is there a method for computing the number of digits, in a given numeric base, of N factorial, without actually computing the factorial? For example, 8! has 5 digits in base 10; 10! has...
9
by: Nebojsa4 | last post by:
Hi. First, sorry on my weak English to all. Qusetion: How to read (in VB) Manufacturer serial number of Hard disk drive? Not volume/serial number of C:, D:, etc. partitons. For reading...
2
Pittaman
by: Pittaman | last post by:
Hello I am creating some crystal reports (for visual studio 2005) based on the content of certain .NET objects. I'm doing this in .NET 2.0. For one of them I'm using a Cross-table to summarize...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.