Aliasing/Torek's strtod() experience

Adam Warner

Hi all,

Message ID <c1********@ene ws2.newsguy.com > is one of many informative
articles by Chris Torek about C. The particular message discusses aliasing
and concludes with this paragraph:

Under these strict type-aliasing rules, casting from (e.g.) "int *" to
"short *" is not only quite suspicious, it is also likely to cause
puzzling behavior, at least if you expect your "short *" to access or
modify your "int". Even the time-honored, albeit dubious, practise of
breaking a 64-bit IEEE "double" into two 32-bit integers (int or long
depending on the CPU involved) via a union need not work, and sometimes
does not. (We had a problem with strtod() not working right because of
code just like this. It worked in older gcc compilers, and eventually
failed when gcc began doing type-specific alias analysis and
optimizations.)

The code I've written below breaks an 8 byte double into two 4 byte
unsigned integers via a union. How should this code be modified so it
conforms to C's aliasing rules?

#include <assert.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

union u {
double f64;
uint32_t u32[2];
};

int main() {
assert(sizeof(d ouble)==8);
double val=strtod("1.2 3", NULL);
printf("%i %i\n", ((union u) val).u32[0], ((union u) val).u32[1]);
return 0;
}

Many thanks,
Adam

Nov 15 '05 #1

Subscribe Reply

1970

Grumble

Adam Warner wrote:

Message ID <c1********@ene ws2.newsguy.com > is one of many informative
articles by Chris Torek about C. The particular message discusses aliasing
and concludes with this paragraph:

Under these strict type-aliasing rules, casting from (e.g.) "int *" to
"short *" is not only quite suspicious, it is also likely to cause
puzzling behavior, at least if you expect your "short *" to access or
modify your "int". Even the time-honored, albeit dubious, practise of
breaking a 64-bit IEEE "double" into two 32-bit integers (int or long
depending on the CPU involved) via a union need not work, and sometimes
does not. (We had a problem with strtod() not working right because of
code just like this. It worked in older gcc compilers, and eventually
failed when gcc began doing type-specific alias analysis and
optimizations.)

The code I've written below breaks an 8 byte double into two 4 byte
unsigned integers via a union. How should this code be modified so it
conforms to C's aliasing rules?

#include <assert.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

union u {
double f64;
uint32_t u32[2];
};

int main() {
assert(sizeof(d ouble)==8);
double val=strtod("1.2 3", NULL);
printf("%i %i\n", ((union u) val).u32[0], ((union u) val).u32[1]);
return 0;
}

I am not sure it is safe to cast 'double' to 'union u'.

In C89, writing to member f64, then reading from member u32 has
implementation-defined behavior - 6.5.2.3 #5.

Nov 15 '05 #2

Adam Warner

On Wed, 29 Jun 2005 10:22:13 +0200, Grumble wrote:

The code I've written below breaks an 8 byte double into two 4 byte
unsigned integers via a union. How should this code be modified so it
conforms to C's aliasing rules?

#include <assert.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

union u {
double f64;
uint32_t u32[2];
};

int main() {
assert(sizeof(d ouble)==8);
double val=strtod("1.2 3", NULL);
printf("%i %i\n", ((union u) val).u32[0], ((union u) val).u32[1]);
return 0;
}

I am not sure it is safe to cast 'double' to 'union u'.

In C89, writing to member f64, then reading from member u32 has
implementation-defined behavior - 6.5.2.3 #5.

I suspect aliasing rules are better specified in C99 (6.5 #7):

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
-- a type compatible with the effective type of the object,
-- a qualified version of a type compatible with the effective type of
the object,
-- a type that is the signed or unsigned type corresponding to the
effective type of the object,
-- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
-- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
-- a character type.

Doesn't the second to last point mean that writing to member f64 then
reading from member u32 is well specified in C99?

If so is this approach conforming:

double val=strtod("1.2 3", NULL);
union u tmp;
tmp.f64=val;
printf("%i %i\n", tmp.u32[0], tmp.u32[1]);

(This eliminates the dubious casts, which is aways a good sign!)

Regards,
Adam

Nov 15 '05 #3

Lawrence Kirby

On Wed, 29 Jun 2005 13:29:59 +1200, Adam Warner wrote:

Hi all,

Message ID <c1********@ene ws2.newsguy.com > is one of many informative
articles by Chris Torek about C. The particular message discusses aliasing
and concludes with this paragraph:

Under these strict type-aliasing rules, casting from (e.g.) "int *" to
"short *" is not only quite suspicious, it is also likely to cause
puzzling behavior, at least if you expect your "short *" to access or
modify your "int". Even the time-honored, albeit dubious, practise of
breaking a 64-bit IEEE "double" into two 32-bit integers (int or long
depending on the CPU involved) via a union need not work, and sometimes
does not. (We had a problem with strtod() not working right because of
code just like this. It worked in older gcc compilers, and eventually
failed when gcc began doing type-specific alias analysis and
optimizations.)

The code I've written below breaks an 8 byte double into two 4 byte
unsigned integers via a union. How should this code be modified so it
conforms to C's aliasing rules?
What is it you want to achieve by doing this? It is inherently
non-portable even without the aliasing rules. The simple answer would be
don't do it at all.
#include <assert.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

union u {
double f64;
uint32_t u32[2];
};

int main() {
assert(sizeof(d ouble)==8);
double val=strtod("1.2 3", NULL);
printf("%i %i\n", ((union u) val).u32[0], ((union u) val).u32[1]);
return 0;
}

One way to get around the aliasing rules is to memcpy() from a double
object to a separate array of uint32_t. Or maybe you don't need to use
uint32_t, you can access any object as a array of unsigned char which
allows you to access the representation of that object. So

double f64;
unsigned char *p = (unsigned char *)&f64;

and you can access p[0] to p[(sizeof f64)-1]. That's essentially that the
memcpy() is doing.

Lawrence

Nov 15 '05 #4

Adam Warner

On Wed, 29 Jun 2005 12:32:56 +0100, Lawrence Kirby wrote:

On Wed, 29 Jun 2005 13:29:59 +1200, Adam Warner wrote:
Hi all,

Message ID <c1********@ene ws2.newsguy.com > is one of many informative
articles by Chris Torek about C. The particular message discusses aliasing
and concludes with this paragraph:

Under these strict type-aliasing rules, casting from (e.g.) "int *" to
"short *" is not only quite suspicious, it is also likely to cause
puzzling behavior, at least if you expect your "short *" to access or
modify your "int". Even the time-honored, albeit dubious, practise of
breaking a 64-bit IEEE "double" into two 32-bit integers (int or long
depending on the CPU involved) via a union need not work, and sometimes
does not. (We had a problem with strtod() not working right because of
code just like this. It worked in older gcc compilers, and eventually
failed when gcc began doing type-specific alias analysis and
optimizations.)

The code I've written below breaks an 8 byte double into two 4 byte
unsigned integers via a union. How should this code be modified so it
conforms to C's aliasing rules?

What is it you want to achieve by doing this?

Knowledge of how the issue described above might have been worked around.

#include <assert.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

union u {
double f64;
uint32_t u32[2];
};

int main() {
assert(sizeof(d ouble)==8);
double val=strtod("1.2 3", NULL);
printf("%i %i\n", ((union u) val).u32[0], ((union u) val).u32[1]);
return 0;
}

One way to get around the aliasing rules is to memcpy() from a double
object to a separate array of uint32_t. Or maybe you don't need to use
uint32_t, you can access any object as a array of unsigned char which
allows you to access the representation of that object. So

double f64;
unsigned char *p = (unsigned char *)&f64;

and you can access p[0] to p[(sizeof f64)-1]. That's essentially that the
memcpy() is doing.

That's two ways I hadn't thought of, thanks.

Can you please confirm that my followup suggestion to assign the double to
the union and then access it as integers is also a conforming (to C99)
approach:

double val=strtod("1.2 3", NULL);
union u tmp;
tmp.f64=val;
printf("%i %i\n", tmp.u32[0], tmp.u32[1]);

Regards,
Adam

Nov 15 '05 #5

Michael Mair

Adam Warner wrote:

On Wed, 29 Jun 2005 10:22:13 +0200, Grumble wrote:

The code I've written below breaks an 8 byte double into two 4 byte
unsigned integers via a union. How should this code be modified so it
conforms to C's aliasing rules?

#include <assert.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

union u {
double f64;
uint32_t u32[2];
};

int main() {
assert(sizeof(d ouble)==8);
double val=strtod("1.2 3", NULL);
printf("%i %i\n", ((union u) val).u32[0], ((union u) val).u32[1]);
return 0;
}

I am not sure it is safe to cast 'double' to 'union u'.

In C89, writing to member f64, then reading from member u32 has
implementatio n-defined behavior - 6.5.2.3 #5.

I suspect aliasing rules are better specified in C99 (6.5 #7):

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
-- a type compatible with the effective type of the object,
-- a qualified version of a type compatible with the effective type of
the object,
-- a type that is the signed or unsigned type corresponding to the
effective type of the object,
-- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
-- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
-- a character type.

Doesn't the second to last point mean that writing to member f64 then
reading from member u32 is well specified in C99?

If so is this approach conforming:

double val=strtod("1.2 3", NULL);
union u tmp;
tmp.f64=val;
printf("%i %i\n", tmp.u32[0], tmp.u32[1]);

(This eliminates the dubious casts, which is aways a good sign!)

I do not have a standard handy right now, so I cannot prove the
following by chapter and verse; AFAIR there is nothing explicitly
stating that you can only access a union member you previously stored
to but for something in the infamous Annex J.
For members of the same size the only convincing argument I know
(and saw once upon a time in c.l.c) is that an implementation could
store different members in different places, e.g. the compiler stores
a 64bit floating point variable in a register and "leaves" the yet
unused array where it is in memory. As there is nothing explicitly
forbidding this, you could see a nasty surprise.
One can come up with volatile, though.

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Nov 15 '05 #6

Walter Roberson

In article <pa************ *************** *@consulting.ne t.nz>,
Adam Warner <us****@consult ing.net.nz> wrote:

I suspect aliasing rules are better specified in C99 (6.5 #7): An object shall have its stored value accessed only by an lvalue
expression that has one of the following types: -- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or Doesn't the second to last point mean that writing to member f64 then
reading from member u32 is well specified in C99?

I don't have the C99 standard available, but such a thing would be
a notable departure from C89.

In C89, it is clear that the only time you can read a union with
"a different type" than you last stored into it, is in the case
where the two union elements have a common prefix, so at the lowest
level you are reading the same type, even if the aggregate type name
is different.

One cannot, though, expect this to work if the common prefix is not
exactly compatable at each element, as there could be differences in
padding. For example, one might know that sizeof(float) == sizeof(int)
but if the prefix in one version was a float, and the prefix in the
other version was an int, then the behaviour of reading the next value
afterwards is not defined, since the padding for float could be
different than the padding for int.

--
'ignorandus (Latin): "deserving not to be known"'
-- Journal of Self-Referentialism

Nov 15 '05 #7

Adam Warner

On Wed, 29 Jun 2005 22:39:39 +0200, Michael Mair wrote:

I am not sure it is safe to cast 'double' to 'union u'.

In C89, writing to member f64, then reading from member u32 has
implementati on-defined behavior - 6.5.2.3 #5.
I suspect aliasing rules are better specified in C99 (6.5 #7):

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
-- a type compatible with the effective type of the object,
-- a qualified version of a type compatible with the effective type of
the object,
-- a type that is the signed or unsigned type corresponding to the
effective type of the object,
-- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
-- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
-- a character type.

Doesn't the second to last point mean that writing to member f64 then
reading from member u32 is well specified in C99?

If so is this approach conforming:

double val=strtod("1.2 3", NULL);
union u tmp;
tmp.f64=val;
printf("%i %i\n", tmp.u32[0], tmp.u32[1]);

(This eliminates the dubious casts, which is always a good sign!)

I do not have a standard handy right now, so I cannot prove the
following by chapter and verse; AFAIR there is nothing explicitly
stating that you can only access a union member you previously stored
to but for something in the infamous Annex J.

"The following are unspecified: ... The value of a union member other
than the last one stored into (6.2.6.1)."

There appear to be two instances where this is unspecified in 6.2.6.1:

When a value is stored in an object of structure or union type,
including in a member object, the bytes of the object representation
that correspond to any padding bytes take unspecified values.42) The
values of padding bytes shall not affect whether the value of such an
object is a trap representation. Those bits of a structure or union
object that are in the same byte as a bit-field member, but are not
part of that member, shall similarly not affect whether the value of
such an object is a trap representation.

When a value is stored in a member of an object of union type, the
bytes of the object representation that do not correspond to that
member but do correspond to other members take unspecified values, but
the value of the union object shall not thereby become a trap
representation.

Unspecified instance 1 is not applicable when there are no corresponding
padding bytes to take unspecified values. I mapped two 4 byte integers
onto an 8 byte double (I checked the double was 8 bytes with an assertion).
As all members of a union are aligned to the same starting address the two
member objects overlap perfectly.

Unspecified instance 2 is also not applicable because the bytes of both
object representations overlap perfectly.
For members of the same size the only convincing argument I know
(and saw once upon a time in c.l.c) is that an implementation could
store different members in different places, e.g. the compiler stores
a 64bit floating point variable in a register and "leaves" the yet
unused array where it is in memory. As there is nothing explicitly
forbidding this, you could see a nasty surprise.
One can come up with volatile, though.

"A union type describes an overlapping nonempty set of member objects
...." If member objects behave as if they are stored in different places
then they don't semantically overlap. I don't think this argument you
read is at all convincing. Regardless of how member objects within a union
are implemented they should behave _as if they overlap_.

Regards,
Adam

Nov 15 '05 #8

Tim Rentsch

Michael Mair <Mi**********@i nvalid.invalid> writes:

Adam Warner wrote:

[... storing into one member of a union, accessing another ...]

I do not have a standard handy right now, so I cannot prove the
following by chapter and verse; AFAIR there is nothing explicitly
stating that you can only access a union member you previously stored
to but for something in the infamous Annex J.
For members of the same size the only convincing argument I know
(and saw once upon a time in c.l.c) is that an implementation could
store different members in different places, e.g. the compiler stores
a 64bit floating point variable in a register and "leaves" the yet
unused array where it is in memory. As there is nothing explicitly
forbidding this, you could see a nasty surprise.
One can come up with volatile, though.

My understanding is that the storing one member of a union in
different memory than another member was the result of unclear
language in the standard, and that the unclear language is
expected to be addressed through a TC. See:

http://www.open-std.org/jtc1/sc22/wg...ocs/dr_283.htm

Nov 15 '05 #9

Michael Mair

Tim Rentsch wrote:

Michael Mair <Mi**********@i nvalid.invalid> writes:

Adam Warner wrote:

[... storing into one member of a union, accessing another ...]

I do not have a standard handy right now, so I cannot prove the
following by chapter and verse; AFAIR there is nothing explicitly
stating that you can only access a union member you previously stored
to but for something in the infamous Annex J.
For members of the same size the only convincing argument I know
(and saw once upon a time in c.l.c) is that an implementation could
store different members in different places, e.g. the compiler stores
a 64bit floating point variable in a register and "leaves" the yet
unused array where it is in memory. As there is nothing explicitly
forbidding this, you could see a nasty surprise.
One can come up with volatile, though.

My understanding is that the storing one member of a union in
different memory than another member was the result of unclear
language in the standard, and that the unclear language is
expected to be addressed through a TC. See:

http://www.open-std.org/jtc1/sc22/wg...ocs/dr_283.htm

Thank you very much!
So, this really is not outlawed but only to be used with care
(and, usually, in an implementation defined way).
Cheers :-)
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Nov 15 '05 #10

Similar topics

3019

strtod / setlocale

by: Mathieu Malaterre | last post by:

Hello, I would like to have some advices on this problem I am having. In my code I have hardcoded a string like: const char foo = "11 0.438482 "; I was then calling strtod to transform it back to double. Unfortunately depending on the LOCALE settings, the strtod could fail.

C / C++

2281

strtod - Dynamic Memory?

by: Marky C | last post by:

atof is not working. double a = atof("12.345"); a gets set to 12.000 I am working on a toshiba micro. The data map has no space allocated to it for dynamic memory. Does anyone have an idea? Could it be due to the lack of dynamic

C / C++

284

Aliasing/Torek's strtod() experience

by: Adam Warner | last post by:

Hi all, Message ID <c1qo3f0tro@enews2.newsguy.com> is one of many informative articles by Chris Torek about C. The particular message discusses aliasing and concludes with this paragraph: Under these strict type-aliasing rules, casting from (e.g.) "int *" to "short *" is not only quite suspicious, it is also likely to cause puzzling behavior, at least if you expect your "short *" to access or modify your "int". Even the time-honored,...

C / C++

2077

Proper way to do casts while avoiding aliasing issues

by: liljencrantz | last post by:

Hi, I have a piece of code that uses hashtables to store pointers to various bits of data. The hashtable sees all pointers as const void *, while the application obviously uses various other pointer types for the data. I've run into a warning with the following code: void hash_remove( hash_table_t *h, const void *key, const void **old_key,

C / C++

2467

strtod(*iter) + double

by: Gary Wessle | last post by:

Hi I have a vector<stringwhich holds numbers, I need to loop and printout those numbers + a value as doubles . typedef vector<string>::const_iterator vs_itr; for(vs_itr i=vect.begin(); i!=vect.end(); ++i){ cout << *i << '\t' << strtod(*i)+val << '\n';

C / C++

2449

Using strtod

by: coder | last post by:

Hi experts, Is the following usage of strtod okay (p is a char pointer): value = strtod(p, &p); Is it possible that this would evoke undefined behaviour? Or should I use a temporary pointer and then assign its value to p? Thanks

C / C++

29697

dereferencing type-punned pointer will break strict-aliasing rules

by: David Mathog | last post by:

I have a program for which this line: if(! lstrtol(&atoken,length-2,(long *) &(lclparams->pad)) || (lclparams->pad< 0)){ generates the warning below, but ONLY if the gcc compiler is at -O2 or -O3. I don't see any reason why optimization should change things much in this piece of code - there's no way to optimize it out and I have verified that this particular line does what it should no matter how the program is compiled. Anyway,...

C / C++

3489

strtod( )

by: Bill Cunningham | last post by:

Since I've been told that char *argv or char **argv must be the second parameter to main's command line structure I have turned to strtod( ) but can't get it to work so far. This function is probably used alot but I obviously am not using it right. #include <stdio.h> int main(int argc, char **argv) { if (argc != 3) {

C / C++

3225

char and strict aliasing

by: Paul Brettschneider | last post by:

Hello all, consider the following code: typedef char T; class test { T *data; public: void f(T, T, T); void f2(T, T, T);

C / C++

8968

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

8787

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

9473

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

9208

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6750

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6053

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4569

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4824

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2744

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP