By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,421 Members | 1,053 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,421 IT Pros & Developers. It's quick & easy.

Misc Qns...

P: n/a
Jin
1. Will simply taking the address of a non-aligned variable (but not
deferencing it) produce undefined behavior?
If it does, are there any known architectures on which this fails?

2. Is char[MB_CUR_MAX] large enough to hold any arbitary multibyte char
beginning and ending in the initial shift state? I didnt quite
understand
the standard (WG14/N869) 7.1.1 #7 (footnote 141).

3. Must the portable characters on the default "C" locale have the same
bit representation when used on an extended locale? IOW, must extended
locales be backwards compatible with the "C" locale?

4. Can the first byte of a shift-byte sequence in a multibyte string have
the same representation as any character in the portable character set
when
in the initial shift state?

Thanks.
Nov 14 '05 #1
Share this Question
Share on Google+
20 Replies


P: n/a
Jin wrote:
1. Will simply taking the address of a non-aligned variable (but not
deferencing it) produce undefined behavior? Not according to the _Standard_. Alignment restrictions are an
implementation detail, not one of the standard. My understanding is
that accessing data that is not aligned will just take more fetches
by the processor.
If it does, are there any known architectures on which this fails?

Could be, but I don't know all the architectures.

[snip -- multibyte char issues]

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Nov 14 '05 #2

P: n/a
Thomas Matthews wrote:

Jin wrote:
1. Will simply taking the address of a non-aligned variable (but not
deferencing it) produce undefined behavior?

Not according to the _Standard_. Alignment restrictions are an
implementation detail, not one of the standard.


No.
Code which relies on there being no alignment requirements
has undefined behavior.
When you have code that relies on
there being no alignment requirements,
all you know is that either it will work or it won't.
If it's going to fail from violating alignment requirements,
then you have no idea how it will fail: that's undefined behavior.

--
pete
Nov 14 '05 #3

P: n/a
In article <op**************@news.starhub.net.sg>, Jin <-> wrote:
1. Will simply taking the address of a non-aligned variable (but not
deferencing it) produce undefined behavior?
If it does, are there any known architectures on which this fails?


How would you generate a non-aligned variable without invoking undefined
behavior?
(It's the implementation's responsibility to make sure any variables
it gives you are aligned appropriately, so if taking a pointer to them
gives you an unaligned pointer that would indicate a problem with the
implementation.)
dave

--
Dave Vandervies dj******@csclub.uwaterloo.ca

Note that printf() could be very useful on a microwave oven.
--Richard Heathfield in comp.lang.c
Nov 14 '05 #4

P: n/a
Dave Vandervies <dj******@csclub.uwaterloo.ca> scribbled the following:
In article <op**************@news.starhub.net.sg>, Jin <-> wrote:
1. Will simply taking the address of a non-aligned variable (but not
deferencing it) produce undefined behavior?
If it does, are there any known architectures on which this fails?
How would you generate a non-aligned variable without invoking undefined
behavior?
(It's the implementation's responsibility to make sure any variables
it gives you are aligned appropriately, so if taking a pointer to them
gives you an unaligned pointer that would indicate a problem with the
implementation.)


I infer from your message that this:
int i[2];
(int *)((char *)i+1);
causes undefined behaviour.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"There's no business like slow business."
- Tailgunner
Nov 14 '05 #5

P: n/a
In article <br**********@oravannahka.helsinki.fi>,
Joona I Palaste <pa*****@cc.helsinki.fi> wrote:
Dave Vandervies <dj******@csclub.uwaterloo.ca> scribbled the following:
In article <op**************@news.starhub.net.sg>, Jin <-> wrote:
1. Will simply taking the address of a non-aligned variable (but not
deferencing it) produce undefined behavior?
If it does, are there any known architectures on which this fails?

How would you generate a non-aligned variable without invoking undefined
behavior?
(It's the implementation's responsibility to make sure any variables
it gives you are aligned appropriately, so if taking a pointer to them
gives you an unaligned pointer that would indicate a problem with the
implementation.)


I infer from your message that this:
int i[2];
(int *)((char *)i+1);
causes undefined behaviour.


I'd have to grovel through the standard (well, my copy of N869) to
be sure, but I think "might cause undefined behavior by generating an
unaligned pointer" is more accurate.

But it's not a pointer to a non-aligned variable generated by the
implementation; it's a (probably non-aligned) pointer to (sizeof(int)-1)
bytes of one int and 1 byte of another int, within a properly aligned
variable (array of 2 ints) generated by the implementation.

A slightly more reasonable way might be:
--------
double foo=SOME_WELL_DEFINED_VALUE;
char *bar=malloc(sizeof foo + 1); /*would be checked in real code, of course*/
double *baz;
memcpy(bar+1,foo,sizeof foo);
/*Is this valid?*/
baz=(double *)(bar+1);
--------
but here baz is still not pointing to a non-aligned variable, it's
pointing to a non-aligned copy of a variable.

But the OP didn't ask about generating a non-aligned pointer using silly
pointer tricks; the question (possibly just poorly worded; see "pedant")
was about taking a pointer to a non-aligned variable.
(And, since silly pointer tricks require knowing what you're getting
yourself into anyways, I would like to think that they can safely be
excluded from a general question like the OP's.)
dave

--
Dave Vandervies dj******@csclub.uwaterloo.ca
[W]hen I am among non-physician Ph.D.s who go by "Doctor" then I like
to be called "Bachelor Petrofsky" in honor of my B.S..
--Al Petrofsky in comp.lang.scheme
Nov 14 '05 #6

P: n/a
Jin <-> writes:
1. Will simply taking the address of a non-aligned variable (but not
deferencing it) produce undefined behavior?
If it does, are there any known architectures on which this fails?


As far as I know, you can't get a non-alighed variable in the first
place without invoking undefined behavior.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
(Note new e-mail address)
Nov 14 '05 #7

P: n/a
Joona I Palaste <pa*****@cc.helsinki.fi> wrote:
Dave Vandervies <dj******@csclub.uwaterloo.ca> scribbled the following:
In article <op**************@news.starhub.net.sg>, Jin <-> wrote:
1. Will simply taking the address of a non-aligned variable (but not
deferencing it) produce undefined behavior?
If it does, are there any known architectures on which this fails?
How would you generate a non-aligned variable without invoking undefined
behavior?
(It's the implementation's responsibility to make sure any variables
it gives you are aligned appropriately, so if taking a pointer to them
gives you an unaligned pointer that would indicate a problem with the
implementation.)
I infer from your message that this:
int i[2];
(int *)((char *)i+1);
causes undefined behaviour.


You have created a pointer to a region of space that is probably
not properly aligned for an int. AFAIK, this is perfectly legal.
What is not legal is actually storing an int in this space.

E.g.

$ cat e.c
int main(void)
{
int i[2];
int *p = (int *)((char *)i+1);

*p = 2;

return 0;
}

$ ./a.out
Bus Error (core dumped)

Alex
Nov 14 '05 #8

P: n/a
Jin
On Tue, 16 Dec 2003 22:10:37 GMT, Keith Thompson <ks***@mib.org> wrote:
Jin <-> writes:
1. Will simply taking the address of a non-aligned variable (but not
deferencing it) produce undefined behavior?
If it does, are there any known architectures on which this fails?


As far as I know, you can't get a non-alighed variable in the first
place without invoking undefined behavior.


unsigned char a[16];
unsigned int *b = (unsigned int*)&a[1];

Not really a "variable" in the strict sense, but you get the idea.
Nov 14 '05 #9

P: n/a
Jin wrote:

On Tue, 16 Dec 2003 22:10:37 GMT, Keith Thompson <ks***@mib.org> wrote:
Jin <-> writes:
1. Will simply taking the address of a non-aligned variable (but not
deferencing it) produce undefined behavior?
If it does, are there any known architectures on which this fails?


As far as I know, you can't get a non-alighed variable in the first
place without invoking undefined behavior.


unsigned char a[16];
unsigned int *b = (unsigned int*)&a[1];

Not really a "variable" in the strict sense, but you get the idea.


That looks like an example of undefined behavior to me.

--
pete
Nov 14 '05 #10

P: n/a
On Tue, 16 Dec 2003 16:00:20 GMT, Thomas Matthews
<Th****************************@sbcglobal.net> wrote in comp.lang.c:
Jin wrote:
1. Will simply taking the address of a non-aligned variable (but not
deferencing it) produce undefined behavior?

Not according to the _Standard_. Alignment restrictions are an
implementation detail, not one of the standard. My understanding is
that accessing data that is not aligned will just take more fetches
by the processor.


You are very, very wrong here. Some platforms, such as Intel x86,
will perform additional memory cycles, to be sure.

But there are at least two other types of responses in current
architectures, particularly RISC and/or DSP:

1. The low bits of the pointer are merely ignored. On this type of
architecture if you tried to access a 32 bit value at any of the
addresses 0x1000, 0x1001, 0x1002, or 0x1003 would pick up the four
bytes at 0x1000 through 0x1003 inclusive. Even if you expected trying
to read a 32 bit value with a pointer containing 0x1003 to pick up
that byte plus 0x1004, 0x1005, and 0x1006.

2. The processor platform performs automatic alignment checks and
generates a trap or exception if a multi-byte object is accessed at an
address with incorrect alignment. If there is an operating system
involved, it generally terminates the program that did this with
something called "sigbus" or some such.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
Nov 14 '05 #11

P: n/a
On Wed, 17 Dec 2003 02:53:20 GMT, pete <pf*****@mindspring.com> wrote
in comp.lang.c:
Jin wrote:

On Tue, 16 Dec 2003 22:10:37 GMT, Keith Thompson <ks***@mib.org> wrote:
Jin <-> writes:
> 1. Will simply taking the address of a non-aligned variable (but not
> deferencing it) produce undefined behavior?
> If it does, are there any known architectures on which this fails?

As far as I know, you can't get a non-alighed variable in the first
place without invoking undefined behavior.


unsigned char a[16];
unsigned int *b = (unsigned int*)&a[1];

Not really a "variable" in the strict sense, but you get the idea.


That looks like an example of undefined behavior to me.


Look again, it's perfectly well defined. Can't fail or trap.

Using the pointer to read or write a value of type int, on the other
hand, can fail or trap.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
Nov 14 '05 #12

P: n/a
Jack Klein <ja*******@spamcop.net> writes:
On Wed, 17 Dec 2003 02:53:20 GMT, pete <pf*****@mindspring.com> wrote
in comp.lang.c:
Jin wrote:

On Tue, 16 Dec 2003 22:10:37 GMT, Keith Thompson <ks***@mib.org> wrote:

> Jin <-> writes:
>> 1. Will simply taking the address of a non-aligned variable (but not
>> deferencing it) produce undefined behavior?
>> If it does, are there any known architectures on which this fails?
>
> As far as I know, you can't get a non-alighed variable in the first
> place without invoking undefined behavior.
>

unsigned char a[16];
unsigned int *b = (unsigned int*)&a[1];

Not really a "variable" in the strict sense, but you get the idea.


That looks like an example of undefined behavior to me.


Look again, it's perfectly well defined. Can't fail or trap.


6.3.2.3#7: "A pointer to an object or incomplete type may be converted to
a pointer to a different object or incomplete type. If the resulting
pointer is not correctly aligned for the pointed-to type, the behavior is
undefined. [...]".

Martin
Nov 14 '05 #13

P: n/a
Jack Klein wrote:
On Tue, 16 Dec 2003 16:00:20 GMT, Thomas Matthews
<Th****************************@sbcglobal.net> wrote in comp.lang.c:

Jin wrote:

1. Will simply taking the address of a non-aligned variable (but not
deferencing it) produce undefined behavior?


Not according to the _Standard_. Alignment restrictions are an
implementation detail, not one of the standard. My understanding is
that accessing data that is not aligned will just take more fetches
by the processor.

You are very, very wrong here. Some platforms, such as Intel x86,
will perform additional memory cycles, to be sure.

But there are at least two other types of responses in current
architectures, particularly RISC and/or DSP:

1. The low bits of the pointer are merely ignored. On this type of
architecture if you tried to access a 32 bit value at any of the
addresses 0x1000, 0x1001, 0x1002, or 0x1003 would pick up the four
bytes at 0x1000 through 0x1003 inclusive. Even if you expected trying
to read a 32 bit value with a pointer containing 0x1003 to pick up
that byte plus 0x1004, 0x1005, and 0x1006.

2. The processor platform performs automatic alignment checks and
generates a trap or exception if a multi-byte object is accessed at an
address with incorrect alignment. If there is an operating system
involved, it generally terminates the program that did this with
something called "sigbus" or some such.


#2 may also direct the trap or exception to a kernel handler that fixes
up the access with the expected results, using byte loads. Of course,
it's horrendously slow. It's sometimes used to help poorly written
(most often x86-centric) software limp along on things like ARM Linux
and such.
Mark F. Haigh
mf*****@sbcglobal.net
Nov 14 '05 #14

P: n/a
Martin Dickopp wrote:
Jack Klein <ja*******@spamcop.net> writes:

On Wed, 17 Dec 2003 02:53:20 GMT, pete <pf*****@mindspring.com> wrote
in comp.lang.c:

Jin wrote:

On Tue, 16 Dec 2003 22:10:37 GMT, Keith Thompson <ks***@mib.org> wrote:
>Jin <-> writes:
>
>>1. Will simply taking the address of a non-aligned variable (but not
>> deferencing it) produce undefined behavior?
>> If it does, are there any known architectures on which this fails?
>
>As far as I know, you can't get a non-alighed variable in the first
>place without invoking undefined behavior.
>

unsigned char a[16];
unsigned int *b = (unsigned int*)&a[1];

Not really a "variable" in the strict sense, but you get the idea.

That looks like an example of undefined behavior to me.


Look again, it's perfectly well defined. Can't fail or trap.

6.3.2.3#7: "A pointer to an object or incomplete type may be converted to
a pointer to a different object or incomplete type. If the resulting
pointer is not correctly aligned for the pointed-to type, the behavior is
undefined. [...]".

Martin


AARGH!! C99 is truly a minefield.

My C89 draft says (haven't looked at my hardcopy C90 + TC1 yet, as it's
at work):

3.3.4, Semantics

[...] A pointer to an object or incomplete type may be converted to a
pointer to a different object type or a different incomplete type. The
resulting pointer might not be valid if it is improperly aligned for the
type pointed to. It is guaranteed, however, that a pointer to an object
of a given alignment may be converted to a pointer to an object of the
same alignment or a less strict alignment and back again; the result
shall compare equal to the original pointer. (An object that has
character type has the least strict alignment.)

This would appear to be directly in conflict with your quoted 6.3.2.3#7,
above. I see no reason why this change was made -- if anything, I
believe 6.3.2.3#7 should state:

"A pointer to an object or incomplete type may be converted to a pointer
to a different object or incomplete type. If the resulting pointer is
not correctly aligned (50) for the pointed-to type, the pointer is not
valid, and dereferencing the pointer is implementation-defined."

Perhaps someone with insider knowledge can comment?
Mark F. Haigh
mf*****@sbcglobal.net
Nov 14 '05 #15

P: n/a
In article <3g******************@newssvr29.news.prodigy.com >,
"Mark F. Haigh" <mf*****@sbcglobal.ten> wrote:
My C89 draft says (haven't looked at my hardcopy C90 + TC1 yet, as it's
at work):

3.3.4, Semantics

[...] A pointer to an object or incomplete type may be converted to a
pointer to a different object type or a different incomplete type. The
resulting pointer might not be valid if it is improperly aligned for the
type pointed to. It is guaranteed, however, that a pointer to an object
of a given alignment may be converted to a pointer to an object of the
same alignment or a less strict alignment and back again; the result
shall compare equal to the original pointer. (An object that has
character type has the least strict alignment.)
If you convert a correctly aligned (pointer to X) to a (pointer to Y)
and Y has less strict alignment, then the resulting pointer _will_ be
correctly aligned; that is what "less strict alignment" means. And you
can also convert it back: Not all correctly aligned (pointer to Y) are
correctly aligned (pointer to X), but those that were created by casting
a correctly aligned (pointer to X) will be correctly aligned.

This would appear to be directly in conflict with your quoted 6.3.2.3#7,
above. I see no reason why this change was made -- if anything, I
believe 6.3.2.3#7 should state:

"A pointer to an object or incomplete type may be converted to a pointer
to a different object or incomplete type. If the resulting pointer is
not correctly aligned (50) for the pointed-to type, the pointer is not
valid, and dereferencing the pointer is implementation-defined."


That would make it illegal for the implementation to produce a valid,
correctly aligned, but different pointer. With "undefined behavior" this
would be ok.
Nov 14 '05 #16

P: n/a
Mark F. Haigh wrote:

(snip regarding unaligned access and pointers)
#2 may also direct the trap or exception to a kernel handler that fixes
up the access with the expected results, using byte loads. Of course,
it's horrendously slow. It's sometimes used to help poorly written
(most often x86-centric) software limp along on things like ARM Linux
and such.


Mostly because Fortran COMMON requires no padding bytes, so it is
easy to generate unaligned data.

(At least it used to. It may have changed by now.)

-- glen

Nov 14 '05 #17

P: n/a
Mark F. Haigh wrote:

(snip)
This would appear to be directly in conflict with your quoted 6.3.2.3#7,
above. I see no reason why this change was made -- if anything, I
believe 6.3.2.3#7 should state: "A pointer to an object or incomplete type may be converted to a pointer
to a different object or incomplete type. If the resulting pointer is
not correctly aligned (50) for the pointed-to type, the pointer is not
valid, and dereferencing the pointer is implementation-defined." Perhaps someone with insider knowledge can comment?


I believe there are machines where char pointers and
int pointers have a different representation. On word
addressable machines that have the ability to address
parts of words, this would likely by true. The PDP-10
might be one example, though there aren't many C compilers
for it to test this on.

Still, it is likely that just creating the pointer doesn't
cause any problems, but the standard allows it to.

-- glen

Nov 14 '05 #18

P: n/a
> >> 1. Will simply taking the address of a non-aligned variable (but not
deferencing it) produce undefined behavior?
If it does, are there any known architectures on which this fails?


unsigned char a[16];
unsigned int *b = (unsigned int*)&a[1];


Note that a[i] means *(a+i), so, strictly speaking, evaluating &a[1]
involves derefencing that pointer (the Standard mentions this).

Also, b could point to an int boundary (there's no requirement that a[]
be aligned, and it could have started 1 byte before the int boundary).

Assuming the intent is to create a pointer that's definitely not on
an int boundary, try this:
int a;
char *a_ptr = (char *)&a;
int *b = (int *) (a+1);

On one platform I develop for (m68000 in an embedded device), this
causes a runtime panic (the device's equivalent of Windows BSOD).
(Luckily, the compiler issues a warning about "bad alignment,
potential run-time error" if you actually do this).

Despite what other posters have said, the Standard is very correct
and this is undefined behaviour, because of platforms where a hardware
exception occurs if you load a memory access register with an
inaccessible value.

As an aside, for the same reason, it's also undefined behaviour to
construct a pointer that points to something that might not be
accessible by your process, eg.
a_ptr--;
causes undefined behaviour.

Another platform I develop for has three-byte pointers. It has 16-bit int,
and a 64k code-page and a 64k data-page, so the third byte of the pointer
stores whether it's a code-page pointer or a data-page pointer.
(Obviously you can't do tricks like casting pointers to ints, on this
platform). It's not a far stretch of the imagination from this, to a
platform where char pointers and int pointers have different forms.
Nov 14 '05 #19

P: n/a
ol*****@inspire.net.nz (Old Wolf) writes:
unsigned char a[16];
unsigned int *b = (unsigned int*)&a[1];


Note that a[i] means *(a+i), so, strictly speaking, evaluating &a[1]
involves derefencing that pointer (the Standard mentions this).


In think you are misunderstanding 6.5.3.2#3:

| The unary & operator returns the address of its operand. If the operand
| has type ``type'', the result has type ``pointer to type''. If the
| operand is the result of a unary * operator, neither that operator nor
| the & operator is evaluated and the result is as if both were omitted,
| except that the constraints on the operators still apply and the result
| is not an lvalue. Similarly, if the operand is the result of a []
| operator, neither the & operator nor the unary * that is implied by the
| [] is evaluated and the result is as if the & operator were removed and
| the [] operator were changed to a + operator. Otherwise, the result is a
| pointer to the object or function designated by its operand.

That means the opposite of what you said: the expression `&a[1]' does
*not* behave as if `a' were dereferenced. Instead, it behaves like `a+1'.

Martin
Nov 14 '05 #20

P: n/a
On Wed, 17 Dec 2003 10:58:33 GMT, glen herrmannsfeldt
<ga*@ugcs.caltech.edu> wrote:
Mark F. Haigh wrote:

(snip regarding unaligned access and pointers)
#2 may also direct the trap or exception to a kernel handler that fixes
up the access with the expected results, using byte loads. Of course,
it's horrendously slow. It's sometimes used to help poorly written
(most often x86-centric) software limp along on things like ARM Linux
and such.
Mostly because Fortran COMMON requires no padding bytes, so it is
easy to generate unaligned data.

To be clear, standardly *requires* that there *not* be any padding.
(At least it used to. It may have changed by now.)

COMMON hasn't. But there is a new improved and often preferable
*alternative*, a MODULE containing data (and possibly routines), which
does allow alignment and also (standardly) provides better checking.

Much as even C99 still has K&R1 functions, but prototypes are better.

- David.Thompson1 at worldnet.att.net
Nov 14 '05 #21

This discussion thread is closed

Replies have been disabled for this discussion.