why prefix increment is faster than postfix increment?

In article <11*********************@g44g2000cwa.googlegroups. com>,
jr********@hotmail.com wrote:

I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are
i = i+1.

i++
++i

Please advise. thanks!!

The C answer and the C++ answer for the built-in operators are: Whoever
makes that claim is not only clueless, but is also the type of dangerous
clueless person who thinks they have a clue. Don't take _any_ advice of
them. Ever.

I don't think you are interested in the C++ answer for operator
overloading yet.

Oct 22 '05 #4

Dave Rahardja

On 22 Oct 2005 01:01:37 -0700, jr********@hotmail.com wrote:

I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are
i = i+1.

i++
++i

Please advise. thanks!!

When i is a C++ object with the appropriate operators defined, the prefix
version may eliminate the creation of a temporary object.

-dr

Oct 22 '05 #5

Kaz Kylheku

jr********@hotmail.com wrote:

I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are
i = i+1.

i++
++i

If you are throwing away the result, there is no semantic difference at
all, because the only difference between ++i and i++ is whether i or i
+ 1 is returned.

If you don't throw away the result, then they are different operators;
you can't substitute one without the other without making compensating
changes in the surrounding program! You then end up with two different
programs that you have to compare as such. If there is a performance
difference between those programs, it's a result of not just changing
the i++ to ++i, or vice versa, but also a consequence of those other
compensating changes, and how your particular compiler and machine
deals with everything as a whole.

Note that in C++ (this is cross-posted to comp.lang.c++), both forms of
the operator can be user-defined. If you are dealing with a choice
between two forms of the user-defined operator, and performance is
critical, you obviously have to take that into consideration!

Oct 22 '05 #6

Gordon Burditt

>I heard people saying prefix increment is faster than postfix

incerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specific
compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Gordon L. Burditt

Oct 22 '05 #7

Dave Rahardja

On Sat, 22 Oct 2005 17:50:01 -0000, go***********@burditt.org (Gordon Burditt)
wrote:

I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specific
compiler and platform is FALSE. Even if Y is "do X 1000000 times".

I think the original poster heard one of those "rules of thumb" that aren't
absolutely true or mathematically proven, but is considered a good
approximation of the truth most of the time. I also assume that the OP wanted
to know what the motivation was behind the saying.

-dr

Oct 23 '05 #8

Greg

jr********@hotmail.com wrote:

I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are
i = i+1.

i++
++i

Please advise. thanks!!

Consider this program:

void PrintElement(const std::vector<int>::iterator& i)
{
std::cout << *i << " ";
}

int main()
{
std::vector<int> v;

v.push_back(1); v.push_back(2); v.push_back(3);

std::vector<int>::iterator i = v.begin();

while (i != v.end())
PrintElement( i++ );

while (i != v.begin())
PrintElement( --i );
}

Which PrintElement() call is like the more efficient one: the one with
the postincremented parameter (i++) or the pre-decremented (--i)
parameter? Or is there no reason to think that there would be a
difference?

In this case, it is likely that first PrintElement call with the
postfix incremented paramter has more overhead than the second, because
the compiler must increment the iterator i before it calls
PrintElement. But when the call to PrintElement is made, the compiler
must pass the value of i (or a reference to a temporary copy of i) that
i had before it was incremented. Therefore the compiler has little
choice but to make a copy of i before incrementing i, so that it has an
iterator with which it can call PrintElement.

The second PrintElement call applies a prefix operator to the paramter;
the compiler can therefore pass i directly to PrintElement, since its
incremented value is the appropriate value to pass to PrintElement.

Of course, the difference is not likely to be great, but there is
nonetheless a basis for expecting postfix operators to be less
efficient than prefix operators, especially when applied to parameters
in a function call.

Greg

Oct 23 '05 #9

Martin Ambuhl

Greg wrote:

Consider this program:

void PrintElement(const std::vector<int>::iterator& i)
{
std::cout << *i << " ";
}

[etc.]

When you respond to posts which are crossposted to <news:comp.lang.c>
and <news:comp.lang.c++>, try to give answers that are acceptable in
both. You have posted a bunch of compilation errors to
<news:comp.lang.c>. There is no need to consider your code at all.

Oct 23 '05 #10

Greg

Gordon Burditt wrote:

I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specific
compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that X
performs, and then has to perform additional operations outside of that
set and that require a measurable amount of time to complete, then one
would have successfully proven that X is faster than Y. And in fact
that is the case here: the postincrement operator may have to perform
an additional copy operation that the prefix version does not have to
perform. Otherwise the amount of work required of each is the same.

Greg

Oct 23 '05 #11

Greg wrote:

Gordon Burditt wrote:
I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specific
compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that X
performs, and then has to perform additional operations outside of that
set and that require a measurable amount of time to complete, then one
would have successfully proven that X is faster than Y. And in fact
that is the case here: the postincrement operator may have to perform
an additional copy operation that the prefix version does not have to
perform. Otherwise the amount of work required of each is the same.

Nope. Maybe in source code but not necessarily in the generated code.
The optimisation may have a stage which looks for certain patterns;
it is very well possible that the seemingly slower code qualifies for
the optimisation but the "faster" code does not. This may be the
compiler's fault or yours, depending on the problem at hand.

So, you are right most of the time but not always. However, C++ coding
guidelines often bring the ++()/()++ including the caveats as an
example of "avoiding premature pessimization" and rightly so.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Oct 23 '05 #12

Gordon Burditt

>> >I heard people saying prefix increment is faster than postfix

>incerement, but I don't know what's the difference. They both are
Any claim that X is faster than Y that doesn't specify a specific
compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that X
performs, and then has to perform additional operations outside of that
set and that require a measurable amount of time to complete, then one
would have successfully proven that X is faster than Y.

Not unless you can prove that the compiler generates the code that
way, and in order to do that, you have to choose a specific compiler.
You cannot test *ALL* compilers, including ones written between
your test and publishing the results. And you can't prove, for
example, that the compiler doesn't add a time-wasting loop to X but
not to Y, without referring to a specific compiler.

There are some situations where doing more can be faster.
For example repeating this statement 1000000 times could be
faster than doing it 10 times:
unsigned int x;

x = x << 1;

since if the width of x is less than 1000000 bits, this results in
a constant independent of the original value of x, and the compiler
might realize this, but if the width of x is greater than 10, and it
must be, it has to shift x.
And in fact
that is the case here: the postincrement operator may have to perform
an additional copy operation that the prefix version does not have to
perform. Otherwise the amount of work required of each is the same.

You're assuming things about the underlying instruction set that
may not be true, and assuming that the compiler doesn't do a
poor job of generating code for one and a good job for the other.
The effect of a cache hit/miss can also do funny things to code
that looks like it should run in the same time as other code.

Gordon L. Burditt

Oct 23 '05 #13

In article <11**********************@f14g2000cwb.googlegroups .com>,
"Greg" <gr****@pacbell.net> wrote:

Gordon Burditt wrote:
I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specific
compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that X
performs, and then has to perform additional operations outside of that
set and that require a measurable amount of time to complete, then one
would have successfully proven that X is faster than Y.

One would have proven no such thing.

Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

A good compiler will translate the second memcpy to a simple assignment
of a double variable, for example:

Load double y into register
Store register into double x.

In fact, if this is the only time the address of x and y is taken, it is
quite possible that x and y are still kept in floating-point registers,
in which case this is a very fast register-to-register assignment.

On the other hand, the first memcpy will have a much more difficult
implementation, even though it copies one byte less. Not only will it
produce much more code, on most current processors that code will
execute considerably slower.

Oct 23 '05 #14

Jack Klein

On 22 Oct 2005 01:01:37 -0700, jr********@hotmail.com wrote in
comp.lang.c:

I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are
i = i+1.
What people are these? What are their credentials so that you, or we,
should place any confidence in their opinions?
i++
++i

Please advise. thanks!!

One possible item of advice would be for you to associate with
different people.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Oct 23 '05 #15

"Christian Bau" <ch***********@cbau.freeserve.co.uk> wrote in message
news:ch*********************************@slb-newsm1.svr.pol.co.uk...

In article <11**********************@f14g2000cwb.googlegroups .com>,
"Greg" <gr****@pacbell.net> wrote:
Gordon Burditt wrote:
> >I heard people saying prefix increment is faster than postfix
> >incerement, but I don't know what's the difference. They both are
>
> Any claim that X is faster than Y that doesn't specify a specific
> compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that X
performs, and then has to perform additional operations outside of that
set and that require a measurable amount of time to complete, then one
would have successfully proven that X is faster than Y.

One would have proven no such thing.

Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

A good compiler will translate the second memcpy to a simple assignment
of a double variable, for example:

Load double y into register
Store register into double x.

In fact, if this is the only time the address of x and y is taken, it is
quite possible that x and y are still kept in floating-point registers,
in which case this is a very fast register-to-register assignment.

On the other hand, the first memcpy will have a much more difficult
implementation, even though it copies one byte less. Not only will it
produce much more code, on most current processors that code will
execute considerably slower.

Nonsense. First one will probably generate hardware excpetion, and
second one will probably work. Then again first one would be much faster
as it is simple nop where sizeof(x) == 1 but second one would copy contents.

Greetings, Bane.

Oct 23 '05 #16

Branimir Maksimovic wrote:

"Christian Bau" <ch***********@cbau.freeserve.co.uk> wrote in message
news:ch*********************************@slb-newsm1.svr.pol.co.uk...
In article <11**********************@f14g2000cwb.googlegroups .com>,
"Greg" <gr****@pacbell.net> wrote:
Gordon Burditt wrote:

>I heard people saying prefix increment is faster than postfix
>incerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specific
compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that X
performs, and then has to perform additional operations outside of that
set and that require a measurable amount of time to complete, then one
would have successfully proven that X is faster than Y.

One would have proven no such thing.

Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

A good compiler will translate the second memcpy to a simple assignment
of a double variable, for example:

Load double y into register
Store register into double x.

In fact, if this is the only time the address of x and y is taken, it is
quite possible that x and y are still kept in floating-point registers,
in which case this is a very fast register-to-register assignment.

On the other hand, the first memcpy will have a much more difficult
implementation, even though it copies one byte less. Not only will it
produce much more code, on most current processors that code will
execute considerably slower.

Nonsense. First one will probably generate hardware excpetion, and
second one will probably work. Then again first one would be much faster
as it is simple nop where sizeof(x) == 1 but second one would copy contents.

The second one is guaranteed to work and have the same effect
as x = y, the first may lead to a trap representation of x but
can also work.
Are you sure that you are aware of the semantics of memcpy()?
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Oct 24 '05 #17

John Carson

"Jack Klein" <ja*******@spamcop.net> wrote in message
news:6v********************************@4ax.com

On 22 Oct 2005 01:01:37 -0700, jr********@hotmail.com wrote in
comp.lang.c:
I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are
i = i+1.

What people are these? What are their credentials so that you, or we,
should place any confidence in their opinions?

Herb Sutter and Andrei Alexandrescu, C++ Coding Standards, p.50: "The prefix
form is semantically equivalent, just as much typing, and often slightly
more efficient by creating one less object. This is not premature
optimization; it is avoiding premature pessimization."
--
John Carson

Oct 24 '05 #18

John Carson wrote:

"Jack Klein" <ja*******@spamcop.net> wrote in message
news:6v********************************@4ax.com
On 22 Oct 2005 01:01:37 -0700, jr********@hotmail.com wrote in
comp.lang.c:
I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are
i = i+1.

What people are these? What are their credentials so that you, or we,
should place any confidence in their opinions?

Herb Sutter and Andrei Alexandrescu, C++ Coding Standards, p.50: "The
prefix form is semantically equivalent, just as much typing, and often
slightly more efficient by creating one less object. This is not
premature optimization; it is avoiding premature pessimization."

An excellent _C++_ book by well-renowned authors; however, you neglected
to quote the context: This applies if we are interested only in the
side effect and not at all in the value of the expression.

As this is crossposted to c.l.c and as there are no overloaded operators
in C, this is wrong in its generality.

<OT>In fact, on processors with postincrement/predecrement only and if
stuck with a poorly performing compiler, this may be even completely
wrong.</OT>
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Oct 24 '05 #19

Michael Mair wrote:

Branimir Maksimovic wrote:
"Christian Bau" <ch***********@cbau.freeserve.co.uk> wrote in message
Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

Nonsense. First one will probably generate hardware excpetion, and
second one will probably work. Then again first one would be much faster
as it is simple nop where sizeof(x) == 1 but second one would copy contents.

The second one is guaranteed to work and have the same effect
as x = y, the first may lead to a trap representation of x but
can also work.

In case that sizeof(x) == 1 , I agree.
Are you sure that you are aware of the semantics of memcpy()?

Well, I don't need to, cause I don't use memcpy to assign variables.

Greetings, Bane.

Oct 24 '05 #20

John Carson

"Michael Mair" <Mi**********@invalid.invalid> wrote in message
news:3s************@individual.net

John Carson wrote:
"Jack Klein" <ja*******@spamcop.net> wrote in message
news:6v********************************@4ax.com
On 22 Oct 2005 01:01:37 -0700, jr********@hotmail.com wrote in
comp.lang.c:

I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are
i = i+1.

What people are these? What are their credentials so that you, or
we, should place any confidence in their opinions?
Herb Sutter and Andrei Alexandrescu, C++ Coding Standards, p.50: "The
prefix form is semantically equivalent, just as much typing, and
often slightly more efficient by creating one less object. This is
not premature optimization; it is avoiding premature pessimization."

An excellent _C++_ book by well-renowned authors; however, you
neglected to quote the context: This applies if we are interested
only in the side effect and not at all in the value of the expression.

You mean the original value of the expression. I took that to be understood.
As this is crossposted to c.l.c and as there are no overloaded
operators in C, this is wrong in its generality.

It is clear that its main significance is for C++. Whether it is completely
irrelevant for C I don't know. I would have imagined that i++ could easily
require an additional temporary even when i is an int. But the exact
efficiency consequences of this or some alternative with built-in types
requires a knowledge of compilers/processors that I don't have.

--
John Carson

Oct 24 '05 #21

In article <dj**********@news.eunet.yu>,
"Branimir Maksimovic" <bm***@eunet.yu> wrote:

Nonsense. First one will probably generate hardware excpetion, and
second one will probably work. Then again first one would be much faster
as it is simple nop where sizeof(x) == 1 but second one would copy contents.

I wish you a successful career. Maybe you should learn a bit about
programming, that might help.

Oct 24 '05 #22

In article <11**********************@g43g2000cwa.googlegroups .com>,
"Branimir Maksimovic" <bm***@volomp.com> wrote:

Michael Mair wrote:
Branimir Maksimovic wrote:
"Christian Bau" <ch***********@cbau.freeserve.co.uk> wrote in message
>Consider this:
>
> double x, y;
> memcpy (&x, &y, sizeof (x) - 1);
> memcpy (&x, &y, sizeof (x));
>

Nonsense. First one will probably generate hardware excpetion, and
second one will probably work. Then again first one would be much faster
as it is simple nop where sizeof(x) == 1 but second one would copy
contents.

The second one is guaranteed to work and have the same effect
as x = y, the first may lead to a trap representation of x but
can also work.

In case that sizeof(x) == 1 , I agree.
Are you sure that you are aware of the semantics of memcpy()?

Well, I don't need to, cause I don't use memcpy to assign variables.

In other words, you are a complete bullshitter.

Oct 24 '05 #23

peter koch

Christian Bau skrev:

In article <11**********************@f14g2000cwb.googlegroups .com>,
"Greg" <gr****@pacbell.net> wrote:

[snip]

Not necessarily. If one can show that Y performs every operation that X
performs, and then has to perform additional operations outside of that
set and that require a measurable amount of time to complete, then one
would have successfully proven that X is faster than Y.

One would have proven no such thing.

Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

[snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour and is just utterly contrived and useless. You
also have my sympathy when you call a poster who suggests using
assignment to assign for a "complete bullshitter".
In short, you have described yourself and your skills wonderfully in
two short posts.

/Peter

Oct 24 '05 #24

Old Wolf

Greg wrote:

Gordon Burditt wrote:
The OP wrote:
I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specific
compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that
X performs, and then has to perform additional operations outside of
that set and that require a measurable amount of time to complete,
then one would have successfully proven that X is faster than Y.

Not true, unless the additional operations are independent of the
X operations.

For example, if you apply the same logic to a file system, then
appending data to a file should increase the amount of space
required to store a file. But for many filesystems that is not
true.

Similar possibilities apply to the CPU case. Maybe the extra
operation fits within some timing interval that had to happen
anyway. Maybe the extra instruction means the whole operation
can be done with different assembly instructions that work out
faster. Maybe the CPU's pipelining is better in one case than
the other. Etc.

Oct 24 '05 #25

Greg

Old Wolf wrote:

Greg wrote:
Gordon Burditt wrote:
The OP wrote:
I heard people saying prefix increment is faster than postfix
incerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specific
compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that
X performs, and then has to perform additional operations outside of
that set and that require a measurable amount of time to complete,
then one would have successfully proven that X is faster than Y.

Not true, unless the additional operations are independent of the
X operations.

For example, if you apply the same logic to a file system, then
appending data to a file should increase the amount of space
required to store a file. But for many filesystems that is not
true.

Similar possibilities apply to the CPU case. Maybe the extra
operation fits within some timing interval that had to happen
anyway. Maybe the extra instruction means the whole operation
can be done with different assembly instructions that work out
faster. Maybe the CPU's pipelining is better in one case than
the other. Etc.

If the additional operations follow the ones in common, than it would
be difficult to see how executing those instructions would be able to
speed up the previous set of instructions that have already executed.

But even if the additional instructions came before or were
interspersed with the ones in common, the only way that the additional
instructions would not add time to the procedure would be if the
program could execute two instructions in less time than it could
execute one of those instructions. [Note that the one instruction must
also be one of the two executed in the comparison]

On a macro scale, because similar operations can be composed of
different sub-operations, adding an operation may make an existing one
faster. But as the granularity of the operations becomes finer, at a
certain point every operation is independent of another and each
executes in constant time.

Greg

Oct 25 '05 #26

peter koch wrote:

Christian Bau skrev:
In article <11**********************@f14g2000cwb.googlegroups .com>,
"Greg" <gr****@pacbell.net> wrote:
[snip]
Not necessarily. If one can show that Y performs every operation that X
performs, and then has to perform additional operations outside of that
set and that require a measurable amount of time to complete, then one
would have successfully proven that X is faster than Y.

One would have proven no such thing.

Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

[snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)
and is just utterly contrived and useless.
Contrived yes. Most simple examples of complex behaviour
are contrived. Useless no. Indeed this code is not meant
to be used but the *example* of an "bigger operaton"
(copying x bytes rather than x-1 bytes) that might reasonably
be expected to execute faster is useful indeed.

A related question. Is it ever better to use an int
variable, even when a char is big enough?

[For a less useful example consider a perverse
implementation (e.g. the DS2K) which introduces a
delay of say 20 minutes, seemingly at random. If the "smaller"
operation incurs the delay, but the "bigger" does not, then
the larger operation will be faster. While this
is correct, such an implementation cannot be considered
reasonable.]
You
also have my sympathy when you call a poster who suggests using
assignment to assign for a "complete bullshitter".

The poster claimed undefined behaviour, then when challenged
claimed ignorance (and gave a stupid exuse for this
ignorance). The term "complete bullshitter" seems an accurate
description.

- William Hughes

Oct 25 '05 #27

Jordan Abel

On 2005-10-25, William Hughes <wp*******@hotmail.com> wrote:

peter koch wrote:
Christian Bau skrev:
> In article <11**********************@f14g2000cwb.googlegroups .com>,
> "Greg" <gr****@pacbell.net> wrote:
>

[snip]
> > Not necessarily. If one can show that Y performs every operation that X
> > performs, and then has to perform additional operations outside of that
> > set and that require a measurable amount of time to complete, then one
> > would have successfully proven that X is faster than Y.
>
> One would have proven no such thing.
>
> Consider this:
>
> double x, y;
> memcpy (&x, &y, sizeof (x) - 1);
> memcpy (&x, &y, sizeof (x));
>

[snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)

for example you could end up with a trap representation in x. say, a signalling
nan of some kind. and in any case you're not guaranteed anything useful about
the value you might get

Oct 25 '05 #28

William Hughes wrote:

peter koch wrote:
Christian Bau skrev:
In article <11**********************@f14g2000cwb.googlegroups .com>,
"Greg" <gr****@pacbell.net> wrote:

[snip]
> Not necessarily. If one can show that Y performs every operation that X
> performs, and then has to perform additional operations outside of that
> set and that require a measurable amount of time to complete, then one
> would have successfully proven that X is faster than Y.

One would have proven no such thing.

Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

[snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)
and is just utterly contrived and useless.

Contrived yes. Most simple examples of complex behaviour
are contrived. Useless no. Indeed this code is not meant
to be used but the *example* of an "bigger operaton"
(copying x bytes rather than x-1 bytes) that might reasonably
be expected to execute faster is useful indeed.

A related question. Is it ever better to use an int
variable, even when a char is big enough?

[For a less useful example consider a perverse
implementation (e.g. the DS2K) which introduces a
delay of say 20 minutes, seemingly at random. If the "smaller"
operation incurs the delay, but the "bigger" does not, then
the larger operation will be faster. While this
is correct, such an implementation cannot be considered
reasonable.]
You
also have my sympathy when you call a poster who suggests using
assignment to assign for a "complete bullshitter".

The poster claimed undefined behaviour, then when challenged
claimed ignorance (and gave a stupid exuse for this
ignorance). The term "complete bullshitter" seems an accurate
description.

No I didn't claim undefined behavior.
I claimed that first case would probably produce hardware exception
and second one would probably work.
As memcpy is defined to be copy operation of n characters from
memory location to memory location, behavior is undefined only
when "to" and "from" overlap.
The original message claimed that compiler can be smart enough
to recognize use case and according to situation, apply different
semantics then those specified by code.
This leads him to conclusion that produced code will be faster when
compiler applies assignment semantics then memcpy semantics.
This is just wrong example, but if we observe this:
double x[2];y=0.;
memcpy((char*)x+1,&y,sizeof(y));
double t = *(double*)((char*)x+1); /* depends on
hardware tolerance to alignment */
memcpy(x,&y,sizeof(y));
t = *x;

It is obvious that second case will be always faster or at least
equal then first case, even if memcpy have to copy same number of bytes
and use ram instead or sizeof (x) ==1 .

Greetings, Bane.

Oct 25 '05 #29

Branimir Maksimovic wrote:

double x[2];y=0.;
double x[2],y = 0.;
memcpy((char*)x+1,&y,sizeof(y));
double t = *(double*)((char*)x+1); /* depends on
hardware tolerance to alignment */
memcpy(x,&y,sizeof(y));
t = *x;

It is obvious that second case will be always faster or at least
equal then first case, even if memcpy have to copy same number of bytes
and use ram instead or sizeof (x) ==1 .
sizeof (y) == 1;

Greetings, Bane.

Oct 25 '05 #30

Keith Thompson

Jordan Abel <jm****@purdue.edu> writes:

On 2005-10-25, William Hughes <wp*******@hotmail.com> wrote:
peter koch wrote:
Christian Bau skrev:
> In article <11**********************@f14g2000cwb.googlegroups .com>,
> "Greg" <gr****@pacbell.net> wrote:
>
[snip]

> > Not necessarily. If one can show that Y performs every operation that X
> > performs, and then has to perform additional operations outside of that
> > set and that require a measurable amount of time to complete, then one
> > would have successfully proven that X is faster than Y.
>
> One would have proven no such thing.
>
> Consider this:
>
> double x, y;
> memcpy (&x, &y, sizeof (x) - 1);
> memcpy (&x, &y, sizeof (x));
>
[snip explanation that second memcpy might be faster]

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)

for example you could end up with a trap representation in x. say, a
signalling nan of some kind. and in any case you're not guaranteed
anything useful about the value you might get

Even if the memcpy() stores a trap representation in x, there's no
undefined behavior until you try to read x as a double. The quoted
code doesn't do that.

BTW, please keep your text down to about 72 columns so it doesn't
overflow an 80-column screen when quoted. My newsreader lets me
reformat quoted text easily, but others might not.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Oct 25 '05 #31

Mark McIntyre

On 25 Oct 2005 12:06:50 -0700, in comp.lang.c , "Branimir Maksimovic"
<bm***@volomp.com> wrote:

William Hughes wrote:
peter koch wrote:
> Christian Bau skrev:
> > Consider this:
> >
> > double x, y;
> > memcpy (&x, &y, sizeof (x) - 1);
> > memcpy (&x, &y, sizeof (x));
> >
> Just a marvellous example you gave us - code that in C++ (and C) causes
> undefined behaviour
What is the undefined behaviour (assume sizeof (x) >1)

The poster claimed undefined behaviour, then when challenged
claimed ignorance

No I didn't claim undefined behavior.
I claimed that first case would probably produce hardware exception
and second one would probably work.

Given that neither will produce any such thing, and given your very
agressive attitude in subsequent postings, "bullshitter" seems
entirely reasonable.
The original message claimed

(stuff thats completely irrelevant to your claim that the code quoted
will cause an exception).
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----

Oct 25 '05 #32

Jordan Abel wrote:

On 2005-10-25, William Hughes <wp*******@hotmail.com> wrote:

peter koch wrote:
Christian Bau skrev:

> In article <11**********************@f14g2000cwb.googlegroups .com>,
> "Greg" <gr****@pacbell.net> wrote:
>
[snip]

> > Not necessarily. If one can show that Y performs every operation that X
> > performs, and then has to perform additional operations outside of that
> > set and that require a measurable amount of time to complete, then one
> > would have successfully proven that X is faster than Y.
>
> One would have proven no such thing.
>
> Consider this:
>
> double x, y;
> memcpy (&x, &y, sizeof (x) - 1);
> memcpy (&x, &y, sizeof (x));
>
[snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour
What is the undefined behaviour (assume sizeof (x) >1)

for example you could end up with a trap representation in x. say, a signalling
nan of some kind.

and as this could only cause a problem if x was subsequently read
as a double, there is not undefined behaviour above.
and in any case you're not guaranteed anything useful about
the value you might get

you are guaranteed that the first x-1 bytes starting at x
are the same as those starting at y. This may be useful
(e.g. if you are treating x and y as arrays of characters)

True, you are not guarenteed that x is meaningful as a double,
but so what. It might be, but this is beside the point,
the original example was not meant as an example of useful code.

- William Hughes

Oct 25 '05 #33

Branimir Maksimovic wrote:

William Hughes wrote:
peter koch wrote:
Christian Bau skrev:

> In article <11**********************@f14g2000cwb.googlegroups .com>,
> "Greg" <gr****@pacbell.net> wrote:
>
[snip]

> > Not necessarily. If one can show that Y performs every operation that X
> > performs, and then has to perform additional operations outside of that
> > set and that require a measurable amount of time to complete, then one
> > would have successfully proven that X is faster than Y.
>
> One would have proven no such thing.
>
> Consider this:
>
> double x, y;
> memcpy (&x, &y, sizeof (x) - 1);
> memcpy (&x, &y, sizeof (x));
>
[snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)
and is just utterly contrived and useless.

Contrived yes. Most simple examples of complex behaviour
are contrived. Useless no. Indeed this code is not meant
to be used but the *example* of an "bigger operaton"
(copying x bytes rather than x-1 bytes) that might reasonably
be expected to execute faster is useful indeed.

A related question. Is it ever better to use an int
variable, even when a char is big enough?

[For a less useful example consider a perverse
implementation (e.g. the DS2K) which introduces a
delay of say 20 minutes, seemingly at random. If the "smaller"
operation incurs the delay, but the "bigger" does not, then
the larger operation will be faster. While this
is correct, such an implementation cannot be considered
reasonable.]
You
also have my sympathy when you call a poster who suggests using
assignment to assign for a "complete bullshitter".

The poster claimed undefined behaviour, then when challenged
claimed ignorance (and gave a stupid exuse for this
ignorance). The term "complete bullshitter" seems an accurate
description.

No I didn't claim undefined behavior.
I claimed that first case would probably produce hardware exception

And this differs from undefined behaviour how?
(are you claiming implementation defined behaviour?)

Anyway, you have yet to even attempt to justify your
claim that the first case would probably produce
a hardware exception.

- William Hughes

Oct 25 '05 #34

William Hughes wrote:

Branimir Maksimovic wrote:
William Hughes wrote:
peter koch wrote:
> Christian Bau skrev:
>
> > In article <11**********************@f14g2000cwb.googlegroups .com>,
> > "Greg" <gr****@pacbell.net> wrote:
> >
> [snip]
>
> > > Not necessarily. If one can show that Y performs every operation that X
> > > performs, and then has to perform additional operations outside of that
> > > set and that require a measurable amount of time to complete, then one
> > > would have successfully proven that X is faster than Y.
> >
> > One would have proven no such thing.
> >
> > Consider this:
> >
> > double x, y;
> > memcpy (&x, &y, sizeof (x) - 1);
> > memcpy (&x, &y, sizeof (x));
> >
> [snip explanation that second memcpy might be faster]
>
> Hi Christian
>
> Just a marvellous example you gave us - code that in C++ (and C) causes
> undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)

> and is just utterly contrived and useless.

Contrived yes. Most simple examples of complex behaviour
are contrived. Useless no. Indeed this code is not meant
to be used but the *example* of an "bigger operaton"
(copying x bytes rather than x-1 bytes) that might reasonably
be expected to execute faster is useful indeed.

A related question. Is it ever better to use an int
variable, even when a char is big enough?

[For a less useful example consider a perverse
implementation (e.g. the DS2K) which introduces a
delay of say 20 minutes, seemingly at random. If the "smaller"
operation incurs the delay, but the "bigger" does not, then
the larger operation will be faster. While this
is correct, such an implementation cannot be considered
reasonable.]

> You
> also have my sympathy when you call a poster who suggests using
> assignment to assign for a "complete bullshitter".
The poster claimed undefined behaviour, then when challenged
claimed ignorance (and gave a stupid exuse for this
ignorance). The term "complete bullshitter" seems an accurate
description.
No I didn't claim undefined behavior.
I claimed that first case would probably produce hardware exception

And this differs from undefined behaviour how?
(are you claiming implementation defined behaviour?)

If implementation is allowed to use floating point registers
for memcpy, then yes implementation defined behavior.
Anyway, you have yet to even attempt to justify your
claim that the first case would probably produce
a hardware exception.

In case that implementation use FPU registers for
memcpy of floating point variables that would be
normal. It is irrelevant how many bytes are copied.

Question is: Are such implementations conformant?
eg:
double x,double y; // produces FPU exception if x,y gets trap value?
memcpy(&x,&y,sizeof(x)); // produces exception if FPU registers are
used
// and y has trap representation value
// which is non conformant as I understand memcpy semantics

Conclusion: if FPU registers are allowed to be used
for memcpy then it is normal to allow hardware exceptions
during memcpy.
Compiler wouldn't care if memcpy produce exception or not
in that case.

Greetings, Bane.

Oct 25 '05 #35

In article <11**********************@g49g2000cwa.googlegroups .com>,
"peter koch" <pe***************@gmail.com> wrote:

Christian Bau skrev:
In article <11**********************@f14g2000cwb.googlegroups .com>,
"Greg" <gr****@pacbell.net> wrote:

[snip]
Not necessarily. If one can show that Y performs every operation that X
performs, and then has to perform additional operations outside of that
set and that require a measurable amount of time to complete, then one
would have successfully proven that X is faster than Y.

One would have proven no such thing.

Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

[snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour and is just utterly contrived and useless. You
also have my sympathy when you call a poster who suggests using
assignment to assign for a "complete bullshitter".
In short, you have described yourself and your skills wonderfully in
two short posts.

Seems our IQs differ by about 30 points. Let's just disagree about the
direction.

Oct 25 '05 #36

Branimir Maksimovic wrote:

Michael Mair wrote:
Branimir Maksimovic wrote:
"Christian Bau" <ch***********@cbau.freeserve.co.uk> wrote in message

Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

Nonsense. First one will probably generate hardware excpetion, and
second one will probably work. Then again first one would be much faster
as it is simple nop where sizeof(x) == 1 but second one would copy contents.

The second one is guaranteed to work and have the same effect
as x = y, the first may lead to a trap representation of x but
can also work.

In case that sizeof(x) == 1 , I agree.

Which is nothing more than you did before.

Are you sure that you are aware of the semantics of memcpy()?

Well, I don't need to, cause I don't use memcpy to assign variables.

Beside the point.

--- C99 ---
7.21.2.1 The memcpy function
Synopsis
1 #include <string.h>
void *memcpy(void * restrict s1, const void * restrict s2, size_t n);

Description
2 The memcpy function copies n characters from the object pointed to by
s2 into the object pointed to by s1. If copying takes place between
objects that overlap, the behavior is undefined.

Returns
3 The memcpy function returns the value of s1.
------------

The only way to safely and portably access the representation of an
object is bytewise (unsigned char). memcpy() does exactly that.

The first memcpy() operation can be replaced
size_t i;
unsigned char *p1= (unsigned char*) &y;
unsigned char *p2= (unsigned char*) &x;

for (i = 0; i < (sizeof x - 1); i++)
*(p2++) = *(p1++);

A conforming implementation has to do this right; you are thinking
of actual hardware and concluding that it cannot work.
Still, the "as if" rule has to hold, the operation has to work. There
must not be any repercussions as long as x is not accessed afterwards.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Oct 25 '05 #37

In article <sl********************@random.yi.org>,
Jordan Abel <jm****@purdue.edu> wrote:

On 2005-10-25, William Hughes <wp*******@hotmail.com> wrote:

peter koch wrote:
Christian Bau skrev:

> In article <11**********************@f14g2000cwb.googlegroups .com>,
> "Greg" <gr****@pacbell.net> wrote:
>
[snip]

> > Not necessarily. If one can show that Y performs every operation that
> > X
> > performs, and then has to perform additional operations outside of
> > that
> > set and that require a measurable amount of time to complete, then one
> > would have successfully proven that X is faster than Y.
>
> One would have proven no such thing.
>
> Consider this:
>
> double x, y;
> memcpy (&x, &y, sizeof (x) - 1);
> memcpy (&x, &y, sizeof (x));
>
[snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)

for example you could end up with a trap representation in x. say, a
signalling
nan of some kind. and in any case you're not guaranteed anything useful about
the value you might get

You can end up with a trap representation in x, but that doesn't in
itself invoke undefined behavior. There would be undefined behavior if
you would later on access x as a double value, but that wasn't done. You
could printf () the individual bytes from x. You could memset () seven
bytes in y to zeroes, then copy those seven bytes back from x and y
would be restored to its original value. When writing a double to a
binary stream or file, it is quite likely that a memcpy similar to this
one will happen: Assume your standard library uses a 512 byte buffer to
write to binary streams, all but sizeof (double) - 1 bytes are used up
in a buffer, and you write another double: sizeof (double) - 1 bytes
will be copied to the buffer, the buffer will be flushed, and another
byte will be copied.

This code was not supposed to do something particularly useful - it was
supposed to give a clear example where "doing more work" is faster than
"doing less work". Which is exactly what it did.

Oct 25 '05 #38

In article <11*********************@z14g2000cwz.googlegroups. com>,
"Branimir Maksimovic" <bm***@volomp.com> wrote:

No I didn't claim undefined behavior.
Absolutely correct, you never claimed that.
I claimed that first case would probably produce hardware exception
and second one would probably work.
None of the cases will produce any hardware exception. Both are
completely legitimate uses of memcpy. The first one is a bit unusual,
the second one is a bit clumsy as the effect could have been achieved
much easier, but both are absolutely legitimate.
As memcpy is defined to be copy operation of n characters from
memory location to memory location, behavior is undefined only
when "to" and "from" overlap. The original message claimed that compiler can be smart enough
to recognize use case and according to situation, apply different
semantics then those specified by code.
The compiler wouldn't "apply different semantics", the compiler would
detect that the effect of memcpy can be achieved much quicker and
therefore generate much better code.
This leads him to conclusion that produced code will be faster when
compiler applies assignment semantics then memcpy semantics. This is just wrong example, but if we observe this:
double x[2];y=0.;
memcpy((char*)x+1,&y,sizeof(y));
double t = *(double*)((char*)x+1); /* depends on
hardware tolerance to alignment */
I would recommend to write ((char *) x) + 1 instead of (char *) x + 1,
so that (1) everyone knows what the expression means without having to
look up the precedence of cast operators, and (2) everyone knows that
what you wrote is what you meant.
memcpy(x,&y,sizeof(y));
t = *x;

It is obvious that second case will be always faster or at least
equal then first case, even if memcpy have to copy same number of bytes
and use ram instead or sizeof (x) ==1 .

In this case, the first assignment to t will have undefined behavior.
There are implementations where it will crash, there are others where it
will be set t to the same value as y, just very slowly, but it is
undefined behavior.

Since the compiler can easily detect that this is undefined behavior, it
is free to do whatever it likes - for example, not doing the memcpy and
the initialisation of t at all. Which will make the first case run
_faster_ than the second case.

x is of type double. In common implementations, sizeof (x) == 8. sizeof
(double) == 1 would be extremely unusual.

Oct 25 '05 #39

Michael Mair wrote:

Branimir Maksimovic wrote:
Michael Mair wrote:
Branimir Maksimovic wrote:

"Christian Bau" <ch***********@cbau.freeserve.co.uk> wrote in message

>Consider this:
>
> double x, y;
> memcpy (&x, &y, sizeof (x) - 1);
> memcpy (&x, &y, sizeof (x));

Nonsense. First one will probably generate hardware excpetion, and
second one will probably work. Then again first one would be much faster
as it is simple nop where sizeof(x) == 1 but second one would copy contents.

The second one is guaranteed to work and have the same effect
as x = y, the first may lead to a trap representation of x but
can also work.

In case that sizeof(x) == 1 , I agree.

Which is nothing more than you did before.

Are you sure that you are aware of the semantics of memcpy()?

Well, I don't need to, cause I don't use memcpy to assign variables.

Beside the point.

--- C99 ---
7.21.2.1 The memcpy function
Synopsis
1 #include <string.h>
void *memcpy(void * restrict s1, const void * restrict s2, size_t n);

Description
2 The memcpy function copies n characters from the object pointed to by
s2 into the object pointed to by s1. If copying takes place between
objects that overlap, the behavior is undefined.

Returns
3 The memcpy function returns the value of s1.
------------

The only way to safely and portably access the representation of an
object is bytewise (unsigned char). memcpy() does exactly that.

The first memcpy() operation can be replaced
size_t i;
unsigned char *p1= (unsigned char*) &y;
unsigned char *p2= (unsigned char*) &x;

for (i = 0; i < (sizeof x - 1); i++)
*(p2++) = *(p1++);

A conforming implementation has to do this right; you are thinking
of actual hardware and concluding that it cannot work.
Still, the "as if" rule has to hold, the operation has to work. There
must not be any repercussions as long as x is not accessed afterwards.

Thank you for proving my point. memcpy can't have x=y semantics
in any way. It can only have same final effect, but paths are
different as x=y is allowed to produce hardware exception
but memcpy(&x,&y,sizeof(x)); is not
Greetings, Bane.

Oct 25 '05 #40

In article <11*********************@z14g2000cwz.googlegroups. com>,
"Branimir Maksimovic" <bm***@volomp.com> wrote:

If implementation is allowed to use floating point registers
for memcpy, then yes implementation defined behavior.
memcpy has some defined meaning, defined by the C Standard (and the C++
Standard uses the same definition). The implementation is free to do
whatever it likes, as long as it guarantees that the results will be the
same as required.

If I have variables

double x, y;

and a call

memcpy (&x, &y, sizeof (x));

then _in this particular case_ the effect of the memcpy case happens to
be exactly the same as the effect of

(void) (x = y)

(not on every possible implementation, but in many implementations. The
implementation would have to know for example that assigning NaN's or
negative zeroes or denormalised numbers etc. doesn't change the bit
pattern, and doesn't cause any side effects like hardware exceptions).

So if the implementation knows all that, then in this particular case it
can use floating point registers for copying these bytes instead of
calling memcpy.
Question is: Are such implementations conformant?
eg:
double x,double y; // produces FPU exception if x,y gets trap value?
memcpy(&x,&y,sizeof(x)); // produces exception if FPU registers are
used
// and y has trap representation value
// which is non conformant as I understand memcpy semantics

Conclusion: if FPU registers are allowed to be used
for memcpy then it is normal to allow hardware exceptions
during memcpy.

No, this is exactly the wrong way round: If the assignment of trap
values would raise hardware exceptions, then the compiler _wouldn't_ be
allowed to use floating-point registers for memcpy. memcpy is _not_
allowed to raise an exception in this situation.

The compiler is allowed to do _anything_ as long as you can't detect the
difference by observing what the program does. If memcpy would raise a
hardware exception, then you could observe that, so memcpy isn't allowed
to do that.

Oct 25 '05 #41

Christian Bau wrote:

In article <11*********************@z14g2000cwz.googlegroups. com>,
"Branimir Maksimovic" <bm***@volomp.com> wrote:

This is just wrong example, but if we observe this:
double x[2];y=0.;
memcpy((char*)x+1,&y,sizeof(y));
double t = *(double*)((char*)x+1); /* depends on
hardware tolerance to alignment */

I would recommend to write ((char *) x) + 1 instead of (char *) x + 1,
so that (1) everyone knows what the expression means without having to
look up the precedence of cast operators, and (2) everyone knows that
what you wrote is what you meant.
memcpy(x,&y,sizeof(y));
t = *x;

It is obvious that second case will be always faster or at least
equal then first case, even if memcpy have to copy same number of bytes
and use ram instead or sizeof (x) ==1 .

In this case, the first assignment to t will have undefined behavior.
There are implementations where it will crash, there are others where it
will be set t to the same value as y, just very slowly, but it is
undefined behavior.

Only on implementations where alignment requirement for a type
is not met.
This is a basic thing for implementing memory allocators.
memcpy works in all cases because it is defined that char is
aligned on any address.
If that wouldn't be the case then no memory allocator can't be written
in C or C++ without causing undefined behavior.
Remember that objects are defined as a sequence of bytes.
So when you convert object to void* it is plain raw memory
of object size bytes. You can place there anything which is smaller
or equal and meats right alignment.

Greetings, Bane.

Oct 25 '05 #42

Christian Bau wrote:

In article <11*********************@z14g2000cwz.googlegroups. com>,
"Branimir Maksimovic" <bm***@volomp.com> wrote:
If implementation is allowed to use floating point registers
for memcpy, then yes implementation defined behavior.

memcpy has some defined meaning, defined by the C Standard (and the C++
Standard uses the same definition). The implementation is free to do
whatever it likes, as long as it guarantees that the results will be the
same as required.

If I have variables

double x, y;

and a call

memcpy (&x, &y, sizeof (x));

then _in this particular case_ the effect of the memcpy case happens to
be exactly the same as the effect of

(void) (x = y)

(not on every possible implementation, but in many implementations. The
implementation would have to know for example that assigning NaN's or
negative zeroes or denormalised numbers etc. doesn't change the bit
pattern, and doesn't cause any side effects like hardware exceptions).

So if the implementation knows all that, then in this particular case it
can use floating point registers for copying these bytes instead of
calling memcpy.

So basically what you are saying is that if particular hardware
does not cause hardware exceptions then implementation can use
floating point registers?
In such case both memcpy's can use registers without problem.
Case that implemementation checks every
size bytes for trap value and use some other means otherwise
to copy is completely unrealistic.

Greetings, Bane.

Oct 25 '05 #43

Keith Thompson

"Branimir Maksimovic" <bm***@volomp.com> writes:

Christian Bau wrote:
In article <11*********************@z14g2000cwz.googlegroups. com>,
"Branimir Maksimovic" <bm***@volomp.com> wrote:
> If implementation is allowed to use floating point registers
> for memcpy, then yes implementation defined behavior.
memcpy has some defined meaning, defined by the C Standard (and the C++
Standard uses the same definition). The implementation is free to do
whatever it likes, as long as it guarantees that the results will be the
same as required.

If I have variables

double x, y;

and a call

memcpy (&x, &y, sizeof (x));

then _in this particular case_ the effect of the memcpy case happens to
be exactly the same as the effect of

(void) (x = y)

(not on every possible implementation, but in many implementations. The
implementation would have to know for example that assigning NaN's or
negative zeroes or denormalised numbers etc. doesn't change the bit
pattern, and doesn't cause any side effects like hardware exceptions).

So if the implementation knows all that, then in this particular case it
can use floating point registers for copying these bytes instead of
calling memcpy.

So basically what you are saying is that if particular hardware
does not cause hardware exceptions then implementation can use
floating point registers?

The implementation can use floating point registers in the
implementation of memcpy() if it can guarantee that doing so meets the
standard's requirements for memcpy(). Hardware exceptions aren't the
only consideration, as Christian Bau very clearly explained (see
above).

For "floating point registers", you can substitute any conceivable
implementation detail, including carrier pigeons carrying clay
tablets. It just has to work.
In such case both memcpy's can use registers without problem.
Case that implemementation checks every
size bytes for trap value and use some other means otherwise
to copy is completely unrealistic.

Unrealistic, but perfectly legal as far as the standard is concerned.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Oct 25 '05 #44

Branimir Maksimovic wrote:

William Hughes wrote:
Branimir Maksimovic wrote:
William Hughes wrote:
> peter koch wrote:
> > Christian Bau skrev:
> >
> > > In article <11**********************@f14g2000cwb.googlegroups .com>,
> > > "Greg" <gr****@pacbell.net> wrote:
> > >
> > [snip]
> >
> > > > Not necessarily. If one can show that Y performs every operation that X
> > > > performs, and then has to perform additional operations outside of that
> > > > set and that require a measurable amount of time to complete, then one
> > > > would have successfully proven that X is faster than Y.
> > >
> > > One would have proven no such thing.
> > >
> > > Consider this:
> > >
> > > double x, y;
> > > memcpy (&x, &y, sizeof (x) - 1);
> > > memcpy (&x, &y, sizeof (x));
> > >
> > [snip explanation that second memcpy might be faster]
> >
> > Hi Christian
> >
> > Just a marvellous example you gave us - code that in C++ (and C) causes
> > undefined behaviour
>
> What is the undefined behaviour (assume sizeof (x) >1)
>
> > and is just utterly contrived and useless.
>
> Contrived yes. Most simple examples of complex behaviour
> are contrived. Useless no. Indeed this code is not meant
> to be used but the *example* of an "bigger operaton"
> (copying x bytes rather than x-1 bytes) that might reasonably
> be expected to execute faster is useful indeed.
>
> A related question. Is it ever better to use an int
> variable, even when a char is big enough?
>
> [For a less useful example consider a perverse
> implementation (e.g. the DS2K) which introduces a
> delay of say 20 minutes, seemingly at random. If the "smaller"
> operation incurs the delay, but the "bigger" does not, then
> the larger operation will be faster. While this
> is correct, such an implementation cannot be considered
> reasonable.]
>
> > You
> > also have my sympathy when you call a poster who suggests using
> > assignment to assign for a "complete bullshitter".
>
>
> The poster claimed undefined behaviour, then when challenged
> claimed ignorance (and gave a stupid exuse for this
> ignorance). The term "complete bullshitter" seems an accurate
> description.

No I didn't claim undefined behavior.
I claimed that first case would probably produce hardware exception
And this differs from undefined behaviour how?
(are you claiming implementation defined behaviour?)

If implementation is allowed to use floating point registers
for memcpy, then yes implementation defined behavior.

No. Check the standard. memcpy has to work! An
implementation can use floating point registers
for memcpy only if they do not cause problems.
Anyway, you have yet to even attempt to justify your
claim that the first case would probably produce
a hardware exception.
In case that implementation use FPU registers for
memcpy of floating point variables that would be
normal.

No. Check the standard. memcpy has to work!
It is irrelevant how many bytes are copied.

Question is: Are such implementations conformant?
Yes, this is the important question. Pity you did not
answer it earlier.

eg:
double x,double y; // produces FPU exception if x,y gets trap value?
memcpy(&x,&y,sizeof(x)); // produces exception if FPU registers are
used
// and y has trap representation value
// which is non conformant as I understand memcpy semantics
So as memcpy is probaby conformant, the statement that
memcpy(&x,&y,sizeof(x)-1) will probably lead to a hardware trap is
wrong.

Conclusion: if FPU registers are allowed to be used
for memcpy then it is normal to allow hardware exceptions
during memcpy.
Yes and if my Grandmother had wheels she would be a bus. If
FPU registers are going to cause problems then they cannot
be used during memcpy.
Compiler wouldn't care if memcpy produce exception or not
in that case.

A conforming compiler cannot produce code that produces an
exception in this case.

- William Hughes

Oct 26 '05 #45

Branimir Maksimovic wrote:

Christian Bau wrote:
In article <11*********************@z14g2000cwz.googlegroups. com>,
"Branimir Maksimovic" <bm***@volomp.com> wrote:

This is just wrong example, but if we observe this:
double x[2];y=0.;
memcpy((char*)x+1,&y,sizeof(y));
double t = *(double*)((char*)x+1); /* depends on
hardware tolerance to alignment */
I would recommend to write ((char *) x) + 1 instead of (char *) x + 1,
so that (1) everyone knows what the expression means without having to
look up the precedence of cast operators, and (2) everyone knows that
what you wrote is what you meant.
memcpy(x,&y,sizeof(y));
t = *x;

It is obvious that second case will be always faster or at least
equal then first case, even if memcpy have to copy same number of bytes
and use ram instead or sizeof (x) ==1 .

In this case, the first assignment to t will have undefined behavior.
There are implementations where it will crash, there are others where it
will be set t to the same value as y, just very slowly, but it is
undefined behavior.

Only on implementations where alignment requirement for a type
is not met.

No! It is undefined behaviour on any implementation.
The fact that it works and works the way you expect does
not make it defined behaviour
This is a basic thing for implementing memory allocators.
memcpy works in all cases because it is defined that char is
aligned on any address.
If that wouldn't be the case then no memory allocator can't be written
in C or C++ without causing undefined behavior.

Assuming you did not intend the double negative, wrong.

It is not clear if you mean

- a memory allocator cannot be written for C
(i.e. malloc cannot be written)

- a memory allocator cannot be written in C
(i.e. a C function, say my_malloc, cannot be written)

However in either case you are incorrect

(as an extreme case consider a memory allocator that
allocates a block of 1 megabyte of memory, suitably
alligned for anything no matter how much memory is
asked for. Ruinously inefficient, but it certainly
can be done.)

- William Hughes

Oct 26 '05 #46

Branimir Maksimovic wrote:

Michael Mair wrote:
Branimir Maksimovic wrote:
"Christian Bau" <ch***********@cbau.freeserve.co.uk> wrote in message
>Consider this:
>
> double x, y;
> memcpy (&x, &y, sizeof (x) - 1);
> memcpy (&x, &y, sizeof (x));
>

Nonsense. First one will probably generate hardware excpetion, and
second one will probably work. Then again first one would be much faster
as it is simple nop where sizeof(x) == 1 but second one would copy contents.
The second one is guaranteed to work and have the same effect
as x = y, the first may lead to a trap representation of x but
can also work.

In case that sizeof(x) == 1 , I agree.
Are you sure that you are aware of the semantics of memcpy()?

Well, I don't need to, cause I don't use memcpy to assign variables.

Someone who is not aware of the sematics of memcpy but
makes pronouncements about its behaviour is properly called
a bullshitter

- William Hughes.

Greetings, Bane.

Oct 26 '05 #47

Jordan Abel

On 2005-10-25, Keith Thompson <ks***@mib.org> wrote:

Jordan Abel <jm****@purdue.edu> writes:
On 2005-10-25, William Hughes <wp*******@hotmail.com> wrote:
peter koch wrote:
Christian Bau skrev:
>
> One would have proven no such thing.
>
> Consider this:
>
> double x, y;
> memcpy (&x, &y, sizeof (x) - 1);
> memcpy (&x, &y, sizeof (x));
>
[snip explanation that second memcpy might be faster]

Just a marvellous example you gave us - code that in C++ (and
C) causes undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)

for example you could end up with a trap representation in x.
say, a signalling nan of some kind. and in any case you're not
guaranteed anything useful about the value you might get

Even if the memcpy() stores a trap representation in x, there's no
undefined behavior until you try to read x as a double. The
quoted code doesn't do that.

BTW, please keep your text down to about 72 columns so it doesn't
overflow an 80-column screen when quoted. My newsreader lets me
reformat quoted text easily, but others might not.

Sorry about that - and, right - if you want to get pedantic about it
there's no undefined behavior invoked _here_ [except possibly for
reading from an uninitialized variable] - and indeed none at all if
you follow the memcpy with ((unsigned char *)x)[(sizeof x)-1] =
((unsigned char *)y)[(sizeof x)-1] or otherwise finish the job... I
just assumed you wouldn't have a double unless you intended to use
it as such.

Oct 26 '05 #48

Keith Thompson

Jordan Abel <jm****@purdue.edu> writes:
[snip]

Sorry about that - and, right - if you want to get pedantic about it
there's no undefined behavior invoked _here_ [except possibly for
reading from an uninitialized variable] - and indeed none at all if
you follow the memcpy with ((unsigned char *)x)[(sizeof x)-1] =
((unsigned char *)y)[(sizeof x)-1] or otherwise finish the job... I
just assumed you wouldn't have a double unless you intended to use
it as such.

Any time the term "undefined behavior" is used in a discussion, you
can assume that pedantry is appropriate.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Oct 26 '05 #49