C perfomance

Papadopoulos Giannis

a) pre vs post increment/decrement

I have read somewhere that:

“Prefer pre-increment and -decrement to postfix operators. Postfix
operators (i++) copy the existing value to a temporary object, increment
the internal value, and then return the temporary. Prefix operators
(++i) increment the value and return a reference to it. With objects
such as iterators, creating temporary copies is expensive compared to
built-in ints.”

A modern compiler wouldn’t make the optimization, so

i++;

and

++i;

would give the same instructions?
b) I find that realloc() calls sometimes take more time to complete than
malloc() calls. Is this the general case?
c) Why do some people declare all the variables at the start of each
function? And I mean ALL variables, including those that are nested in
deep fors and ifs... I don’t see any obvious performance gains - unless
they do it to remember what they are using...

Nov 14 '05 #1

Subscribe Reply

2520

Bruno Desthuilliers

Papadopoulos Giannis wrote:
(snip)

c) Why do some people declare all the variables at the start of each
function? And I mean ALL variables, including those that are nested in
deep fors and ifs...

Could it be possible that some people don't know that they can declare
variables at the start of each *block* ?-)

Bruno

Nov 14 '05 #2

Papadopoulos Giannis

Bruno Desthuilliers wrote:

Papadopoulos Giannis wrote:
(snip)

c) Why do some people declare all the variables at the start of each
function? And I mean ALL variables, including those that are nested in
deep fors and ifs...

Could it be possible that some people don't know that they can declare
variables at the start of each *block* ?-)

Bruno

Maybe.. But I find it often and I wonder...

Nov 14 '05 #3

Christian Bau

In article <bv***********@ulysses.noc.ntua.gr>,
Papadopoulos Giannis <ip******@inf.uth.gr> wrote:

a) pre vs post increment/decrement

I have read somewhere that:

³Prefer pre-increment and -decrement to postfix operators. Postfix
operators (i++) copy the existing value to a temporary object, increment
the internal value, and then return the temporary. Prefix operators
(++i) increment the value and return a reference to it. With objects
such as iterators, creating temporary copies is expensive compared to
built-in ints.²
I bet you didn't read that in a book about C.
b) I find that realloc() calls sometimes take more time to complete than
malloc() calls. Is this the general case? c) Why do some people declare all the variables at the start of each
function? And I mean ALL variables, including those that are nested in
deep fors and ifs... I don¹t see any obvious performance gains - unless
they do it to remember what they are using...

You worry too much about performance, and you worry too much about the
wrong kind of performance. First try to write code that is bug-free and
readable. That is the most important thing.

If there is need to make your code faster: First measure. Get yourself a
profiler, learn how to use it, learn how to interpret the numbers. Then
before trying to figure out how to make an operation faster that you do
a million times, figure out how to do it only 100,000 times or 1000
times. That's how you make a program fast.

Nov 14 '05 #4

E. Robert Tisdale

Papadopoulos Giannis wrote:

a) pre vs post increment/decrement

I have read somewhere that:

“Prefer pre-increment and -decrement to postfix operators. Postfix
operators (i++) copy the existing value to a temporary object,
increment the internal value, and then return the temporary.
Prefix operators (++i) increment the value
and return a reference to it. With objects such as iterators,
creating temporary copies is expensive compared to built-in ints.”
This must be a reference to
overloaded increment and decrement operators in C++.

Favoring pre-decrement/increment over post decrement/increment
operators is a good habit for C programmers who must also
write C++ programs. Otherwise, it is a matter of style.
A modern compiler wouldn’t make the optimization, so

i++;

and

++i;

would give the same instructions?
Yes.
b) I find that realloc() calls sometimes take more time to complete
than malloc() calls. Is this the general case?
Typically, the difference is hard to measure except in contrived cases.
c) Why do some people declare all the variables
at the start of each function? And I mean *all* variables
including those that are nested in deep for's and ifs...
I don’t see any obvious performance gains -
unless they do it to remember what they are using...

1. Some C programs are translations of Fortran programs.
2. Some C programs are written by Fortran programmers.
3. Early versions of C (before C 89) required this
according to Brian W. Kernighan and Dennis M. Ritchie
in "The C Programming Language",
Chapter 1: A Tutorial Introduction,
Section 2: Variables and Arithmetic, page 8:
"In C, /all/ variables must be declared before use, usually
at the beginning of a function before any executable statements."

Nov 14 '05 #5

Mark McIntyre

On Thu, 29 Jan 2004 23:45:25 +0200, in comp.lang.c , Papadopoulos
Giannis <ip******@inf.uth.gr> wrote:

a) pre vs post increment/decrement

I have read somewhere that:

“Prefer pre-increment and -decrement to postfix operators.
Yes, this is an old chestnut. I can find nothing to support it
nowadays, tho ICBW.i++;
and
++i;
would give the same instructions?
AFAIK yes. Try it and see.
b) I find that realloc() calls sometimes take more time to complete than
malloc() calls. Is this the general case?
The standard doesn't say.
c) Why do some people declare all the variables at the start of each
function?
Until C99, you pretty much had to do it like that. Plus many people
consider it a good idea to keep your declarations in one place for
easier reference. Spraying declarations around through the body of
your code makes it a lot harder to follow.
And I mean ALL variables, including those that are nested in
deep fors and ifs...
I agree, this is often a bad idea.
I don’t see any obvious performance gains
Again, C doesn't say.
unless they do it to remember what they are using...

Which may be a performance gain in itself of course

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Nov 14 '05 #6

Nick Landsberg

Christian Bau wrote:

In article <bv***********@ulysses.noc.ntua.gr>,
Papadopoulos Giannis <ip******@inf.uth.gr> wrote:

a) pre vs post increment/decrement

I have read somewhere that:

³Prefer pre-increment and -decrement to postfix operators. Postfix
operators (i++) copy the existing value to a temporary object, increment
the internal value, and then return the temporary. Prefix operators
(++i) increment the value and return a reference to it. With objects
such as iterators, creating temporary copies is expensive compared to
built-in ints.²

I bet you didn't read that in a book about C.

This actually depends on the underlying chip architecture,
which is probably off-topic here.

b) I find that realloc() calls sometimes take more time to complete than
malloc() calls. Is this the general case?

c) Why do some people declare all the variables at the start of each
function? And I mean ALL variables, including those that are nested in
deep fors and ifs... I don¹t see any obvious performance gains - unless
they do it to remember what they are using...

You worry too much about performance, and you worry too much about the
wrong kind of performance. First try to write code that is bug-free and
readable. That is the most important thing.

AMEN!

If there is need to make your code faster: First measure. Get yourself a ^^^^
The key word is "need". What are the performance requirements? You
mean you didn't get any from the customer? Shame on you! This will
tell you how fast if MUST be in order to be acceptable.
profiler, learn how to use it, learn how to interpret the numbers. Then
before trying to figure out how to make an operation faster that you do
a million times, figure out how to do it only 100,000 times or 1000
times. That's how you make a program fast.

Agreed 1,000%! However, often after the developer has already
written code which uses foo() hundreds of thousands of times,
there is usually an emotional unwillingness to admit that a better
algorithm would do the trick and then they try to optimize foo(), even
if it's a standard library call. Start with requirements, as above,
design your algorithms, protoype and measure those which are going
to be invoked most often in order to find out if there will be
a problem. It's not just the efficiency (or lack thereof) in any
module, it's cpu-cost times frequency of use.

This holds for any language, not just C.
Sheesh, this IS [OT].

--
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #7

Nick Landsberg

E. Robert Tisdale wrote:

Papadopoulos Giannis wrote:

a) pre vs post increment/decrement

I have read somewhere that:

“Prefer pre-increment and -decrement to postfix operators. Postfix
operators (i++) copy the existing value to a temporary object,
increment the internal value, and then return the temporary.
Prefix operators (++i) increment the value
and return a reference to it. With objects such as iterators,
creating temporary copies is expensive compared to built-in ints.”

This must be a reference to
overloaded increment and decrement operators in C++.

Favoring pre-decrement/increment over post decrement/increment
operators is a good habit for C programmers who must also
write C++ programs. Otherwise, it is a matter of style.

A modern compiler wouldn’t make the optimization, so

i++;

and

++i;

would give the same instructions?

Yes.

For this trivial example, yes.

For the case of j = i++; vs. j = ++i;
(which are very different in intent), the
emitted code SHOULD be different,
unless you have a broken compiler.

The efficiency of the constructs to implement
these is a function of the underlying chip
architecture and not a language issue.

Nov 14 '05 #8

CBFalconer

Mark McIntyre wrote:

Papadopoulos Giannis <ip******@inf.uth.gr> wrote:
.... snip ...
b) I find that realloc() calls sometimes take more time to
complete than malloc() calls. Is this the general case?
The standard doesn't say.

However, if you think about typical implementations, _sometimes_
it is necessary to allocate a whole new block of memory and copy
old data over to it. It will normally take longer to copy than to
not copy.

c) Why do some people declare all the variables at the start
of each function?

Until C99, you pretty much had to do it like that. Plus many people
consider it a good idea to keep your declarations in one place for
easier reference. Spraying declarations around through the body of
your code makes it a lot harder to follow.

IMO if those declarations are getting awkwardly far away from the
place they are used, you are writing overly large functions in the
first place.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #9

E. Robert Tisdale

CBFalconer wrote:

Mark McIntyre wrote:
Until C99, you pretty much had to do it like that. Plus many people
consider it a good idea to keep your declarations in one place for
easier reference. Spraying declarations around through the body of
your code makes it a lot harder to follow.

IMO if those declarations are getting awkwardly far away
from the place they are used,
you are writing overly large functions in the first place.

I agree.
And moving the declarations closer to the point of first use
is the first step in decomposing the function
into a set of smaller functions
that the compiler can inline automatically.

Nov 14 '05 #10

pete

Papadopoulos Giannis wrote:

a) pre vs post increment/decrement

I have read somewhere that:

“Prefer pre-increment and -decrement to postfix operators. Postfix
operators (i++) copy the existing value to a temporary object,
increment the internal value, and then return the temporary.

That's all wrong.
There is no order of operation vs evaluation implied in i++.
Any code which depends on such an order, has a problem.

These loops are semantically equal:
while (i++ != 5) {/*code*/}
while (++i != 6) {/*code*/}

--
pete

Nov 14 '05 #11

Paul Hsieh

Papadopoulos Giannis <ip******@inf.uth.gr> wrote:

A modern compiler wouldn?t make the optimization, so

i++;

and

++i;

would give the same instructions?
Not only will the compiler give you the same instructions -- even if
it didn't the underlying CPU would execute all trivially equivalent
re-expressions of said expression with identical performance. (inc
eax; add eax, 1; sub eax, -1 -- its all the same.)
b) I find that realloc() calls sometimes take more time to complete than
malloc() calls. Is this the general case?
realloc() may have to perform a memcpy(). In general, actually you
should find that realloc() is *MUCH* slower than malloc().
c) Why do some people declare all the variables at the start of each
function? And I mean ALL variables, including those that are nested in
deep fors and ifs... I don?t see any obvious performance gains - unless
they do it to remember what they are using...

There is no performance difference, whatsoever. I also would point
out that I actually try to put any variable declaration that isn't
reused into the deepest possible scope where it can be declared. This
gives the compiler an opportunity to alias variables (even of
different types) while helping catch errors of using dead variables
out of scope.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #12

Christian Bau

In article <79**************************@posting.google.com >,
qe*@pobox.com (Paul Hsieh) wrote:

realloc() may have to perform a memcpy(). In general, actually you
should find that realloc() is *MUCH* slower than malloc().

Of course it has to do more things. If you tried to do the same things
as realloc by hand (malloc + memcpy + free + all kinds of checks), then
most likely that would end up slower.

Nov 14 '05 #13

Tim Prince

"Paul Hsieh" <qe*@pobox.com> wrote in message
news:79**************************@posting.google.c om...

Papadopoulos Giannis <ip******@inf.uth.gr> wrote:
A modern compiler wouldn?t make the optimization, so

i++;

and

++i;

would give the same instructions?

Not only will the compiler give you the same instructions -- even if
it didn't the underlying CPU would execute all trivially equivalent
re-expressions of said expression with identical performance. (inc
eax; add eax, 1; sub eax, -1 -- its all the same.)

No, these instructions don't perform the same, on common platforms which
spell them this way. All the more reason for using a compiler which can
take portable C and choose the best instruction for the target architecture.
The OP assertion, that ++i could be more efficient (when used in subscript
context), was true of gcc on the Mac I once had.

Nov 14 '05 #14

Malcolm

"Papadopoulos Giannis" <ip******@inf.uth.gr> wrote in message

a)
“Prefer pre-increment and -decrement to postfix operators.
You're mixing up C++ overloaded operators with the C types. In C++,
overloading the postincrement operator does indeed force the compiler to
make a temporary copy. In C the machine instructions are identical, but the
convention is to use postfix form where the order of evaluation doesn't
matter.
b) I find that realloc() calls sometimes take more time to complete than
malloc() calls. Is this the general case?
Yes, because realloc() has to copy the reallocated block. However this isn't
always the case, because some libraries set newly-allocated memory to a
fixed value, which takes as much time as copying.
c) Why do some people declare all the variables at the start of each
function?

This is because we already have two levels of scope - global scope and file
scope. Function scopes adds another layer. Adding a fourth, or
multiply-nested block scopes, moves the number of levels beyond what a human
programmer can reasonably be expected to cope with. There is of course no
problem for the computer - it's a human-to-human thing.

Nov 14 '05 #15

Paul Hsieh

"Tim Prince" <tp*****@computer.org> wrote:

"Paul Hsieh" <qe*@pobox.com> wrote:
Papadopoulos Giannis <ip******@inf.uth.gr> wrote:
A modern compiler wouldn?t make the optimization, so

i++;

and

++i;

would give the same instructions?
Not only will the compiler give you the same instructions -- even if
it didn't the underlying CPU would execute all trivially equivalent
re-expressions of said expression with identical performance. (inc
eax; add eax, 1; sub eax, -1 -- its all the same.)

No, these instructions don't perform the same, on common platforms which
spell them this way.

Yes they do. The only difference is inc doesn't create a dependency
on the carry flag. Otherwise they all take a third, quarter or half a
clock depending on which brand of x86 processor is executing them.
This is true of the Intel 80486, and every 486 or better class x86
architecture that has followed it, from whichever vendor.
[...] All the more reason for using a compiler which can
take portable C and choose the best instruction for the target architecture.
The OP assertion, that ++i could be more efficient (when used in subscript
context), was true of gcc on the Mac I once had.

Well ok, then the Mac port of the gcc compiler sucks ass -- but it
also means that the underlying PPC must be somewhat weak not to make
this irrelevant, which I am not sure I believe.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #16

Chris Torek

>> "Paul Hsieh" <qe*@pobox.com> wrote:

(inc eax; add eax, 1; sub eax, -1 -- its all the same.)

"Tim Prince" <tp*****@computer.org> wrote:
No, these instructions don't perform the same, on common platforms which
spell them this way.

In article <news:79**************************@posting.google. com>
Paul Hsieh <qe*@pobox.com> writes:Yes they do. The only difference is inc doesn't create a dependency
on the carry flag. Otherwise they all take a third, quarter or half a
clock depending on which brand of x86 processor is executing them.
I will believe you on the cycle count (my x86 documentation is not
handy anyway :-) ), but they differ in another way that *could*
matter in terms of performance: "inc eax" is a single byte opcode
(0x40), "add 1" is a six-byte sequence (01 05, followed by the four
bytes representing 1), and "sub -1" is also a six-byte sequence
(29 05, followed by the four bytes representing -1). Thus, the
first variant saves code space. Depending on instruction cache
usage, this could affect the performance of some loop. (It seems
a bit unlikely.)
[...] All the more reason for using a compiler which can
take portable C and choose the best instruction for the target architecture.
The OP assertion, that ++i could be more efficient (when used in subscript
context), was true of gcc on the Mac I once had.

Well ok, then the Mac port of the gcc compiler sucks ass -- but it
also means that the underlying PPC must be somewhat weak not to make
this irrelevant, which I am not sure I believe.

I think you are making unwarranted assumptions here, such as which
version of gcc was involved, and whether the target CPU was the
PowerPC or the older Mac CPU family, the 680x0. It is, however,
true that the PowerPC has an unusual instruction set, with instructions
like "bdnzf" (decrement count and branch if comparison result not
equal and count not zero) and "rlwinm" (rotate and mask), and
apparently many versions of gcc do not use it very effectively.

About all that can be said with any certainty, when it comes to
actual run time of various C source code constructs, is that "it
depends". :-)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 14 '05 #17

CBFalconer

Chris Torek wrote:

.... snip ...
I think you are making unwarranted assumptions here, such as which
version of gcc was involved, and whether the target CPU was the
PowerPC or the older Mac CPU family, the 680x0. It is, however,
true that the PowerPC has an unusual instruction set, with instructions
like "bdnzf" (decrement count and branch if comparison result not
equal and count not zero) and "rlwinm" (rotate and mask), and
apparently many versions of gcc do not use it very effectively.

About all that can be said with any certainty, when it comes to
actual run time of various C source code constructs, is that "it
depends". :-)

I consider that a prerequisite for building an efficient code
generator is a good assembly language programmer for that
machine. The next requirement is a good register allocation
scheme. These days things are complicated by pipelining and the
need to know jump probabilities.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #18

Paul Hsieh

Chris Torek <no****@torek.net> wrote:

"Paul Hsieh" <qe*@pobox.com> wrote:
(inc eax; add eax, 1; sub eax, -1 -- its all the same.)
"Tim Prince" <tp*****@computer.org> wrote: No, these instructions don't perform the same, on common platforms which
spell them this way.
In article <news:79**************************@posting.google. com>
Paul Hsieh <qe*@pobox.com> writes:
Yes they do. The only difference is inc doesn't create a dependency
on the carry flag. Otherwise they all take a third, quarter or half a
clock depending on which brand of x86 processor is executing them.

I will believe you on the cycle count (my x86 documentation is not
handy anyway :-) ), but they differ in another way that *could*
matter in terms of performance: "inc eax" is a single byte opcode
(0x40), "add 1" is a six-byte sequence (01 05, followed by the four
bytes representing 1),

There is a short mode of encoding that allows a byte offset rather
than a dword offset (083h 0C1h) making for a total of 3 bytes for the
instruction. Same with sub. Since all x86s can consume 16-bytes
worth per clock in the instruction fetch, this is fine. I would
expect all x86 compilers and assemblers to see this.
Well ok, then the Mac port of the gcc compiler sucks ass -- but it
also means that the underlying PPC must be somewhat weak not to make
this irrelevant, which I am not sure I believe.

I think you are making unwarranted assumptions here, such as which
version of gcc was involved, and whether the target CPU was the
PowerPC or the older Mac CPU family, the 680x0.

The 680x0 is kind of obsolete on the Mac. Not quite as obsolete as
the 386 (where there is a difference between inc and add), but just
about. Modern PowerPCs are out of order and fairly wide executers, so
using special instructions isn't going to really help.

Microprocessor architectures have all been moving forward in a
particular way -- all that matters is the length of the long
dependency chain, not the vague slight differences in expressing each
node on that chain. It doesn't make sense to chase performance in
this way any more.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #19

Christian Bau

In article <79**************************@posting.google.com >,
qe*@pobox.com (Paul Hsieh) wrote:

"Tim Prince" <tp*****@computer.org> wrote:
[...] All the more reason for using a compiler which can
take portable C and choose the best instruction for the target architecture.
The OP assertion, that ++i could be more efficient (when used in subscript
context), was true of gcc on the Mac I once had.

Well ok, then the Mac port of the gcc compiler sucks ass -- but it
also means that the underlying PPC must be somewhat weak not to make
this irrelevant, which I am not sure I believe.

Not for ++i vs. i++, but for *++p vs. *p++ it made a difference: The
PowerPC has a "load with update" instruction: It calculates an address,
loads the data at that address, and writes the address into a register,
all in one instruction. This makes it easy to implement x = *++p; in one
instruction: Calculate p + 1, load x from address p + 1, store p + 1
into p all in one instruction. x = *p++ needs two instructions instead.

Nov 14 '05 #20

Mark McIntyre

On 1 Feb 2004 06:12:21 -0800, in comp.lang.c , qe*@pobox.com (Paul
Hsieh) wrote:

Chris Torek <no****@torek.net> wrote:
>> "Paul Hsieh" <qe*@pobox.com> wrote:

"Tim Prince" <tp*****@computer.org> wrote:

(misc stuff about x86 assembler, etc. )

Remind me what the hell this has to do with C?
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Nov 14 '05 #21

pete

Christian Bau wrote:

In article <79**************************@posting.google.com >,
qe*@pobox.com (Paul Hsieh) wrote:
"Tim Prince" <tp*****@computer.org> wrote:
[...] All the more reason for using a compiler which can
take portable C and choose the best instruction for the target architecture.
The OP assertion, that ++i could be more efficient (when used in subscript
context), was true of gcc on the Mac I once had.

Well ok, then the Mac port of the gcc compiler sucks ass -- but it
also means that the underlying PPC must be somewhat weak not to make
this irrelevant, which I am not sure I believe.

Not for ++i vs. i++, but for *++p vs. *p++ it made a difference:

*++p is semantically different from *p++

There's no way that the evaluation of those two expressions in code,
could generate the same machine instructions in translation.

--
pete

Nov 14 '05 #22

Peter Nilsson

"pete" <pf*****@mindspring.com> wrote in message
news:40***********@mindspring.com...

Christian Bau wrote:

In article <79**************************@posting.google.com >,
qe*@pobox.com (Paul Hsieh) wrote:
"Tim Prince" <tp*****@computer.org> wrote:
> [...] All the more reason for using a compiler which can
> take portable C and choose the best instruction for the target architecture. > The OP assertion, that ++i could be more efficient (when used in subscript > context), was true of gcc on the Mac I once had.

Well ok, then the Mac port of the gcc compiler sucks ass -- but it
also means that the underlying PPC must be somewhat weak not to make
this irrelevant, which I am not sure I believe.

Not for ++i vs. i++, but for *++p vs. *p++ it made a difference:

*++p is semantically different from *p++

There's no way that the evaluation of those two expressions in code,
could generate the same machine instructions in translation.

I think what Christian Bau is talking about is the difference between the
following...

char *strcpy(char *s, const char *t)
{
char *p = s;
while (*p++ = *t++);
return s;
}

char *strcpy(char *, const char *t)
{
char *p = s; /* more usual is: char *p = s - 1; */
if (*p = *t) /* t--; */
while (*++p = *++t);
return s;
}

These two are semantically the same (hopefully! ;-), but you could (and
will) observe different optimisations from compilers targetting different
architectures. The top version targets 680x0, the bottom targets Power PC.
The former has fast post-increment, the latter has fast pre-increment.

Of course, you could argue that I should get a better optimising compiler
when needed, but these are quite simple cases. More difficult challanges for
a compiler are not too hard to come up with.

--
Peter

Nov 14 '05 #23

pete

Peter Nilsson wrote:

"pete" <pf*****@mindspring.com> wrote in message
news:40***********@mindspring.com...
Christian Bau wrote:

In article <79**************************@posting.google.com >,
qe*@pobox.com (Paul Hsieh) wrote:

> "Tim Prince" <tp*****@computer.org> wrote:
> > [...] All the more reason for using a compiler which can
> > take portable C and choose the best instruction for the target architecture. > > The OP assertion, that ++i could be more efficient (when used in subscript > > context), was true of gcc on the Mac I once had.
>
> Well ok, then the Mac port of the gcc compiler sucks ass -- but it
> also means that the underlying PPC must be somewhat weak not to make
> this irrelevant, which I am not sure I believe.

Not for ++i vs. i++, but for *++p vs. *p++ it made a difference:

*++p is semantically different from *p++

There's no way that the evaluation of those two expressions in code,
could generate the same machine instructions in translation.

I think what Christian Bau is talking about is the difference between the
following...

char *strcpy(char *s, const char *t)
{
char *p = s;
while (*p++ = *t++);
return s;
}

char *strcpy(char *, const char *t)
{
char *p = s; /* more usual is: char *p = s - 1; */
if (*p = *t) /* t--; */
while (*++p = *++t);
return s;
}

These two are semantically the same (hopefully! ;-),

The functions are the same, but I think that it
would be asking a lot from a compiler, to see that.
If the values of p and t were supposed to be meaningful
after the loop, it would be different.
The loop semantics are not the same.
When t points to a zero length string,
the top version will increment and the bottom one won't.

In this version of strncpy, the bottom loop is my prefered
method of writing a loop that will execute as many times
as the intitial value of n, when I'm not counting clock ticks.
But after the top loop executes,
I need n to represent the number of times still left to go,
so I can't write while(n-- && *s2 != '\0'), there.

char *strncpy(char *s1, const char *s2, size_t n)
{
char *const p1 = s1;

while (n != 0 && *s2 != '\0') {
*s1++ = *s2++;
--n;
}
while (n--) {
*s1++ = '\0';
}
return p1;
}

--
pete

Nov 14 '05 #24

Christian Bau

In article <40***********@mindspring.com>,
pete <pf*****@mindspring.com> wrote:

In this version of strncpy, the bottom loop is my prefered
method of writing a loop that will execute as many times
as the intitial value of n, when I'm not counting clock ticks.
But after the top loop executes,
I need n to represent the number of times still left to go,
so I can't write while(n-- && *s2 != '\0'), there.

char *strncpy(char *s1, const char *s2, size_t n)
{
char *const p1 = s1;

while (n != 0 && *s2 != '\0') {
*s1++ = *s2++;
--n;
}
while (n--) {
*s1++ = '\0';
}
return p1;
}

The second loop would be an example of making your code unreadable in
the hope of saving a few nanoseconds (without success, for many
compilers). Why not

for (; n > 0; --n)
*s1++ = '\0';

Nov 14 '05 #25

pete

Christian Bau wrote:

In article <40***********@mindspring.com>,
pete <pf*****@mindspring.com> wrote:
In this version of strncpy, the bottom loop is my prefered
method of writing a loop that will execute as many times
as the intitial value of n, when I'm not counting clock ticks.
But after the top loop executes,
I need n to represent the number of times still left to go,
so I can't write while(n-- && *s2 != '\0'), there.

char *strncpy(char *s1, const char *s2, size_t n)
{
char *const p1 = s1;

while (n != 0 && *s2 != '\0') {
*s1++ = *s2++;
--n;
}
while (n--) {
*s1++ = '\0';
}
return p1;
}

The second loop would be an example of making your code unreadable
in the hope of saving a few nanoseconds

How do you figure there's a hope of saving a few nanoseconds ?

while (n--){;}
is easy for me to recoginize
as a loop that's supposed to execute n times.
That's why I like it.

--
pete

Nov 14 '05 #26

Nick

pete wrote:

Christian Bau wrote:

In article <40***********@mindspring.com>,
pete <pf*****@mindspring.com> wrote:

In this version of strncpy, the bottom loop is my prefered
method of writing a loop that will execute as many times
as the intitial value of n, when I'm not counting clock ticks.
But after the top loop executes,
I need n to represent the number of times still left to go,
so I can't write while(n-- && *s2 != '\0'), there.

char *strncpy(char *s1, const char *s2, size_t n)
{
char *const p1 = s1;

while (n != 0 && *s2 != '\0') {
*s1++ = *s2++;
--n;
}
while (n--) {
*s1++ = '\0';
}
return p1;
}

The second loop would be an example of making your code unreadable
in the hope of saving a few nanoseconds

How do you figure there's a hope of saving a few nanoseconds ?

while (n--){;}
is easy for me to recoginize
as a loop that's supposed to execute n times.
That's why I like it.

Unless there's a compelling reason to null out the remainder of s1,
a single *s1 = '\0' would suffice to null terminate the string, instead
of the
while (n--) loop?

I'm not sure what the spec for strncp() says regarding whether a single
null is acceptable or whether the remainder of string should be nulled out.

Nick L.

BTW - I also like the while (n--) contruct, but that's because
in m680X0 assembler it was implemented as a single instruction
more or less.

Nov 14 '05 #27

Peter Nilsson

"Nick" <ni***********@excite.com> wrote in message
news:tE*********************@bgtnsc04-news.ops.worldnet.att.net...
....

char *strncpy(char *s1, const char *s2, size_t n)
{ <snip>

Unless there's a compelling reason to null out the remainder of s1,
a single *s1 = '\0' would suffice to null terminate the string,

The 'compelling reason' is supplied by both standards' specification of
strncpy.

--
Peter

Nov 14 '05 #28

pete

Christian Bau wrote:

In article <40***********@mindspring.com>,
pete <pf*****@mindspring.com> wrote:
In this version of strncpy, the bottom loop is my prefered
method of writing a loop that will execute as many times
as the intitial value of n, when I'm not counting clock ticks.
But after the top loop executes,
I need n to represent the number of times still left to go,
so I can't write while(n-- && *s2 != '\0'), there.

char *strncpy(char *s1, const char *s2, size_t n) while (n--) {
*s1++ = '\0';
}
The second loop would be an example of making your code
unreadable in the hope of saving a few nanoseconds
(without success, for many compilers). Why not

for (; n > 0; --n)
*s1++ = '\0';

As a general rule, I don't like using relational operators
to compare size_t objects against zero constants.

The only reason that I write library functions in C code,
is so that I can post examples to this newsgroup,
without having to explain what they're supposed to do.

--
pete

Nov 14 '05 #29

Similar topics