473,387 Members | 1,535 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Mystery: static variables & performance

I've encountered a troublesome inconsistency in the C-language Perl
extension I've written for CPAN (Digest::SHA). The problem involves the
use of a static array within a performance-critical transform function.
When compiling under gcc on my big-endian PowerPC (Mac OS X),
declaring this array as "static" DECREASES the transform throughput by
around 5%. However, declaring it as "static" on gcc/Linux/Intel
INCREASES the throughput by almost 30%.

I would prefer that the array not be "static" so that the underlying C
function will be thread-safe. However, giving up close to 30%
performance on gcc/Linux/Intel is unacceptable for a digest routine,
whose value is often closely tied to speed.

Can anyone enlighten me on this mystery, and recommend a simple, clean,
portable way to assure good performance on all host types?

TIA, Mark

Nov 14 '05
115 7443
Richard Heathfield wrote:
R. Rajesh Jeba Anbiah wrote:
nrk <ra*********@devnull.verizon.net> wrote in message
<snip>
PS: Your book, section 4.2: The WAR style example of strcmp
is atrocious IMO. If you must insist on a single return,
here's a clearer version of strcmp:

int strcmp(const char *s1, const char *s2) {
while ( *s1 == *s2 && *s1 )
++s1, ++s2;

return *s1 - *s2;
}


Thanks a lot for your interest in the quality of the book. As
you see, it has it's bug reporting corner; please don't take
c.l.c as the one. Thanks for your help; thanks for your
understanding.


Since he /did/ post his "clearer version" to comp.lang.c, you
should at least get some feedback as to what is wrong with his
correction. Do you see the flaw? If not, then how do can you do
quality control on the bug reports you receive?

comp.lang.c is good at this sort of thing.

(Hint: the problem I can see has nothing to do with the loop.)


Nobody has picked up on the real problem when sizeof char == 1,
which is in the final comparison and can overflow. It should be:

return (*s1 > *s2) - (*s1 < *s2);

or the equivalent.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #101
nrk wrote:
Richard Heathfield wrote:
R. Rajesh Jeba Anbiah wrote:
nrk <ra*********@devnull.verizon.net> wrote in message
news:<Wi*******************@nwrddc01.gnilink.net>. ..

<snip>
PS: Your book, section 4.2: The WAR style example of strcmp is
atrocious
IMO. If you must insist on a single return, here's a clearer version
of strcmp:

int strcmp(const char *s1, const char *s2) {
while ( *s1 == *s2 && *s1 )
++s1, ++s2;

return *s1 - *s2;
}

Thanks a lot for your interest in the quality of the book. As you
see, it has it's bug reporting corner; please don't take c.l.c as the
one. Thanks for your help; thanks for your understanding.


Since he /did/ post his "clearer version" to comp.lang.c, you should at
least get some feedback as to what is wrong with his correction. Do you
see the flaw? If not, then how do can you do quality control on the bug
reports you receive?

comp.lang.c is good at this sort of thing.

(Hint: the problem I can see has nothing to do with the loop.)


Ok, here's my take on this:

a) sizeof(int) > 1 for hosted implementations.
http://www.google.com/groups?selm=bu...unnews.cern.ch
So, integer overflow not an issue, yes?

b) Peter's concern still remains. So, does changing the last line to:

return *(unsigned char *)s1 - *(unsigned char *)s2;

make it alright?


Chuck's solution is a good one (although I'm not sure why he insists that
it's only relevant when sizeof(char) is 1, since sizeof(char) is /always/
1!).

But my point was more general than that; that is, comp.lang.c is very good
at chewing over proposed "corrections", making sure that they do actually
improve the code. Corrections you receive in email will not have undergone
that process of review, so how do you know whether you've spotted all the
problems therein?

--
Richard Heathfield : bi****@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton
Nov 14 '05 #102
Richard Heathfield wrote:
.... snip ...
Chuck's solution is a good one (although I'm not sure why he
insists that it's only relevant when sizeof(char) is 1, since
sizeof(char) is /always/ 1!).


Oh very well, said Tom sheepishly. I meant int. Bah humbug.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #103
"nrk" <ra*********@devnull.verizon.net> wrote in message
news:yV******************@nwrddc01.gnilink.net...
Richard Heathfield wrote:
R. Rajesh Jeba Anbiah wrote:
nrk <ra*********@devnull.verizon.net> wrote in message
news:<Wi*******************@nwrddc01.gnilink.net>. ..
<snip>
PS: Your book, section 4.2: The WAR style example of strcmp is
atrocious
IMO. If you must insist on a single return, here's a clearer version of strcmp:

int strcmp(const char *s1, const char *s2) {
while ( *s1 == *s2 && *s1 )
++s1, ++s2;

return *s1 - *s2;
}

Thanks a lot for your interest in the quality of the book. As you
see, it has it's bug reporting corner; please don't take c.l.c as the
one. Thanks for your help; thanks for your understanding.


Since he /did/ post his "clearer version" to comp.lang.c, you should at
least get some feedback as to what is wrong with his correction. Do you
see the flaw? If not, then how do can you do quality control on the bug
reports you receive?

comp.lang.c is good at this sort of thing.

(Hint: the problem I can see has nothing to do with the loop.)


Ok, here's my take on this:

a) sizeof(int) > 1 for hosted implementations.
http://www.google.com/groups?selm=bu...unnews.cern.ch


There mere fact that Dan Pop says so does not make it so! ;)

There are actual members of the C Committee who disagree on this.
So, integer overflow not an issue, yes?
No. Even if sizeof(int) == 2, you can still have INT_MAX < UCHAR_MAX. [It's
the limits which are important, not the byte size.]

b) Peter's concern still remains. So, does changing the last line to:

return *(unsigned char *)s1 - *(unsigned char *)s2;

make it alright?


No. Reading chars via an unsigned char lvalue can produce a different value
to the original.

Since character constants and I/O are based on unsigned char -> int -> char
_conversions_ when storing plain char strings, the correct answer (assuming
no integer overflow) is to use a _conversion_ of the plain char value to
unsigned char...

return (unsigned char) *s1 - (unsigned char) *s2;

Note that on most implementations (8-bit, 2c, no padding) there is no need
to go to this extreme, although the result is the same.

The most robust answer would seem to be...

return (unsigned char) *s1 > (unsigned char) *s2
- (unsigned char) *s1 < (unsigned char) *s2;

or...

return (unsigned char) *s1 < (unsigned char) *s2 ? -1
: (unsigned char) *s1 > (unsigned char) *s2;

--
Peter
Nov 14 '05 #104
Peter Nilsson wrote:
"nrk" <ra*********@devnull.verizon.net> wrote in message
Richard Heathfield wrote:
R. Rajesh Jeba Anbiah wrote:
> nrk <ra*********@devnull.verizon.net> wrote in message
>>
<snip>

>> PS: Your book, section 4.2: The WAR style example of
>> strcmp is atrocious IMO. If you must insist on a single
>> return, here's a clearer version of strcmp:
>>
>> int strcmp(const char *s1, const char *s2) {
>> while ( *s1 == *s2 && *s1 )
>> ++s1, ++s2;
>>
>> return *s1 - *s2;
>> }
>
.... snip ...
The most robust answer would seem to be...

return (unsigned char) *s1 > (unsigned char) *s2
- (unsigned char) *s1 < (unsigned char) *s2;

or...

return (unsigned char) *s1 < (unsigned char) *s2 ? -1
: (unsigned char) *s1 > (unsigned char) *s2;


Why cast to unsigned char? If native chars are signed, I would
want this routine to respect that. Casts are usually a sign of
evil doings.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #105
"CBFalconer" <cb********@yahoo.com> wrote in message
news:40***************@yahoo.com...
Peter Nilsson wrote: ....
>>> int strcmp(const char *s1, const char *s2) {
>>> while ( *s1 == *s2 && *s1 )
>>> ++s1, ++s2;
>>>
>>> return *s1 - *s2;
>>> }...

The most robust answer would seem to be...

return (unsigned char) *s1 > (unsigned char) *s2
- (unsigned char) *s1 < (unsigned char) *s2;


Oops...

return ((unsigned char) *s1 > (unsigned char) *s2)
- ((unsigned char) *s1 < (unsigned char) *s2);

or...

return (unsigned char) *s1 < (unsigned char) *s2 ? -1
: (unsigned char) *s1 > (unsigned char) *s2;


Why cast to unsigned char?


Because that is the specification for strcmp().
If native chars are signed, I would
want this routine to respect that.
If we weren't talking about strcmp, you would be free to do that. But you
may get some unexpected surprises, e.g. "a" > "aé".
Casts are usually a sign of evil doings.


You can do the same thing without casts, if you like.

--
Peter
Nov 14 '05 #106
nrk <ra*********@devnull.verizon.net> wrote in message news:<YK*******************@nwrddc01.gnilink.net>. ..

<snip>

Ram, sorry for my late follow up; I was not feeling well for past
2days. Now I see, the thread has some more useful info. Thanks.

--
"Success = 10% sweat + 90% tears"
If you live in USA, please support John Edwards.
http://guideme.itgo.com/atozofc/ - "A to Z of C" Project
Email: rrjanbiah-at-Y!com
Nov 14 '05 #107
Peter Nilsson wrote:
"CBFalconer" <cb********@yahoo.com> wrote in message
Peter Nilsson wrote:

...
> >>> int strcmp(const char *s1, const char *s2) {
> >>> while ( *s1 == *s2 && *s1 )
> >>> ++s1, ++s2;
> >>>
> >>> return *s1 - *s2;
> >>> }

...

The most robust answer would seem to be...
.... snip ...
return (unsigned char) *s1 < (unsigned char) *s2 ? -1
: (unsigned char) *s1 > (unsigned char) *s2;


Why cast to unsigned char?


Because that is the specification for strcmp().
If native chars are signed, I would want this routine to
respect that. Casts are usually a sign of evil doings.


If we weren't talking about strcmp, you would be free to do that.
But you may get some unexpected surprises, e.g. "a" > "aé".


I still see no justification for the cast. I know of nothing that
specifies that strings consist of unsigned chars. However that is
a good argument for having the system specify char to be
unsigned. From N869:

7.21.4.2 The strcmp function

Synopsis
[#1]
#include <string.h>
int strcmp(const char *s1, const char *s2);

Description

[#2] The strcmp function compares the string pointed to by
s1 to the string pointed to by s2.

Returns

[#3] The strcmp function returns an integer greater than,
equal to, or less than zero, accordingly as the string
pointed to by s1 is greater than, equal to, or less than the
string pointed to by s2.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #108

On Sat, 21 Feb 2004, CBFalconer wrote:

Peter Nilsson wrote:
"CBFalconer" <cb********@yahoo.com> wrote in message
Peter Nilsson wrote:

...
> > >>> int strcmp(const char *s1, const char *s2) {
> > >>> while ( *s1 == *s2 && *s1 )
> > >>> ++s1, ++s2;
>
> return (unsigned char) *s1 < (unsigned char) *s2 ? -1
> : (unsigned char) *s1 > (unsigned char) *s2;

Why cast to unsigned char?


Because that is the specification for strcmp().


I still see no justification for the cast.


Look about two subsections earlier in N869, where it discusses the
semantics of comparison functions:

7.21.4 Comparison functions

[#1] The sign of a nonzero value returned by the comparison
functions memcmp, strcmp, and strncmp is determined by the
sign of the difference between the values of the first pair
of characters (both interpreted as unsigned char) that
differ in the objects being compared.

IMHO this is a silly requirement; I would *expect* memcmp to do
unsigned comparisons and 'strcmp' to do plain char comparisons,
but for whatever reason the C committee decided otherwise.

HTH,
-Arthur

Nov 14 '05 #109
CBFalconer wrote:

Peter Nilsson wrote:
"CBFalconer" <cb********@yahoo.com> wrote in message
Peter Nilsson wrote:

...
> > >>> int strcmp(const char *s1, const char *s2) {
> > >>> while ( *s1 == *s2 && *s1 )
> > >>> ++s1, ++s2;
> > >>>
> > >>> return *s1 - *s2;
> > >>> }
...
>
> The most robust answer would seem to be...
> ... snip ... >
> return (unsigned char) *s1 < (unsigned char) *s2 ? -1
> : (unsigned char) *s1 > (unsigned char) *s2;

Why cast to unsigned char?


Because that is the specification for strcmp().
If native chars are signed, I would want this routine to
respect that. Casts are usually a sign of evil doings.


If we weren't talking about strcmp, you would be free to do that.
But you may get some unexpected surprises, e.g. "a" > "aé".


I still see no justification for the cast. I know of nothing that
specifies that strings consist of unsigned chars. However that is
a good argument for having the system specify char to be
unsigned. From N869:

7.21.4.2 The strcmp function

Synopsis
[#1]
#include <string.h>
int strcmp(const char *s1, const char *s2);

Description

[#2] The strcmp function compares the string pointed to by
s1 to the string pointed to by s2.

Returns

[#3] The strcmp function returns an integer greater than,
equal to, or less than zero, accordingly as the string
pointed to by s1 is greater than, equal to, or less than the
string pointed to by s2.

Don't we have a guarantee that characters in our set are positive? With
signed char, in the range 00..127 (ASCII)? I've read that EBCDIC
implementations, because characters can be > 127 implement unsigned char
just so that characters remain positive.

If this is the case, subtracting one positive integer from another
cannot overflow.
--
Joe Wright http://www.jw-wright.com
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Nov 14 '05 #110
"Arthur J. O'Dwyer" wrote:

On Sat, 21 Feb 2004, CBFalconer wrote:

Peter Nilsson wrote:
"CBFalconer" <cb********@yahoo.com> wrote in message
> Peter Nilsson wrote:
...
> > > >>> int strcmp(const char *s1, const char *s2) {
> > > >>> while ( *s1 == *s2 && *s1 )
> > > >>> ++s1, ++s2;
> >
> > return (unsigned char) *s1 < (unsigned char) *s2 ? -1
> > : (unsigned char) *s1 > (unsigned char) *s2;
>
> Why cast to unsigned char?

Because that is the specification for strcmp().


I still see no justification for the cast.


Look about two subsections earlier in N869, where it discusses the
semantics of comparison functions:

7.21.4 Comparison functions

[#1] The sign of a nonzero value returned by the comparison
functions memcmp, strcmp, and strncmp is determined by the
sign of the difference between the values of the first pair
of characters (both interpreted as unsigned char) that
differ in the objects being compared.

IMHO this is a silly requirement; I would *expect* memcmp to do
unsigned comparisons and 'strcmp' to do plain char comparisons,
but for whatever reason the C committee decided otherwise.


Aha - that justifies Peter Nilssons attitude, and shoots down
mine. It does ensure that the shorter substring compares as less
than the longer.

It would be nice if such encompassing clauses were referenced in
the individual descriptions, i.e. "See also 7.21.4".

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #111
Peter Nilsson wrote:
it does not make comparisons based on unsigned char values.
[A requirement for the real strcmp().]


I don't see that in the standard.

--
pete
Nov 14 '05 #112
pete wrote:

Peter Nilsson wrote:
it does not make comparisons based on unsigned char values.
[A requirement for the real strcmp().]


I don't see that in the standard.


OK, now I do.

--
pete
Nov 14 '05 #113
pete <pf*****@mindspring.com> writes:
Peter Nilsson wrote:
it does not make comparisons based on unsigned char values.
[A requirement for the real strcmp().]


I don't see that in the standard.


7.21.4 Comparison functions

1 The sign of a nonzero value returned by the comparison
functions memcmp, strcmp, and strncmp is determined by the
sign of the difference between the values of the first pair
of characters (both interpreted as unsigned char) that
differ in the objects being compared.

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Nov 14 '05 #114
nrk
Peter Nilsson wrote:
"nrk" <ra*********@devnull.verizon.net> wrote in message
news:yV******************@nwrddc01.gnilink.net...
Richard Heathfield wrote:
> R. Rajesh Jeba Anbiah wrote:
>
>> nrk <ra*********@devnull.verizon.net> wrote in message
>> news:<Wi*******************@nwrddc01.gnilink.net>. ..
>>>
> <snip>
>
>>> PS: Your book, section 4.2: The WAR style example of strcmp is
>>> atrocious
>>> IMO. If you must insist on a single return, here's a clearer version of >>> strcmp:
>>>
>>> int strcmp(const char *s1, const char *s2) {
>>> while ( *s1 == *s2 && *s1 )
>>> ++s1, ++s2;
>>>
>>> return *s1 - *s2;
>>> }
>>
>> Thanks a lot for your interest in the quality of the book. As you
>> see, it has it's bug reporting corner; please don't take c.l.c as the
>> one. Thanks for your help; thanks for your understanding.
>
> Since he /did/ post his "clearer version" to comp.lang.c, you should at
> least get some feedback as to what is wrong with his correction. Do you
> see the flaw? If not, then how do can you do quality control on the bug
> reports you receive?
>
> comp.lang.c is good at this sort of thing.
>
> (Hint: the problem I can see has nothing to do with the loop.)
>
Ok, here's my take on this:

a) sizeof(int) > 1 for hosted implementations.
http://www.google.com/groups?selm=bu...unnews.cern.ch


There mere fact that Dan Pop says so does not make it so! ;)

There are actual members of the C Committee who disagree on this.
So, integer overflow not an issue, yes?


No. Even if sizeof(int) == 2, you can still have INT_MAX < UCHAR_MAX.
[It's the limits which are important, not the byte size.]


Sorry, I goofed it up, but if you read the quoted thread, the idea is that
INT_MAX >= UCHAR_MAX for hosted implementations by implication. Can you
point out why there can be a disagreement on that?

b) Peter's concern still remains. So, does changing the last line to:

return *(unsigned char *)s1 - *(unsigned char *)s2;

make it alright?


No. Reading chars via an unsigned char lvalue can produce a different
value to the original.


This is only in the presence of padding bits, right? Or is there something
else that I am missing here?

-nrk.
Since character constants and I/O are based on unsigned char -> int ->
char _conversions_ when storing plain char strings, the correct answer
(assuming no integer overflow) is to use a _conversion_ of the plain char
value to unsigned char...

return (unsigned char) *s1 - (unsigned char) *s2;

Note that on most implementations (8-bit, 2c, no padding) there is no need
to go to this extreme, although the result is the same.

The most robust answer would seem to be...

return (unsigned char) *s1 > (unsigned char) *s2
- (unsigned char) *s1 < (unsigned char) *s2;

or...

return (unsigned char) *s1 < (unsigned char) *s2 ? -1
: (unsigned char) *s1 > (unsigned char) *s2;

--
Peter


--
Remove devnull for email
Nov 14 '05 #115
Peter Nilsson wrote:
No. Even if sizeof(int) == 2, you can still have INT_MAX < UCHAR_MAX. [It's
the limits which are important, not the byte size.]

b) Peter's concern still remains. So, does changing the last line to:

return *(unsigned char *)s1 - *(unsigned char *)s2;

make it alright?
No. Reading chars via an unsigned char lvalue
can produce a different value to the original.


That's what's called for.
Since character constants and I/O are based on
unsigned char -> int -> char
_conversions_ when storing plain char strings,
the correct answer (assuming no integer overflow)
is to use a _conversion_ of the plain char value to
unsigned char...
Conversion is not called for.
The functions which use converted values, have the word "converted"
in their function descriptions.
return (unsigned char) *s1 - (unsigned char) *s2;

Note that on most implementations
(8-bit, 2c, no padding) there is no need
to go to this extreme, although the result is the same.

The most robust answer would seem to be...

return (unsigned char) *s1 > (unsigned char) *s2
- (unsigned char) *s1 < (unsigned char) *s2;

or...

return (unsigned char) *s1 < (unsigned char) *s2 ? -1
: (unsigned char) *s1 > (unsigned char) *s2;


I'm not seeing it that way.
((unsigned char) *s1), is *s1 *Converted* to unsigned char.
(*(unsigned char*)s1), is *s1, interpreted as unsigned char.

N869
7.21.4 Comparison functions
[#1] The sign of a nonzero value returned by the comparison
functions memcmp, strcmp, and strncmp is determined by the
sign of the difference between the values of the first pair
of characters (both interpreted as unsigned char) that
differ in the objects being compared.

memchr is a function which uses both converted and
differently interpreted values.

N869
7.21.5.1 The memchr function
Description
[#2] The memchr function locates the first occurrence of c
(converted to an unsigned char) in the initial n characters
(each interpreted as unsigned char) of the object pointed to
by s.

void *memchr(const void *s, int c, size_t n)
{
const unsigned char *p = s;

while (n-- != 0) {
if (*p == (unsigned char)c) {
return (void *)p;
}
++p;
}
return NULL;
}

--
pete
Nov 14 '05 #116

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

109
by: MSG | last post by:
Michel Bardiaux <michel.bardiaux@peaktime.be> wrote in message news:<G4idnfgZ0ZfCWbrdRVn2jQ@giganews.com>... > Mark Shelor wrote: > > > > > OK, Sidney, I am considering it. I can certainly...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.