code portability

Eigenvector

My question is more generic, but it involves what I consider ANSI standard C
and portability.

I happen to be a system admin for multiple platforms and as such a lot of
the applications that my users request are a part of the OpenSource
community. Many if not most of those applications strongly require the
presence of the GNU compiling suite to work properly. My assumption is that
this is due to the author/s creating the applications with the GNU suite.
Many of the tools requested/required are GNU replacements for make,
configure, the loader, and lastly the C compiler itself. Where I'm going
with this is, has the OpenSource community as a whole committed itself to at
the very least encouraging its contributing members to conform to ANSI
standards of programming?

My concern is that as an admin I am sometimes compelled to port these
applications to multiple platforms running the same OS and as the user
community becomes more and more insistent on OpenSource applications will
gotcha's appear due to lack of portability in coding? I fully realize that
independent developers may or may not conform to standards, but again is it
at least encouraged?

11.32 of the FAQ seemed to at least outline the crux of what I am asking.
If I loaded up my home machine to the gills will all open source compiler
applications (gcc, imake, autoconfig, etc....) would my applications that I
compile and link and load conform?

Aug 1 '06

Subscribe Post Reply

239

10004

<
1
2
3
4
>
Last »

Andrew Poelstra

On 2006-08-04, Ian Collins <ia******@hotmail.comwrote:

Andrew Poelstra wrote:
>On 2006-08-03, Keith Thompson <ks***@mib.orgwrote:

>>>Richard Heathfield <in*****@invalid.invalidwrites:

The introduction of long long int was, in my continued opinion, a mistake.
All the ISO guys had to do was - nothing at all! Any implementation that
wanted to support 64-bit integers could simply have made long int rather
longer than before - such a system would have continued to be fully
conforming to C90. And if it broke code, well, so what? Any code that
wrongly assumes long int is precisely 32 bits is already broken, and needs
fixing.

That's true, but 64 bits is the effective limit for this. The
following:
char 8 bits
short 16 bits
int 32 bits
long 64 bits
is a reasonable set of types, but if you go beyond that to 128 bits,
you're going to have to leave gaps (for example, there might not be
any 16-bit integer type).

1) This isn't really a problem; you can use a 32-bit variable to store
16-bit values; if you really need 16 bits you might need some debug
macros to artificially constrain the range.

Just beware of overflows!

Yes, that would require more than debug macros to fix, since you'd want
the overflow behavior to be the same whether or not you are debugging!
(It's a bit of a pain, I admit, but there aren't too many times when you
absolutely need an exact number of bits.)

>2) If you've got a 128-bit processor, IMHO, you shouldn't be insisting
on using 8-bit types. That just sounds inefficient. [OT]

Unless your (possibly externally imposed) data happens to be 8 bit.

If I were ISO, I'd consider adding a new specifier to scanf and friends
to read a specified number of bytes (or probably bits, although that
could be a lot harder to implement) into an already defined type. So, if
you wanted to read 8 bits into an int (which is 32 bits on this particular
system), you'd do:
fscanf (fhandle, "%d8b", &charvar);

Since I'm not ISO, nor will they create such a change, I'd stick to
avoiding arbitrary data widths (in general, stick with text files as
long as you can spare the space), and don't worry about changing data
widths: most companies don't switch compilers too often.

If you have Data of a Certain Width imposed on you, you're probably
going to have to fiddle with stuff when changing compilers anyway,
so suddenly having long twice as wide should be an expected problem.

--
Andrew Poelstra <http://www.wpsoftware.net/projects>
To reach me by email, use `apoelstra' at the above domain.
"Do BOTH ends of the cable need to be plugged in?" -Anon.

Aug 4 '06 #51

Michael Mair

Richard Heathfield schrieb:

Keith Thompson said:
>>"Malcolm" <re*******@btinternet.comwrites:
>>>There is also the problem of "good enough" portability, for instance
assuming ASCII and two's complement integers.

<snip>

>>As for two's complement, I typically don't care about that either.
Numbers are numbers. If I need to do bit-twiddling, I use unsigned.

Indeed. And, on a related note, I find it very difficult to understand this
fascination with integers that have a particular number of bits. If I need
8 bits, I'll use char (or a flavour thereof). If I need 9 to 16 bits, I'll
use int (or unsigned). If I need 17 to 32 bits, I'll use long (or unsigned
long). And if I need more than 32 bits, I'll use a bit array. I see
absolutely no need for int_leastthis, int_fastthat, and int_exacttheother.

Depends on your area of application.
If you are in an embedded environment with two's complement, exact
width integers of width 8, 16, 32, you often want the 16 bit type
because it is sufficient for storing your data and saves memory
you really need. Even if it is only provided as compiler extension.
You compute values "through overflow" to save auxiliary variables
and time and do other things not necessary when in a less restricted
environment.

For other applications, I agree with you. However, I _would_ have
liked to have a clean naming scheme implying what the standard
says instead of nondescript identifiers.
short <-int_least16_t
int <-int_fast16_t
long <-int_least32_t
obviously does not give that in a convenient manner. int16, int32,
intFast16, intExact16 probably would have been a better starting
point for easy extensibility.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Aug 4 '06 #52

jacob navia

Richard Heathfield a écrit :

The introduction of long long int was, in my continued opinion, a mistake.
All the ISO guys had to do was - nothing at all! Any implementation that
wanted to support 64-bit integers could simply have made long int rather
longer than before - such a system would have continued to be fully
conforming to C90. And if it broke code, well, so what? Any code that
wrongly assumes long int is precisely 32 bits is already broken, and needs
fixing.

So what if it is broken? I wouldn't want to fix it. Let it run as it was
till now.

Leave long as it was and use a new type for 64 bits. This was the
decision of Microsoft.

Gcc decided otherwise. Long becomes 64 bits, and long long stays 64
bits, making this type completely useless.

For lcc-win64 I thought about

char 8 bits
short 16 bits
int 32 bits
long 64 bits
long long 128 bits

but then... I would have been incompatible
to both: gcc AND MSVC.

So I decided to follow MSVC under windows-64 and gcc
under unix-64.

Aug 4 '06 #53

Richard

Andrew Poelstra <ap*******@false.sitewrites:

On 2006-08-04, Ian Collins <ia******@hotmail.comwrote:
>Andrew Poelstra wrote:
>>On 2006-08-03, Keith Thompson <ks***@mib.orgwrote:

Richard Heathfield <in*****@invalid.invalidwrites:

>The introduction of long long int was, in my continued opinion, a mistake.
>All the ISO guys had to do was - nothing at all! Any implementation that
>wanted to support 64-bit integers could simply have made long int rather
>longer than before - such a system would have continued to be fully
>conforming to C90. And if it broke code, well, so what? Any code that
>wrongly assumes long int is precisely 32 bits is already broken, and needs
>fixing.

That's true, but 64 bits is the effective limit for this. The
following:
char 8 bits
short 16 bits
int 32 bits
long 64 bits
is a reasonable set of types, but if you go beyond that to 128 bits,
you're going to have to leave gaps (for example, there might not be
any 16-bit integer type).

1) This isn't really a problem; you can use a 32-bit variable to store
16-bit values; if you really need 16 bits you might need some debug
macros to artificially constrain the range.

Just beware of overflows!

Yes, that would require more than debug macros to fix, since you'd want
the overflow behavior to be the same whether or not you are debugging!
(It's a bit of a pain, I admit, but there aren't too many times when you
absolutely need an exact number of bits.)

Its not the fact that you absolutley *must* have the bits, rather than
the fact you want defined, repeatable behaviour.

The simplest test condition in the world would be hugely concerned to
knwo that that programmer didnt care if it was 16 or 32 bits

if(!x=func(y))
...;

How many times this would really be a problem I wouldnt venture to say.

Aug 4 '06 #54

Frederick Gotham

Ian Collins posted:

Keith Thompson wrote:
>>
My objection to C's integer type system is that the names are
arbitrary: "char", "short", "int", "long", "long long", "ginormous
long". I'd like to see a system where the type names follow a regular
pattern, and if you want to have a dozen distinct types the names are
clear and obvious. I have a few ideas, but since this will never
happen in any language called "C" I won't go into any more detail.

Isn't that why we now have (u)int32_t and friends? I tend to use int or
unsigned if I don't care about the size and one of the exact size type
if I do.

I use "int unsigned" when I want to store a positive integer.

I use "int signed" when the integer value might be negative.

If the unsigned number might exceed 65535, or if the signed number might
not fit in the range +32767 to -32767, then I'll consider using "int long
unsigned" or "int long signed", or perhaps "int long long unsigned" or "int
long long signed".

I only use "plain" char when I'm dealing with characters.

I only use "unsigned char" when I'm playing with bytes.

I've never used "short", but I'd consider using it if I had a large array
of integers whose value wouldn't exceed 65535.

--

Frederick Gotham

Aug 4 '06 #55

Ben Pfaff

Richard <rg****@gmail.comwrites:

Its not the fact that you absolutley *must* have the bits, rather than
the fact you want defined, repeatable behaviour.

The simplest test condition in the world would be hugely concerned to
knwo that that programmer didnt care if it was 16 or 32 bits

if(!x=func(y))
...;

Why should I care whether func() returns 16 or 32 bits? I only
want to know whether it returned a nonzero value, and the
specific value, be it 1 or 2 or -5 or 0xffffffff, doesn't matter.

(That won't compile, by the way. You forgot a set of parentheses.)
--
"...Almost makes you wonder why Heisenberg didn't include postinc/dec operators
in the uncertainty principle. Which of course makes the above equivalent to
Schrodinger's pointer..."
--Anthony McDonald

Aug 4 '06 #56

Richard

Ben Pfaff <bl*@cs.stanford.eduwrites:

Richard <rg****@gmail.comwrites:

>Its not the fact that you absolutley *must* have the bits, rather than
the fact you want defined, repeatable behaviour.

The simplest test condition in the world would be hugely concerned to
knwo that that programmer didnt care if it was 16 or 32 bits

if(!(x=func(y)))
...;

Why should I care whether func() returns 16 or 32 bits? I only
want to know whether it returned a nonzero value, and the
specific value, be it 1 or 2 or -5 or 0xffffffff, doesn't matter.

because 0x10000 wont be zero since 32 bits hold it. 16 bits
will cycle through 0. Or?

Aug 4 '06 #57

Ben Pfaff

Richard <rg****@gmail.comwrites:

Ben Pfaff <bl*@cs.stanford.eduwrites:

>Richard <rg****@gmail.comwrites:

>>Its not the fact that you absolutley *must* have the bits, rather than
the fact you want defined, repeatable behaviour.

The simplest test condition in the world would be hugely concerned to
knwo that that programmer didnt care if it was 16 or 32 bits

if(!(x=func(y)))
...;

Why should I care whether func() returns 16 or 32 bits? I only
want to know whether it returned a nonzero value, and the
specific value, be it 1 or 2 or -5 or 0xffffffff, doesn't matter.

because 0x10000 wont be zero since 32 bits hold it. 16 bits
will cycle through 0. Or?

func() should return the proper type for its caller to interpret
it? If it doesn't, then the caller is not going to be able to
interpret correctly. If it does, then the condition makes sense
regardless of the type.
--
"In My Egotistical Opinion, most people's C programs should be indented six
feet downward and covered with dirt." -- Blair P. Houghton

Aug 4 '06 #58

Harald van DÄ³k

Ben Pfaff wrote:

Richard <rg****@gmail.comwrites:

Ben Pfaff <bl*@cs.stanford.eduwrites:

Richard <rg****@gmail.comwrites:

Its not the fact that you absolutley *must* have the bits, rather than
the fact you want defined, repeatable behaviour.

The simplest test condition in the world would be hugely concerned to
knwo that that programmer didnt care if it was 16 or 32 bits

if(!(x=func(y)))
...;

Why should I care whether func() returns 16 or 32 bits? I only
want to know whether it returned a nonzero value, and the
specific value, be it 1 or 2 or -5 or 0xffffffff, doesn't matter.
because 0x10000 wont be zero since 32 bits hold it. 16 bits
will cycle through 0. Or?

func() should return the proper type for its caller to interpret
it? If it doesn't, then the caller is not going to be able to
interpret correctly. If it does, then the condition makes sense
regardless of the type.

!(x=func(y)) doesn't test func's return value. It tests func's return
value converted to the type of x. If x is narrower than func(), even
nonzero return values may cause the expression to evaluate to 1.

char ch; while((ch = getchar()) != EOF) { /* ... */ }

Aug 4 '06 #59

Ben Pfaff

"Harald van DD3k" <tr*****@gmail.comwrites:

Ben Pfaff wrote:
>Richard <rg****@gmail.comwrites:

Ben Pfaff <bl*@cs.stanford.eduwrites:

Richard <rg****@gmail.comwrites:

Its not the fact that you absolutley *must* have the bits, rather than
the fact you want defined, repeatable behaviour.

The simplest test condition in the world would be hugely concerned to
knwo that that programmer didnt care if it was 16 or 32 bits

if(!(x=func(y)))
...;

Why should I care whether func() returns 16 or 32 bits? I only
want to know whether it returned a nonzero value, and the
specific value, be it 1 or 2 or -5 or 0xffffffff, doesn't matter.

because 0x10000 wont be zero since 32 bits hold it. 16 bits
will cycle through 0. Or?

func() should return the proper type for its caller to interpret
it? If it doesn't, then the caller is not going to be able to
interpret correctly. If it does, then the condition makes sense
regardless of the type.

!(x=func(y)) doesn't test func's return value. It tests func's return
value converted to the type of x. If x is narrower than func(), even
nonzero return values may cause the expression to evaluate to 1.

True. I missed that. But the point stands: you should be
assigning it to the proper type. Again, this is important
regardless of the width of the type in question. I don't care if
"int" is 16 or 32 bits as long as "int" is the type that func()
returns.
--
"I ran it on my DeathStation 9000 and demons flew out of my nose." --Kaz

Aug 4 '06 #60

Stephen Sprunk

<la************@ugs.comwrote in message
news:9r************@jones.homeip.net...

Keith Thompson <ks***@mib.orgwrote:
>>
My objection to C's integer type system is that the names are
arbitrary: "char", "short", "int", "long", "long long", "ginormous
long". I'd like to see a system where the type names follow a
regular
pattern, and if you want to have a dozen distinct types the names
are
clear and obvious.

Well, one wag did propose adding the "very" keyword to C so that,
instead of long long we could have very long (and very very long,
and
very very very long, etc.), not to mention very short (and very very
short, etc.). You could even have very const....

Don't forget very NULL:

http://thedailywtf.com/forums/thread/71684.aspx

I do have to wonder what "very unsigned" or "very int" would mean...

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

--
Posted via a free Usenet account from http://www.teranews.com

Aug 4 '06 #61

lawrence.jones

Keith Thompson <ks***@mib.orgwrote:

>
That's a problem with the design of <stdint.h>; the naming scheme
assumes that exact-width types are more important than types with *at
least* a specified size or range.

That's the problem with codifying existing practice rather than making
things up out of whole cloth. The original designers were only
interested in the exact-width types. By the time it got to the
standards committee, we were pretty much stuck with those names; it
would have been antisocial to change them.

-Larry Jones

Oh, now YOU'RE going to start in on me TOO, huh? -- Calvin

Aug 4 '06 #62

lawrence.jones

jacob navia <ja***@jacob.remcomp.frwrote:

>
but then... I would have been incompatible
to both: gcc AND MSVC.

You say that like it's a *bad* thing.... ;-)

-Larry Jones

Oh, what the heck. I'll do it. -- Calvin

Aug 4 '06 #63

Keith Thompson

la************@ugs.com writes:

Keith Thompson <ks***@mib.orgwrote:
>That's a problem with the design of <stdint.h>; the naming scheme
assumes that exact-width types are more important than types with *at
least* a specified size or range.

That's the problem with codifying existing practice rather than making
things up out of whole cloth. The original designers were only
interested in the exact-width types. By the time it got to the
standards committee, we were pretty much stuck with those names; it
would have been antisocial to change them.

I understand and agree; the committee didn't have the option of
designing the ideal language from scratch. (And if they had, there's
no guarantee anyone else would agree with their design decisions.)

"To summarize the summary of the summary: People are a problem."
-- Douglas Adams

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 4 '06 #64

Dik T. Winter

In article <ln************@nuthaus.mib.orgKeith Thompson <ks***@mib.orgwrites:
....

I understand and agree; the committee didn't have the option of
designing the ideal language from scratch. (And if they had, there's
no guarantee anyone else would agree with their design decisions.)

There are a few languages designed from scratch. And indeed in all
cases not everyone did agree with the design decisions. That is why
we have Pascal (an off-spring of the Algol 68 design effort). There
is also reluctance for such languages because the objections will be:
"too difficult to implement", and so you will not see many compilers.
Algol 60 found much opposition from the US because it would be too
difficult to implement (that is a reason why Knuth designed a subset
of the language that threw away some of the major features). For the
same reason also full featured Pascal found much reluctance, it was
only when subsets where implemented that it found its place. (Pascal
level 1, 2 and 3, if I remember correctly.)

I still have a report in my possession, written in the early sixties,
describing the implementation of variable length arrays on the stack
for Algol 60. Still I have worked early on with full implementations
of Algol 60, Algol 68 and Pascal, while people were still muttering
that it was too difficult to implement. Especially CDC's effort on
the full implementation of Algol 68 is worth mentioning (although the
US branch never has acknowledged that they had a compiler). The
compiler was written by some nine workers without any earlier
knowledge of writing compilers. I think that it took about a year to
complete (I was present at some progress meetings). Also consider the
very first Pascal compiler written as (I think) a Ph.D. thesis by Urs
Amman. Too difficult to implement? Rather unwillingness to implement.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Aug 5 '06 #65

websnarf

Keith Thompson wrote:

"Malcolm" <re*******@btinternet.comwrites:
<we******@gmail.comwrote in message
news:11**********************@m79g2000cwm.googlegr oups.com...
Eigenvector wrote:
[...] I fully realize that
independent developers may or may not conform to standards, but again is
it at least encouraged?

Not really. By its very nature C encourages non-portable programming.
In general, I try to write code portably, but the only thing keeping me
honest is actually compiling my stuff with multiple compilers to see
what happens.

Yes. There is a tension between efficiency and portability. In Java they
resolved it by compromising efficiency, in C we have to be careful to make
our portable code genuinely portable, which is why the topic is so often
discussed.
There is also the problem of "good enough" portability, for instance
assuming ASCII and two's complement integers.

I rarely find it useful to assume ASCII.

Who cares what *YOU* find useful or not.

I would like to auto-initialize an array that maps from unsigned char
-parse-state, which makes, say, letters to one value, numbers to
another, etc. The reason I want to auto-initialize this rather than
calling some init() routine that sets them up is because I want to
support correct multithreading, and my inner loops that use such an
array are going so fast, that auto-first-time checking actually is
unacceptable overhead.

If I can't assume ASCII, then this solution has simply been taken away
from me. Compare this with the Lua language, which allows unordered
specific index auto-initialization.

[...] It's usually just as easy to
write code that depends only on the guarantees in the standard, and
will just work regardless of the character set. It would be
marginally more convenient to be able to assume that the character
codes for the letters are contiguous, but that's easy enough to work
around.

Yeah, well obviously you don't work in environments where performance
and portability matters.

As for two's complement, I typically don't care about that either.
Numbers are numbers. If I need to do bit-twiddling, I use unsigned.

And if you need a correctly functioning ring modulo 2**n? If you can
assume 2s complement then you've *got one*. Otherwise, you get to
construct one somehow (not sure how hard this is, I have never ever
been exposed to a system that didn't *ONLY* support 2s complement).

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 5 '06 #66

Chris McDonald

we******@gmail.com writes:

>Who cares what *YOU* find useful or not.

That's the great attitude that we all love to see!
Keep up the good work,

--
Chris.

Aug 5 '06 #67

Keith Thompson

we******@gmail.com writes:

Keith Thompson wrote:
>"Malcolm" <re*******@btinternet.comwrites:
<we******@gmail.comwrote in message
news:11**********************@m79g2000cwm.googlegr oups.com...
Eigenvector wrote:
[...] I fully realize that
independent developers may or may not conform to standards, but again is
it at least encouraged?

Not really. By its very nature C encourages non-portable programming.
In general, I try to write code portably, but the only thing keeping me
honest is actually compiling my stuff with multiple compilers to see
what happens.

Yes. There is a tension between efficiency and portability. In Java they
resolved it by compromising efficiency, in C we have to be careful to make
our portable code genuinely portable, which is why the topic is so often
discussed.
There is also the problem of "good enough" portability, for instance
assuming ASCII and two's complement integers.

I rarely find it useful to assume ASCII.

Who cares what *YOU* find useful or not.

Gosh, I don't know. Do you care? Because, as you know, your opinion
matters a great deal to me. It's probably because of your charming
manner.

I would like to auto-initialize an array that maps from unsigned char
-parse-state, which makes, say, letters to one value, numbers to
another, etc. The reason I want to auto-initialize this rather than
calling some init() routine that sets them up is because I want to
support correct multithreading, and my inner loops that use such an
array are going so fast, that auto-first-time checking actually is
unacceptable overhead.

If I can't assume ASCII, then this solution has simply been taken away
from me. Compare this with the Lua language, which allows unordered
specific index auto-initialization.

I can think of several ways to do this. You can use some automated
process to generate the C code for you during the build process,
perhaps with a build-time option to select ASCII or some other
character set. Or you can explicitly invoke an initialization routine
exactly once as your program is starting up and save the expense of
checking on each use. Or (and this may or may not be available to
you), you can use C99, which lets you do just what you want. Here's a
brief outline of what I think you're trying to do:
================================================== ======================
#include <stdio.h#include <limits.h>

enum CHAR_CLASS { OTHER=0, UPPER, LOWER, DIGIT };

static const enum CHAR_CLASS cclass[UCHAR_MAX + 1] =
{ ['a'] = LOWER,
['b'] = LOWER,
/* ... */
['A'] = UPPER,
['B'] = UPPER,
/* ... */
['0'] = DIGIT,
['1'] = DIGIT,
/* ... */
};

int main(void)
{
for (int c = 0; c <= UCHAR_MAX; c ++) {
if (cclass[c] != OTHER) {
printf("'%c' =%d\n", c, cclass[c]);
}
}

return 0;
}
================================================== ======================

This works with gcc 3.4.5 and 4.1.1 with "-std=c99".

Or you can just (drum roll please) assume ASCII. If you'll look very
closely at what I wrote above:

| I rarely find it useful to assume ASCII.

you'll see the word "rarely", not "never". If assuming ASCII, and
therefore making your code non-portable to non-ASCII platforms, makes
it significantly faster, that's great. I might consider adding a
check at program startup, something like
if ('A' != 65) {
/* yes, it's an incomplete check */
fprintf(stderr, "This program won't work on a non-ASCII system\n");
exit(EXIT_FAILURE);
}
or I might not bother; I'd at least document the assumption somewhere
in the code. (No, you can't reliably test this in the preprocessor;
see C99 6.10.1p3.)

The fact that you've managed to cite a single application where
assuming ASCII happens to be useful does not refute anything I've
said.

Write portable code if you can. If you need to write non-portable
code, keep it as isolated as you can (but you may *sometimes* find
that a portable implementation would have worked just as well in the
first place).

>It's usually just as easy to
write code that depends only on the guarantees in the standard, and
will just work regardless of the character set. It would be
marginally more convenient to be able to assume that the character
codes for the letters are contiguous, but that's easy enough to work
around.

Yeah, well obviously you don't work in environments where performance
and portability matters.

Obviously you have no clue about the environments in which I work.

>As for two's complement, I typically don't care about that either.
Numbers are numbers. If I need to do bit-twiddling, I use unsigned.

And if you need a correctly functioning ring modulo 2**n? If you can
assume 2s complement then you've *got one*. Otherwise, you get to
construct one somehow (not sure how hard this is, I have never ever
been exposed to a system that didn't *ONLY* support 2s complement).

It's been a while since my last abstract algebra class, but isn't a
"ring module 2**n" simply the set of integers from 0 to 2**n-1? And
isn't that precisely what C's *unsigned* integer types are?

If you want a particular behavior for *signed* integers on overflow, C
doesn't guarantee this; overflow of a signed integer invokes undefined
behavior. Very commonly it happens to do the obvious 2's-complement
wraparound you're probably thinking of. If you need to write code
that depends on that, go ahead. Be aware that it's not 100% portable,
but it may well be portable enough for your purposes.

In case I've misunderstood your point, I'll expand on what I wrote
above.

I *typically* don't care about two's-complement. If I actually need
to write code that depends on two's-complement, I'll do so. All else
being equal, portable code, particularly code that depends only on
guarantees made by the C standard, is better than non-portable code --
and, personally, I usually find it easier to write and understand.
(Don't bother posting counterexamples; I wrote "usually".) In those
cases where all else *isn't* equal, there's a tradeoff between
portability and performance and convenience, and whatever other
desirable attibutes you want to think about).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 5 '06 #68

Keith Thompson

Keith Thompson <ks***@mib.orgwrites:
[...]

#include <stdio.h#include <limits.h>

enum CHAR_CLASS { OTHER=0, UPPER, LOWER, DIGIT };

static const enum CHAR_CLASS cclass[UCHAR_MAX + 1] =
{ ['a'] = LOWER,
['b'] = LOWER,
/* ... */
['A'] = UPPER,
['B'] = UPPER,
/* ... */
['0'] = DIGIT,
['1'] = DIGIT,
/* ... */
};

int main(void)
{
for (int c = 0; c <= UCHAR_MAX; c ++) {
if (cclass[c] != OTHER) {
printf("'%c' =%d\n", c, cclass[c]);
}
}

return 0;
}

Of course the two #include directives need to be on separate lines; I
accidentally joined them when I reformatted the previous paragraph
before posting.

BTW, the output is:

'0' =3
'1' =3
'A' =1
'B' =1
'a' =2
'b' =2

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 5 '06 #69

Ben Pfaff

Keith Thompson <ks***@mib.orgwrites:

I might consider adding a check at program startup, something
like
if ('A' != 65) {
/* yes, it's an incomplete check */
fprintf(stderr, "This program won't work on a non-ASCII system\n");
exit(EXIT_FAILURE);
}

Is there some reason that this can't be done at compile time:
#if 'A' != 65
#error Needs ASCII character set
#endif
--
"When I have to rely on inadequacy, I prefer it to be my own."
--Richard Heathfield

Aug 5 '06 #70

Keith Thompson

Ben Pfaff <bl*@cs.stanford.eduwrites:

Keith Thompson <ks***@mib.orgwrites:
>I might consider adding a check at program startup, something
like
if ('A' != 65) {
/* yes, it's an incomplete check */
fprintf(stderr, "This program won't work on a non-ASCII system\n");
exit(EXIT_FAILURE);
}

Is there some reason that this can't be done at compile time:
#if 'A' != 65
#error Needs ASCII character set
#endif

Yes.

(Barely resisting the temptation to leave it at that...)

N1124 6.10.1p3:

This includes interpreting character constants, which may involve
converting escape sequences into execution character set
members. Whether the numeric value for these character constants
matches the value obtained when an identical character constant
occurs in an expression (other than within a #if or #elif
directive) is implementation-defined.

Footnote:

Thus, the constant expression in the following #if directive and
if statement is not guaranteed to evaluate to the same value in
these two contexts.

#if 'z' - 'a' == 25

if ('z' - 'a' == 25)

I did refer to this upthread, just after the portion you quoted:

| or I might not bother; I'd at least document the assumption somewhere
| in the code. (No, you can't reliably test this in the preprocessor;
| see C99 6.10.1p3.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 5 '06 #71

Ben Pfaff

Keith Thompson <ks***@mib.orgwrites:

Ben Pfaff <bl*@cs.stanford.eduwrites:
>Keith Thompson <ks***@mib.orgwrites:
>>I might consider adding a check at program startup, something
like
if ('A' != 65) {
/* yes, it's an incomplete check */
fprintf(stderr, "This program won't work on a non-ASCII system\n");
exit(EXIT_FAILURE);
}

Is there some reason that this can't be done at compile time:
#if 'A' != 65
#error Needs ASCII character set
#endif

Yes.

I need to do a better job of reading. Thank you for your
patience.
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}

Aug 5 '06 #72

websnarf

Keith Thompson wrote:

we******@gmail.com writes:
Keith Thompson wrote:
"Malcolm" <re*******@btinternet.comwrites:
<we******@gmail.comwrote:
Eigenvector wrote:
[...] I fully realize that
independent developers may or may not conform to standards, but again is
it at least encouraged?

Not really. By its very nature C encourages non-portable programming.
In general, I try to write code portably, but the only thing keeping me
honest is actually compiling my stuff with multiple compilers to see
what happens.

Yes. There is a tension between efficiency and portability. In Java they
resolved it by compromising efficiency, in C we have to be careful to make
our portable code genuinely portable, which is why the topic is so often
discussed.
There is also the problem of "good enough" portability, for instance
assuming ASCII and two's complement integers.

I rarely find it useful to assume ASCII.
Who cares what *YOU* find useful or not.

Gosh, I don't know. Do you care? Because, as you know, your opinion
matters a great deal to me. It's probably because of your charming
manner.

Its just so typical of you to answer generic questions with what
happens to suit you. As if you represent the only kind of C programmer
that there is, or should be.

I would like to auto-initialize an array that maps from unsigned char
-parse-state, which makes, say, letters to one value, numbers to
another, etc. The reason I want to auto-initialize this rather than
calling some init() routine that sets them up is because I want to
support correct multithreading, and my inner loops that use such an
array are going so fast, that auto-first-time checking actually is
unacceptable overhead.

If I can't assume ASCII, then this solution has simply been taken away
from me. Compare this with the Lua language, which allows unordered
specific index auto-initialization.

I can think of several ways to do this. You can use some automated
process to generate the C code for you during the build process,
perhaps with a build-time option to select ASCII or some other
character set.

The subject for this thread is "code portability". So of course, I
assume you have a way of doing this portably.

[...] Or you can explicitly invoke an initialization routine
exactly once as your program is starting up and save the expense of
checking on each use.

Ok, read carefully, I just told you I can't do that. If I am willing
to sacrifice performance (remember we're setting up a constant
addressed look-up table so we're expecting a throughput of 1/3 of a
single clock (or even 1/4 of a clock on these new Intel Core CPUs) for
this operation) why would I bother doing this through a look up table
in the first place?

[...] Or (and this may or may not be available to
you), you can use C99,

Again, the subject of this thread is "code portability". Use C99 is
diametrically opposite to this goal.

[...] This works with gcc 3.4.5 and 4.1.1 with "-std=c99".

The irony of this statement is just unbelievable ... . Two versions of
gcc counts as portability?

Or you can just (drum roll please) assume ASCII. If you'll look very
closely at what I wrote above:

| I rarely find it useful to assume ASCII.

you'll see the word "rarely", not "never".

That's nice, but you've removed the context. This is not a response to
the generic question posed. This is just a statement about *your*
predilictions. The fact is that *I* rarely find it useful as well,
because I don't write a lot of code that does parsing. But that is
completely irrelevant, which is why, of course, I refrained from making
such ridiculous non-sequitor statements. *rarely* is not the only word
you wrote there, you also wrote the word *I*.

[...] If assuming ASCII, and
therefore making your code non-portable to non-ASCII platforms, makes
it significantly faster, that's great. I might consider adding a
check at program startup, something like
if ('A' != 65) {
/* yes, it's an incomplete check */
fprintf(stderr, "This program won't work on a non-ASCII system\n");
exit(EXIT_FAILURE);
}
or I might not bother; I'd at least document the assumption somewhere
in the code. (No, you can't reliably test this in the preprocessor;
see C99 6.10.1p3.)

The fact that you've managed to cite a single application where
assuming ASCII happens to be useful does not refute anything I've
said.

This is a *single* application? I am talking about a technique, not an
application. The fact is, this comes up for a wide variety of string
parsing scenarios, where speed (or in fact *simplicity*) might be a
concern. We're talking about ASCII here -- where else would such a
concern apply?

Write portable code if you can. If you need to write non-portable
code, keep it as isolated as you can (but you may *sometimes* find
that a portable implementation would have worked just as well in the
first place).

Now why couldn't you have posted this more reasoned position instead of
the drivel that you did in the first place?

It's usually just as easy to
write code that depends only on the guarantees in the standard, and
will just work regardless of the character set. It would be
marginally more convenient to be able to assume that the character
codes for the letters are contiguous, but that's easy enough to work
around.
Yeah, well obviously you don't work in environments where performance
and portability matters.

Obviously you have no clue about the environments in which I work.

Ok, well then maybe you are just bad at your job, or maybe you have
long term memory problems like the guy from the movie Memento.

As for two's complement, I typically don't care about that either.
Numbers are numbers. If I need to do bit-twiddling, I use unsigned.
And if you need a correctly functioning ring modulo 2**n? If you can
assume 2s complement then you've *got one*. Otherwise, you get to
construct one somehow (not sure how hard this is, I have never ever
been exposed to a system that didn't *ONLY* support 2s complement).

It's been a while since my last abstract algebra class, but isn't a
"ring module 2**n" simply the set of integers from 0 to 2**n-1?

No, that would be a list or a set.

Your bizarre relationship with the definition of technical words is a
real curiosity. How can you pretend to be a computer programmer, and
be so far removed from standard nomenclature? It would be ok if you
just mixed up a few words or something I wouldn't make a big deal about
it. But you appear to not know the concepts on the other side of these
words.

[...] And isn't that precisely what C's *unsigned* integer types are?

First of all no, and second of all if it was, then it wouldn't be a
ring.

A Ring is a set with a 0, a + operator and a * operator. And the point
is that its completely *closed* under these operations. In typical 2s
complement implementations, I know that integers (signed or not) are
rings. In 1s complement machines -- I have no idea; I don't have
access to such a machine (I never have in the past, and I almost
certainly never will in the future), and just don't have familliarity
with 1s complement. It doesn't have the natural wrapping properties
that 2s complement has, so my intuition is that its *not* a ring, but I
just don't know.

The reason why this is important is for verification purposes. Suppose
I write the following:

x = (y << 7) - (y << 2);

Well, that should be the same as x = y * 124. How do I know this?
Because I know that y << 7 is the same as y * 128, and y << 2 is the
same as y * 4. After that, there is a concern that one of operands of
the subtract might wrap around, while the other one doesn't. Or both
might. Because of that, direct verification of this fact might lead
you to believe that you need to look at these as seperate cases and
very carefully examine the bits to make sure that the results are still
correct. But we don't have to. If we *know* that the expression is
equivalent to y*128 - y*4, then because 2s complement integers form an
actual ring, then we are allowed rely on ordinary algebra without
concern. Wrap around doesn't matter -- its always correct.
Verification of just straight *algebra* is unnecessary, we can just
rely on mathematics.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 5 '06 #73

Chris Torek

In article <11*********************@i42g2000cwa.googlegroups. com>
<we******@gmail.comwrote (replying to someone else):

>A Ring is a set with a 0, a + operator and a * operator. And the point
is that its completely *closed* under these operations.

This is ... hardly a thorough definition. You need to add
commutativity (for +) and distribution (of * over +), in particular.

>In typical 2s complement implementations, I know that integers
(signed or not) are rings. In 1s complement machines -- I have
no idea ...

And that is where you have missed Keith Thompson's point -- because
even on ones' complement machines, *unsigned* integers (in C) are
still rings. So use "unsigned"; they give you the very property
you want. They *guarantee* it.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Aug 5 '06 #74

websnarf

Chris Torek wrote:

In article <11*********************@i42g2000cwa.googlegroups. com>
<we******@gmail.comwrote (replying to someone else):
A Ring is a set with a 0, a + operator and a * operator. And the point
is that its completely *closed* under these operations.

This is ... hardly a thorough definition.

I didn't claim it was. This isn't a classroom; thoroughness is not the
same as correctness.

[...] You need to add commutativity (for +) and distribution (of * over +), in
particular.

In typical 2s complement implementations, I know that integers
(signed or not) are rings. In 1s complement machines -- I have
no idea ...

And that is where you have missed Keith Thompson's point -- because
even on ones' complement machines, *unsigned* integers (in C) are
still rings. So use "unsigned"; they give you the very property
you want. They *guarantee* it.

And now you are starting to make Keith-style mistakes. What if I need
to do algebra on signed integers? I need the "ring properties" for
proofs of correctness -- this is not an useful end in of itself. If I
cannot apply these properties to signed integers, then I cannot do
algebra on signed integers without great difficulty.

Compare this to the situation in 2s complement. Suppose its
*difficult* to prove something on signed integers, but easy to prove it
for unsigned. But if it turns out you can "lift" from signed to
unsigned through casting and your theorem still makes sense, then you
likely can just apply the proof through this mechanism.

What Keith said is tantamount to saying "don't use negative numbers, if
you plan on doing sound arithmetic". This is kind of useless.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 5 '06 #75

Keith Thompson

we******@gmail.com writes:

Keith Thompson wrote:
>we******@gmail.com writes:
Keith Thompson wrote:
"Malcolm" <re*******@btinternet.comwrites:
<we******@gmail.comwrote:
Eigenvector wrote:
[...] I fully realize that independent developers may or may
not conform to standards, but again is it at least
encouraged?

Not really. By its very nature C encourages non-portable
programming. In general, I try to write code portably, but
the only thing keeping me honest is actually compiling my
stuff with multiple compilers to see what happens.

Yes. There is a tension between efficiency and portability. In
Java they resolved it by compromising efficiency, in C we have
to be careful to make our portable code genuinely portable,
which is why the topic is so often discussed. There is also
the problem of "good enough" portability, for instance
assuming ASCII and two's complement integers.

I rarely find it useful to assume ASCII.

Who cares what *YOU* find useful or not.

Gosh, I don't know. Do you care? Because, as you know, your opinion
matters a great deal to me. It's probably because of your charming
manner.

Its just so typical of you to answer generic questions with what
happens to suit you. As if you represent the only kind of C programmer
that there is, or should be.

I never said or implied that.

I would like to auto-initialize an array that maps from unsigned char
-parse-state, which makes, say, letters to one value, numbers to
another, etc. The reason I want to auto-initialize this rather than
calling some init() routine that sets them up is because I want to
support correct multithreading, and my inner loops that use such an
array are going so fast, that auto-first-time checking actually is
unacceptable overhead.

If I can't assume ASCII, then this solution has simply been taken away
from me. Compare this with the Lua language, which allows unordered
specific index auto-initialization.

I can think of several ways to do this. You can use some automated
process to generate the C code for you during the build process,
perhaps with a build-time option to select ASCII or some other
character set.

The subject for this thread is "code portability". So of course, I
assume you have a way of doing this portably.

No, I don't.

>[...] Or you can explicitly invoke an initialization routine
exactly once as your program is starting up and save the expense of
checking on each use.

Ok, read carefully, I just told you I can't do that. If I am willing
to sacrifice performance (remember we're setting up a constant
addressed look-up table so we're expecting a throughput of 1/3 of a
single clock (or even 1/4 of a clock on these new Intel Core CPUs) for
this operation) why would I bother doing this through a look up table
in the first place?

You said you wanted "an array that maps from unsigned char ->
parse-state, which makes, say, letters to one value, numbers to
another, etc.". I took that to be a description of a lookup table.
If you were referring to something else, I suggest you write more
clearly.

>[...] Or (and this may or may not be available to
you), you can use C99,

Again, the subject of this thread is "code portability". Use C99 is
diametrically opposite to this goal.

>[...] This works with gcc 3.4.5 and 4.1.1 with "-std=c99".

The irony of this statement is just unbelievable ... . Two versions of
gcc counts as portability?

No, of course not. You described a problem; I suggested some
solutions, and I clearly stated that some of them are not portable.
The fact that the subject of this thread happens to be "code
portability" does not mean that I am obligated to discuss only
portable solutions. In fact, I am discussing portable
vs. non-portable code.

>Or you can just (drum roll please) assume ASCII. If you'll look very
closely at what I wrote above:

| I rarely find it useful to assume ASCII.

you'll see the word "rarely", not "never".

That's nice, but you've removed the context. This is not a response to
the generic question posed. This is just a statement about *your*
predilictions. The fact is that *I* rarely find it useful as well,
because I don't write a lot of code that does parsing. But that is
completely irrelevant, which is why, of course, I refrained from making
such ridiculous non-sequitor statements. *rarely* is not the only word
you wrote there, you also wrote the word *I*.

Yes, I wrote the word "I" because "I" was talking about my own
experience. If it's not useful to you, that's too bad.

>[...] If assuming ASCII, and
therefore making your code non-portable to non-ASCII platforms, makes
it significantly faster, that's great. I might consider adding a
check at program startup, something like
if ('A' != 65) {
/* yes, it's an incomplete check */
fprintf(stderr, "This program won't work on a non-ASCII system\n");
exit(EXIT_FAILURE);
}
or I might not bother; I'd at least document the assumption somewhere
in the code. (No, you can't reliably test this in the preprocessor;
see C99 6.10.1p3.)

The fact that you've managed to cite a single application where
assuming ASCII happens to be useful does not refute anything I've
said.

This is a *single* application? I am talking about a technique, not an
application. The fact is, this comes up for a wide variety of string
parsing scenarios, where speed (or in fact *simplicity*) might be a
concern. We're talking about ASCII here -- where else would such a
concern apply?

>Write portable code if you can. If you need to write non-portable
code, keep it as isolated as you can (but you may *sometimes* find
that a portable implementation would have worked just as well in the
first place).

Now why couldn't you have posted this more reasoned position instead of
the drivel that you did in the first place?

It's what I've been saying all along. Pay attention.

>It's usually just as easy to
write code that depends only on the guarantees in the standard, and
will just work regardless of the character set. It would be
marginally more convenient to be able to assume that the character
codes for the letters are contiguous, but that's easy enough to work
around.

Yeah, well obviously you don't work in environments where performance
and portability matters.

Obviously you have no clue about the environments in which I work.

Ok, well then maybe you are just bad at your job, or maybe you have
long term memory problems like the guy from the movie Memento.

The subject of this thread is "code portability", not "gratuitous
insults".

>As for two's complement, I typically don't care about that either.
Numbers are numbers. If I need to do bit-twiddling, I use unsigned.

And if you need a correctly functioning ring modulo 2**n? If you can
assume 2s complement then you've *got one*. Otherwise, you get to
construct one somehow (not sure how hard this is, I have never ever
been exposed to a system that didn't *ONLY* support 2s complement).

It's been a while since my last abstract algebra class, but isn't a
"ring module 2**n" simply the set of integers from 0 to 2**n-1?

No, that would be a list or a set.

Your bizarre relationship with the definition of technical words is a
real curiosity. How can you pretend to be a computer programmer, and
be so far removed from standard nomenclature? It would be ok if you
just mixed up a few words or something I wouldn't make a big deal about
it. But you appear to not know the concepts on the other side of these
words.

>[...] And isn't that precisely what C's *unsigned* integer types are?

First of all no, and second of all if it was, then it wouldn't be a
ring.

A Ring is a set with a 0, a + operator and a * operator. And the point
is that its completely *closed* under these operations. In typical 2s
complement implementations, I know that integers (signed or not) are
rings. In 1s complement machines -- I have no idea; I don't have
access to such a machine (I never have in the past, and I almost
certainly never will in the future), and just don't have familliarity
with 1s complement. It doesn't have the natural wrapping properties
that 2s complement has, so my intuition is that its *not* a ring, but I
just don't know.

A ring is not just a set, nor is it just a set with a 0, a + operator,
and a - operator. There are several other properties it has to have.
You flame me for an incomplete definition, then offer another
incomplete definition yourself.

I believe that unsigned int satisfies those properties, but signed int
may or may not; for example, the standard makes no guarantee that any
signed type is closed under addition. It's probably true that signed
integers on most 2's-complement systems (which are almost all existing
systems) also happen to satisfy those properties.

The reason why this is important is for verification purposes. Suppose
I write the following:

x = (y << 7) - (y << 2);

Well, that should be the same as x = y * 124. How do I know this?
Because I know that y << 7 is the same as y * 128, and y << 2 is the
same as y * 4. After that, there is a concern that one of operands
of the subtract might wrap around, while the other one doesn't. Or
both might. Because of that, direct verification of this fact might
lead you to believe that you need to look at these as seperate cases
and very carefully examine the bits to make sure that the results
are still correct. But we don't have to. If we *know* that the
expression is equivalent to y*128 - y*4, then because 2s complement
integers form an actual ring, then we are allowed rely on ordinary
algebra without concern. Wrap around doesn't matter -- its always
correct. Verification of just straight *algebra* is unnecessary, we
can just rely on mathematics.

If you *know* that 2's-complement integers form a ring, then you are
depending on properties not guaranteed by the C standard. (You are,
of course, free to do so.)

Incidentally, you might find that it's possible to have a technical
discussion without being a hypocritical jerk. Try it.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 5 '06 #76

Flash Gordon

we******@gmail.com wrote:

Keith Thompson wrote:
>we******@gmail.com writes:
>>Keith Thompson wrote:
"Malcolm" <re*******@btinternet.comwrites:
<we******@gmail.comwrote:
>Eigenvector wrote:
>>[...] I fully realize that
>>independent developers may or may not conform to standards, but again is
>>it at least encouraged?
>Not really. By its very nature C encourages non-portable programming.
>In general, I try to write code portably, but the only thing keeping me
>honest is actually compiling my stuff with multiple compilers to see
>what happens.
Yes. There is a tension between efficiency and portability. In Java they
resolved it by compromising efficiency, in C we have to be careful to make
our portable code genuinely portable, which is why the topic is so often
discussed.
There is also the problem of "good enough" portability, for instance
assuming ASCII and two's complement integers.
I rarely find it useful to assume ASCII.
Who cares what *YOU* find useful or not.
Gosh, I don't know. Do you care? Because, as you know, your opinion
matters a great deal to me. It's probably because of your charming
manner.

Its just so typical of you to answer generic questions with what
happens to suit you. As if you represent the only kind of C programmer
that there is, or should be.

Unless someone knows *every* domain, which no one does, then all they
can do is talk about the areas they do. Therefore *any* response to a
generic question will be based on what the person answering it comes >
across.

Keith rarely finds it useful to assume ASCII, it appears you regularly
find it useful to assume ASCII. Neither shows what the situation is
across all domains.

>>I would like to auto-initialize an array that maps from unsigned char
-parse-state, which makes, say, letters to one value, numbers to
another, etc. The reason I want to auto-initialize this rather than
calling some init() routine that sets them up is because I want to
support correct multithreading, and my inner loops that use such an
array are going so fast, that auto-first-time checking actually is
unacceptable overhead.

If I can't assume ASCII, then this solution has simply been taken away
from me. Compare this with the Lua language, which allows unordered
specific index auto-initialization.
I can think of several ways to do this. You can use some automated
process to generate the C code for you during the build process,
perhaps with a build-time option to select ASCII or some other
character set.

The subject for this thread is "code portability". So of course, I
assume you have a way of doing this portably.

I'm sure Keith can.

>[...] Or you can explicitly invoke an initialization routine
exactly once as your program is starting up and save the expense of
checking on each use.

Ok, read carefully, I just told you I can't do that. If I am willing
to sacrifice performance (remember we're setting up a constant
addressed look-up table so we're expecting a throughput of 1/3 of a
single clock (or even 1/4 of a clock on these new Intel Core CPUs) for
this operation) why would I bother doing this through a look up table
in the first place?

int main(void)
{
do_init()
/* Throw off as many off topic threads as you want */
/* rest of program */
}

Calling do_init has a major impact on the performance of the program?

>[...] Or (and this may or may not be available to
you), you can use C99,

Again, the subject of this thread is "code portability". Use C99 is
diametrically opposite to this goal.

Keith noted that C99 might not be available to you. However, if it is
available on all platforms of interest then it might be portable enough.

>[...] This works with gcc 3.4.5 and 4.1.1 with "-std=c99".

The irony of this statement is just unbelievable ... . Two versions of
gcc counts as portability?

If it is valid C99, and I have no reason to believe it isn't, there are
other compilers it will work on.

>Or you can just (drum roll please) assume ASCII. If you'll look very
closely at what I wrote above:

| I rarely find it useful to assume ASCII.

you'll see the word "rarely", not "never".

That's nice, but you've removed the context. This is not a response to
the generic question posed. This is just a statement about *your*
predilictions. The fact is that *I* rarely find it useful as well,
because I don't write a lot of code that does parsing. But that is
completely irrelevant, which is why, of course, I refrained from making
such ridiculous non-sequitor statements. *rarely* is not the only word
you wrote there, you also wrote the word *I*.

Which means that what Keith wrote is perfectly clear. You (and probably
Keith) do not know whether for the majority of programs it is useful to
assume ASCII or not, all you know is the domains you know about.

>[...] If assuming ASCII, and
therefore making your code non-portable to non-ASCII platforms, makes
it significantly faster, that's great. I might consider adding a
check at program startup, something like
if ('A' != 65) {
/* yes, it's an incomplete check */
fprintf(stderr, "This program won't work on a non-ASCII system\n");
exit(EXIT_FAILURE);
}
or I might not bother; I'd at least document the assumption somewhere
in the code. (No, you can't reliably test this in the preprocessor;
see C99 6.10.1p3.)

The fact that you've managed to cite a single application where
assuming ASCII happens to be useful does not refute anything I've
said.

This is a *single* application? I am talking about a technique, not an
application. The fact is, this comes up for a wide variety of string
parsing scenarios, where speed (or in fact *simplicity*) might be a
concern. We're talking about ASCII here -- where else would such a
concern apply?

So if I come up a technique for two things covering two wide varieties
of scenarios where assuming ASCII provides no benefit that will prove
that generally you don't need to assume ASCII?

<snip>

>>>It's usually just as easy to
write code that depends only on the guarantees in the standard, and
will just work regardless of the character set. It would be
marginally more convenient to be able to assume that the character
codes for the letters are contiguous, but that's easy enough to work
around.
Yeah, well obviously you don't work in environments where performance
and portability matters.
Obviously you have no clue about the environments in which I work.

Ok, well then maybe you are just bad at your job, or maybe you have
long term memory problems like the guy from the movie Memento.

Or maybe Keith is good at his job and does things where it is rarely
useful to assume ASCII?

>>>As for two's complement, I typically don't care about that either.
Numbers are numbers. If I need to do bit-twiddling, I use unsigned.
And if you need a correctly functioning ring modulo 2**n? If you can
assume 2s complement then you've *got one*. Otherwise, you get to
construct one somehow (not sure how hard this is, I have never ever
been exposed to a system that didn't *ONLY* support 2s complement).
It's been a while since my last abstract algebra class, but isn't a
"ring module 2**n" simply the set of integers from 0 to 2**n-1?

No, that would be a list or a set.

Your bizarre relationship with the definition of technical words is a
real curiosity. How can you pretend to be a computer programmer, and
be so far removed from standard nomenclature? It would be ok if you
just mixed up a few words or something I wouldn't make a big deal about
it. But you appear to not know the concepts on the other side of these
words.

There are large fields of computing where algebra is not required.
Certainly large fields where rings are not required.

>[...] And isn't that precisely what C's *unsigned* integer types are?

First of all no, and second of all if it was, then it wouldn't be a
ring.

A Ring is a set with a 0, a + operator and a * operator. And the point
is that its completely *closed* under these operations.

Which unsigned integer types are.

In typical 2s
complement implementations, I know that integers (signed or not) are
rings.

You obviously no very little about how unsigned integers are defined in
C. They are the same *whatever* representation is used for signed integers.

In 1s complement machines -- I have no idea;

Had you bothered to look you would know that the signed integer
representation does not affect the unsigned integer representation.
Keith *explicitly* stated *unsigned*.

I don't have
access to such a machine (I never have in the past, and I almost
certainly never will in the future), and just don't have familliarity
with 1s complement. It doesn't have the natural wrapping properties
that 2s complement has, so my intuition is that its *not* a ring, but I
just don't know.

Singed integers are not defined as being a ring *whatever*
representation is used. I've used processors that use 2s complement
where they will limit on overflow of addition/subtraction instead of
wrapping. There are times in signal processing where this is a *very*
useful property.

The reason why this is important is for verification purposes. Suppose
I write the following:

x = (y << 7) - (y << 2);

Well, that should be the same as x = y * 124. How do I know this?
Because I know that y << 7 is the same as y * 128, and y << 2 is the
same as y * 4. After that, there is a concern that one of operands of

If you understood unsigned integers in C you would understand that it
applies whatever the signed representation is. I would still use
multiplication rather than a shift/subtract when I want multiplication
and let the compiler sort out the optimisation. After all, that is what
the optimisation phase is for. In any case, on some processors it would
be *faster* to multiply because they have single cycle hardware multipliers.

the subtract might wrap around, while the other one doesn't. Or both
might. Because of that, direct verification of this fact might lead
you to believe that you need to look at these as seperate cases and
very carefully examine the bits to make sure that the results are still
correct. But we don't have to. If we *know* that the expression is
equivalent to y*128 - y*4, then because 2s complement integers form an
actual ring, then we are allowed rely on ordinary algebra without
concern. Wrap around doesn't matter -- its always correct.
Verification of just straight *algebra* is unnecessary, we can just
rely on mathematics.

As Keith said, you get these guarantees on unsigned integers. So if you
need a ring use unsigned integers. Since unsigned integers are
guaranteed to be a ring by the C standard.
--
Flash Gordon
Still sigless on this computer.

Aug 5 '06 #77

Walter Roberson

In article <11**********************@p79g2000cwp.googlegroups .com>,
<we******@gmail.comwrote:

>Chris Torek wrote:
>In article <11*********************@i42g2000cwa.googlegroups. com>
<we******@gmail.comwrote (replying to someone else):
>A Ring is a set with a 0, a + operator and a * operator. And the point
is that its completely *closed* under these operations.

>This is ... hardly a thorough definition.

>I didn't claim it was. This isn't a classroom; thoroughness is not the
same as correctness.

You gave a definition for ring, but there are sets that match your
definition that are NOT rings, because your definition was incomplete
even for common types of rings.
http://mathworld.wolfram.com/Ring.html

It is not clear to me how someone can complain about someone
else's "bizarre relationship to technical terms" and then themselves
misuse a technical term that they themself have indicated is important
to part of their discussion.
--
Okay, buzzwords only. Two syllables, tops. -- Laurie Anderson

Aug 5 '06 #78

Walter Roberson

In article <11*********************@h48g2000cwc.googlegroups. com>,
<we******@gmail.comwrote:

>And if you need a correctly functioning ring modulo 2**n? If you can
assume 2s complement then you've *got one*. Otherwise, you get to
construct one somehow (not sure how hard this is, I have never ever
been exposed to a system that didn't *ONLY* support 2s complement).

Caution: on most 2s complement machines, the *signed* integers do
not form a ring. In cases where INT_MIN is (-INT_MAX - 1)
(e.g., INT_MIN is -32768 for an INT_MAX of 32767) then there
is no "additive inverse" for INT_MIN -- no element in the set
such that INT_MIN plus the element is 0.

This is not an issue for *unsigned* integers: operations on the
unsigned integers are defined such that the additive inverse of
the maximum unsigned integer is always 1 [if I recall correctly.]
--
There are some ideas so wrong that only a very intelligent person
could believe in them. -- George Orwell

Aug 5 '06 #79

websnarf

Walter Roberson wrote:

In article <11*********************@h48g2000cwc.googlegroups. com>,
<we******@gmail.comwrote:
And if you need a correctly functioning ring modulo 2**n? If you can
assume 2s complement then you've *got one*. Otherwise, you get to
construct one somehow (not sure how hard this is, I have never ever
been exposed to a system that didn't *ONLY* support 2s complement).

Caution: on most 2s complement machines, the *signed* integers do
not form a ring. In cases where INT_MIN is (-INT_MAX - 1)
(e.g., INT_MIN is -32768 for an INT_MAX of 32767) then there
is no "additive inverse" for INT_MIN -- no element in the set
such that INT_MIN plus the element is 0.

What do you mean? The additive inverse of INT_MIN is INT_MIN.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 5 '06 #80

Walter Roberson

In article <11**********************@n13g2000cwa.googlegroups .com>,
<we******@gmail.comwrote:

>Walter Roberson wrote:

>Caution: on most 2s complement machines, the *signed* integers do
not form a ring. In cases where INT_MIN is (-INT_MAX - 1)
(e.g., INT_MIN is -32768 for an INT_MAX of 32767) then there
is no "additive inverse" for INT_MIN -- no element in the set
such that INT_MIN plus the element is 0.

>What do you mean? The additive inverse of INT_MIN is INT_MIN.

It is true that the additive inverse is not required to be distinct
from the original value, but in order for the additive inverse of
INT_MIN to be INT_MIN, then INT_MIN + INT_MIN would have to equal 0.
That is not at all guaranteed in C's operator definitions:
the C definition of addition and subtraction on the signed integral
types leaves it up to the implementation (or undefined behaviour) as
to what happens in cases of overflow or underflow.

The C operations on the unsigned integral types are strictly defined;
those on the signed integral types are not.
--
I was very young in those days, but I was also rather dim.
-- Christopher Priest

Aug 5 '06 #81

websnarf

Walter Roberson wrote:

<we******@gmail.comwrote:
Chris Torek wrote:
In article <11*********************@i42g2000cwa.googlegroups. com>
<we******@gmail.comwrote (replying to someone else):
A Ring is a set with a 0, a + operator and a * operator. And the point
is that its completely *closed* under these operations.

This is ... hardly a thorough definition.

I didn't claim it was. This isn't a classroom; thoroughness is not the
same as correctness.

You gave a definition for ring,

I did? Look. Pay attention:

A cat is an animal with fur, four legs and whiskers.

Is that a definition? It was my intent just to give sufficient
properties of a ring to explain why Keith's notion of what a ring is is
just not going to cut it.

[...] but there are sets that match your
definition that are NOT rings, because your definition was incomplete
even for common types of rings.
http://mathworld.wolfram.com/Ring.html

Right -- so you wanted me to paste that whole thing in here? I know
what the definition is, but as you can clearly see, its quite wordy
relative to what its actual content is. I could paste in Russell and
Whitehead's proof that 2+2=4 (although the metamath proof appears to be
much shorter) every time I cite that, but I don't think that it would
be very useful to this audience.

It is not clear to me how someone can complain about someone
else's "bizarre relationship to technical terms" and then themselves
misuse a technical term that they themself have indicated is important
to part of their discussion.

How did I misuse it?

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 5 '06 #82

websnarf

Walter Roberson wrote:

<we******@gmail.comwrote:
Walter Roberson wrote:
Caution: on most 2s complement machines, the *signed* integers do
not form a ring. In cases where INT_MIN is (-INT_MAX - 1)
(e.g., INT_MIN is -32768 for an INT_MAX of 32767) then there
is no "additive inverse" for INT_MIN -- no element in the set
such that INT_MIN plus the element is 0.

What do you mean? The additive inverse of INT_MIN is INT_MIN.

It is true that the additive inverse is not required to be distinct
from the original value, but in order for the additive inverse of
INT_MIN to be INT_MIN, then INT_MIN + INT_MIN would have to equal 0.
That is not at all guaranteed in C's operator definitions:
the C definition of addition and subtraction on the signed integral
types leaves it up to the implementation (or undefined behaviour) as
to what happens in cases of overflow or underflow.

The C operations on the unsigned integral types are strictly defined;
those on the signed integral types are not.

But you're losing context again. That the *C standard* has these
weaknesses is well understood. And I'm sure there are 1s complement
machines, or otherwise where these weaknesses are truly manifest. In
real world 2s complement machines however, you will never see any such
problem. The ring properties are there for both signed and unsigned,
just as you may have learned about 2s complement in school. This is my
point -- the standard penalizes your working assumptions, while the
*defacto* standard supports them.

So if you want portability, you have to give up on using math. For
serious environments, that means you have to actually write more
verification code, which are completely pointless on platforms that
support the defacto 2s complement standard.

(Your point also has nothing to do with INT_MIN, BTW. That anomoly
just has to do with intuitive expectations about the negation
operation, which is only a problem of interpretation. Its not an
actual mathematical problem.)

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 5 '06 #83

Keith Thompson

we******@gmail.com writes:

Walter Roberson wrote:
> <we******@gmail.comwrote:
>Chris Torek wrote:
In article <11*********************@i42g2000cwa.googlegroups. com>
<we******@gmail.comwrote (replying to someone else):
A Ring is a set with a 0, a + operator and a * operator. And the point
is that its completely *closed* under these operations.

>This is ... hardly a thorough definition.

>I didn't claim it was. This isn't a classroom; thoroughness is not the
same as correctness.

You gave a definition for ring,

I did? Look. Pay attention:

A cat is an animal with fur, four legs and whiskers.

Is that a definition? It was my intent just to give sufficient
properties of a ring to explain why Keith's notion of what a ring is is
just not going to cut it.

Ok. Here's the context. You wrote:
| And if you need a correctly functioning ring modulo 2**n? If you can
| assume 2s complement then you've *got one*. Otherwise, you get to
| construct one somehow (not sure how hard this is, I have never ever
| been exposed to a system that didn't *ONLY* support 2s complement).

and I replied:
| It's been a while since my last abstract algebra class, but isn't a
| "ring module 2**n" simply the set of integers from 0 to 2**n-1? And
| isn't that precisely what C's *unsigned* integer types are?

I wasn't attempting to offer a *definition* of "ring" either. I was
suggesting that the specified set of integers, along with the required
set of operations, is a ring, and that a C unsigned integer type,
along with those same operations, is also a ring.

You're saying that my "notion of what a ring is is just not going to
cut it". Apparently this is based on the fact that I offered a valid
example of a ring.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 5 '06 #84

Dik T. Winter

In article <11*********************@h48g2000cwc.googlegroups. comwe******@gmail.com writes:
....

If I can't assume ASCII, then this solution has simply been taken away
from me. Compare this with the Lua language, which allows unordered
specific index auto-initialization.

Eh, well, if you want to limit the usability to languages that use only
the 26 letters of the Latin alphabet...
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Aug 6 '06 #85

Dik T. Winter

In article <eb*********@news2.newsguy.comChris Torek <no****@torek.netwrites:

In article <11*********************@i42g2000cwa.googlegroups. com>
<we******@gmail.comwrote (replying to someone else):
A Ring is a set with a 0, a + operator and a * operator. And the point
is that its completely *closed* under these operations.

This is ... hardly a thorough definition. You need to add
commutativity (for +) and distribution (of * over +), in particular.

And associativity.

In typical 2s complement implementations, I know that integers
(signed or not) are rings. In 1s complement machines -- I have
no idea ...

And that is where you have missed Keith Thompson's point -- because
even on ones' complement machines, *unsigned* integers (in C) are
still rings. So use "unsigned"; they give you the very property
you want. They *guarantee* it.

I can look further, but I think that in both also signed arithmetic
forms a ring (when overflow gives a properly defined result). But
of course such is not defined in C.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Aug 6 '06 #86

Dik T. Winter

In article <11**********************@p79g2000cwp.googlegroups .comwe******@gmail.com writes:
....

Compare this to the situation in 2s complement. Suppose its
*difficult* to prove something on signed integers, but easy to prove it
for unsigned. But if it turns out you can "lift" from signed to
unsigned through casting and your theorem still makes sense, then you
likely can just apply the proof through this mechanism.

That is very true. But C does not specify the results on overflow of
signed numbers. For a good reason, various machines handle that
differently.

Assuming you can simply apply mathematical theorems to the arithemetic
provided can be very wrong. For instance, the triangle inequality does
not hold for IEEE floating point arithmetic with rounding to nearest.
And you do not need either NaNs or Infinities or overflow to show that.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Aug 6 '06 #87

Dik T. Winter

In article <eb**********@canopus.cc.umanitoba.caro******@ibd.nrc-cnrc.gc.ca (Walter Roberson) writes:

In article <11*********************@h48g2000cwc.googlegroups. com>,
<we******@gmail.comwrote:

And if you need a correctly functioning ring modulo 2**n? If you can
assume 2s complement then you've *got one*. Otherwise, you get to
construct one somehow (not sure how hard this is, I have never ever
been exposed to a system that didn't *ONLY* support 2s complement).

Caution: on most 2s complement machines, the *signed* integers do
not form a ring. In cases where INT_MIN is (-INT_MAX - 1)
(e.g., INT_MIN is -32768 for an INT_MAX of 32767) then there
is no "additive inverse" for INT_MIN -- no element in the set
such that INT_MIN plus the element is 0.

Yes, there is. It is INT_MIN. Provided that overflow does not bother
the processor (which we assume, otherwise it would not be closed under
either addition or multiplication). That INT_MIN is its own additive
inverse is in itself not a problem. And I think it is indeed a ring.
Just as I think that signed 1's complement numbers form a ring (again,
provided that overflow is ignored).

But Paul Hsieh is talking about a ring mod 2**n, and in that case the
numbers are inherently unsigned. In that ring -1 == 2**n - 1.

Now his question was whether x << 7 would be equal to x * 128.
And indeed, that is the case, as long as there is no overflow,
it is the case, both on 1's complement and on 2's complement machines
(at least all machines I ever did use). For unsigned arithmetic and
for signed arithmetic.

And, contrary to Paul Hsieh's experience, in my career I think that
I have used 1's complement machines about as long as I have used
2's complement machines. Both about 25 years, but there is some
overlap when I used both.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Aug 6 '06 #88

Dik T. Winter

In article <11**********************@i42g2000cwa.googlegroups .comwe******@gmail.com writes:
....

But you're losing context again. That the *C standard* has these
weaknesses is well understood. And I'm sure there are 1s complement
machines, or otherwise where these weaknesses are truly manifest. In
real world 2s complement machines however, you will never see any such
problem.

You will. Unless you think that the Cray-1 and successors is not a
real world machine (but they still were at the time of C89). Or
consider the Gould, also fairly popular around that time. On that
machine INT_MIN was -INT_MAX, although it was 2's complement. What
would normally be seen as INT_MIN on 2's complement machines was a
trap representation on the Gould.

The ring properties are there for both signed and unsigned,
just as you may have learned about 2s complement in school. This is my
point -- the standard penalizes your working assumptions, while the
*defacto* standard supports them.

Your de facto standard is just a subset of the working machines.

So if you want portability, you have to give up on using math. For
serious environments, that means you have to actually write more
verification code, which are completely pointless on platforms that
support the defacto 2s complement standard.

Yes, so if you omit that code you are *implicitly* doing something
non-portable.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Aug 6 '06 #89

Chris Torek

>In article <eb**********@canopus.cc.umanitoba.ca>

>ro******@ibd.nrc-cnrc.gc.ca (Walter Roberson) writes:
>Caution: on most 2s complement machines ... there
is no "additive inverse" for INT_MIN -- no element in the set
such that INT_MIN plus the element is 0.

In article <J3********@cwi.nl>, Dik T. Winter <Di********@cwi.nlwrote:

>Yes, there is. It is INT_MIN. Provided that overflow does not bother
the processor (which we assume, otherwise it would not be closed under
either addition or multiplication).

Indeed, the problem is that overflow is not defined in C. It seems
to me odd for anyone to argue that 10414516 * 50120 "should" be
-2010468192 in ordinary (signed integer) arithmetic; it seems to
me more likely that it should be "caught runtime error: integer
overflow" with the default behavior being to terminate the process
(on a Unix-like system anyway). It would be better if the behavior
*were* defined as "caught at runtime" (among other things, perhaps
Yahoo finance would not show NYSE stock trading volume as a negative
number when 2147483600 + 100 becomes -2147483596 -- apparently
someone forgot to use unsigned there too). But, as we see over
and over again in computing, it tends to be more important to get
the wrong answer as fast as possible... :-)

>But Paul Hsieh is talking about a ring mod 2**n, and in that case the
numbers are inherently unsigned. In that ring -1 == 2**n - 1.

Yes ... so I found his later remarks (along the lines of "what if
I want negative numbers in my ring") rather puzzling. "Negative"
numbers are just positive numbers. This is one of the few places
where C gets the math right "right out of the box". Perhaps he
wants certain large positive numbers to print out, for output
purposes, as negative numbers. This is of course easy to achieve.
For instance, suppose we want our ring mod 2**32 (or whatever) to
hold positive numbers up to 17, and then use the simplest "negative
number" form for the rest of the values:

unsigned result;
...
printf("result is %s%u\n",
result 17 ? "-" : "", result 17 ? -result : result);

does the trick. This is slightly less convenient than simply lying
with "%d", perhaps -- the lie will work on most machines but is not
portable -- but clearly much more flexible, as in this example.
(So, I do not understand this complaint. Especially when there
are much more useful things to complain about, such as the lack
of a 2n-bit product when multiplying two n-bit numbers, or even
a full-precision (a*b/c) routine, both of which would be quite
useful in certain fields.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Aug 7 '06 #90

Philip Potter

<we******@gmail.comwrote in message
news:11*********************@i42g2000cwa.googlegro ups.com...

*something in which the content was lost in a swathe of personal attacks*

Calm down. You do not make your points any clearer by insulting others and
ignoring their arguments. While I think that *what* you were saying has some
validity [1], the way you said it was completely OTT.

You have become my first *plonk*

Philip

[1] Not that Keith Thompson has any less validity.

Aug 7 '06 #91

Richard

Chris Torek <no****@torek.netwrites:

>>In article <eb**********@canopus.cc.umanitoba.ca>
ro******@ibd.nrc-cnrc.gc.ca (Walter Roberson) writes:
>>Caution: on most 2s complement machines ... there
is no "additive inverse" for INT_MIN -- no element in the set
such that INT_MIN plus the element is 0.

In article <J3********@cwi.nl>, Dik T. Winter <Di********@cwi.nlwrote:
>>Yes, there is. It is INT_MIN. Provided that overflow does not bother
the processor (which we assume, otherwise it would not be closed under
either addition or multiplication).

Indeed, the problem is that overflow is not defined in C. It seems
to me odd for anyone to argue that 10414516 * 50120 "should" be
-2010468192 in ordinary (signed integer) arithmetic; it seems to
me more likely that it should be "caught runtime error: integer
overflow" with the default behavior being to terminate the process
(on a Unix-like system anyway). It would be better if the behavior
*were* defined as "caught at runtime" (among other things, perhaps
Yahoo finance would not show NYSE stock trading volume as a negative
number when 2147483600 + 100 becomes -2147483596 -- apparently
someone forgot to use unsigned there too). But, as we see over
and over again in computing, it tends to be more important to get
the wrong answer as fast as possible... :-)

I would be surprised and irritated to have a language like C do any
runtime checks. Its why ADA was invented :)

Failing that there are loads of libraries out there which deal with
large ints, financial rounding and so forth.

Aug 7 '06 #92

Richard

"Philip Potter" <ph***********@xilinx.comwrites:

<we******@gmail.comwrote in message
news:11*********************@i42g2000cwa.googlegro ups.com...
>*something in which the content was lost in a swathe of personal attacks*

Calm down. You do not make your points any clearer by insulting others and
ignoring their arguments. While I think that *what* you were saying has some
validity [1], the way you said it was completely OTT.

You have become my first *plonk*

He wasnt overly rude : just somewhat miffed, as a lot of posters get,
when dealing with the typically overtly imperious tone taken by a
certain poster.

Aug 7 '06 #93

ena8t8si

Frederick Gotham wrote:

Ian Collins posted:

Keith Thompson wrote:
>
My objection to C's integer type system is that the names are
arbitrary: "char", "short", "int", "long", "long long", "ginormous
long". I'd like to see a system where the type names follow a regular
pattern, and if you want to have a dozen distinct types the names are
clear and obvious. I have a few ideas, but since this will never
happen in any language called "C" I won't go into any more detail.

Isn't that why we now have (u)int32_t and friends? I tend to use int or
unsigned if I don't care about the size and one of the exact size type
if I do.

I use "int unsigned" when I want to store a positive integer.

I use "int unsigned" when I want to store an integer nonnegative.

If, on the other hand, I want to store nonnegative integers, I
use "unsigned int" or just "unsigned".

Aug 7 '06 #94

ena8t8si

Dik T. Winter wrote:

In article <eb**********@canopus.cc.umanitoba.caro******@ibd.nrc-cnrc.gc.ca (Walter Roberson) writes:
In article <11*********************@h48g2000cwc.googlegroups. com>,
<we******@gmail.comwrote:
>
And if you need a correctly functioning ring modulo 2**n? If you can
assume 2s complement then you've *got one*. Otherwise, you get to
construct one somehow (not sure how hard this is, I have never ever
been exposed to a system that didn't *ONLY* support 2s complement).
>
Caution: on most 2s complement machines, the *signed* integers do
not form a ring. In cases where INT_MIN is (-INT_MAX - 1)
(e.g., INT_MIN is -32768 for an INT_MAX of 32767) then there
is no "additive inverse" for INT_MIN -- no element in the set
such that INT_MIN plus the element is 0.

Yes, there is. It is INT_MIN. Provided that overflow does not bother
the processor (which we assume, otherwise it would not be closed under
either addition or multiplication). That INT_MIN is its own additive
inverse is in itself not a problem. And I think it is indeed a ring.
Just as I think that signed 1's complement numbers form a ring (again,
provided that overflow is ignored).

A ones complement representation can form a ring, depending
on how overflow is handled. Which, in terms of the C standard,
makes it no different from twos complement.

But Paul Hsieh is talking about a ring mod 2**n, and in that case the
numbers are inherently unsigned. In that ring -1 == 2**n - 1.

Now his question was whether x << 7 would be equal to x * 128.
And indeed, that is the case, as long as there is no overflow,
it is the case, both on 1's complement and on 2's complement machines
(at least all machines I ever did use). For unsigned arithmetic and
for signed arithmetic.

For unsigned, x << 7 is equal to x * 128 whether or not there is
overflow.

And, contrary to Paul Hsieh's experience, in my career I think that
I have used 1's complement machines about as long as I have used
2's complement machines. Both about 25 years, but there is some
overlap when I used both.

Funny, I've read about various ones complement machines
but I don't think I've ever used one. I have used a
sign/magnitude machine though.

Aug 7 '06 #95

Frederick Gotham

Ena8t posted:

>I use "int unsigned" when I want to store a positive integer.

I use "int unsigned" when I want to store an integer nonnegative.

If, on the other hand, I want to store nonnegative integers, I
use "unsigned int" or just "unsigned".

Congratulations, you've discovered that the word order is at the discretion
of the programmer.

Good luck with that.

Here's some more toys to play with:

int const inline static *const Func() { ...

static inline const int* const Func() { ...

--

Frederick Gotham

Aug 7 '06 #96

Keith Thompson

Frederick Gotham <fg*******@SPAM.comwrites:

Ena8t posted:

>>I use "int unsigned" when I want to store a positive integer.

I use "int unsigned" when I want to store an integer nonnegative.

If, on the other hand, I want to store nonnegative integers, I
use "unsigned int" or just "unsigned".

Congratulations, you've discovered that the word order is at the discretion
of the programmer.

Good luck with that.

Here's some more toys to play with:

int const inline static *const Func() { ...

static inline const int* const Func() { ...

Yes, it is. And how exactly is that useful? "int unsigned" is legal,
but all it's going to do is confuse the reader. "unsigned int" or
just "unsigned" is far more common, and won't confuse the reader.

Is there some reason you want to confuse your readers?

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 7 '06 #97

Frederick Gotham

Keith Thompson posted:

Yes, it is. And how exactly is that useful? "int unsigned" is legal,
but all it's going to do is confuse the reader.

People have different coding styles.

"unsigned int" or just "unsigned" is far more common, and won't confuse
the reader.

Is there some reason you want to confuse your readers?

When I have a definition (or a declaration), I put things in order of
descending importance. The most important thing comes first.

First and foremost, I specify the type.

Next, I specify things such as "long", "short".

Next, I specify signedness, e.g. "signed", "unsigned".

Next, I specify const, or volatile, or neither.

Next, I specify static, or extern, or neither.

Next, I specify inline, or not.

Next, I specify the name, which may involve asterisks, more const/volatile
qualifiers, or perhaps array bounds.

char unsigned const static inline (*const Func(void))[3]
{
char unsigned const static ch[3] = {'a','b','c'};

return &ch;
}
int main()
{
Func();
}

--

Frederick Gotham

Aug 8 '06 #98

Ian Collins

Frederick Gotham wrote:

Keith Thompson posted:

>>Yes, it is. And how exactly is that useful? "int unsigned" is legal,
but all it's going to do is confuse the reader.

People have different coding styles.

>>"unsigned int" or just "unsigned" is far more common, and won't confuse
the reader.

Is there some reason you want to confuse your readers?

When I have a definition (or a declaration), I put things in order of
descending importance. The most important thing comes first.

First and foremost, I specify the type.

Next, I specify things such as "long", "short".

Next, I specify signedness, e.g. "signed", "unsigned".

Next, I specify const, or volatile, or neither.

Next, I specify static, or extern, or neither.

Next, I specify inline, or not.

Next, I specify the name, which may involve asterisks, more const/volatile
qualifiers, or perhaps array bounds.

char unsigned const static inline (*const Func(void))[3]
{
char unsigned const static ch[3] = {'a','b','c'};

return &ch;
}
int main()
{
Func();
}

I pity anyone who has to maintain your code.

--
Ian Collins.

Aug 8 '06 #99

Keith Thompson

Frederick Gotham <fg*******@SPAM.comwrites:

Keith Thompson posted:
>Yes, it is. And how exactly is that useful? "int unsigned" is legal,
but all it's going to do is confuse the reader.

People have different coding styles.

Yes. That doesn't mean all styles are equally good.

>"unsigned int" or just "unsigned" is far more common, and won't confuse
the reader.

Is there some reason you want to confuse your readers?

When I have a definition (or a declaration), I put things in order of
descending importance. The most important thing comes first.

First and foremost, I specify the type.

Next, I specify things such as "long", "short".

Next, I specify signedness, e.g. "signed", "unsigned".

[..]

Ok, so your style is internally consistent, but it's inconsistent with
perhaps 99.73% of other C programmers (I made up that number).

char unsigned const static inline (*const Func(void))[3]

Anyone seeing that almost certainly going to spend far more time
wondering why the keywords are in that order and mentally translating
it to a more traditional order than appreciating the esthetics.

As Henry Spencer wrote in "The Ten Commandments for C Programmers",
<http://www.lysator.liu.se/c/ten-commandments.html>:

... thy creativity is better used in solving problems than in
creating beautiful new impediments to understanding.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 8 '06 #100

Similar topics