By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,678 Members | 1,144 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,678 IT Pros & Developers. It's quick & easy.

On what does size of data types depend?

P: n/a
Hi all,

I am using gcc compiler in linux.I compiled a small program
int main()
{
printf("char : %d\n",sizeof(char));
printf("unsigned char : %d\n",sizeof(unsigned char));
printf("short : %d\n",sizeof(short));
printf("unsigned short : %d\n",sizeof(unsigned short));
printf("int : %d\n",sizeof(int));
printf("unsigned int : %d\n",sizeof(unsigned int));
printf("long : %d\n",sizeof(long));
printf("unsigned long : %d\n",sizeof(unsigned long));
printf("long long : %d\n",sizeof(long long));
printf("unsigned long long : %d\n",sizeof(unsigned long
long));
}

Result was

char : 1
unsigned char : 1
short : 2
unsigned short : 2
int : 4
unsigned int : 4
long : 4
unsigned long : 4
long long : 8
unsigned long long : 8
What i want to know is what will be the effect if i use int in
place of long in applications running on linux and also on what factors
does the size of datatypes depend.

Thanks
Sunil.

Nov 15 '05 #1
Share this Question
Share on Google+
35 Replies


P: n/a
In general, the code you right should not depend upon the size of a
type on a platform.

The size of a type depends upon the processor architecture and the
compiler vendor. For example, a 16 bit MCU migth consider an int to be
2 bytes and a long to be 4 bytes.

On a 64 bit processor, a compiler vendor may decide to use 8 bytes to
represent an int.

--
EventStudio System Designer 2.5 - http://www.EventHelix.com/EventStudio
System Design with Sequence Diagrams in PDF and Word EMF

Nov 15 '05 #2

P: n/a
Sunil wrote:
Hi all,

I am using gcc compiler in linux.I compiled a small program
int main()
{
printf("char : %d\n",sizeof(char));
printf("unsigned char : %d\n",sizeof(unsigned char));
printf("short : %d\n",sizeof(short));
printf("unsigned short : %d\n",sizeof(unsigned short));
printf("int : %d\n",sizeof(int));
printf("unsigned int : %d\n",sizeof(unsigned int));
printf("long : %d\n",sizeof(long));
printf("unsigned long : %d\n",sizeof(unsigned long));
printf("long long : %d\n",sizeof(long long));
printf("unsigned long long : %d\n",sizeof(unsigned long
long));
}
On some systems I have used, the output would claim
that all the types are of size zero. Hint: What type of
value does "%d" expect, and what type of value does sizeof
produce?

You've also neglected to #include <stdio.h> and to
return a value from main() -- the latter is all right in
C99 but not in C90, and is doubtful practice in any case.
You're using gcc, which can produce a lot of helpful
diagnostics if you ask for them: ask for them.
Result was

char : 1
unsigned char : 1
short : 2
unsigned short : 2
int : 4
unsigned int : 4
long : 4
unsigned long : 4
long long : 8
unsigned long long : 8

What i want to know is what will be the effect if i use int in
place of long in applications running on linux and also on what factors
does the size of datatypes depend.


On your system and using your compiler with your choice
of compile-time options, `long' and `int' appear to have the
same size. This suggests that they probably also have the
same range, but you'd need to display the actual values from
<limits.h> to be completely certain.

But even if they have the same size, the same range, and
the same representation, they remain distinct types. You can
prove this to yourself with a small test program:

int func(void);
long (*fptr)(void) = func;

You may wonder why "A difference that makes no difference"
is, in this case, truly a difference. The answer is that the
fact that `int' and `long' look the same on your machine under
current conditions does not mean that they look the same on all
machines, or even on your machine with different compiler flags.
If you intend never to move your program to another machine,
never to upgrade your current machine, never to move to 64-bit
Linux, and never to change compilers, then you can ignore the
distinction between `int' and `long' (except in situations like
that I've shown above, where the compiler is required to be
more scrupulous than you).

On the other hand, if you think that your current system
and all its software might not be the very last one you ever
want to use, you should not pretend that `int' and `long' are
the same thing. Three useful references:

The Tenth Commandment
http://www.lysator.liu.se/c/ten-commandments.html

FAQ Question 1.1
http://www.eskimo.com/~scs/C-faq/top.html

IAQ Question 1.1
http://www.plethora.net/~seebs/faqs/c-iaq.html
(be sure to read the footnotes)

--
Eric Sosman
es*****@acm-dot-org.invalid
Nov 15 '05 #3

P: n/a

Sunil wrote:
What i want to know is what will be the effect if i use int in
place of long in applications running on linux and also on what factors
does the size of datatypes depend.


The size depends on the implementation. Incidentally, the size is
measured in units of the space occupied by a char, which is not
guaranteed to take 8 bits of space, though it often does.

In most cases, the exact amount of space something takes should not
concern the programmer: use char when talking about characters,
unsigned char when talking about raw memory, short or unsigned short
when space savings is important, signed or unsigned char for even more
savings, long and unsigned long when dealing with large integers, long
long and unsigned long long when the integers may be really long, and
int and unsigned int when you want to use whatever is the most
`natural' integer representation in the implementation (i.e. int or
unsigned int ought to be the one you use unless there is reason to
deviate). Use the signed types if you think of them as integers, use
the unsigned types if you treat them as bit patterns, or need to use
the extra range on the large positive end, or the logic of the program
is such that blindly converting negative numbers to large positive
integers is the `right thing' to do!

The C standard does guarantee some minimum range of values for each of
these types: look at those, and decide when you want space savings
versus when your integers may become large in magnitude in interpreting
the last paragraph. But don't apply the rules blindly ... experience
teaches you what is likely to be the best data type. C99 also lets you
more fine grained control over integral types: look at them. In rare
cases, bitfields might also be useful.

Do not gratuitously put in `linux' dependencies: and not all linux
platforms will have the exact same behaviour anyway. Even if they are
the same size, int and long are not interchangeable, a program will
often become `incorrect' if you change ints to longs without changing
anything else. Evn though this makes the behaviour undefined, in
practice, on current implementations, it is not likely to create a
difference except in warnings from the compiler. But why take the
chance?

Note that sizeof gives you the space occupied in memory: and it is
possible for an implementation to not effectively use all the space, so
use the macros like CHAR_MAX if you need the exact ranges. It may also
not use the same representation in memory for different types of the
same size (For example, there is no bar to using little endian for ints
and big endian for the longs, as long as the compiler takes care to do
the bit manipulations properly; no implementation I know of does that
yet). The implementation can further require different alignments for
the two types (thus, it might decide on a 2 byte alignment for ints and
4 byte alignment for longs: planning on using 16 bit bus operations for
ints and 32 bit operations for longs; again I know of no implementation
that does that).

In short, you are asking questions which you `should' not be. C is
trying to provide a level of abstraction: a division of labour between
the programmer and the implementation. The programmer describes what
he/she wants and some low level things like whether space or range is
more important, the implementation takes care of the hardware and makes
the program run according to the specification. The standard provides
the language for unambiguous communication. Your questions are
crossing the border, violating one of the raisons de etre of high level
langauges.

Sure, there are occasions when you need to know your precise hardware
and how your implementation maps your code to it. The phrasing of your
question seems to suggest you are not in that situation, though.

Nov 15 '05 #4

P: n/a
Thanks for good info.

Nov 15 '05 #5

P: n/a
"Sunil" <su***********@gmail.com> writes:
Hi all,

I am using gcc compiler in linux.I compiled a small program
int main()
{
printf("char : %d\n",sizeof(char));
printf("unsigned char : %d\n",sizeof(unsigned char));
printf("short : %d\n",sizeof(short));


<SNIP>

This brings to mind something that I have wondered about.

I often see advice elsewhere, and in other peoples programs,
suggesting hiding all C "fundamental" types behind typedefs such as

typedef char CHAR;
typedef int INT32;
typedef unsigned int UINT32;
typedef char* PCHAR;

The theory is that application code which always uses these typedefs
will be more likely to run on multiple systems (provided the typedefs
are changed of course).

I used to do this. Then I found out that C99 defined things like
"uint32_t", so I started using these versions instead. But after
following this group for a while I now find even these ugly and don't
use them unless unavoidable.

What do people here think is best?

--

John Devereux
Nov 15 '05 #6

P: n/a
John Devereux <jd******@THISdevereux.me.uk> writes:
"Sunil" <su***********@gmail.com> writes:
Hi all,

I am using gcc compiler in linux.I compiled a small program
int main()
{
printf("char : %d\n",sizeof(char));
printf("unsigned char : %d\n",sizeof(unsigned char));
printf("short : %d\n",sizeof(short));


<SNIP>

This brings to mind something that I have wondered about.

I often see advice elsewhere, and in other peoples programs,
suggesting hiding all C "fundamental" types behind typedefs such as

typedef char CHAR;
typedef int INT32;
typedef unsigned int UINT32;
typedef char* PCHAR;

The theory is that application code which always uses these typedefs
will be more likely to run on multiple systems (provided the typedefs
are changed of course).

I used to do this. Then I found out that C99 defined things like
"uint32_t", so I started using these versions instead. But after
following this group for a while I now find even these ugly and don't
use them unless unavoidable.

What do people here think is best?


Depends on whether the code really needs to depend on the size of its
variables.

--
Lowell Gilbert, embedded/networking software engineer
http://be-well.ilk.org/~lowell/
Nov 15 '05 #7

P: n/a

John Devereux wrote:
I used to do this. Then I found out that C99 defined things like
"uint32_t", so I started using these versions instead. But after
following this group for a while I now find even these ugly and don't
use them unless unavoidable.

What do people here think is best?


Good coding style can rarely be encapsulated into simple rules! I
suggest that these C99 features be used in favour of pragmas (like `I
want this thing to be really fast') or making unwarranted assumptions
like unsigned int can hold 32 bits. If I can get away by just using
long instead of int, though, I do it in preference to using the more
precise specifications.

The idea is that even though we can avoid them, we often do not write a
strictly conforming code because a fast algorithm using bit
manipulations might be available if we made assumptions about the exact
number of bits, and conditioning everything on limits.h may be
unnecessary for the project at hand. Or, we may be tempted to use
compiler flags to guarantee speed or space savings. If the C99
integral type definitions solve the problem (by, at worst, detecting
problem at compile time), use them in preference to silent breakage
when code is ported.

Most often, I find no particular reason to use them, and I do not.

Nov 15 '05 #8

P: n/a
Eric Sosman wrote:
Sunil wrote:
Hi all,

I am using gcc compiler in linux.I compiled a small program
int main()
{
printf("char : %d\n",sizeof(char));
printf("unsigned char : %d\n",sizeof(unsigned char));
printf("short : %d\n",sizeof(short));
printf("unsigned short : %d\n",sizeof(unsigned short));
printf("int : %d\n",sizeof(int));
printf("unsigned int : %d\n",sizeof(unsigned int));
printf("long : %d\n",sizeof(long));
printf("unsigned long : %d\n",sizeof(unsigned long));
printf("long long : %d\n",sizeof(long long));
printf("unsigned long long : %d\n",sizeof(unsigned long
long));
}

On some systems I have used, the output would claim
that all the types are of size zero. Hint: What type of
value does "%d" expect, and what type of value does sizeof
produce?

<snip>
What exactly *is* the format specifier for size_t in C90? C99 has "%zu",
but is (say) "%lu" guaranteed to work?

S.
Nov 15 '05 #9

P: n/a
Lowell Gilbert <lg******@be-well.ilk.org> writes:
John Devereux <jd******@THISdevereux.me.uk> writes:
This brings to mind something that I have wondered about.

I often see advice elsewhere, and in other peoples programs,
suggesting hiding all C "fundamental" types behind typedefs such as

typedef char CHAR;
typedef int INT32;
typedef unsigned int UINT32;
typedef char* PCHAR;

The theory is that application code which always uses these typedefs
will be more likely to run on multiple systems (provided the typedefs
are changed of course).

I used to do this. Then I found out that C99 defined things like
"uint32_t", so I started using these versions instead. But after
following this group for a while I now find even these ugly and don't
use them unless unavoidable.

What do people here think is best?


Depends on whether the code really needs to depend on the size of its
variables.


What I am asking is whether one should habitually use them *just in
case* something breaks when run on another platform. I have seen
programs where int, long etc. are *never used* except in a "types.h"
header.

--

John Devereux
Nov 15 '05 #10

P: n/a


Skarmander wrote On 10/05/05 10:30,:
Eric Sosman wrote:
Sunil wrote:
printf("char : %d\n",sizeof(char));
[...]


On some systems I have used, the output would claim
that all the types are of size zero. Hint: What type of
value does "%d" expect, and what type of value does sizeof
produce?

<snip>
What exactly *is* the format specifier for size_t in C90? C99 has "%zu",
but is (say) "%lu" guaranteed to work?


The usual C90 way is

printf ("size = %lu\n", (unsigned long)sizeof(Type));

This works because size_t must be an unsigned integer type,
C90 has only four such types, and unsigned long can handle
all the values of any of the four.

In C99 the number of unsigned integer types is much
larger, and varies from one implementation to another. The
widest unsigned integer type is uintmax_t, so one could
write (using another C99-invented length modifier)

printf ("size = %ju\n", (uintmax_t)sizeof(Type));

I do not know for sure why the committee decided to
invent the "z" width modifier, but two motivations seem
plausible:

- That silly cast is a pain, and since other length
modifiers were already being invented it was easy
to introduce a new one for size_t.

- On small machines size_t might be as narrow as 16
bits, while uintmax_t must be at least 64 bits.
Working with 64-bit values might require a multi-
precision software library that would otherwise
not be needed. The "z" modifier lets one avoid
using uintmax_t, and might allow the implementation
to exclude the unnecessary library (recall systems
that tried to omit software floating-point support
when they thought the program wouldn't use it.)

--
Er*********@sun.com

Nov 15 '05 #11

P: n/a
Eric Sosman wrote:

Skarmander wrote On 10/05/05 10:30,:
Eric Sosman wrote:

Sunil wrote:
printf("char : %d\n",sizeof(char));
[...]

On some systems I have used, the output would claim
that all the types are of size zero. Hint: What type of
value does "%d" expect, and what type of value does sizeof
produce?

<snip>
What exactly *is* the format specifier for size_t in C90? C99 has "%zu",
but is (say) "%lu" guaranteed to work?

The usual C90 way is

printf ("size = %lu\n", (unsigned long)sizeof(Type));

This works because size_t must be an unsigned integer type,
C90 has only four such types, and unsigned long can handle
all the values of any of the four.

<snip> I do not know for sure why the committee decided to
invent the "z" width modifier, but two motivations seem
plausible:

<snip>

I'm guessing because it offers a valuable abstraction. The argument by
elimination one has to apply for C90 is shaky: a future implementation
that uses 64-bit size_t's but 32-bit longs (and 64-bit long longs,
presumably) will find its format specifiers outdated.

After all, there's a reason size_t wasn't just defined as "unsigned
long", and the format specifiers should allow for it.

S.
Nov 15 '05 #12

P: n/a
Skarmander wrote:
Eric Sosman wrote:
<snip>
I do not know for sure why the committee decided to
invent the "z" width modifier, but two motivations seem
plausible:


<snip>

I'm guessing because it offers a valuable abstraction. The argument by
elimination one has to apply for C90 is shaky: a future implementation
that uses 64-bit size_t's but 32-bit longs (and 64-bit long longs,
presumably) will find its format specifiers outdated.


The point is, C90 and C99 are different languages, and correct code on
one is not necessarily correct code on the other. Pick one of the two,
and set up your compiler options to match that choice.

If you are writing C90 code, there cannot be any integer type larger
than unsigned long. Therefore, casting size_t to unsigned long must
preserve the correct value. Any implementation that has 64-bit size_t
but 32-bit long DOES NOT CONFORM TO C90.

If you are writing for C99, then you should be using the %zu specifier
and not trying to cast to unsigned long.

--
Simon.
Nov 15 '05 #13

P: n/a
Simon Biber wrote:
Skarmander wrote:
Eric Sosman wrote:
<snip>
I do not know for sure why the committee decided to
invent the "z" width modifier, but two motivations seem
plausible:

<snip>

I'm guessing because it offers a valuable abstraction. The argument by
elimination one has to apply for C90 is shaky: a future implementation
that uses 64-bit size_t's but 32-bit longs (and 64-bit long longs,
presumably) will find its format specifiers outdated.

The point is, C90 and C99 are different languages, and correct code on
one is not necessarily correct code on the other. Pick one of the two,
and set up your compiler options to match that choice.

If you are writing C90 code, there cannot be any integer type larger
than unsigned long. Therefore, casting size_t to unsigned long must
preserve the correct value. Any implementation that has 64-bit size_t
but 32-bit long DOES NOT CONFORM TO C90.

Oh, that's a good point. An implementation wouldn't be allowed to do
that in C90 mode even if it could.

That is, I think. The standard *is* worded in such a way that makes it
impossible for size_t to be an integral type different from unsigned
char, short, int, long, right? It'll say something like "the integral
types are such and such" and "size_t must be an unsigned integral type",
so that size_t is always convertible to an unsigned long without loss.
If you are writing for C99, then you should be using the %zu specifier
and not trying to cast to unsigned long.

Yes, but that wasn't exactly the point. The question was why C99 added
it in the first place. And in my opinion, this was to settle the matter
once and for all. Had C99 not added "%zu", then C99 would be subject to
the same problem, possibly limiting the implementation artifically. C90
needs %lu, C99 would have needed %llu, etc. (Not that I imagine that
many successors to C99 which will boost the ranges of integral types,
but you get the point.)

The comparison here is not between C90 and C99, but between C99 and its
hypothetical successor. The "just use the specifier for the biggest
integer in the language" approach is not stable (and not clean), and the
simple introduction of a new specifier to cover the abstraction solves
this issue now and forever.

S.
Nov 15 '05 #14

P: n/a
John Devereux <jd******@THISdevereux.me.uk> writes:
"Sunil" <su***********@gmail.com> writes:
Hi all,

I am using gcc compiler in linux.I compiled a small program
int main()
{
printf("char : %d\n",sizeof(char));
printf("unsigned char : %d\n",sizeof(unsigned char));
printf("short : %d\n",sizeof(short));


<SNIP>

This brings to mind something that I have wondered about.

I often see advice elsewhere, and in other peoples programs,
suggesting hiding all C "fundamental" types behind typedefs such as

typedef char CHAR;
typedef int INT32;
typedef unsigned int UINT32;
typedef char* PCHAR;

The theory is that application code which always uses these typedefs
will be more likely to run on multiple systems (provided the typedefs
are changed of course).

I used to do this. Then I found out that C99 defined things like
"uint32_t", so I started using these versions instead. But after
following this group for a while I now find even these ugly and don't
use them unless unavoidable.


Of the typedefs above, I'd have to say that CHAR and PCHAR are utterly
useless. Presumably there's never any reason to define CHAR as
anything other than char, or PCHAR as anything other than char*. If
so, just use char and char* directly, so the reader doesn't have to
wonder if CHAR and PCHAR have been defined properly. If not, the
names CHAR and PCHAR are misleading.

As for INT32 and UINT32, of course those definitions will have to be
changed for systems where int and unsigned int are something other
than 32 bits. C99, as you've seen, provides int32_t and uint32_t in
<stdint.h>. If you don't have a C99 compiler, you can define them
yourself. Doug Gwyn has written a public domain implementation of
some of the new C99 headers for use with C90; see
<http://www.lysator.liu.se/c/q8/>.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 15 '05 #15

P: n/a
Skarmander a écrit :
What exactly *is* the format specifier for size_t in C90? C99 has "%zu",
but is (say) "%lu" guaranteed to work?


Yes, with (unsigned long).
Nov 15 '05 #16

P: n/a
In article <11**********************@g14g2000cwa.googlegroups .com>,
"Sunil" <su***********@gmail.com> wrote:
What i want to know is what will be the effect if i use int in
place of long in applications running on linux and also on what factors
does the size of datatypes depend.


int is guaranteed to be capable of holding values in the range -32767 to
+32767, nothing else. You can use int of your data is in that range.

long is guaranteed to be capable of holding values in the range from
about -2,000,000,000 to +2,000,000,000. Use long if your data can be
outside the range guaranteed to be available for int, but not outside
the larger range.

Size of datatypes depends on whatever the compiler writer thought was a
good idea. Don't be surprised if sizeof (long) or sizeof (void *) is
greater than four.
Nov 15 '05 #17

P: n/a
In article <87************@cordelia.devereux.me.uk>,
John Devereux <jd******@THISdevereux.me.uk> wrote:
I often see advice elsewhere, and in other peoples programs,
suggesting hiding all C "fundamental" types behind typedefs such as

typedef char CHAR;
typedef int INT32;
typedef unsigned int UINT32;
typedef char* PCHAR;

The theory is that application code which always uses these typedefs
will be more likely to run on multiple systems (provided the typedefs
are changed of course).

I used to do this. Then I found out that C99 defined things like
"uint32_t", so I started using these versions instead. But after
following this group for a while I now find even these ugly and don't
use them unless unavoidable.


typedef char CHAR; and typedef char* PCHAR; is just plain stupid.

"int", "long" etc. , properly used, is the best way to code. However,
they are often not properly used, and there will be lots of trouble
because of that when 64 bit systems become more widely available. If you
use a typedef like INT32 or uint32_t, at least I know what assumptions
you made.
Nov 15 '05 #18

P: n/a
In article <43***********************@news.xs4all.nl>,
Skarmander <in*****@dontmailme.com> wrote:
Eric Sosman wrote:
Sunil wrote:
Hi all,

I am using gcc compiler in linux.I compiled a small program
int main()
{
printf("char : %d\n",sizeof(char));
printf("unsigned char : %d\n",sizeof(unsigned char));
printf("short : %d\n",sizeof(short));
printf("unsigned short : %d\n",sizeof(unsigned short));
printf("int : %d\n",sizeof(int));
printf("unsigned int : %d\n",sizeof(unsigned int));
printf("long : %d\n",sizeof(long));
printf("unsigned long : %d\n",sizeof(unsigned long));
printf("long long : %d\n",sizeof(long long));
printf("unsigned long long : %d\n",sizeof(unsigned long
long));
}

On some systems I have used, the output would claim
that all the types are of size zero. Hint: What type of
value does "%d" expect, and what type of value does sizeof
produce?

<snip>
What exactly *is* the format specifier for size_t in C90? C99 has "%zu",
but is (say) "%lu" guaranteed to work?


printf("short: %lu\n", (unsigned long) sizeof(short));

will work as long as a short is fewer than four billion bytes :-)

(I remember seeing a bug while a program was being ported: A function
took an argument of type long, and the value passed was "- sizeof
(short)". The function received a value of 65534 which was a bit
unexpected. And yes, the compiler was right. )
Nov 15 '05 #19

P: n/a
"Christian Bau" <ch***********@cbau.freeserve.co.uk> wrote in message
news:ch*********************************@slb-newsm1.svr.pol.co.uk...
....
printf("short: %lu\n", (unsigned long) sizeof(short));

will work as long as a short is fewer than four billion bytes :-)
Correct :)
(I remember seeing a bug while a program was being ported: A function
took an argument of type long, and the value passed was "- sizeof
(short)". The function received a value of 65534 which was a bit
unexpected. And yes, the compiler was right. )


C is wonderful in this respect. Perhaps because of this Java AFAIK has no
unsigned types.

Alex
Nov 15 '05 #20

P: n/a
Alexei A. Frounze wrote:
<snip>
(I remember seeing a bug while a program was being ported: A function
took an argument of type long, and the value passed was "- sizeof
(short)". The function received a value of 65534 which was a bit
unexpected. And yes, the compiler was right. )

C is wonderful in this respect. Perhaps because of this Java AFAIK has no
unsigned types.

Which, incidentally, is a spectacular misfeature when they then call the
8-bit signed type "byte". Either you keep promoting things back and
forth to integers, or you use two's complement (which Java conveniently
guarantees). Either way it's annoying. Consistency isn't everything. :-)

S.
Nov 15 '05 #21

P: n/a
"Skarmander" <in*****@dontmailme.com> wrote in message
news:43***********************@news.xs4all.nl...
Alexei A. Frounze wrote:
(I remember seeing a bug while a program was being ported: A function
took an argument of type long, and the value passed was "- sizeof
(short)". The function received a value of 65534 which was a bit
unexpected. And yes, the compiler was right. )


C is wonderful in this respect. Perhaps because of this Java AFAIK has no unsigned types.

Which, incidentally, is a spectacular misfeature when they then call the
8-bit signed type "byte". Either you keep promoting things back and
forth to integers, or you use two's complement (which Java conveniently
guarantees). Either way it's annoying. Consistency isn't everything. :-)


But you know, there are different kinds and levels of consistency. In
certain places I'd like C behave more like in math (e.g. singed vs unsigned,
promotions and related things), more humane and straightforward (e.g. the
way the type of a variable in the declaration/definition is specified), etc
etc. I'm not saying Java or C is definetely better, no. Each has its good
sides and bad sides and there's always a room for an improvmenet, not
necessarily big or very important, but good enough to be considered and
desired...

Alex
Nov 15 '05 #22

P: n/a


Alexei A. Frounze wrote On 10/05/05 18:30,:
"Christian Bau" <ch***********@cbau.freeserve.co.uk> wrote in message
news:ch*********************************@slb-newsm1.svr.pol.co.uk...
...
printf("short: %lu\n", (unsigned long) sizeof(short));

will work as long as a short is fewer than four billion bytes :-)

Correct :)

(I remember seeing a bug while a program was being ported: A function
took an argument of type long, and the value passed was "- sizeof
(short)". The function received a value of 65534 which was a bit
unexpected. And yes, the compiler was right. )

C is wonderful in this respect. Perhaps because of this Java AFAIK has no
unsigned types.


Java has two unsigned types (one of which might be
better termed "signless"). IMHO, it would be better if
it had three.

--
Er*********@sun.com

Nov 15 '05 #23

P: n/a
On 05 Oct 2005 14:14:03 +0100, John Devereux
<jd******@THISdevereux.me.uk> wrote in comp.lang.c:
"Sunil" <su***********@gmail.com> writes:
Hi all,

I am using gcc compiler in linux.I compiled a small program
int main()
{
printf("char : %d\n",sizeof(char));
printf("unsigned char : %d\n",sizeof(unsigned char));
printf("short : %d\n",sizeof(short));
<SNIP>

This brings to mind something that I have wondered about.

I often see advice elsewhere, and in other peoples programs,
suggesting hiding all C "fundamental" types behind typedefs such as

typedef char CHAR;
typedef int INT32;
typedef unsigned int UINT32;


The first one is useless, the second two are worse than useless, they
are dangerous, because on another machine int might have only 16 bits
and INT32 might need to be a signed long.
typedef char* PCHAR;
This is more dangerous yes, never typedef a pointer this way. At
least not if the pointer will ever be dereferenced using that alias.
The theory is that application code which always uses these typedefs
will be more likely to run on multiple systems (provided the typedefs
are changed of course).
More than theory, very real fact.
I used to do this. Then I found out that C99 defined things like
"uint32_t", so I started using these versions instead. But after
following this group for a while I now find even these ugly and don't
use them unless unavoidable.
Nobody says you have to care about portability if you don't want to.
That's between you, your bosses, and your users. If you are writing a
program for your own use, the only one you ever have to answer to is
yourself.

On the other hand, both UNIXy and Windows platforms are having the
same problems with the transition from 32 to 64 bits that they had
moving from 16 to 32 bits, if perhaps not quite so extreme.

For more than a decade, the natural integer type and native machine
word on Windows has been called a DWORD, and on 64 bit Windows the
native machine word is going to be a QWORD.
What do people here think is best?


On one embedded project we had CAN communications between the main
processor, a 32-bit ARM, and slave processors that were 16/32 bit
DSPs.

The only types that were identical between the two were signed and
unsigned short, and signed and unsigned long. In fact, here are the
different integer types for the two platforms:

'plain' char unsigned 8-bit signed 16-bit
signed char signed 8-bit signed 16-bit
unsigned char unsigned 8-bit unsigned 16-bit
signed short signed 16-bit signed 16-bit
unsigned short unsigned 16-bit signed 16-bit
signed int signed 32-bit signed 16-bit
unsigned int unsigned 32-bit unsigned 16-bit
signed long signed 32-bit signed 32-bit
unsigned long unsigned 32-bit unsigned 32-bit

Both processors had hardware alignment requirements. The 32-bit
processor can only access 16-bit data at an even address and 32-bit
data on an address divisible by four. The penalty for misaligned
access is a hardware trap. The DSP only addresses memory in 16-bit
words, so there is no misalignment possible for anything but long, and
they had to be aligned on an even address (32-bit alignment). The
penalty for misaligned access is just wrong data (read), or
overwriting the wrong addresses (write).

Now the drivers for the CAN controller hardware are completely
off-topic here, but the end result on both systems is two 32-bit words
in memory containing the 0 to 8 octets (0 to 64 bits) of packet data.
These octets can represent any quantity of 8-bit, signed or unsigned
16-bit, or 32-bit data values that can fit in 64 bits, and have any
alignment.

So your mission, Mr. Phelps, if you decide to accept it, is to write
code that will run on both processors despite their different
character sizes and alignment requirements, that can use a format
specifier to parse 1 to 8 octets into the proper types with the proper
values.

The code I wrote runs on both processors with no modifications. And I
couldn't even use 'uint8_t', since the DSP doesn't have an 8-bit type.
I used 'uint_least8_t' instead.

As for the C99 choice of type definitions like 'unit8_t' and so on,
they are not the best I have ever seen, but they are also far from the
worst. And they have the advantage of being in a C standard, so with
a little luck they will eventually edge out all the others.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Nov 15 '05 #24

P: n/a
Jack Klein <ja*******@spamcop.net> writes:
On 05 Oct 2005 14:14:03 +0100, John Devereux
<jd******@THISdevereux.me.uk> wrote in comp.lang.c:
This brings to mind something that I have wondered about.

I often see advice elsewhere, and in other peoples programs,
suggesting hiding all C "fundamental" types behind typedefs such as

typedef char CHAR;
typedef int INT32;
typedef unsigned int UINT32;
The first one is useless, the second two are worse than useless, they
are dangerous, because on another machine int might have only 16 bits
and INT32 might need to be a signed long.


Perhaps I was not clear; the typedefs go in a single "portability"
header file and are specific to the machine. E.g.

#ifdef __X86
typedef short int INT16;
....
#endif
#ifdef __AVR
typedef int INT16;
....
#endif

(made up examples)

It should be understood that this file will need to be changed for
each new machine, but that hopefully nothing else will. By using
UINT32 etc thoughout, nothing needs to change except this one file.
typedef char* PCHAR;

This is more dangerous yes, never typedef a pointer this way. At
least not if the pointer will ever be dereferenced using that alias.
The theory is that application code which always uses these typedefs
will be more likely to run on multiple systems (provided the typedefs
are changed of course).
More than theory, very real fact.


So that would make them a good thing? Sorry if I miss the point; you
seem to be saying they are "worse than useless" but do improve
portability?
I used to do this. Then I found out that C99 defined things like
"uint32_t", so I started using these versions instead. But after
following this group for a while I now find even these ugly and don't
use them unless unavoidable.


Nobody says you have to care about portability if you don't want to.
That's between you, your bosses, and your users. If you are writing a
program for your own use, the only one you ever have to answer to is
yourself.


I don't really care about portability to the extent sometimes apparent
on CLC. For example, I am quite happy to restrict myself to twos
complement machines. However the idea of writing code the right way
once, rather than the wrong way lots of times, does appeal! I am
starting to see real productivity benefits from my attempts to do this
in my work.
On the other hand, both UNIXy and Windows platforms are having the
same problems with the transition from 32 to 64 bits that they had
moving from 16 to 32 bits, if perhaps not quite so extreme.

For more than a decade, the natural integer type and native machine
word on Windows has been called a DWORD, and on 64 bit Windows the
native machine word is going to be a QWORD.


I had to write a fairly simple windows program last week, and it was
horrible. All those WORDS, DWORDS, LPCSTR, HPARAMS, LPARAMS etc. I
think that experience was what prompted my post.
What do people here think is best?


On one embedded project we had CAN communications between the main
processor, a 32-bit ARM, and slave processors that were 16/32 bit
DSPs.

The only types that were identical between the two were signed and
unsigned short, and signed and unsigned long. In fact, here are the
different integer types for the two platforms:

'plain' char unsigned 8-bit signed 16-bit
signed char signed 8-bit signed 16-bit
unsigned char unsigned 8-bit unsigned 16-bit
signed short signed 16-bit signed 16-bit
unsigned short unsigned 16-bit signed 16-bit
signed int signed 32-bit signed 16-bit
unsigned int unsigned 32-bit unsigned 16-bit
signed long signed 32-bit signed 32-bit
unsigned long unsigned 32-bit unsigned 32-bit

Both processors had hardware alignment requirements. The 32-bit
processor can only access 16-bit data at an even address and 32-bit
data on an address divisible by four. The penalty for misaligned
access is a hardware trap. The DSP only addresses memory in 16-bit
words, so there is no misalignment possible for anything but long, and
they had to be aligned on an even address (32-bit alignment). The
penalty for misaligned access is just wrong data (read), or
overwriting the wrong addresses (write).

Now the drivers for the CAN controller hardware are completely
off-topic here, but the end result on both systems is two 32-bit words
in memory containing the 0 to 8 octets (0 to 64 bits) of packet data.
These octets can represent any quantity of 8-bit, signed or unsigned
16-bit, or 32-bit data values that can fit in 64 bits, and have any
alignment.

So your mission, Mr. Phelps, if you decide to accept it, is to write
code that will run on both processors despite their different
character sizes and alignment requirements, that can use a format
specifier to parse 1 to 8 octets into the proper types with the proper
values.

The code I wrote runs on both processors with no modifications. And I
couldn't even use 'uint8_t', since the DSP doesn't have an 8-bit type.
I used 'uint_least8_t' instead.

As for the C99 choice of type definitions like 'unit8_t' and so on,
they are not the best I have ever seen, but they are also far from the
worst. And they have the advantage of being in a C standard, so with
a little luck they will eventually edge out all the others.


Thanks for the detailed discussion. I have been working with slightly
similar programming tasks recently, implementing modbus on PC and two
embedded systems. I must be getting better; the generic modbus code I
wrote for the (8 bit) AVR system did compile and run fine on the 32
bit ARM system.

--

John Devereux
Nov 15 '05 #25

P: n/a


John Devereux wrote On 10/07/05 05:40,:

Perhaps I was not clear; the typedefs go in a single "portability"
header file and are specific to the machine. E.g.

#ifdef __X86
typedef short int INT16;
...
#endif
#ifdef __AVR
typedef int INT16;
...
#endif

(made up examples)

It should be understood that this file will need to be changed for
each new machine, but that hopefully nothing else will. By using
UINT32 etc thoughout, nothing needs to change except this one file.


IMHO it's preferable to base such tests on the actual
characteristics of the implementation and not on the name
of one of its constituent parts:

#include <limits.h>
#if INT_MAX == 32767
typedef int INT16;
#elif SHRT_MAX == 32767
typedef short INT16;
#else
#error "DeathStation 2000 not supported"
#endif

This inflicts <limits.h> on every module that includes
the portability header, but that seems a benign side-
effect.

--
Er*********@sun.com

Nov 15 '05 #26

P: n/a
Eric Sosman <er*********@sun.com> writes:
John Devereux wrote On 10/07/05 05:40,:

Perhaps I was not clear; the typedefs go in a single "portability"
header file and are specific to the machine. E.g.

#ifdef __X86
typedef short int INT16;
...
#endif
#ifdef __AVR
typedef int INT16;
...
#endif

(made up examples)

It should be understood that this file will need to be changed for
each new machine, but that hopefully nothing else will. By using
UINT32 etc thoughout, nothing needs to change except this one file.


IMHO it's preferable to base such tests on the actual
characteristics of the implementation and not on the name
of one of its constituent parts:

#include <limits.h>
#if INT_MAX == 32767
typedef int INT16;
#elif SHRT_MAX == 32767
typedef short INT16;
#else
#error "DeathStation 2000 not supported"
#endif

This inflicts <limits.h> on every module that includes
the portability header, but that seems a benign side-
effect.


That does seem much better. Why did I not think of that?

--

John Devereux
Nov 15 '05 #27

P: n/a
In article <di**********@news1brm.Central.Sun.COM>,
Eric Sosman <er*********@sun.com> wrote:
IMHO it's preferable to base such tests on the actual
characteristics of the implementation and not on the name
of one of its constituent parts: #include <limits.h>
#if INT_MAX == 32767
typedef int INT16;
#elif SHRT_MAX == 32767
typedef short INT16;
#else
#error "DeathStation 2000 not supported"
#endif


An implementation is not required to use the entire arithmetic space
possible with its hardware. In theory, INT_MAX == 32767 could
happen on (say) an 18 bit machine.
--
Watch for our new, improved .signatures -- Wittier! Profounder! and
with less than 2 grams of Trite!
Nov 15 '05 #28

P: n/a
John Devereux wrote:
Eric Sosman <er*********@sun.com> writes:

John Devereux wrote On 10/07/05 05:40,:
Perhaps I was not clear; the typedefs go in a single "portability"
header file and are specific to the machine. E.g.

#ifdef __X86
typedef short int INT16;
...
#endif
#ifdef __AVR
typedef int INT16;
...
#endif

(made up examples)

It should be understood that this file will need to be changed for
each new machine, but that hopefully nothing else will. By using
UINT32 etc thoughout, nothing needs to change except this one file.


IMHO it's preferable to base such tests on the actual
characteristics of the implementation and not on the name
of one of its constituent parts:

#include <limits.h>
#if INT_MAX == 32767
typedef int INT16;
#elif SHRT_MAX == 32767
typedef short INT16;
#else
#error "DeathStation 2000 not supported"
#endif

This inflicts <limits.h> on every module that includes
the portability header, but that seems a benign side-
effect.

That does seem much better. Why did I not think of that?

Possibly because when you've got system dependencies, there tend to be
more of them than the size of the data types. So it's very common to get
stuff like

everything.h:
#ifdef __FOONLY
typedef short INT16;
#define HAVE_ALLOCA 1
#define HCF __asm__("hcf")
#define TTY_SUPPORTS_CALLIGRAPHY 1
#include <foonlib.h>
...etc...

In fact, the ever-popular GNU autoconf does this, except that it takes
care of all the tests and writes just one header with the appropriate
defines.

S.
Nov 15 '05 #29

P: n/a


Walter Roberson wrote On 10/07/05 11:16,:
In article <di**********@news1brm.Central.Sun.COM>,
Eric Sosman <er*********@sun.com> wrote:
IMHO it's preferable to base such tests on the actual
characteristics of the implementation and not on the name
of one of its constituent parts:


#include <limits.h>
#if INT_MAX == 32767
typedef int INT16;
#elif SHRT_MAX == 32767
typedef short INT16;
#else
#error "DeathStation 2000 not supported"
#endif

An implementation is not required to use the entire arithmetic space
possible with its hardware. In theory, INT_MAX == 32767 could
happen on (say) an 18 bit machine.


Adjust the tests appropriately for the semantics
you desire for "INT16". As shown they're appropriate
for an "exact" type (which is a pretty silly thing to
ask for in a signed integer; sorry for the bad example).
If you want "fastest," change == to >=. If you want
"at least," change == to >= and test short before int.
If you want some other semantics, test accordingly.

It is not possible to test in this way for every
possible characteristic somebody might want to ask
about -- there's no Standard macro or other indicator
to say what happens on integer overflow, for example.
Still, I believe tests that *can* be made portably
*should* be made portably, and as broadly as possible.
Testing the name of the compiler or of the host machine
is not broad; it's the opposite. Test them if you must,
but test more portably if you can.

--
Er*********@sun.com

Nov 15 '05 #30

P: n/a
John Devereux <jd******@THISdevereux.me.uk> writes:
Jack Klein <ja*******@spamcop.net> writes:
On 05 Oct 2005 14:14:03 +0100, John Devereux
<jd******@THISdevereux.me.uk> wrote in comp.lang.c:
>
> This brings to mind something that I have wondered about.
>
> I often see advice elsewhere, and in other peoples programs,
> suggesting hiding all C "fundamental" types behind typedefs such as
>
> typedef char CHAR;
> typedef int INT32;
> typedef unsigned int UINT32;


The first one is useless, the second two are worse than useless, they
are dangerous, because on another machine int might have only 16 bits
and INT32 might need to be a signed long.


Perhaps I was not clear; the typedefs go in a single "portability"
header file and are specific to the machine. E.g.

#ifdef __X86
typedef short int INT16;
...
#endif
#ifdef __AVR
typedef int INT16;
...
#endif

(made up examples)

It should be understood that this file will need to be changed for
each new machine, but that hopefully nothing else will. By using
UINT32 etc thoughout, nothing needs to change except this one file.


Given that the definitions change for each platform (and assuming that
you always get it right), the INT16 and UIN32 typedefs are reasonable.
Since C99 defines similar typedefs in <stdint.h>, and since it also
distinguishes among exact-width, minimum-width, and fastest types,
you'd probably be better of using <stdint.h> if it's available, or
using a C90-compatible version of it if it's not (see
<http://www.lysator.liu.se/c/q8/>). (I can't connect to that site at
the moment.)

But the typedefs CHAR (for char) and PCHAR (for char*) are either
utterly useless or dangerously misleading. If you want type char, use
char; if you want a pointer to char, use char*. There's no point in
hiding these types behind typedefs that won't change from one platform
to another. And if they are going to change, they should be called
something other than CHAR and PCHAR.
> typedef char* PCHAR;

This is more dangerous yes, never typedef a pointer this way. At
least not if the pointer will ever be dereferenced using that alias.
> The theory is that application code which always uses these typedefs
> will be more likely to run on multiple systems (provided the typedefs
> are changed of course).


More than theory, very real fact.


So that would make them a good thing? Sorry if I miss the point; you
seem to be saying they are "worse than useless" but do improve
portability?


I'm not sure what Jack Klein meant here, but I doubt that he meant
that CHAR and PCHAR are useful.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 15 '05 #31

P: n/a
In article <ln************@nuthaus.mib.org>
Keith Thompson <ks***@mib.org> wrote:
... But the typedefs CHAR (for char) and PCHAR (for char*) are either
utterly useless or dangerously misleading. If you want type char, use
char; if you want a pointer to char, use char*. There's no point in
hiding these types behind typedefs that won't change from one platform
to another. And if they are going to change, they should be called
something other than CHAR and PCHAR.


Indeed. The whole point to "creating a type" (which typedef fails
to do, but that is another problem entirely) is to obtain abstraction:
"moving up a level" in a problem, making irrelevant detail go away
so that you work only with relevant detail. "Pointer to char" is
no more abstract than C's raw "char *": what irrelevant detail has
been removed?
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 15 '05 #32

P: n/a
In article <87************@cordelia.devereux.me.uk>,
John Devereux <jd******@THISdevereux.me.uk> wrote:
I had to write a fairly simple windows program last week, and it was
horrible. All those WORDS, DWORDS, LPCSTR, HPARAMS, LPARAMS etc. I
think that experience was what prompted my post.


I can feel your pain. I don't mind things like UINT32; it seems to be
quite self-explanatory. I have a real problem with "WORD" and "DWORD"
which is used in Windows programs a lot: "WORD" is defined as a 16 bit
type and DWORD as a 32 bit type, which means that on your average
Pentium or Athlon processor a WORD is a halfword and a DWORD is a word,
whereas on a 64 bit processor a WORD is a quarterword and a DWORD is a
halfword - in other words, these typenames are complete nonsense.

And LPCSTR - "Long Pointer to C String". For heavens sake, what is a
"long pointer"?
Nov 15 '05 #33

P: n/a
Christian Bau wrote:
In article <87************@cordelia.devereux.me.uk>,
John Devereux <jd******@THISdevereux.me.uk> wrote:

I had to write a fairly simple windows program last week, and it was
horrible. All those WORDS, DWORDS, LPCSTR, HPARAMS, LPARAMS etc. I
think that experience was what prompted my post.

I can feel your pain. I don't mind things like UINT32; it seems to be
quite self-explanatory. I have a real problem with "WORD" and "DWORD"
which is used in Windows programs a lot: "WORD" is defined as a 16 bit
type and DWORD as a 32 bit type, which means that on your average
Pentium or Athlon processor a WORD is a halfword and a DWORD is a word,
whereas on a 64 bit processor a WORD is a quarterword and a DWORD is a
halfword - in other words, these typenames are complete nonsense.

And LPCSTR - "Long Pointer to C String". For heavens sake, what is a
"long pointer"?


No, LPCSTR is Hungarian abracadabra for "long pointer to *constant*
string". These days, it's the same thing as a regular pointer, and
"LPCSTR" is the same thing as "PCSTR", which, however, is almost never
used for hysterical reasons.

But back when Windows 3.0 roamed the earth, the 8086 segmented memory
model meant Windows too made the difference between "far" and "near"
pointers (calling them "long" and, well, nothing pointers for
consistency), depending on whether a pointer was constrained by the 64K
range of a segment or not.

The problem is that Microsoft tried to abstract away from actual data
types and mostly got it wrong; the abstraction wasn't and code that went
from 16 to 32 bits still broke happily -- though that wasn't Microsoft's
fault, they didn't help matters either.

They had an idea that might have been worthwhile, didn't stop to think
whether it was feasible and went on to implement it in a half-assed way,
yielding the current mess. You see, char* is typedef'ed to PCCHAR (yes,
"pointer to C char", not "constant char" -- const char* has no typedef),
to PSZ ("pointer to string that's zero-terminated", of course), then
char is typedef'ed to CHAR (huh?) and CHAR* is in turn typedef'ed to
PCHAR, LPCH, PCH, NPSTR, LPSTR and PSTR!

The semantic differences these are intended to convey is lost on the
vast majority of Windows programmers out there, and no small wonder too.
Of course the C compiler doesn't give a rat's ass about these fancy
typedefs, which means any "errors" in using them go undetected, except
by people who are fluent in this make-belief type system.

S.
Nov 15 '05 #34

P: n/a
Christian Bau <ch***********@cbau.freeserve.co.uk> writes:
I have a real problem with "WORD" and "DWORD"
which is used in Windows programs a lot: "WORD" is defined as a 16 bit
type and DWORD as a 32 bit type, which means that on your average
Pentium or Athlon processor a WORD is a halfword and a DWORD is a word,
whereas on a 64 bit processor a WORD is a quarterword and a DWORD is a
halfword - in other words, these typenames are complete nonsense.

And LPCSTR - "Long Pointer to C String". For heavens sake, what is a
"long pointer"?


I suppose you do realize that these names refer to the types that
they do for historical reasons? That's not to say that they
aren't deceptive, but there was some sense behind them at the
time they were invented.
--
"When I have to rely on inadequacy, I prefer it to be my own."
--Richard Heathfield
Nov 15 '05 #35

P: n/a
In article <hf********************************@4ax.com>,
Jack Klein <ja*******@spamcop.net> wrote:
On 05 Oct 2005 14:14:03 +0100, John Devereux
<jd******@THISdevereux.me.uk> wrote in comp.lang.c:
I often see advice elsewhere, and in other peoples programs,
suggesting hiding all C "fundamental" types behind typedefs such as

typedef char CHAR;
typedef char* PCHAR;


This is more dangerous yes, never typedef a pointer this way. At
least not if the pointer will ever be dereferenced using that alias.


What exactly is the danger you are alluding to here?
Nov 15 '05 #36

This discussion thread is closed

Replies have been disabled for this discussion.