Abstraction layer between C and CPU

Hello,

From spending some time in clc, I've come to realize that C's model of

the CPU can be totally different from the atual CPU.

Is it safe to say that almost nothing can be gleaned about physical CPU
behaviour from C level behaviour.

For example:

- do the addresses returned by & have to have (by Standard) any direct
relationship to real addresses?

-- if the address of 2 objects is different by 'x' in C (when using &
operator), are they so in the hardware?

- do the elements of an array usually end up being placed side by side
in most implementations (does the standard require this) ?
how about multidimensional arrays?

-- I've seen code where two or more arrays would be declared side by
side and then the first array with extended indexing would be used to
access the elements of the second/more array. Does this suggest that C
guarantees that declared variables of the same storage class are placed
in ascending order in memory?

int a[10], b[10], c[20];
int i;

for(i = 0; i < 40; i++)
{
a[i] = 0; /* zero initializes all three arrays */
}

- I read in an article once (can't find it now) that a "byte" in C
doesn't necessarily have to be an octet of bits at the hardware level

Any help would be appreciated.
(Are there any links on the net that point to details of C's
abstraction layer? I can't seem to find any. I guess these details
are woven through the standards documents, but I'm talking about a
single cohesive document)

Nov 14 '05 #1

Subscribe Post Reply

2173

Peter Nilsson

Luke Wu wrote:

From spending some time in clc, I've come to realize that C's
model of the CPU can be totally different from the atual CPU.
C doesn't model a CPU, it models an abstract machine.
Is it safe to say that almost nothing can be gleaned about
physical CPU behaviour from C level behaviour.
True. The whole point of high level languages is to avoid
dealing with low level implementation details.
For example: ...
Your examples are really questions on how implementations might
work. Comp.lang.c is not the place for such questions since the
standard merely supplies semantics. How those semantics are
actually implemented is not specified.[*]

Personally, I think your questions, whilst naturally curious,
are nonetheless dangerous. I've seen countless examples of
newbie programmers who try to analyse C semantics from things
like disassemblies, only to develop false conclusions.

When code based on such false conclusions is ported to other
machines, it can often lead to bugs which are difficult to
diagnose and debug.
...
- I read in an article once (can't find it now) that a "byte"
in C doesn't necessarily have to be an octet of bits at the
hardware level.
Correct. Some architectures are incapable of addressing octets.
Any help would be appreciated.
(Are there any links on the net that point to details of C's
abstraction layer?
The standards _are_ the abstraction layer. Your questions are
about realisations of that abstraction.
I can't seem to find any. I guess these details
are woven through the standards documents, but I'm talking
about a single cohesive document)

You should perhaps look at compiler writing books.
[*] Of course, the standard authors are quite mindful of what
can be implemented efficiently on various existing and future
architectures.

--
Peter

Nov 14 '05 #2

Andrey Tarasevich

Luke Wu wrote:

...
From spending some time in clc, I've come to realize that C's model of the CPU can be totally different from the atual CPU.

From purely abstract theoretical point of view: yes, of course it can be.
Is it safe to say that almost nothing can be gleaned about physical CPU
behaviour from C level behaviour.
That's correct.
For example:

- do the addresses returned by & have to have (by Standard) any direct
relationship to real addresses?
If by "real addresses" you mean machine addresses, then no, they don't
have to have any relationship.
-- if the address of 2 objects is different by 'x' in C (when using &
operator), are they so in the hardware?
I don't exactly understand what you mean by "different by 'x'". By 'x'
what? "Bytes" in C sense of the word? Machine bytes? Difference returned
by binary '-' operator?
- do the elements of an array usually end up being placed side by side
in most implementations (does the standard require this) ?
Yes, they do. This means that any padding present between the elements
of the array is part of the element itself, not something added
specifically by the array object. This follows from the fact that in C

sizeof(array) = sizeof(element) * number_of_elements

This is required by the standard.
how about multidimensional arrays?
Multidimensional arrays in C are just arrays of arrays, which means that
the above applies to them as well. Arrays cannot "insert" extra padding
between elements.
-- I've seen code where two or more arrays would be declared side by
side and then the first array with extended indexing would be used to
access the elements of the second/more array.
Does this suggest that C
guarantees that declared variables of the same storage class are placed
in ascending order in memory?
No, there's no such guarantee. Such access is completely illegal in C.
The behavior is undefined.
- I read in an article once (can't find it now) that a "byte" in C
doesn't necessarily have to be an octet of bits at the hardware level
That's true. "Byte" in C (C-byte) is essentially synonymous with 'char'
type. '[unsigned|signed] char' objects in C always consist of 1 C-byte
by definition. A C-byte might consist of any number of machine bytes,
which means that the number of bits in C-byte might be different from 8
(could be 16, for example).
(Are there any links on the net that point to details of C's
abstraction layer? I can't seem to find any. I guess these details
are woven through the standards documents, but I'm talking about a
single cohesive document)

C99 standard has a number of sections specifically dedicated to these
issues.

--
Best regards,
Andrey Tarasevich

Nov 14 '05 #3

Mike Wahler

"Luke Wu" <Lo***********@gmail.com> wrote in message
news:11*********************@f14g2000cwb.googlegro ups.com...

Hello,
From spending some time in clc, I've come to realize that C's model of the CPU

C doesn't really model 'a CPU' it defines an 'abstract machine',
and doesn't directly refer to a "CPU" component.
can be totally different from the atual CPU.

Is it safe to say that almost nothing can be gleaned about physical CPU
behaviour from C level behaviour.
Not almost nothing, but nothing. However, by examining an
assembly listing which many compilers can emit, one can
glean some platform-specific information. But then you're
outside the realm of C.

For example:

- do the addresses returned by & have to have (by Standard) any direct
relationship to real addresses?
No. This is especially true for platforms which feature
'virtual memory' and/or separate 'process spaces' as in
e.g. Microsoft Windows.

-- if the address of 2 objects is different by 'x' in C (when using &
operator), are they so in the hardware?
Not necessarily. Also note that the addresses of two separate
objects will not necessarily reflect their relationship in
source code. e.g.:

int i;
int j;

the address of 'j' need not be greater than address of 'i'
nor is their difference guaranteed to be sizeof(int).
(the only time this *is* guaranteed is when the
objects are adjacent elements (the subscript of one
is one more or less than the subscript of the other)
of the same array).

(but the adddresses of two separate objects are always
guaranteed to be different)

- do the elements of an array usually
Not usually, but always.
end up being placed side by side
At contiguous addresses (as reported by the & operator),
whose difference is sizeof(array's element type).

int array[2];

&array[1] is guaranteed to be exactly
sizeof(int) larger than &array[0];
in most implementations
All conforming implementations.
(does the standard require this) ?
Yes.
how about multidimensional arrays?
Yes. "multi-dimensional arrays" in C are really
"arrays of arrays"

for the array:

int arr2d[2][3] = {1, 2, 3, 4, 5, 6};
(sometimes written for clarity as:
int arr2d[2][3] = { {1,2,3}, {4,5,6} };

the values are stored (contiguously) in memory in the
order in which the intializer values appear above. That is:
arr2d[0][0] == 1
arr2d[0][1] == 2
arr2d[0][2] == 3
arr2d[1][0] == 4
arr2d[1][1] == 5
arr2d[1][2] == 6

That is, C arrays are stored in 'row major' order, unlike
some other languages.

-- I've seen code where two or more arrays would be declared side by
side
C has rather 'free' formatting rules, e.g. more than one
declaration or statment can appear on a single line.
int array1[] = {1,2,3}; int array2[] = {4,5,6};

However I recommend against this practice.
and then the first array with extended indexing
What do you mean by 'extended indexing'? C does not define
such a term.
would be used to
access the elements of the second/more array.
Any integral expression whose value when added to the address
of an array's first element is within the bounds of that array
can be used to index into it. The fact that these values might
themselves be stored in an array is of no consequence. As a
matter of fact, some 'convoluted' code could be written in which
array element values are used to index that same array. But imo
this is a rather dangerous practice.

Does this suggest that C
guarantees that declared variables of the same storage class are placed
in ascending order in memory?
No. This is only guaranteed for elements of the same array.

int a[10], b[10], c[20];
This is a valid way to define several objects, but I recommend
one object per line. Easier to read and maintain.
int i;

for(i = 0; i < 40; i++)
{
a[i] = 0; /* zero initializes all three arrays */
NO, NO, NO!

You must process each array individually. Their positions
in memory relative to one another is not specified. Also
note that what you wrote above is *not* intiialization,
but assignment, not the same thing. An object is intitialized
when it is defined:

int a[10] = {1,2,3}; /* first three elements are intialized with
1, 2, and 3, respectively, all others to zero */

FWIW, you can initialize all the elements of an array to zero like this:

int a[10] = {0};

(If this definition appears at file scope, or is qualified
with 'static' at block scope, all elements are initialized to
zero implicitly -- but I like to include the initializer(s)
anyway, for clarity, but that is a 'style' issue).
}

- I read in an article once (can't find it now) that a "byte" in C
doesn't necessarily have to be an octet of bits at the hardware level
Correct. It's simply the 'smallest addressible unit of storage',
which is required to have a minimum size of eight bits, but can
be larger (and often is on certain architectures). From a C
perspective, 'byte' and 'character' are synonymous.

This 'abstraction' is there to make the language as platform
neutral as possible, allowing for implementation on the widest
possible variety of existing architectures as well as those that
have yet to be concieved.

Any help would be appreciated.
(Are there any links on the net that point to details of C's
abstraction layer? I can't seem to find any. I guess these details
are woven through the standards documents, but I'm talking about a
single cohesive document)

This single cohesive document *is* the ISO standard, but I'll be
the first to admit it's not easy to read. What you need are
some books. See www.accu.org for peer reviews.

-Mike

Nov 14 '05 #4

Jens.Toerring

Luke Wu <Lo***********@gmail.com> wrote:

From spending some time in clc, I've come to realize that C's model of the CPU can be totally different from the atual CPU.

Is it safe to say that almost nothing can be gleaned about physical CPU
behaviour from C level behaviour.
That's why there is a standard, i.e. in order to be able to write
programs that _don't_ depend on the specific CPU you are using but
that can be ported easily from one to the next system. Otherwise
you wouldn't have much more that a (high-level) assembler.
For example: - do the addresses returned by & have to have (by Standard) any direct
relationship to real addresses?
No. With many modern operating systems the concept of "real addresses"
(in the sense of physical addresses) don't even make much sense, since
there's what's called "virtually memory", and the mapping between phy-
sical addresses and what a program sees is completely at the discretion
of the operating system. What the program sees as a fixed address can
be mapped to varying physical addresses (or even get written out to swap
space).
-- if the address of 2 objects is different by 'x' in C (when using &
operator), are they so in the hardware?
No - one of the objects could even be in swap space on the disk while
the other is in memory.
- do the elements of an array usually end up being placed side by side
in most implementations (does the standard require this) ?
how about multidimensional arrays?
As long as what the program sees as the addresses are continous in
(virtual) memory everything is fine. But in the sense of physical mem-
mory the elements could be far apart.
-- I've seen code where two or more arrays would be declared side by
side and then the first array with extended indexing would be used to
access the elements of the second/more array. Does this suggest that C
guarantees that declared variables of the same storage class are placed
in ascending order in memory? int a[10], b[10], c[20];
int i; for(i = 0; i < 40; i++)
{
a[i] = 0; /* zero initializes all three arrays */
}
No, you can't rely on that, even if you only care about the "virtual"
addresses. Accessing an array element outside of its defined range
of indices is forbidden and leads to undefined behaviour. That code
may work on a certain platform when compiled with a certain compiler
but there's no guarantee that it works with any other compiler or on
a different platform.
- I read in an article once (can't find it now) that a "byte" in C
doesn't necessarily have to be an octet of bits at the hardware level
There's no "byte" in C. What you have is a char (as the smallest
type), and how many bits a char has on the system you're working on
can be found out from the CHAR_BIT macro from <limits.h>. The only
guarantee you have is that CHAR_BIT is at least 8, i.e. a char has
at least 8 bits - but it can be more.
(Are there any links on the net that point to details of C's
abstraction layer? I can't seem to find any. I guess these details
are woven through the standards documents, but I'm talking about a
single cohesive document)

Most of the things you're asking about you won't find in the standard
because they aren't relevant from a C language point of view. How C
code gets compiled to have the resulting executable work as expected
(i.e. as required by the standard) is due to the people writing the
compiler. The standard does not make any requirements how they use the
CPU they are dealing with to manage this. The C standard is basically
a recipe along the lines of "Given this code as input the resulting
program must behave in the that way", but how this it's achieved (and
with what kind of hardware) isn't relevant.

Regards, Jens
--
\ Jens Thoms Toerring ___ Je***********@physik.fu-berlin.de
\__________________________ http://www.toerring.de

Nov 14 '05 #5

Luke Wu

Thank you for the responses.

I am now getting the 'feel' for C's abstraction away from hardware
details from reading clc posts. I think I'm almost done erasing all
the assumptions that I got into my head from reading books like The C
Companion, by Allen I. Holub.

Nov 14 '05 #6

Andrey Tarasevich

Andrey Tarasevich wrote:

...
- do the elements of an array usually end up being placed side by side
in most implementations (does the standard require this) ?

Yes, they do. This means that any padding present between the elements
of the array is part of the element itself, not something added
specifically by the array object. This follows from the fact that in C

sizeof(array) = sizeof(element) * number_of_elements

This is required by the standard.
how about multidimensional arrays?

Multidimensional arrays in C are just arrays of arrays, which means that
the above applies to them as well. Arrays cannot "insert" extra padding
between elements.
...

Although it is worth noting that the above requirements are still
formulated at language level. Which means that if some compiler by means
of "compiler magic" can satisfy these requirements and at the same time
place array elements out of order/apart from each other in machine
memory, there wouldn't be anything wrong with it.

--
Best regards,
Andrey Tarasevich

Nov 14 '05 #7

Mike Wahler

<Je***********@physik.fu-berlin.de> wrote in message
news:35*************@uni-berlin.de...

Luke Wu <Lo***********@gmail.com> wrote:

- I read in an article once (can't find it now) that a "byte" in C
doesn't necessarily have to be an octet of bits at the hardware level

There's no "byte" in C.

Au contraire.

ISO/IEC 9899:1999 (E)

3.6

1 byte
addressable unit of data storage large enough to hold
any member of the basic character set of the execution
environment

-Mike

Nov 14 '05 #8

E. Robert Tisdale

Mike Wahler wrote:

Jens.Toerring wrote:
Luke Wu wrote:

- I read in an article once (can't find it now) that a "byte" in C
doesn't necessarily have to be an octet of bits at the hardware level

There's no "byte" in C.

Au contraire.

ISO/IEC 9899:1999 (E)

3.6

1 byte
addressable unit of data storage large enough to hold
any member of the basic character set of the execution
environment

Note that a byte is not a data type
but the *size* of a unit of storage.

In practice, a byte is 8 binary digits (bits) almost everywhere
including machines where four characters are normally "packed"
into 32 bit "words".

Nov 14 '05 #9

Mike Wahler

"E. Robert Tisdale" <E.**************@jpl.nasa.gov> wrote in message
news:cs**********@nntp1.jpl.nasa.gov...

Mike Wahler wrote:
Jens.Toerring wrote:
Luke Wu wrote:
- I read in an article once (can't find it now) that a "byte" in C
doesn't necessarily have to be an octet of bits at the hardware level

There's no "byte" in C.

Au contraire.

ISO/IEC 9899:1999 (E)

3.6

1 byte
addressable unit of data storage large enough to hold
any member of the basic character set of the execution
environment

Note that a byte is not a data type

Note that I never claimed that it is.
but the *size* of a unit of storage.

In practice, a byte is 8 binary digits (bits) almost everywhere
Then imo your 'everywhere' is rather limited.
including machines where four characters are normally "packed"
into 32 bit "words".

Note that on some machines a byte is 32 bits.

-Mike

Nov 14 '05 #10

E. Robert Tisdale

Mike Wahler wrote:

E. Robert Tisdale wrote:
Mike Wahler wrote:
Jens.Toerring wrote:

Luke Wu wrote:

>- I read in an article once (can't find it now) that a "byte" in C
>doesn't necessarily have to be an octet of bits at the hardware level

There's no "byte" in C.

Au contraire.

ISO/IEC 9899:1999 (E)

3.6

1 byte
addressable unit of data storage large enough to hold
any member of the basic character set of the execution
environment

Note that a byte is not a data type

Note that I never claimed that it is.

I never claimed that you claimed that it is. :-)

but the *size* of a unit of storage.

In practice, a byte is 8 binary digits (bits) almost everywhere

Then imo your 'everywhere' is rather limited.
including machines where four characters are normally "packed"
into 32 bit "words".

Note that on some machines a byte is 32 bits.

Name ten.

Perspective is important.

I know that you don't mean to imply
that this is a real problem for C programmers.

Most C programmers will never write a single line of code
that will be ported to a processor with 32 bit bytes.

Nov 14 '05 #11

Thomas Stegen

Mike Wahler wrote:

"Luke Wu" <Lo***********@gmail.com> wrote in message
news:11*********************@f14g2000cwb.googlegro ups.com...

- do the addresses returned by & have to have (by Standard) any direct
relationship to real addresses?

No. This is especially true for platforms which feature
'virtual memory' and/or separate 'process spaces' as in
e.g. Microsoft Windows.

Well, there must somewhere be a mapping between pointer
values and actual addresses (even in the abstract machine).
Though there can be several layers of mappings. So there
must be a relationship, but depending on what you mean by
direct, it might not be direct.

But even though this mapping must exist even in the abstract
machine one cannot one cannot portably use this for anything
as there is a) no specified mechanism for doing so and b) it
will be very different between platforms.

--
Thomas.

Nov 14 '05 #12

Thomas Stegen

Mike Wahler wrote:

"E. Robert Tisdale" <E.**************@jpl.nasa.gov> wrote in message
news:cs**********@nntp1.jpl.nasa.gov...

[snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Here in comp.lang.c thw context should be clear to everyone.

--
Thomas.

Nov 14 '05 #13

Jonathan Burd

Thomas Stegen wrote:

Mike Wahler wrote:
"E. Robert Tisdale" <E.**************@jpl.nasa.gov> wrote in message
news:cs**********@nntp1.jpl.nasa.gov...

[snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Here in comp.lang.c thw context should be clear to everyone.

Perhaps, using the term ``octet" for a group of 8 bits would be
much better. A byte may be an octet and is the most basic
addressable unit in an execution environment. Therefore, a byte,
according to this definition, may also be 4 bits.

To reply to the original context, C does not have
a ``byte" data type. In C, a char contains, at least, enough
bits to represent any element of the basic character set.
A char may at least be a byte or higher.

I don't see how you can safely assume a char to contain at least
8 bits. The standard doesn't say so explicitly.

--
"I'm learning to program because then I can write
programs to do my homework faster." - Andy Anfilofieff

Nov 14 '05 #14

Jonathan Burd

Jonathan Burd wrote:

Thomas Stegen wrote:
Mike Wahler wrote:
"E. Robert Tisdale" <E.**************@jpl.nasa.gov> wrote in message
news:cs**********@nntp1.jpl.nasa.gov...

[snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Here in comp.lang.c thw context should be clear to everyone.

Perhaps, using the term ``octet" for a group of 8 bits would be
much better. A byte may be an octet and is the most basic
addressable unit in an execution environment. Therefore, a byte,
according to this definition, may also be 4 bits.

To reply to the original context, C does not have
a ``byte" data type. In C, a char contains, at least, enough
bits to represent any element of the basic character set.
A char may at least be a byte or higher.

Correction: A char must at least be a byte.
I don't see how you can safely assume a char to contain at least
8 bits. The standard doesn't say so explicitly.

--
"I'm learning to program because then I can write
programs to do my homework faster." - Andy Anfilofieff

Nov 14 '05 #15

CBFalconer

Mike Wahler wrote:

.... snip ...
FWIW, you can initialize all the elements of an array to zero
like this:

int a[10] = {0};

(If this definition appears at file scope, or is qualified
with 'static' at block scope, all elements are initialized to
zero implicitly -- but I like to include the initializer(s)
anyway, for clarity, but that is a 'style' issue).

But be aware that, on some systems, this may result in heavy
bloating of the final executable file with long strings of zero
bytes. This has nothing whatsoever to do with the language, but
you should be aware of the possibility.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson

Nov 14 '05 #16

Jonathan Burd

Jonathan Burd wrote:

Thomas Stegen wrote:
Mike Wahler wrote:
"E. Robert Tisdale" <E.**************@jpl.nasa.gov> wrote in message
news:cs**********@nntp1.jpl.nasa.gov...

[snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Here in comp.lang.c thw context should be clear to everyone.

Perhaps, using the term ``octet" for a group of 8 bits would be
much better. A byte may be an octet and is the most basic
addressable unit in an execution environment. Therefore, a byte,
according to this definition, may also be 4 bits.

To reply to the original context, C does not have
a ``byte" data type. In C, a char contains, at least, enough
bits to represent any element of the basic character set.
A char may at least be a byte or higher.

I don't see how you can safely assume a char to contain at least
8 bits. The standard doesn't say so explicitly.

Alright, CHAR_BIT is at least 8 bits. My bad.

Regards,
Jonathan.

--
"I'm learning to program because then I can write
programs to do my homework faster." - Andy Anfilofieff

Nov 14 '05 #17

pete

Jonathan Burd wrote:

A char may at least be a byte or higher.
(sizeof(char) == 1) /* always just exactly one. */
I don't see how you can safely assume a char to contain at least
8 bits. The standard doesn't say so explicitly.

Alright, CHAR_BIT is at least 8 bits. My bad.

--
pete

Nov 14 '05 #18

Chris Croughton

On Fri, 21 Jan 2005 10:11:21 +0000, Thomas Stegen
<th***********@gmail.com> wrote:

Mike Wahler wrote:
"E. Robert Tisdale" <E.**************@jpl.nasa.gov> wrote in message
news:cs**********@nntp1.jpl.nasa.gov... [snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

This was discussed here recently.

Yes, Werner Buchholz at IBM invented the term in 1956, originally just
as a 1 to 6 bit field used for I/O but by the end of the year it had
come to refer to 8 bit quantities. The DEC PDP-11 was a 16 bit machine,
and DEC did use the term byte to refer to half-words of 8 bits (as far
as I know no PDP-11 actually used wrds like 'byte' at all, they couldn't
usually speak <g>). The DEC PDP-10 programmers used 'bytes' to refer to
variable bit fields (from 1 to 36 bits I believe).
It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?
Use "characters" or "chars" to refer to the C entities and "octets" to
refer to the 8 bit quanities, and shun the overloaded term "bytes".
Here in comp.lang.c thw context should be clear to everyone.

It isn't, even those of use who have worked on machines with odd byte
lengths often now use it only ablut 8 bit quantities, because that
represents the vast majority of machines these days (most of the DSP
programmers I know refer to the basic -- and only -- memory units as
"words").

Chris C

Nov 14 '05 #19

Mike Wahler

"E. Robert Tisdale" <E.**************@jpl.nasa.gov> wrote in message
news:cs**********@nntp1.jpl.nasa.gov...

Mike Wahler wrote:
E. Robert Tisdale wrote:
Mike Wahler wrote:

Jens.Toerring wrote:

>Luke Wu wrote:

>>- I read in an article once (can't find it now) that a "byte" in C
>>doesn't necessarily have to be an octet of bits at the hardware level
>
>There's no "byte" in C.

Au contraire.

ISO/IEC 9899:1999 (E)

3.6

1 byte
addressable unit of data storage large enough to hold
any member of the basic character set of the execution
environment

Note that a byte is not a data type
Note that I never claimed that it is.

I never claimed that you claimed that it is. :-)
but the *size* of a unit of storage.

In practice, a byte is 8 binary digits (bits) almost everywhere

Then imo your 'everywhere' is rather limited.
including machines where four characters are normally "packed"
into 32 bit "words".

Note that on some machines a byte is 32 bits.

Name ten.

No need.

Perspective is important.
More important is abstraction and portability.

I know that you don't mean to imply
that this is a real problem for C programmers.
It can be for some.

Most C programmers will never write a single line of code
that will be ported to a processor with 32 bit bytes.

There you go again, with your 'most [insert whatever]'.
You can't have any idea what 'most' C programmers do
or don't do. You can't know who they are, or how many
of them there are.

-Mike

Nov 14 '05 #20

Mike Wahler

"Thomas Stegen" <th***********@gmail.com> wrote in message
news:35*************@individual.net...

Mike Wahler wrote:
"E. Robert Tisdale" <E.**************@jpl.nasa.gov> wrote in message
news:cs**********@nntp1.jpl.nasa.gov...

[snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Here in comp.lang.c thw context should be clear to everyone.

Suggestion for new FAQ, and request for answer:

Q: How do I clear the context?
A: __________________________

-Mike

Nov 14 '05 #21

Mike Wahler

"Jonathan Burd" <jo***********@REMOVEMEgmail.com> wrote in message
news:35*************@individual.net...

Thomas Stegen wrote:
Mike Wahler wrote:
"E. Robert Tisdale" <E.**************@jpl.nasa.gov> wrote in message
news:cs**********@nntp1.jpl.nasa.gov...
[snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Here in comp.lang.c thw context should be clear to everyone.

Perhaps, using the term ``octet" for a group of 8 bits would be
much better. A byte may be an octet and is the most basic
addressable unit in an execution environment. Therefore, a byte,
according to this definition, may also be 4 bits.

A 'C byte' must have at least eight bits.

To reply to the original context, C does not have
a ``byte" data type. In C, a char contains, at least, enough
bits to represent any element of the basic character set.
A char may at least be a byte or higher.
IOW a char must fit in a byte.

I don't see how you can safely assume a char to contain at least
8 bits. The standard doesn't say so explicitly.

I'll let you decide if this is explicit or not:

ISO/IEC 9899:1999 (E)

5.2.4.2.1 Sizes of integer types <limits.h>

1 The values given below shall be replaced by constant expressions
suitable for use in #if preprocessing directives. Moreover, except
for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by
expressions that have the same type as would an expression that
is an object of the corresponding type converted according to the
integer promotions. Their implementation-defined values shall be
equal or greater in magnitude (absolute value) to those shown, with
the same sign.

-- number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8
-Mike

Nov 14 '05 #22

Vinko Vrsalovic

Mike Wahler wrote:

[...regarding char type...]

Correct. It's simply the 'smallest addressible unit of storage',
which is required to have a minimum size of eight bits, but can
be larger (and often is on certain architectures). From a C
perspective, 'byte' and 'character' are synonymous.

What about unsigned char then? Shouldn't it be one bit shorter than
char?

If it is one bit shorter, then I tend to think that char isn't the
'smallest addressible unit of storage', or, if it isn't one bit
shorter, what's the point of having that type?

I'm probably missing some fundamental concept, which I'm hoping you can
clarify for me.

V.

Nov 14 '05 #23

Mike Wahler

"Vinko Vrsalovic" <vi****@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...

Mike Wahler wrote:

[...regarding char type...]
Correct. It's simply the 'smallest addressible unit of storage',
which is required to have a minimum size of eight bits, but can
be larger (and often is on certain architectures). From a C
perspective, 'byte' and 'character' are synonymous.
What about unsigned char then?

All the character types have a size of one (byte) by
definition.
Shouldn't it be one bit shorter than
char?
No.

If it is one bit shorter,
It's not.
then I tend to think that char isn't the
'smallest addressible unit of storage',
A byte is. An object of any of the (3) character types
must fit in a byte.
or, if it isn't one bit
shorter, what's the point of having that type?
For representing unsigned values. One 'side-effect'
is that 'unsigned char' has twice the range as 'signed char'.
(Note that 'plain' char type may be either signed or unsigned,
that's up to the implementation. But 'char', 'signed char',
and 'unsigned char' are treated as three distinct types.)

I'm probably missing some fundamental concept, which I'm hoping you can
clarify for me.

I hope I did.

-Mike

Nov 14 '05 #24

Keith Thompson

"Mike Wahler" <mk******@mkwahler.net> writes:

"Luke Wu" <Lo***********@gmail.com> wrote in message
news:11*********************@f14g2000cwb.googlegro ups.com...

[...]

-- if the address of 2 objects is different by 'x' in C (when using &
operator), are they so in the hardware?

Not necessarily. Also note that the addresses of two separate
objects will not necessarily reflect their relationship in
source code. e.g.:

int i;
int j;

the address of 'j' need not be greater than address of 'i'
nor is their difference guaranteed to be sizeof(int).
(the only time this *is* guaranteed is when the
objects are adjacent elements (the subscript of one
is one more or less than the subscript of the other)
of the same array).

It's actually worse than that. Comparing the addresses of two
distinct objects (using <, <=, >, or >=) invokes undefined behavior
unless the objects are both part of some larger object. (Equality or
inequality comparson is ok.)

[...]

-- I've seen code where two or more arrays would be declared side by
side

C has rather 'free' formatting rules, e.g. more than one
declaration or statment can appear on a single line.
int array1[] = {1,2,3}; int array2[] = {4,5,6};

However I recommend against this practice.

I don't think the OP was asking about the arrangement in the source.

The following:

int a[10], b[10], c[20];

int a[10]; int b[10]; int c[20];

int a[10];
int b[10];
int c[20];

are exactly equivalent as far as the language is concerned. In all
three cases, the language guarantees nothing about the placement of
the three array objects in memory. They might or might not be
adjacent, and they could be in any order. (A compiler could choose a
different layout depending on what the source looks like, but it could
just as easily choose a different layout depending on the phase of the
moon.)

It is possible for a program to find out whether they're contiguous;
for example, if a+10 == b, then a and b are contiguous. But no sane
program should ever take advantage of this. Just treat them as three
distinct objects.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #25

Keith Thompson

"Mike Wahler" <mk******@mkwahler.net> writes:

"Vinko Vrsalovic" <vi****@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com... [...]
or, if it isn't one bit
shorter, what's the point of having that type?

For representing unsigned values. One 'side-effect'
is that 'unsigned char' has twice the range as 'signed char'.

Not typically. On a typical 8-bit two's-complement implementation,
unsigned char has a range of 0..255, and signed char has a range of
-128..+127.
(Note that 'plain' char type may be either signed or unsigned,
that's up to the implementation. But 'char', 'signed char',
and 'unsigned char' are treated as three distinct types.)

Yes, and in fact plain char must have the same characteristics as
either signed char or unsigned char. (Your description left open the
possibility of plain char being an (un)signed type with a different
range than (un)signed char.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #26

Keith Thompson

Chris Croughton <ch***@keristor.net> writes:

On Fri, 21 Jan 2005 10:11:21 +0000, Thomas Stegen
<th***********@gmail.com> wrote:

[...]

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Use "characters" or "chars" to refer to the C entities and "octets" to
refer to the 8 bit quanities, and shun the overloaded term "bytes".
Here in comp.lang.c thw context should be clear to everyone.

It isn't, even those of use who have worked on machines with odd byte
lengths often now use it only ablut 8 bit quantities, because that
represents the vast majority of machines these days (most of the DSP
programmers I know refer to the basic -- and only -- memory units as
"words").

The C standard uses the term "byte" extensively, and it uses it
consistently in the way defined in the standard, never as a
specifically 8-bit quantity. In my opinion, there's no need to shun
the use of the word "byte" in this newsgroup. The meaning is clear in
this context.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #27

Mike Wahler

"Keith Thompson" <ks***@mib.org> wrote in message
news:ln************@nuthaus.mib.org...

"Mike Wahler" <mk******@mkwahler.net> writes:
"Vinko Vrsalovic" <vi****@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com... [...]
or, if it isn't one bit
shorter, what's the point of having that type?

For representing unsigned values. One 'side-effect'
is that 'unsigned char' has twice the range as 'signed char'.

Not typically. On a typical 8-bit two's-complement implementation,
unsigned char has a range of 0..255, and signed char has a range of
-128..+127.

Oops, I meant twice the range of positive values.

(Note that 'plain' char type may be either signed or unsigned,
that's up to the implementation. But 'char', 'signed char',
and 'unsigned char' are treated as three distinct types.)

Yes, and in fact plain char must have the same characteristics as
either signed char or unsigned char. (Your description left open the
possibility of plain char being an (un)signed type with a different
range than (un)signed char.)

That's me, incomplete. :-)

-Mike

Nov 14 '05 #28

Albert van der Horst

In article <35*************@uni-berlin.de>,
<Je***********@physik.fu-berlin.de> wrote:

Luke Wu <Lo***********@gmail.com> wrote:
-- I've seen code where two or more arrays would be declared side by
side and then the first array with extended indexing would be used to
access the elements of the second/more array. Does this suggest that C
guarantees that declared variables of the same storage class are placed
in ascending order in memory?

I have seen that code too: In FORTRAN in the seventies.
Not in C lately.

int a[10], b[10], c[20];
int i;
for(i = 0; i < 40; i++)
{
a[i] = 0; /* zero initializes all three arrays */
}

No, you can't rely on that, even if you only care about the "virtual"
addresses. Accessing an array element outside of its defined range
of indices is forbidden and leads to undefined behaviour. That code
may work on a certain platform when compiled with a certain compiler
but there's no guarantee that it works with any other compiler or on
a different platform.

An interesting compiler would be one that takes care of enforcing
array bounds by using hardware means (e.g. a b and c would each have
their own so-called descriptors in an Intel architecture.)
It could very well be that b[0] has
the same physical address as a[10] would have (so to speak),
but addressing it through the ``b descriptor succeeds, but through the
``a' descriptor fails with a memory fault.
A compiler that takes care to use the correct descriptor would be
perfectly conforming, but would do a good job in killing
unwarranted assumptions or wrong headed conclusions from experiments.

It would be a nice compiler to have at universities.

Regards, Jens
--
\ Jens Thoms Toerring ___ Je***********@physik.fu-berlin.de
\__________________________ http://www.toerring.de

--
Groetjes Albert

--
Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
One man-hour to invent,
One man-week to implement,
One lawyer-year to patent.

Nov 14 '05 #29

Walter Roberson

In article <ib********@spenarnc.xs4all.nl>,
Albert van der Horst <al****@spenarnc.xs4all.nl> wrote:
:An interesting compiler would be one that takes care of enforcing
:array bounds by using hardware means (e.g. a b and c would each have
:their own so-called descriptors in an Intel architecture.)

Ah, like VAX.

It's been more than 20 years, so my memory is probably faulty, but I
seem to recall that DEC had a hard time porting [K&R] C to VAX while
still permitting type-casting. I think I heard that they effectively
ended up turning off all descriptors. Hmmm, when I heard about it then
I didn't think about that, but now I realize that it likely wasn't
really possible to turn off descriptors, since they were
hardware-level. So they perhaps ended up doing something like declaring
all of memory as one large array of bytes. Pretty much the same
solution as on Intel segmented architectures when the 286 or so came
out.
--
Sub-millibarn resolution bio-hyperdimensional plasmatic space
polyimaging is just around the corner. -- Corry Lee Smith

Nov 14 '05 #30

Chris Torek

>In article <ib********@spenarnc.xs4all.nl>,

Albert van der Horst <al****@spenarnc.xs4all.nl> wrote:
:An interesting compiler would be one that takes care of enforcing
:array bounds by using hardware means (e.g. a b and c would each have
:their own so-called descriptors in an Intel architecture.)
In article <cu**********@canopus.cc.umanitoba.ca>
Walter Roberson <ro******@ibd.nrc-cnrc.gc.ca> wrote:Ah, like VAX.

It's been more than 20 years, so my memory is probably faulty, but I
seem to recall that DEC had a hard time porting [K&R] C to VAX while
still permitting type-casting. I think I heard that they effectively
ended up turning off all descriptors. ...

The VAX was a pretty conventional machine. VMS used lots of
descriptors but they were not implemented in hardware. You could
be thinking of the INDEX instruction, perhaps; but it merely did
a range check and multiply, and was tremendously outperformed by
doing a separate range-check-and-multiply on any 11/780 with hardware
multiply, because INDEX always used a microcode loop instead of
using the hardware. (Oops.)

The old Burroughs A-series machines would have been much trickier.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 14 '05 #31

Abstraction layer between C and CPU

Similar topics