473,473 Members | 2,169 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Size of char on a 64 bit machine

How is a character stored in a word aligned machine? Assuming on 64bit
machine, 1 byte is reserved for a char, is it the case that only 1
byte is used to store the character and the rest 7 bytes are wasted,
or my assumption is wrong? If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.
Nov 14 '05 #1
11 11824
aruna wrote:
How is a character stored in a word aligned machine?
Depends on the machine.
Assuming on 64bit
machine, 1 byte is reserved for a char, is it the case that only 1
byte is used to store the character and the rest 7 bytes are wasted,
or my assumption is wrong?
Your assumption is probably wrong, but it depends on the machine.
If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.


That depends on the machine.

Essentially, usually, but not necessarily: Characters in an array are
usually stored sequentially without any padding in between. There can
be no padding in unsigned char.

These are not questions about C, but about C implementations.

--
Thomas.

Nov 14 '05 #2
aruna wrote:

How is a character stored in a word aligned machine? Assuming on 64bit
machine, 1 byte is reserved for a char, is it the case that only 1
byte is used to store the character and the rest 7 bytes are wasted,
or my assumption is wrong? If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.


These questions cannot be answered by the C language
itself, but only by the particular implementations of it.
Different implementations will do things differently, and
their different choices will lead to different answers.

--
Er*********@sun.com
Nov 14 '05 #3
"aruna" <ar********@yahoo.co.in> wrote in message
news:a2**************************@posting.google.c om...
How is a character stored in a word aligned machine?
You mean a machine where all variables, regardless of size, must be stored
with 8-byte alignment? I'm not aware of such a machine, but if one exists
your assumption would be correct.

The usual case is that variables must be aligned to their own size, e.g. a
char requires only 1-byte alignment, but a 32-bit int requires 4-byte
alignment. Depending on how your variables end up laid out in memory, it's
possible several may end up in the same 64-bit word.
Assuming on 64bit
machine, 1 byte is reserved for a char, is it the case that only 1
byte is used to store the character and the rest 7 bytes are wasted,
or my assumption is wrong? If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.


Modern processors have the same latency for all loads up to the size of a
single cache line. However, there is a large performance benefit in having
multiple variables in each cache line because it reduces cache misses.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin

Nov 14 '05 #4
"Stephen Sprunk" <st*****@sprunk.org> writes:
"aruna" <ar********@yahoo.co.in> wrote in message
news:a2**************************@posting.google.c om...
How is a character stored in a word aligned machine?


You mean a machine where all variables, regardless of size, must be stored
with 8-byte alignment? I'm not aware of such a machine [...]


This description seems consistent with descriptions I've seen of
the Cray's architecture, but I've never used a Cray and don't
know any of the details.
--
Ben Pfaff
email: bl*@cs.stanford.edu
web: http://benpfaff.org
Nov 14 '05 #5
Thomas Stegen <ts*****@cis.strath.ac.uk> writes:
[...]
Essentially, usually, but not necessarily: Characters in an array are
usually stored sequentially without any padding in between. There can
be no padding in unsigned char.


Actually, that's required by the standard. There can be padding
between members of a structure, but not between elements of an array.
Given an array object arr, the number of elements can be computed by
sizeof arr / sizeof arr[0]
Padding would break that.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
Nov 14 '05 #6
ar********@yahoo.co.in (aruna) writes:
How is a character stored in a word aligned machine? Assuming on 64bit
machine, 1 byte is reserved for a char, is it the case that only 1
byte is used to store the character and the rest 7 bytes are wasted,
or my assumption is wrong? If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.


The C standard doesn't say much about alignment or padding, beyond
allowing it to exist.

The required alignment for a given type can be no greater than the
size of the type. If an implementation supports 8-bit chars, it must
be able to represent an array of char by storing one char in each
byte. On the other hand, the implementation can add padding after a
standalone object or struct member.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
Nov 14 '05 #7
ar********@yahoo.co.in (aruna) wrote in message news:<a2**************************@posting.google. com>...
How is a character stored in a word aligned machine?
Within a single byte.
Assuming on 64bit machine, 1 byte is reserved for a char,
No need for the assumption. The C language defines what a byte is
(within the context of the language) and all character types are 1
byte in size on a conforming implementation. [CHAR_BIT (the number of
bits within a byte) can vary depending on implementation.]
is it the case that only 1 byte is used to store the character and the rest
7 bytes are wasted,
From the programmer perspective, the size of an object is the 'sizeof'
an object. An array of N elements of objects of size T will be N*T
bytes. Structures can have padding bytes, so a character member
followed by an int may well 'waste' 7 padding bytes for the purposes
of alignment, but this is nothing new and padding is not limited to
64-bit machines.
or my assumption is wrong? If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.


Depends on the implementation. I believe the old Crays (at least) used
64-bit words and had no direct octet addressing. I also believe the
implementors of C compilers mimiced 8-bit bytes by storing the 0..7
octet offset of a word address in the high (unused) 3 bits of address
pointers.

Naturally, this would come at an efficiency cost for character
manipulation, but the alternative of trying to create a hosted
implementation (and subsequent programs) where UCHAR_MAX > INT_MAX
wasn't desirable. [Although I think it was due largely to memory
issues in days of yore. Today, there are certainly 32-bit
implementations where characters are 32-bit.]

All that said, the internal specifics where effectively hidden from
the programmers by the implementors.

If you want more detail on implementation specifics, this is not the
right forum as clc deals with the _virtual_ C machine.

--
Peter
Nov 14 '05 #8
Ben Pfaff <bl*@cs.stanford.edu> writes:
"Stephen Sprunk" <st*****@sprunk.org> writes:
"aruna" <ar********@yahoo.co.in> wrote in message
news:a2**************************@posting.google.c om...
How is a character stored in a word aligned machine?


You mean a machine where all variables, regardless of size, must be stored
with 8-byte alignment? I'm not aware of such a machine [...]


This description seems consistent with descriptions I've seen of
the Cray's architecture, but I've never used a Cray and don't
know any of the details.


On a Cray vector machine, like the SV1, there are no machine-level
instructions to access quantities smaller than 64 bits, but the C
compiler uses CHAR_BIT==8 for compatibility with other systems. It
makes accessing character data less efficient, but that's not really
what the machine is for.

Based on the results of a couple of small test programs, standalone
variables of type char are stored on word boundaries, but struct
members and array elements of type char are packed into 8-bit bytes.
I'm not sure why standalone variables are word-aligned. As far as I
know, accessing an 8-bit quantity on a word boundary is no cheaper
than accessing an 8-bit quantity in the middle of a word. I was
thinking for some operations (such as when the value is promoted to
int) it can just grab the entire word, but it's a big-endian machine,
so that wouldn't work (storing 0xff in a char object and then
accessing the word containing it yields 0xff00000000000000).

In any case, the semantics of arrays are such that an implementation
cannot *require* an alignment boundary large than the size of a type
(there can be no gaps between array elements), but it can use a larger
alignment if it's convenient (or even if the compiler writer was in an
odd mood that day).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
Nov 14 '05 #9
In article <2a******************************@news.teranews.co m> "Stephen Sprunk" <st*****@sprunk.org> writes:
"aruna" <ar********@yahoo.co.in> wrote in message
news:a2**************************@posting.google.c om...
How is a character stored in a word aligned machine?


You mean a machine where all variables, regardless of size, must be stored
with 8-byte alignment? I'm not aware of such a machine, but if one exists
your assumption would be correct.


Cray 1 to YMP.
Assuming on 64bit
machine, 1 byte is reserved for a char, is it the case that only 1
byte is used to store the character and the rest 7 bytes are wasted,
or my assumption is wrong? If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.


Modern processors have the same latency for all loads up to the size of a
single cache line. However, there is a large performance benefit in having
multiple variables in each cache line because it reduces cache misses.


As those Cray's do not have caches, this is irrelevant for them.

As a variable a char is stored in a (64 bit) word, in an array 8 chars
are packed in a word. There are some performance issues for arrays, but
not as much as you would think. For variables the load time is
irrespective of the type (except that loading/storing a 128-bit double
takes one cycle more, but all operations on those things are in software).
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Nov 14 '05 #10
In article <63**************************@posting.google.com > ai***@acay.com.au (Peter Nilsson) writes:
....
Depends on the implementation. I believe the old Crays (at least) used
64-bit words and had no direct octet addressing. I also believe the
implementors of C compilers mimiced 8-bit bytes by storing the 0..7
octet offset of a word address in the high (unused) 3 bits of address
pointers.
Actually in the upper 16 bits (there was a reason for that...).
Naturally, this would come at an efficiency cost for character
manipulation, but the alternative of trying to create a hosted
implementation (and subsequent programs) where UCHAR_MAX > INT_MAX
wasn't desirable. [Although I think it was due largely to memory
issues in days of yore. Today, there are certainly 32-bit
implementations where characters are 32-bit.]
I think not. Cray had quite some experience with handling characters
on those machines. The compilers (for instance) where *extremely*
fast. The last time I have seen a Fortran routine of 1200 lines
(nearly no comments) compiled with full optimisation in a few
milliseconds.
All that said, the internal specifics where effectively hidden from
the programmers by the implementors.


As long as you followed the standard. Casting a char pointer to an
int pointer and back would in many cases change the pointer. Assuming
that the low order bit of a long* would be 0 would result in problems
(seen in the Bourne shell and derivatives). All kinds of behaviour
that is undefined according to the standard would indeed give different
behaviour on that machine.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Nov 14 '05 #11
In <a2**************************@posting.google.com > ar********@yahoo.co.in (aruna) writes:
How is a character stored in a word aligned machine? Assuming on 64bit
machine, 1 byte is reserved for a char, is it the case that only 1
byte is used to store the character and the rest 7 bytes are wasted,
or my assumption is wrong?
It is wrong, due to the special properties of the type unsigned char: it
can be used to examine the representation of any other type. Therefore,
this type cannot, by definition, have "wasted" bits (they are called
padding bits in the C99 standard).

So, possible sizes of char on a 64-bit machine are: 8, 16, 32 and 64-bit.
If the size is less than 64-bit, sizeof word > 1 and multiple chars
can be stored in a word (the word can be aliased with an array of char).

There is only one known architecture with 64-bit word addressing (no
octet-based addressing) where C was implemented: the Cray vector
processor used in the old Cray supercomputers. char is an 8-bit type
on that particular platform.
If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.


Because the machine uses word addressing, char pointers need to store more
data than all other pointers (the address or position of the byte inside
the word). There are two ways of storing this additional information:
in the low bits, which optimises char pointer arithmetic, but requires
additional operations when the pointer is dereferenced, or in the upper
bits, which simplifies pointer dereferencing (the higher bits are
ignored, as the address space is only 48-bit) but complicates char
pointer arithmetic. I believe both ways have been uses in different
implementations. Either way, after retrieving the word containing the
char, the char itself has to be extracted from the word, and this takes
some additional shifting and masking, so char access is slower. Not
much of a problem in practice, as these machines were not intended for
intensive character manipulations, but as number crunchers.

The other, more common, 64-bit architectures use octet-based addressing
and things are no different from the more common 32-bit architectures.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: Ellarco | last post by:
``Opaque-pointer representing the ID of an object. struct _objID; typedef struct _objID * objectID;'' Hi again. Im using an api that defines an objectID type. The above represents the extent...
11
by: Bill Cunningham | last post by:
In fread, the type of the function is the typedef size_t. I want to rewrite a program that read binary data of mp3s. int main(){ printf("Enter name of file-> "); char name; fflush(stdout); FILE...
79
by: syntax | last post by:
what is the size of a pointer? suppose i am writing, datatype *ptr; sizeof(ptr); now what does this sizeof(ptr) will give? will it give the size of the
15
by: puzzlecracker | last post by:
Got Confused on the interview with memory alligment questions... PLEASE HELP -- How much bytes of memory will structs below take on 32 bit machine? What about 64 bit machine? Why is it different?...
53
by: Neo | last post by:
Hi All, Is that true that size of a byte not necessarily 8-bit? What the std. says? If that true, then what will the size of an int, i mean what sizeof(int) should return? On my machine...
19
by: junky_fellow | last post by:
Can the size of pointer variables of different type may be different on a particular architecture. For eg. Can the sizeof (char *) be different from sizeof(int *) or sizeof (void *) ? What...
35
by: Sunil | last post by:
Hi all, I am using gcc compiler in linux.I compiled a small program int main() { printf("char : %d\n",sizeof(char)); printf("unsigned char : ...
14
by: Agoston Bejo | last post by:
Hi, sorry about the multiple posting, technical difficulties.... ----- What does exactly the size of the int datatype depends in C++? Recenlty I've heard that it depends on the machine's...
11
by: subramanian100in | last post by:
Given that the sizes of pointers to different data types(built-in or structures) can be different, though malloc returns a void *, it is assigned to any pointer type. The language allows it. From...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
1
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.