size and nomenclature of integral types

Shailesh

One problem I've been wrestling with for a long time is how to use the
C++ integral data types, vis-a-vis their size. The C++ rules
guarantee that a char is at least 1 bytes, a short and int at least 2
bytes, and a long at least 4 bytes. The rules also define a size
precedence to the types. In Stroustrup's book, it says that all type
sizes are multiples of char, "so by definition the size of a char is
1." According to the rules, that means 1 unit which can be any number
of bytes.

Why should I trust the compiler to optimize the memory usage of the
types behind my back? As for portability, wouldn't fixed,
unambiguosly-sized types be much more portable? Doesn't the ambiguity
open the door for me on my system X with compiler Y to rely on its
Z-byte representation of int? And if system-dependent optimization is
desired, wouldn't it be easier to do with fixed-size types instead?

One of my gripes is that the terminology is unclear. New programmers
can be especially confused. For example, 'short' and 'long' are
relative adjectives, and they don't say how big or at least how big.
The other extreme are the names like __int8, __int16, and __int32 in
MSVC++. Wouldn't I be much less likely to use something called __int8
to count numbers over 255, than I would something called char? On the
other hand, these keywords fix the size of the type and allow no room
for the compiler to optimize.

If I could invent new types, I would name them something like:

uint8, uint16, uint24, uintN, ... (unsigned integer types)
sint8, sint16, sint24, sintN, ... (signed integer types)

where N is any multiple of 8 greater than 0 (i.e. arbitrary precision
types would be built-in.) I feel the signed/unsigned aspect is better
part of the keyword, and not separate and optional. The Mozilla
sources are instructive in that their cross-platform code implements
macros following a similar convention; but macros are like pollution.

I'd further have a new keyword like "allowopt", which when placed
after the type keyword grants access to the compiler to optimize the
memory allocation of the type. For example, when I write "uint16
allowopt myCounter;", then I would unambiously be declaring, "Give me
a 16-bit, unsigned, integer called myCounter whose size the compiler
may optimize."

In most compilers, the default setting would be to enable optimization
for all the declarations, and a pragma could turn it off. I have
suspicions about why things are the way they are, but I'd like to hear
the experts' opinions.

Jul 22 '05 #1

Subscribe Post Reply

2279

Ben Measures

Shailesh wrote:

One problem I've been wrestling with for a long time is how to use the
C++ integral data types, vis-a-vis their size. The C++ rules guarantee
that a char is at least 1 bytes, a short and int at least 2 bytes, and a
long at least 4 bytes. The rules also define a size precedence to the
types. In Stroustrup's book, it says that all type sizes are multiples
of char, "so by definition the size of a char is 1." According to the
rules, that means 1 unit which can be any number of bytes.

Why should I trust the compiler to optimize the memory usage of the
types behind my back?
That question can have one of two meanings IMO.

Q1: How do I know the compiler won't introduce an error in its optimisation?
A: You don't. You'll just have to trust its optimisations or disable them.

Q2: How do I know the compiler is coming up with the best, most
optimised code?
A: You don't. If you want assurance, write in assembly code (with many
years experience behind you).

Trivial low-level optimisations like these have miniscule impact
compared to algorithmic optimisations.
As for portability, wouldn't fixed,
unambiguosly-sized types be much more portable? Doesn't the ambiguity
open the door for me on my system X with compiler Y to rely on its
Z-byte representation of int? And if system-dependent optimization is
desired, wouldn't it be easier to do with fixed-size types instead?

The C++ standard specifies the minimum ranges that integral types must
support. If you consider these to be your maximum ranges then your code
will definitely be portable in that respect. (Note though, that the C
standard library provides ways of getting the exact ranges.)

In any case, you should try IMHO to avoid the concepts of sizes in bits
and bytes when programming in C++ and instead think in higher-level
terms of ranges.

--
Ben Measures
Software programming, Internet design/programming, Gaming freak.

http://ben.measures.org.uk - when I find time

Jul 22 '05 #2

Ben Measures

Shailesh wrote:

One problem I've been wrestling with for a long time is how to use the
C++ integral data types, vis-a-vis their size. The C++ rules guarantee
that a char is at least 1 bytes, a short and int at least 2 bytes, and a
long at least 4 bytes. The rules also define a size precedence to the
types. In Stroustrup's book, it says that all type sizes are multiples
of char, "so by definition the size of a char is 1." According to the
rules, that means 1 unit which can be any number of bytes.

Why should I trust the compiler to optimize the memory usage of the
types behind my back?
That question can have one of two meanings IMO.

Q1: How do I know the compiler won't introduce an error in its optimisation?
A: You don't. You'll just have to trust its optimisations or disable them.

Q2: How do I know the compiler is coming up with the best, most
optimised code?
A: You don't. If you want assurance, write in assembly code (with many
years experience behind you).

Trivial low-level optimisations like these have miniscule impact
compared to algorithmic optimisations.
As for portability, wouldn't fixed,
unambiguosly-sized types be much more portable? Doesn't the ambiguity
open the door for me on my system X with compiler Y to rely on its
Z-byte representation of int? And if system-dependent optimization is
desired, wouldn't it be easier to do with fixed-size types instead?

Jul 22 '05 #3

Joe C

"Shailesh" <hu******@hotmail.com> wrote in message
news:CN********************@fe3.columbus.rr.com...

One problem I've been wrestling with for a long time is how to use the
C++ integral data types, vis-a-vis their size.

Try using the standard c (not c++) header:
#include <stdint.h>
it came on board with c99.

This shows the names it allows.
http://www.dinkumware.com/manuals/re...&h=stdint.html

My experience is that this header can be used in c++ programs with no
problems in gcc-based compilers.

Jul 22 '05 #4

Joe C

"Shailesh" <hu******@hotmail.com> wrote in message
news:CN********************@fe3.columbus.rr.com...

One problem I've been wrestling with for a long time is how to use the
C++ integral data types, vis-a-vis their size.

Jul 22 '05 #5

Jack Klein

On Sat, 03 Apr 2004 08:42:10 GMT, Shailesh <hu******@hotmail.com>
wrote in comp.lang.c++:

One problem I've been wrestling with for a long time is how to use the
C++ integral data types, vis-a-vis their size. The C++ rules
guarantee that a char is at least 1 bytes, a short and int at least 2
bytes, and a long at least 4 bytes.
Your statement above is completely incorrect, because you are making
the common mistake of confusing the word "byte" with the word "octet".

In C and C++, the word "byte" is by definition the size of a
character, and is at least 8 bits in width but may be wider. A char
does not contain "at least 1 bytes", it contains exactly one byte,
although that may be larger than one octet. A byte in C and C++ may
have more than 8 bits.

C++ does not guarantee that short and int are at least two bytes,
although they must be at least two octets. Likewise with long.

There are architectures where char contains more than 8 bits, mostly
digital signal processors. On one such DSP, the minimum addressable
unit is 16 bits, that is one byte has 16 bits. The character, short,
and int types all contain 16 bits and their sizeof is 1. Another only
addresses 32 bit quantities. All of the integer types, from char
through long, are 32 bits, and all are exactly one byte.

The rules also define a size
precedence to the types. In Stroustrup's book, it says that all type
sizes are multiples of char, "so by definition the size of a char is
1." According to the rules, that means 1 unit which can be any number
of bytes.
No, the size of a char in C or C++ is exactly one byte, no matter how
many bits it contains. You are still making the common mistake of
assuming that the term "byte" means exactly 8 bits, which it most
certainly does not. Especially in C and C++, where by definition it
does not.
Why should I trust the compiler to optimize the memory usage of the
types behind my back? As for portability, wouldn't fixed,
unambiguosly-sized types be much more portable? Doesn't the ambiguity
open the door for me on my system X with compiler Y to rely on its
Z-byte representation of int? And if system-dependent optimization is
desired, wouldn't it be easier to do with fixed-size types instead?
As others have already pointed out, C++ does not specify the size of
any type in bytes, except for the character types. What it does
specify is the range of values each type must be able to hold. If you
stay within the minimum range of values for a type, it will be
portable to all applications.
One of my gripes is that the terminology is unclear. New programmers
can be especially confused. For example, 'short' and 'long' are
relative adjectives, and they don't say how big or at least how big.
The other extreme are the names like __int8, __int16, and __int32 in
MSVC++. Wouldn't I be much less likely to use something called __int8
to count numbers over 255, than I would something called char? On the
other hand, these keywords fix the size of the type and allow no room
for the compiler to optimize.

If I could invent new types, I would name them something like:

uint8, uint16, uint24, uintN, ... (unsigned integer types)
sint8, sint16, sint24, sintN, ... (signed integer types)
As already pointed out, C's <stdint.h> provides this for hardware
platforms that actually support these exact sizes. The <stdint.h> for
the 16-bit DSP I am currently working with does not typedef the int8_t
and uint8_t types because they do not exist on the hardware.
where N is any multiple of 8 greater than 0 (i.e. arbitrary precision
types would be built-in.) I feel the signed/unsigned aspect is better
part of the keyword, and not separate and optional. The Mozilla
sources are instructive in that their cross-platform code implements
macros following a similar convention; but macros are like pollution.

I'd further have a new keyword like "allowopt", which when placed
after the type keyword grants access to the compiler to optimize the
memory allocation of the type. For example, when I write "uint16
allowopt myCounter;", then I would unambiously be declaring, "Give me
a 16-bit, unsigned, integer called myCounter whose size the compiler
may optimize."

In most compilers, the default setting would be to enable optimization
for all the declarations, and a pragma could turn it off. I have
suspicions about why things are the way they are, but I'd like to hear
the experts' opinions.

You really need to do a net search for stdint.h, or get and read a
copy of the current C standard. All implementations are required to
provide typedefs for types that eliminate the need for the rather
clumsy concept of a keyword like "allowopt".

You can choose required types that are either the smallest or fastest
to hold at least 8, 16, 32, and 64 bits, and an implementation is free
to provide other widths if it supports them.

The feature's of C's <stdint.h> will almost certainly be included in
the next revision of the C++ standard.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Jul 22 '05 #6

Jack Klein

On Sat, 03 Apr 2004 08:42:10 GMT, Shailesh <hu******@hotmail.com>
wrote in comp.lang.c++:

One problem I've been wrestling with for a long time is how to use the
C++ integral data types, vis-a-vis their size. The C++ rules
guarantee that a char is at least 1 bytes, a short and int at least 2
bytes, and a long at least 4 bytes.
Your statement above is completely incorrect, because you are making
the common mistake of confusing the word "byte" with the word "octet".

In C and C++, the word "byte" is by definition the size of a
character, and is at least 8 bits in width but may be wider. A char
does not contain "at least 1 bytes", it contains exactly one byte,
although that may be larger than one octet. A byte in C and C++ may
have more than 8 bits.

C++ does not guarantee that short and int are at least two bytes,
although they must be at least two octets. Likewise with long.

There are architectures where char contains more than 8 bits, mostly
digital signal processors. On one such DSP, the minimum addressable
unit is 16 bits, that is one byte has 16 bits. The character, short,
and int types all contain 16 bits and their sizeof is 1. Another only
addresses 32 bit quantities. All of the integer types, from char
through long, are 32 bits, and all are exactly one byte.

The rules also define a size
precedence to the types. In Stroustrup's book, it says that all type
sizes are multiples of char, "so by definition the size of a char is
1." According to the rules, that means 1 unit which can be any number
of bytes.
No, the size of a char in C or C++ is exactly one byte, no matter how
many bits it contains. You are still making the common mistake of
assuming that the term "byte" means exactly 8 bits, which it most
certainly does not. Especially in C and C++, where by definition it
does not.
Why should I trust the compiler to optimize the memory usage of the
types behind my back? As for portability, wouldn't fixed,
unambiguosly-sized types be much more portable? Doesn't the ambiguity
open the door for me on my system X with compiler Y to rely on its
Z-byte representation of int? And if system-dependent optimization is
desired, wouldn't it be easier to do with fixed-size types instead?
As others have already pointed out, C++ does not specify the size of
any type in bytes, except for the character types. What it does
specify is the range of values each type must be able to hold. If you
stay within the minimum range of values for a type, it will be
portable to all applications.
One of my gripes is that the terminology is unclear. New programmers
can be especially confused. For example, 'short' and 'long' are
relative adjectives, and they don't say how big or at least how big.
The other extreme are the names like __int8, __int16, and __int32 in
MSVC++. Wouldn't I be much less likely to use something called __int8
to count numbers over 255, than I would something called char? On the
other hand, these keywords fix the size of the type and allow no room
for the compiler to optimize.

If I could invent new types, I would name them something like:

uint8, uint16, uint24, uintN, ... (unsigned integer types)
sint8, sint16, sint24, sintN, ... (signed integer types)
As already pointed out, C's <stdint.h> provides this for hardware
platforms that actually support these exact sizes. The <stdint.h> for
the 16-bit DSP I am currently working with does not typedef the int8_t
and uint8_t types because they do not exist on the hardware.
where N is any multiple of 8 greater than 0 (i.e. arbitrary precision
types would be built-in.) I feel the signed/unsigned aspect is better
part of the keyword, and not separate and optional. The Mozilla
sources are instructive in that their cross-platform code implements
macros following a similar convention; but macros are like pollution.

I'd further have a new keyword like "allowopt", which when placed
after the type keyword grants access to the compiler to optimize the
memory allocation of the type. For example, when I write "uint16
allowopt myCounter;", then I would unambiously be declaring, "Give me
a 16-bit, unsigned, integer called myCounter whose size the compiler
may optimize."

In most compilers, the default setting would be to enable optimization
for all the declarations, and a pragma could turn it off. I have
suspicions about why things are the way they are, but I'd like to hear
the experts' opinions.

Jul 22 '05 #7

Shailesh

Jack Klein wrote:

As others have already pointed out, C++ does not specify the size of
any type in bytes, except for the character types. What it does
specify is the range of values each type must be able to hold. If you
stay within the minimum range of values for a type, it will be
portable to all applications.

You're right that I had no idea a byte could other than 8-bits. I
also hadn't heard of the stdint.h header. I browsed it online, and it
includes exactly the kinds of things I was looking for. I feel that a
standard header like this would be far better than rolling one's own
fixed-size types. Thank you for pointing it out. The fixed-width
types are very helpful for tightly controlling data representation in
files, memory, and network traffic. On the other hand, with so many
flavors of CPU around, I can see how size is less meaningful in that
context.

Jul 22 '05 #8

Shailesh

Jack Klein wrote:

As others have already pointed out, C++ does not specify the size of
any type in bytes, except for the character types. What it does
specify is the range of values each type must be able to hold. If you
stay within the minimum range of values for a type, it will be
portable to all applications.

Jul 22 '05 #9

by: Jason | last post by:

Hi, below is example code which demonstrates a problem I have encountered. When passing a number to a function I compare it with a string's size and then take certain actions, unfortunately during...

C / C++

size and nomenclature of integral types

by: Shailesh | last post by:

One problem I've been wrestling with for a long time is how to use the C++ integral data types, vis-a-vis their size. The C++ rules guarantee that a char is at least 1 bytes, a short and int at...

C / C++

Why no non-integral static const initialiser's within class definition?

by: Mike Hewson | last post by:

Have been researching as to why: <example 1> class ABC { static const float some_float = 3.3f; }; <end example 1>

C / C++

integral promotion, arithmetic conversion, value preserving, unsigned preserving???

by: TTroy | last post by:

Hello, I'm relatively new to C and have gone through more than 4 books on it. None mentioned anything about integral promotion, arithmetic conversion, value preserving and unsigned preserving. ...

C / C++

On what does size of data types depend?

by: Sunil | last post by:

Hi all, I am using gcc compiler in linux.I compiled a small program int main() { printf("char : %d\n",sizeof(char)); printf("unsigned char : ...

C / C++

Size of int (once again, sorry)

by: Agoston Bejo | last post by:

Hi, sorry about the multiple posting, technical difficulties.... ----- What does exactly the size of the int datatype depends in C++? Recenlty I've heard that it depends on the machine's...

.NET Framework

"error C2057: expected constant expression", "error C2466: cannot allocate an array of constant size 0". Why doesn't my simple program work???

by: hn.ft.pris | last post by:

Hi: I have the following simple program: #include<iostream> using namespace std; int main(int argc, char* argv){ const double L = 1.234; const int T = static_cast<const int>(L); int arr;

C / C++

Variables for array size

by: barcaroller | last post by:

Can variables be used for array size in C++? I know that in the past, I could not do the following: foo (int x) { type arr; } I have recently seen code that does exactly that. Is it right?

C / C++

Expanding buffer - response to "Determine the size of malloc" query

by: James Harris | last post by:

Initial issue: read in an arbitrary-length piece of text. Perceived issue: handle variable-length data The code below is a suggestion for implementing a variable length buffer that could be used...

C / C++

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

size and nomenclature of integral types

Similar topics