Hi,
I do need to implement something similar to C++'s std::bitset in C; for
this, I use an array of int's to get together any desired number of
bits, possibly larger than 32/64 or anything like this.
So of course it does not matter how many bits a single int has, but I do
need a reliable way to find it out.
I think of something like
CHAR_BITS*sizeof(int)
will do the trick, am I right here?
I'm just confused that it is *CHAR*_BITS; in reference to the usual
example, there are some machines where char's have 9 bitsbut is in
this case int required to have some multiple of 9 bits, too? I.e., does
sizeof(something) always give the size of this as multiples of
sizeof(char) or could such a 9 bit char be paired with 16/32 bit integers?
Thank you very much,
Daniel

Got two DearDanielInstant Messages
by MSN, associate ICQ with stressso
please use good, old EMAIL! 25 4199
Daniel Kraft said:
Hi,
I do need to implement something similar to C++'s std::bitset in C; for
this, I use an array of int's to get together any desired number of
bits, possibly larger than 32/64 or anything like this.
So of course it does not matter how many bits a single int has, but I do
need a reliable way to find it out.
I think of something like
CHAR_BITS*sizeof(int)
will do the trick, am I right here?
Well, you mean CHAR_BIT, but yes, that will give you the total number of
bits occupied by the int  including the sign bit, at least (but possibly
more than) 15 value bits, and at least (but possibly more than) no padding
bits.
I'm just confused that it is *CHAR*_BITS; in reference to the usual
example, there are some machines where char's have 9 bitsbut is in
this case int required to have some multiple of 9 bits, too?
Yes, CHAR_BIT gives the number of bits in a char, and a char is exactly one
byte wide, and every object must be a whole number of bytes wide. If
CHAR_BIT is 9, then objects must be a multiple of 9 bits wide.
The usual way to implement a "bit array", though, is as follows:
1) decide how many bits, B, you want your array to have (if you decide this
at runtime, you'll need to allocate the memory in step 2 dynamically,
check that you've got it, and release it when you're done);
2) allocate (B + CHAR_BIT  1) / CHAR_BIT bytes (unsigned char foo[N] = {0}
or unsigned char *foo = calloc((B + CHAR_BIT  1) / CHAR_BIT, 1),
initialising it all to 0 (you can use = {0} unless you allocate
dynamically, in which case use calloc  one of the rare occasions where
this is a good idea);
3) use macros to get, set, and test individual bits. http://www.snippets.org has some macros that can be used for this purpose.

Richard Heathfield <http://www.cpax.org.uk>
Email: http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place"  dmr 29 July 1999
It is CHAR_BIT and not CHAR_BITS, ignoring that, if sizeof(anything)
== 2
we know that it's twice the size of a `char'.
Richard Heathfield <rj*@see.sig.invalidwrites:
Daniel Kraft said:
>I do need to implement something similar to C++'s std::bitset in C;
<snip>
The usual way to implement a "bit array", though, is as follows:
<snip>
2) allocate (B + CHAR_BIT  1) / CHAR_BIT bytes (unsigned char
foo[N] = {0}
<snip>
It is probably worth adding the reason one uses an unsigned integer
type is that shift operations are welldefined on these, and the
reason one uses unsigned char in particular is that it is guaranteed
not to have any padding bits.

Ben.
Ben Bacarisse said:
<snip>
It is probably worth adding the reason one uses an unsigned integer
type is that shift operations are welldefined on these, and the
reason one uses unsigned char in particular is that it is guaranteed
not to have any padding bits.
It is indeed worth adding. Thank you.

Richard Heathfield <http://www.cpax.org.uk>
Email: http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place"  dmr 29 July 1999 vi*************@gmail.com wrote:
It is CHAR_BIT and not CHAR_BITS, ignoring that, if sizeof(anything)
== 2
we know that it's twice the size of a `char'.
Yes, but we do not know that it contains twice as many 'usable' bits.
E.g. one could have
CHAR_BIT == 9
sizeof(int) == 2
INT_MIN == 32767
INT_MAX == 32767
and it would be perfactly fine as far as the C standard is concerned.
In this case an int would occupy 18 bits, but 2 of those bits would
be padding and such an int would not be able to hold than a 16bit wide
int without padding would.

<Insert your favourite quote here.>
Erik Trulsson er******@student.uu.se
>>I do need to implement something similar to C++'s std::bitset in C;
<snip>
>The usual way to implement a "bit array", though, is as follows:
<snip>
>2) allocate (B + CHAR_BIT  1) / CHAR_BIT bytes (unsigned char foo[N] = {0}
<snip>
It is probably worth adding the reason one uses an unsigned integer
type is that shift operations are welldefined on these, and the
reason one uses unsigned char in particular is that it is guaranteed
not to have any padding bits.
Hm... Actually I was thinking of using int's, as the main operations in
my case are not accessing individual bits but rather doing binary and's
and or's on the whole bitsetI know, premature optimization... but it
seems to me to be much cleaner and probably "faster" to use ints for
this instead of chars (always unsigned, of course).
So is there a way to get the number of *usable* bits of a single
unsigned? Or should I use unsigned chars and do the binary operations
perchar instead of perint?
Cheers,
Daniel

Got two DearDanielInstant Messages
by MSN, associate ICQ with stressso
please use good, old EMAIL!
Daniel Kraft said:
<snip>
So is there a way to get the number of *usable* bits of a single
unsigned?
Sure. UINT_MAX has to be one less than a power of 2, so you can do this
(once only will do):
#include <limits.h>
int count_value_bits_in_unsigned_int(void)
{
int rv = 0;
unsigned int n = UINT_MAX;
while(n 0)
{
++rv;
n >>= 1;
}
return rv;
}
Or should I use unsigned chars and do the binary operations
perchar instead of perint?
Well, I would, but it's up to you.

Richard Heathfield <http://www.cpax.org.uk>
Email: http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place"  dmr 29 July 1999
Richard Heathfield wrote:
>
Daniel Kraft said:
<snip>
So is there a way to get the number of *usable* bits of a single
unsigned?
Sure. UINT_MAX has to be one less than a power of 2,
so you can do this (once only will do):
#include <limits.h>
int count_value_bits_in_unsigned_int(void)
{
int rv = 0;
unsigned int n = UINT_MAX;
while(n 0)
{
++rv;
n >>= 1;
}
return rv;
}
rv should be type unsigned, instead of type int.
The number of value bits in type unsigned isn't
guaranteed to be representable as an int type value.

pete
pete wrote:
Richard Heathfield wrote:
>>int count_value_bits_in_unsigned_int(void) { int rv = 0;
<snip>
> return rv; }
rv should be type unsigned, instead of type int.
The number of value bits in type unsigned isn't
guaranteed to be representable as an int type value.
Can you name one architecture with more than 32767 value bit
in unsigned?
Peter Pichler wrote:
>
pete wrote:
The number of value bits in type unsigned isn't
guaranteed to be representable as an int type value.
Can you name one architecture with more than 32767 value bit
in unsigned?
I've never even seen a padding bit.

pete
Daniel Kraft <d@domob.euwrites:
[...]
Of course, I also thought about calculating this once at runtime
(which is easy and should not cost much), but of course a compiletime
constant is much betterwith C++ template metaprogramming this
shouldn't be hard to solve, but with the C preprocessor it could get a
bit trickier...
However, thanks for this nice macro! I think I can put it into
something I can use (because I do have some restrictions on which bits
are really usable for me).
Another possibility is to write a small C program that computes the
number of value bits and writes out a valid C header file that
declares it as a macro. You can then include this generated header
file in your program.
You'll have to have some mechanism for compiling and executing this
small program as part of your build procedure. Doing so is beyond the
scope of C, but shouldn't be difficult for a "make"like tool  *if*
you're compiling on the same system where you're going to run the
program. If you're crosscompiling, things get a bit trickier.
Or you can maintain it manually, but that's errorprone.

Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
 Antony Jay and Jonathan Lynn, "Yes Minister"
pete said:
Richard Heathfield wrote:
>> Daniel Kraft said:
<snip>
So is there a way to get the number of *usable* bits of a single
unsigned?
Sure. UINT_MAX has to be one less than a power of 2, so you can do this (once only will do):
#include <limits.h>
int count_value_bits_in_unsigned_int(void) { int rv = 0; unsigned int n = UINT_MAX; while(n 0) { ++rv; n >>= 1; } return rv; }
rv should be type unsigned, instead of type int.
The number of value bits in type unsigned isn't
guaranteed to be representable as an int type value.
We can deduce from the standard that it is representable as an int type
value. The int type uses the same amount of storage as the unsigned int
type, and has the same alignment requirements. Given that UINT_MAX must be
representable as an unsigned int, and given that it must be at least 65535
(at which point rv can be at most 16), and given that n grows very much
faster than log2(n), it follows that rv will grow very slowly.
Since int must be capable of storing the value 32767, my little function
will work on any system with unsigned ints containing 32767 value bits 
that is, even if there were a disparity between the number of the value
bits in the two types so great that INT_MAX were 32767 and UINT_MAX were 1
less than 2 to the power 32767 (a number so large I hesitate to post it
here!), my function would still cope.
And with each extra value bit in an int, you more than double the UINT_MAX
it can cope with.
An implementation on which my function did not work would not merely be
runofthemill clcstyle pathological. It would be psychopathological.

Richard Heathfield <http://www.cpax.org.uk>
Email: http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place"  dmr 29 July 1999
pete:
#include <limits.h>
int count_value_bits_in_unsigned_int(void)
{
int rv = 0;
unsigned int n = UINT_MAX;
while(n 0)
{
++rv;
n >>= 1;
}
return rv;
}
rv should be type unsigned, instead of type int.
The number of value bits in type unsigned isn't
guaranteed to be representable as an int type value.
While we're making ridiculous critiques, the most blatantly obvious
flaw if you ask me is the use of a "while" instead of a "do".
As for signed integer types Vs unsigned integer types, I default to
unsigned rather than signed.
My own function would've been something like:
unsigned VRbitsInUInt(void)
{
unsigned vr = 16;
unsigned x = 0x8000u;
while (x <<= 1) ++vr;
return vr;
}
Or a more generic one (if we pretend we can pass a type):
template <typename UIntType>
unsigned VRbitsUIntType(void)
{
unsigned vr = 8;
UIntType x = 0x80u;
while (x <<= 1) ++vr;
return vr;
}
Martin
Richard Heathfield wrote:
>
pete said:
Richard Heathfield wrote:
>
Daniel Kraft said:
<snip>
So is there a way to get the number of *usable* bits of a single
unsigned?
Sure. UINT_MAX has to be one less than a power of 2,
so you can do this (once only will do):
#include <limits.h>
int count_value_bits_in_unsigned_int(void)
{
int rv = 0;
unsigned int n = UINT_MAX;
while(n 0)
{
++rv;
n >>= 1;
}
return rv;
}
rv should be type unsigned, instead of type int.
The number of value bits in type unsigned isn't
guaranteed to be representable as an int type value.
We can deduce from the standard
that it is representable as an int type value.
An implementation on which my function
did not work would not merely be
runofthemill clcstyle pathological.
It would be psychopathological.
You can't have it both ways.
If you can deduce from the standard that it
is representable as an int type value,
then there can't be an implementation,
psychopathological or otherwise,
where your function won't work.

pete
On Fri, 19 Oct 2007 17:54:00 0400, pete wrote:
Richard Heathfield wrote:
>> pete said:
Richard Heathfield wrote:
Daniel Kraft said:
<snip>
So is there a way to get the number of *usable* bits of a single
unsigned?
Sure. UINT_MAX has to be one less than a power of 2, so you can do this (once only will do):
#include <limits.h>
int count_value_bits_in_unsigned_int(void) { int rv = 0; unsigned int n = UINT_MAX; while(n 0) { ++rv; n >>= 1; } return rv; }
rv should be type unsigned, instead of type int. The number of value
bits in type unsigned isn't guaranteed to be representable as an int
type value.
We can deduce from the standard that it is representable as an int type value.
>An implementation on which my function did not work would not merely be runofthemill clcstyle pathological. It would be psychopathological.
You can't have it both ways.
If you can deduce from the standard that it is representable as an int
type value, then there can't be an implementation, psychopathological or
otherwise,
where your function won't work.
The standard doesn't guarantee that
int main(int argc, char *argv[])
{
return 0;
}
will work, since it declares two objects whose size is allowed to be
greater than 65535 bytes. This is already taking the environmental limits
to the extreme. A hosted implementation on which
UINT_MAX (1 << INT_MAX)  1
would require sizeof(int) to not only simply equal or exceed 2^16, but to
equal or exceed 2^32752. A freestanding implementation might be able to
get it done with a lower number of bytes, by using excessively large
bytes, but it would still require such an utterly insane implementation
that you're better off worrying about clowns breaking into your house,
doing a silly dance, and leaving with your toilet paper.
pete said:
Richard Heathfield wrote:
<snip>
>
>An implementation on which my function did not work would not merely be runofthemill clcstyle pathological. It would be psychopathological.
You can't have it both ways.
If you can deduce from the standard that it
is representable as an int type value,
then there can't be an implementation,
psychopathological or otherwise,
where your function won't work.
Yes, I think I wrote that one as I was going along, and forgot to go back
to edit out the claim that it is deducible. (It might yet be, but in fact
I failed to do it.)
Let's just say that I would certainly worry about the possibility of a
nonASCII character set, I have been known to worry about the number of
bits in a byte being greater than 8, I might worry about the endianness of
a machine being different to the endianness of data in a file on that
machine, and I could even conceivably worry about the result of
rightshifting a negative value  but I would not be even slightly
concerned about the possibility of an implementation using more than
INT_MAX value bits in an unsigned int.

Richard Heathfield <http://www.cpax.org.uk>
Email: http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place"  dmr 29 July 1999
My question is probably offtopic, but can someone please explain me
what are padding bits in an unsigned integer? (and how can affect the
code)
One more question:
This code
#include <limits.h>
int count_value_bits_in_unsigned_int(void)
{
int rv = 0;
unsigned int n = UINT_MAX;
while(n 0)
{
++rv;
n >>= 1;
}
return rv;
}
should always give 32 on a 4 byte int system, is this correct?
Thanks for your time
DiAvOl said:
My question is probably offtopic, but can someone please explain me
what are padding bits in an unsigned integer?
They are bits that do not contribute to the value of the object, but which
are there for the convenience of the implementor (or, perhaps they are
inconveniently there, and are ignored for the convenience of the
implementor!).
Let's just say, for the sake of argument, that you wanted to write a C
compiler on a system with 8bit bytes, and that you wanted to give a
*guarantee* to your users that any legal pointer could be converted to a
legal unsigned int and vice versa  but you only had a 28bit address
space, i.e. 256 MB of RAM. It would be sensible to keep your byte size at
the natural size for the machine, 8 bits, and it's clear you're going to
need four bytes for an unsigned int on this system. But if *any* unsigned
int value translates to a valid pointer value and you only have 2^28
addresses, then it's clear that you're not going to be able to use the top
four bits of your unsigned int. So you'd probably define UINT_MAX as
268435455 and ignore the top four bits completely. They'd still be there,
of course, but they would not contribute to the value of the unsigned int.
In fact, you'd probably have to mask them out during calcs, just in case
some bright spark managed to set them anyway.
One more question:
This code
#include <limits.h>
int count_value_bits_in_unsigned_int(void)
{
int rv = 0;
unsigned int n = UINT_MAX;
while(n 0)
{
++rv;
n >>= 1;
}
return rv;
}
should always give 32 on a 4 byte int system, is this correct?
No, it isn't correct. Consider a "4 byte int" system with bytes that are
117 bits in size, and no padding bits. On such a system, the function will
return 468.

Richard Heathfield <http://www.cpax.org.uk>
Email: http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place"  dmr 29 July 1999
Let's just say, for the sake of argument, that you wanted to write a C
compiler on a system with 8bit bytes, and that you wanted to give a
*guarantee* to your users that any legal pointer could be converted to a
legal unsigned int and vice versa  but you only had a 28bit address
space, i.e. 256 MB of RAM. It would be sensible to keep your byte size at
the natural size for the machine, 8 bits, and it's clear you're going to
need four bytes for an unsigned int on this system. But if *any* unsigned
int value translates to a valid pointer value and you only have 2^28
addresses, then it's clear that you're not going to be able to use the top
four bits of your unsigned int. So you'd probably define UINT_MAX as
268435455 and ignore the top four bits completely. They'd still be there,
of course, but they would not contribute to the value of the unsigned int.
In fact, you'd probably have to mask them out during calcs, just in case
some bright spark managed to set them anyway.
What values would those 4 top "padded bits" contain? 0 or 1?
I'm trying to figure out if padding bits affects the code in any way,
for example what if the user wanted to use those 4 top bits for
bitwise operations?
Can someone give an example how would the code be affected when
running on a system without padding bit ints and when running on one
with padding bit ?
Thanks for the above example & sorry for my bad english.
DiAvOl said:
<snip>
>So you'd probably define UINT_MAX as 268435455 and ignore the top four bits completely. They'd still be there, of course, but they would not contribute to the value of the unsigned int. In fact, you'd probably have to mask them out during calcs, just in case some bright spark managed to set them anyway.
What values would those 4 top "padded bits" contain? 0 or 1?
One or other of those, yes.
I'm trying to figure out if padding bits affects the code in any way,
for example what if the user wanted to use those 4 top bits for
bitwise operations?
The user could actually do this, of course, simply by accessing the object
representation via type punning:
unsigned int n = some_value;
unsigned char *p = (unsigned char *)&n;
/* can now write to n's object representation via p[0] through p[(sizeof n)
 1] */
If the user chooses to bitshift via a series of p[i] hacks, he could, and
the padding bits would then muddy the waters. But if the user instead
shifted the unsigned int itself, well, that's a valuebased operation, so
the padding bits would have no effect on the value of the shifted unsigned
int (and the implementation would have to ensure that this was the case).

Richard Heathfield <http://www.cpax.org.uk>
Email: http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place"  dmr 29 July 1999
On Thu, 18 Oct 2007 18:44:05 0400, pete wrote:
Richard Heathfield wrote:
>int count_value_bits_in_unsigned_int(void) { int rv = 0; unsigned int n = UINT_MAX; while(n 0) { ++rv; n >>= 1; } return rv; }
rv should be type unsigned, instead of type int.
The number of value bits in type unsigned isn't
guaranteed to be representable as an int type value.
That'd require one unsigned int to occupy at least 4 KB

Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.
On Thu, 18 Oct 2007 18:44:05 0400, pete wrote:
The number of value bits in type unsigned isn't
guaranteed to be representable as an int type value.
Finding the greatest lower bound for the fraction of padding bits
among the bits of int on any implementation where log2(UINT_MAX+1.0)
is greater than INT_MAX is left as an exercise to the reader.
Now an implementation which wastes more than 99.5% (Hint: unless
I had something wrong, this is strictly less than the greatest
lower bound) of the bits in an int is an implementation I would
rather never have *anything* to do with (including "being to
within one lightyear of each other").

Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.
"Army1987" <ar******@NOSPAM.ita écrit dans le message de news: pa****************************@NOSPAM.it...
On Thu, 18 Oct 2007 18:44:05 0400, pete wrote:
>Richard Heathfield wrote:
>>int count_value_bits_in_unsigned_int(void) { int rv = 0; unsigned int n = UINT_MAX; while(n 0) { ++rv; n >>= 1; } return rv; }
rv should be type unsigned, instead of type int. The number of value bits in type unsigned isn't guaranteed to be representable as an int type value.
That'd require one unsigned int to occupy at least 4 KB
And the corresponding int representation to have at least 16368 padding bits
;)

Chqrlie.
On Sat, 20 Oct 2007 15:11:26 +0200, Charlie Gordon wrote:
"Army1987" <ar******@NOSPAM.ita Ã©crit dans le message de news: pa****************************@NOSPAM.it...
>On Thu, 18 Oct 2007 18:44:05 0400, pete wrote:
>>rv should be type unsigned, instead of type int. The number of value bits in type unsigned isn't guaranteed to be representable as an int type value.
That'd require one unsigned int to occupy at least 4 KB
And the corresponding int representation to have at least 16368 padding bits
;)
Have you found a way for such an implementation to have less than
32752 padding bits in int?
(Yes, "at least 16368" is still true if the greatest lower bound
is 32752, but I wander if you found that that lower bound is
wrong.)

Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.
"Army1987" <ar******@NOSPAM.ita écrit dans le message de news: pa****************************@NOSPAM.it...
On Sat, 20 Oct 2007 15:11:26 +0200, Charlie Gordon wrote:
>"Army1987" <ar******@NOSPAM.ita écrit dans le message de news: pa****************************@NOSPAM.it...
>>On Thu, 18 Oct 2007 18:44:05 0400, pete wrote: rv should be type unsigned, instead of type int. The number of value bits in type unsigned isn't guaranteed to be representable as an int type value. That'd require one unsigned int to occupy at least 4 KB
And the corresponding int representation to have at least 16368 padding bits ;)
Have you found a way for such an implementation to have less than
32752 padding bits in int?
(Yes, "at least 16368" is still true if the greatest lower bound
is 32752, but I wander if you found that that lower bound is
wrong.)
No, your lower bound seems good.

Chqrlie. This discussion thread is closed Replies have been disabled for this discussion. Similar topics
36 posts
views
Thread by Dag 
last post: by

4 posts
views
Thread by Aaron W. West 
last post: by

12 posts
views
Thread by jose luis fernandez diaz 
last post: by

19 posts
views
Thread by alain dhaene 
last post: by

6 posts
views
Thread by arunaling 
last post: by

22 posts
views
Thread by subramanian100in 
last post: by

13 posts
views
Thread by arnuld 
last post: by

4 posts
views
Thread by mattG 
last post: by

275 posts
views
Thread by Astley Le Jasper 
last post: by
          