By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,855 Members | 865 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,855 IT Pros & Developers. It's quick & easy.

Storing 4 characters using 3 bytes

P: n/a
I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?

(The 6 bits integers represent characters in range A-Z and 0-9)
Nov 13 '05 #1
Share this Question
Share on Google+
19 Replies


P: n/a
he***************@hotmail.com (becte) wrote in
news:b4**************************@posting.google.c om:
I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?

(The 6 bits integers represent characters in range A-Z and 0-9)


How about the untested:

#define PACKED_INT_RANGE 0x3F /* 6-bits */

/* pPacked had better be 3 bytes or more
*/
int packInts(int first, int second, int third, int fourth, char *pPacked)
{
int failures = !0;

if ((first & PACKED_INT_RANGE) == first
&& (second & PACKED_INT_RANGE) == second
&& (third & PACKED_INT_RANGE) == third
&& (fourth & PACKED_INT_RANGE) == fourth)
{
pPacked[0] = (first & PACKED_INT_RANGE) << 2;
pPacked[0] |= (second & PACKED_INT_RANGE) >> 4;
pPacked[1] = (second & PACKED_INT_RANGE) << 4;
pPacked[1] |= (third & PACKED_INT_RANGE) >> 2;
pPacked[2] = (third & PACKED_INT_RANGE) << 6;
pPacked[2] |= (fourth & PACKED_INT_RANGE) >> 0;

failures = 0;
}

return failures;
}
--
- Mark ->
--
Nov 13 '05 #2

P: n/a
"becte" <he***************@hotmail.com> wrote in message
news:b4**************************@posting.google.c om...
I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?

(The 6 bits integers represent characters in range A-Z and 0-9)


You could store the 4 6-bit values contiguously into a 32-bit
unsigned long, then split these 24-bit value back into 3 bytes.

I think you should give it a try, and post some code if you
need additional help. All you need is the bit-shift and
bit-or/bit-and operators ( << >> | & ).
Cheers,
Ivan
--
http://ivan.vecerina.com
Nov 13 '05 #3

P: n/a
Greetings.

In article <b4**************************@posting.google.com >, becte wrote:
I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?


Sounds suspiciously like a homework problem, so you're unlikely to get
complete solutions here. Have you read up on C's bit-shifting and bitwise
comparison operators? If not, do so and ask again if you still need help.

Regards,
Tristan

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you
Nov 13 '05 #4

P: n/a
Mark A. Odell <no****@embeddedfw.com> spoke thus:
int packInts(int first, int second, int third, int fourth, char *pPacked)
int failures = !0;


Just FMI, this assumes that !0 isn't a trap representation and that CHAR_BIT
is 8, right? Does it do anything else non-portable?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Nov 13 '05 #5

P: n/a
Christopher Benson-Manica <at***@nospam.cyberspace.org> scribbled the following:
Mark A. Odell <no****@embeddedfw.com> spoke thus:
int packInts(int first, int second, int third, int fourth, char *pPacked)
int failures = !0;
Just FMI, this assumes that !0 isn't a trap representation and that CHAR_BIT
is 8, right? Does it do anything else non-portable?


Umm... what do you think !0 equals?

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"When a man talks dirty to a woman, that's sexual harassment. When a woman talks
dirty to a man, that's 14.99 per minute + local telephone charges!"
- Ruben Stiller
Nov 13 '05 #6

P: n/a
becte wrote:

I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?


"Straightforward" suffices:

o1 = (c1 << 2) | (c2 >> 4);
o2 = ((c2 & 0xF) << 4) | (c3 >> 2);
03 = ((c3 & 0x3) << 2) | c4;

For "clever" you might try something like this (requires
an `int' of 25 or more bits):

o1 = (c1 << 18) | (c2 << 12) | (c3 << 6) | c4;
o3 = o1 & 0xFF;
o1 >>= 8;
o2 = o1 & 0xFF;
o1 >>= 8;

--
Er*********@sun.com
Nov 13 '05 #7

P: n/a
Joona I Palaste <pa*****@cc.helsinki.fi> spoke thus:
Umm... what do you think !0 equals?


Um... nevermind? :(

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Nov 13 '05 #8

P: n/a
becte wrote:
I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?


How about bit fields?

Nov 13 '05 #9

P: n/a
typedef union
{
char chars[3];
struct
{
int _1 : 6;
int _2 : 6;
int _3 : 6;
int _4 : 6;
} BITS

} PACKED
--
cody

[Freeware, Games and Humor]
www.deutronium.de.vu || www.deutronium.tk
Nov 13 '05 #10

P: n/a
In article <bm************@ID-176797.news.uni-berlin.de>
cody <do*********************@gmx.de> writes:
typedef union
{
char chars[3];
struct
{
int _1 : 6;
int _2 : 6;
int _3 : 6;
int _4 : 6;
} BITS

} PACKED


Even once the syntax errors are fixed, the code simply does not
work at all on the SPARC. Try it and see.

(You will find that c[0] is never changed. This is a hint as to
what is going wrong. :-) )
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://67.40.109.61/torek/index.html (for the moment)
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 13 '05 #11

P: n/a
"cody" <do*********************@gmx.de> wrote:
typedef union
{
char chars[3];
struct
{
int _1 : 6;
int _2 : 6;
int _3 : 6;
int _4 : 6;
} BITS

} PACKED


If you'd followed the thread "How to extract bytes from a long?"
you'd already know this is calling for nasal demons.

Regards
--
Irrwahn
(ir*******@freenet.de)
Nov 13 '05 #12

P: n/a
> >typedef union
{
char chars[3];
struct
{
int _1 : 6;
int _2 : 6;
int _3 : 6;
int _4 : 6;
} BITS

} PACKED
Even once the syntax errors are fixed, the code simply does not
work at all on the SPARC. Try it and see.


Missing semicolons? Sorry.
(You will find that c[0] is never changed. This is a hint as to
what is going wrong. :-) )


Why? Structure padding? I don't think the struct BITS will be padding bits
inserted. added at the end maybe but that is not of interest here.

--
cody

[Freeware, Games and Humor]
www.deutronium.de.vu || www.deutronium.tk
Nov 13 '05 #13

P: n/a
becte wrote:

I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?

(The 6 bits integers represent characters in range A-Z and 0-9)


You can do better. You can store three of those chars into 16
bits. You should include at least a blank, for 37 chars each
represented by a value in 0 through 39

You store it by: i = 40 * (40 * c3 + c2) + c1. You extract by
dividing by 40 and taking the remainder.

This is a very old mechanism, and is at the root of 6 char
filenames in bygone years. Look up Rad50 (the 50 is in octal).

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 13 '05 #14

P: n/a
[regarding a union of "char" array and bitfields, I wrote:]
Even once the syntax errors are fixed, the code simply does not
work at all on the SPARC. Try it and see.

Actually, I picked the wrong machine; I now think it *would* work
(depending on desired output) on the SPARC. Ones on which it would
likely fail badly are certain Motorola 680x0 implementations.

In article <bm************@ID-176797.news.uni-berlin.de>
cody <do*********************@gmx.de> writes:Missing semicolons? Sorry.
That, and the names "_1" through "_4" are dubious. (I would have
to consult the C standards to see whether those names are reserved
to users in the structure-member namespace. Instead of using names
whose form requires careful scrutiny of a C standard, why not use
names that are clearly off-limits to the implementor?)
Why? Structure padding? I don't think the struct BITS will be padding bits
inserted. added at the end maybe but that is not of interest here.


For discussion purposes, here is the proposed code, turned into
compilable C:

void pack(unsigned char *out, int in1, int in2, int in3, int in4) {
union {
unsigned char c[3];
struct {
unsigned int one:6, two:6, three:6, four:6;
} bits;
} u;
u.bits.one = in1;
u.bits.two = in2;
u.bits.three = in3;
u.bits.four = in4;

out[0] = u.c[0];
out[1] = u.c[1];
out[2] = u.c[2];
}

Now, suppose we have a fairly typical (in today's terms) machine
with 8-bit "char"s and 4-byte "int"s and 4-byte "storage units".
If you print out the size of the union, you will find that it
requires four bytes, not three.

These four bytes are laid out in memory in the following way,
as required by the C standards:

a) when viewed as array-of-char:
at offset 0: u.c[0] (one byte)
at offset 1: u.c[1] (one byte)
at offset 2: u.c[2] (one byte)
at offset 3: <unnamed> (one byte of padding)
(remember that arrays *must* be contiguous)

b) when viewed as unnamed structure with bitfields:
storage unit at offset 0: <four bytes wide, unspecified layout>

Part (b) is where the problem occurs. The layout of the bitfields
within the storage unit is unspecified. Suppose the compiler
chooses to start at "bit 0" of a 32-bit big-endian word. In this
case, u.bits.one occupies bits 0 through 5 inclusive of that 32-bit
word, which live in the unnamed padding byte at offset 3 in the
array in u.c. u.bits.two occupies bits 6 through 11, which uses
two more bits in the unnamed byte and then four bits in the array
element u.c[3]. (At least one Motorola 68020 C compiler handles
-- or handled -- bitfields this way; I used it in the early Sun
days.)

As it happens, SPARC compilers in big-endian mode (which is to say
most of them) are "encouraged" to allocate bitfields starting at
bit 31 and working down towards bit 0. This means that u.bits.one
occupies bits 31 through 26 inclusive, which live in the array
element u.c[0].

On Intel x86 compilers, the situation is reversed: the usual
convention is to allocate bitfields starting at bit 0 and working
up towards bit 31. As it happens, Intel x86 CPUs are little-endian
-- so this means that u.bits.one occupies bits 0 through 5 inclusive,
which are also bits 0 through 5 of u.c[0]. u.bits.two occupies
bits 6 through 11, which are in bits 6 and 7 of u.c[0] (for the
two low-order bits of u.bits.two) and then bits 0 through 3 of
u.c[1].

Of course, there is nothing but convention to prevent an x86 C
compiler from allocating bits from 31 down to 0, in which case
u.c[0] would not map to any of the bits in u.bits. In any case,
the output on the x86 seems a bit peculiar -- the bits are spread
over the bytes in an interesting manner:

#include <stdio.h>

int main(void) {
unsigned char buf[3];
pack(buf, 0x00, 0x3f, 0x00, 0x00);
printf("result: buf[0..3] = 0x%2.2x%2.2x%2.2x\n",
buf[0], buf[1], buf[2]);
return 0;
}

produces:

result: buf[0..3] = 0xc00f00

Packing the tuple <0x00, 0x03, 0x00, 0x00> results in 0xc00000, so
we see that, indeed, buf[0] contains as its two *uppermost* bits
the two *lowermost* bits of the 2nd value.

(The result on the big-endian SPARC is probably closer to what was
expected.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://67.40.109.61/torek/index.html (for the moment)
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 13 '05 #15

P: n/a
In <bm************@ID-176797.news.uni-berlin.de> "cody" <do*********************@gmx.de> writes:
>typedef union
>{
> char chars[3];
> struct
> {
> int _1 : 6;
> int _2 : 6;
> int _3 : 6;
> int _4 : 6;
> } BITS
>
>} PACKED


Even once the syntax errors are fixed, the code simply does not
work at all on the SPARC. Try it and see.


Missing semicolons? Sorry.
(You will find that c[0] is never changed. This is a hint as to
what is going wrong. :-) )


Why? Structure padding? I don't think the struct BITS will be padding bits
inserted. added at the end maybe but that is not of interest here.


Clue: in what order do bit fields get allocated inside the storage unit?
What if this storage unit happens to be a 32-bit entity?

10 An implementation may allocate any addressable storage unit large
enough to hold a bit-field. If enough space remains,
a bit-field that immediately follows another bit-field in
a structure shall be packed into adjacent bits of the same
unit. If insufficient space remains, whether a bit-field that
does not fit is put into the next unit or overlaps adjacent
units is implementation-defined. The order of allocation of
bit-fields within a unit (high-order to low-order or low-order
to high-order) is implementation-defined. The alignment of the
addressable storage unit is unspecified.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #16

P: n/a
Chris Torek wrote:

[regarding a union of "char" array and bitfields, I wrote:]
Even once the syntax errors are fixed, the code simply does
not work at all on the SPARC. Try it and see.

.... snip ...
For discussion purposes, here is the proposed code, turned into
compilable C:

void pack(unsigned char *out, int in1, int in2, int in3, int in4) {
union {
unsigned char c[3];
struct {
unsigned int one:6, two:6, three:6, four:6;
} bits;
} u;
u.bits.one = in1;
u.bits.two = in2;
u.bits.three = in3;
u.bits.four = in4;

out[0] = u.c[0];
out[1] = u.c[1];
out[2] = u.c[2];
}

... snip about problems etc. ...

However, for this problem, simply work with values using
multiplication and division and all the problems and
non-portabilities go away. See my post elsethread.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 13 '05 #17

P: n/a
Eric Sosman <Er*********@sun.com> wrote in message news:<3F***************@sun.com>...
becte wrote:

I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?


"Straightforward" suffices:

o1 = (c1 << 2) | (c2 >> 4);
o2 = ((c2 & 0xF) << 4) | (c3 >> 2);
03 = ((c3 & 0x3) << 2) | c4;


I go for the straightforward solution.
BTW, there is an error in the last line, should be
o3 = ((c3 & 0x3) << 6) | c4; // shift 6, not 2
Nov 13 '05 #18

P: n/a
> Actually, I picked the wrong machine; I now think it *would* work
(depending on desired output) on the SPARC. Ones on which it would
likely fail badly are certain Motorola 680x0 implementations.
Why? byte order doesn't matter here.You put 3 bytes in and get 4 bytes out.
That, and the names "_1" through "_4" are dubious. (I would have
to consult the C standards to see whether those names are reserved
to users in the structure-member namespace. Instead of using names
whose form requires careful scrutiny of a C standard, why not use
names that are clearly off-limits to the implementor?)


OK, I admit that _1 is a stupid identifier.
Why? Structure padding? I don't think the struct BITS will be padding bitsinserted. added at the end maybe but that is not of interest here.


For discussion purposes, here is the proposed code, turned into
compilable C:

void pack(unsigned char *out, int in1, int in2, int in3, int in4) {
union {
unsigned char c[3];
struct {
unsigned int one:6, two:6, three:6, four:6;
} bits;
} u;
u.bits.one = in1;
u.bits.two = in2;
u.bits.three = in3;
u.bits.four = in4;

out[0] = u.c[0];
out[1] = u.c[1];
out[2] = u.c[2];
}

Now, suppose we have a fairly typical (in today's terms) machine
with 8-bit "char"s and 4-byte "int"s and 4-byte "storage units".
If you print out the size of the union, you will find that it
requires four bytes, not three.

These four bytes are laid out in memory in the following way,
as required by the C standards:

a) when viewed as array-of-char:
at offset 0: u.c[0] (one byte)
at offset 1: u.c[1] (one byte)
at offset 2: u.c[2] (one byte)
at offset 3: <unnamed> (one byte of padding)
(remember that arrays *must* be contiguous)

b) when viewed as unnamed structure with bitfields:
storage unit at offset 0: <four bytes wide, unspecified layout>

Part (b) is where the problem occurs. The layout of the bitfields
within the storage unit is unspecified. Suppose the compiler
chooses to start at "bit 0" of a 32-bit big-endian word. In this
case, u.bits.one occupies bits 0 through 5 inclusive of that 32-bit
word, which live in the unnamed padding byte at offset 3 in the
array in u.c. u.bits.two occupies bits 6 through 11, which uses
two more bits in the unnamed byte and then four bits in the array
element u.c[3]. (At least one Motorola 68020 C compiler handles
-- or handled -- bitfields this way; I used it in the early Sun
days.)

As it happens, SPARC compilers in big-endian mode (which is to say
most of them) are "encouraged" to allocate bitfields starting at
bit 31 and working down towards bit 0. This means that u.bits.one
occupies bits 31 through 26 inclusive, which live in the array
element u.c[0].

On Intel x86 compilers, the situation is reversed: the usual
convention is to allocate bitfields starting at bit 0 and working
up towards bit 31. As it happens, Intel x86 CPUs are little-endian
-- so this means that u.bits.one occupies bits 0 through 5 inclusive,
which are also bits 0 through 5 of u.c[0]. u.bits.two occupies
bits 6 through 11, which are in bits 6 and 7 of u.c[0] (for the
two low-order bits of u.bits.two) and then bits 0 through 3 of
u.c[1].

Of course, there is nothing but convention to prevent an x86 C
compiler from allocating bits from 31 down to 0, in which case
u.c[0] would not map to any of the bits in u.bits. In any case,
the output on the x86 seems a bit peculiar -- the bits are spread
over the bytes in an interesting manner:

#include <stdio.h>

int main(void) {
unsigned char buf[3];
pack(buf, 0x00, 0x3f, 0x00, 0x00);
printf("result: buf[0..3] = 0x%2.2x%2.2x%2.2x\n",
buf[0], buf[1], buf[2]);
return 0;
}

produces:

result: buf[0..3] = 0xc00f00

Packing the tuple <0x00, 0x03, 0x00, 0x00> results in 0xc00000, so
we see that, indeed, buf[0] contains as its two *uppermost* bits
the two *lowermost* bits of the 2nd value.

(The result on the big-endian SPARC is probably closer to what was
expected.)


Who says that the size of the struct will be 32bit? The standard just says:
[#9] An implementation may allocate any addressable storage
unit large enough to hold a bit-field. If enough space
remains, a bit-field that immediately follows another bit-
field in a structure shall be packed into adjacent bits of
the same unit. If insufficient space remains, whether a
bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined. The
order of allocation of bit-fields within a unit (high-order
to low-order or low-order to high-order) is implementation-
defined. The alignment of the addressable storage unit is
unspecified.Additionally you can adjust the structure padding with
compiler options.

union {
unsigned char c[3*4];
struct {
unsigned int bit0:6, bit1:6, bit2:6, bit3:6;
unsigned int bit4:6, bit5:6, bit6:6, bit7:6;
unsigned int bit8:6, bit2:9, bit3:10, bit11:6;
unsigned int bit12:6, bit13:6, bit14:6, bit15:6;
} bits;
} u;

This should solve the problem because the union is now a multiple of 32 bits
in size so there will be no internal padding and byte order won't matter.

--
cody

[Freeware, Games and Humor]
www.deutronium.de.vu || www.deutronium.tk
Nov 13 '05 #19

P: n/a
[I wrote, in part:]
Now, suppose we have a fairly typical (in today's terms) machine
with 8-bit "char"s and 4-byte "int"s and 4-byte "storage units".

In article <bm************@ID-176797.news.uni-berlin.de>
cody <do*********************@gmx.de> wrote: Who says that the size of the struct will be 32bit?
I did, right in the text you quoted (and I retained) above. And
it is -- on today's typical implementations. It does not *have*
to be, but it *is*, right now. (On a few implementations the
storage-units are 16 bits, and on a few more they are 64 bits, and
there are no doubt some oddballs with 24 bits and the like out
there today. But 32 is pretty common at the moment.)
The standard just says:
[#9] An implementation may allocate any addressable storage
unit large enough to hold a bit-field. If enough space
remains, a bit-field that immediately follows another bit-
field in a structure shall be packed into adjacent bits of
the same unit. If insufficient space remains, whether a
bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined. The
order of allocation of bit-fields within a unit (high-order
to low-order or low-order to high-order) is implementation-
defined. The alignment of the addressable storage unit is
unspecified.
Yes, this is what it says.
Additionally you can adjust the structure padding with compiler options.
(If they exist.)

union {
unsigned char c[3*4];
struct {
unsigned int bit0:6, bit1:6, bit2:6, bit3:6;
unsigned int bit4:6, bit5:6, bit6:6, bit7:6;
unsigned int bit8:6, bit2:9, bit3:10, bit11:6;
unsigned int bit12:6, bit13:6, bit14:6, bit15:6;
} bits;
} u;

This should solve the problem because the union is now a multiple of 32 bits
in size so there will be no internal padding and byte order won't matter.


Will it? (Assume we have fixed the typos above -- bit2:9 and
bit3:10 should be bit9:6 and bit10:6. I let a compiler find this.)

"If enough space remains, a bit-field that immediately follows
another bit-field in a structure shall be packed into adjacent
bits of the same [storage] unit."

Suppose storage-units are 32 bits wide. Then 6+6+6+6+6 = 30, so
that the fields name "bit0" through "bit4" inclusive are packed
into the first 32-bit storage unit. The field named "bit5" now
needs another 6 bits, which is greater than the remaining space
(2 bits), so the Standard's wording continues:

"If insufficient space remains," -- and this is indeed the
case -- "whether a bit-field that does not fit is put into
the next unit or overlaps adjacent units is implementation-
defined."

So the implementation must document whether "bit5" is split across
two storage units, or the two leftover bits are skipped and then
"bit5" is placed in the next storage-unit.

Suppose the implementation's documentation says that bitfields are
*not* so split. In this case, "bit5" through "bit9" inclusive go
in the 2nd 32-bit storage-unit, "bit10" through "bit14" in the
third, and "bit15" occupies some implementation-defined portion
of the last 32-bit storage unit.

Thus, the "struct ... bits" member will need four 32-bit storage
units, or 16 bytes (remember that we are still assuming a common
8-bit-byte 32-bit-int machine). This is indeed a multiple of 32
bits -- but hardly the desired multiple, as illustrated by the
following program and its output:

#include <stdio.h>

union u {
unsigned char c[3*4];
struct s {
unsigned int bit0:6, bit1:6, bit2:6, bit3:6;
unsigned int bit4:6, bit5:6, bit6:6, bit7:6;
unsigned int bit8:6, bit9:6, bit10:6, bit11:6;
unsigned int bit12:6, bit13:6, bit14:6, bit15:6;
} bits;
};

int main(void) {
printf("sizeof union u = %lu\n",
(unsigned long)sizeof(union u));
printf("sizeof struct s = %lu\n",
(unsigned long)sizeof(struct s));
printf("sizeof unsigned char [3*4] = %lu\n",
(unsigned long)sizeof(unsigned char [3*4]));
return 0;
}

When compiled with "gcc -ansi -pedantic" and run, this prints:

sizeof union u = 16
sizeof struct s = 16
sizeof c[3*4] = 12

which means that the various bitfields were in fact padded out in
just the way I suggested above. The array, of course, has no
padding.

Ultimately, you *can* do what the original poster asked using unions
and bitfields -- but the result is inherently implementation
dependent, because of all those "implementation-defined" items
above; and it is surprisingly hard to get the right answers.
(Dennis Ritchie once called C's bitfields "a botch and a blemish",
and I think this helps illustrate why.) As a general rule, if you
want to do bitwise computations in C, it all works a lot better if
you use the bitwise operators instead of bitfield members of
structures. The results are far easier to predict -- thus easier
to get right -- and you have much more hope of producing portable,
or even just "mostly portable", code.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://67.40.109.61/torek/index.html (for the moment)
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 13 '05 #20

This discussion thread is closed

Replies have been disabled for this discussion.