By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
426,165 Members | 1,928 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 426,165 IT Pros & Developers. It's quick & easy.

Portably extracting data from a bytestring

P: n/a
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.
2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.


Nov 15 '05 #1
Share this Question
Share on Google+
17 Replies


P: n/a
James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.
2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.

typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}

This assumes that at the given location an integer was stored.
The problem is that you did not define what "extract four bytes"
and "store them in an unsigned int" really means.

If you do not care about alignment (x86 architecture) you could

unsigned int convert(char *S,int d)
{
U *u;
u = (U *)(S+d);
return u->i;
}
More efficient, but you could get an alignment trap.

Both suppose that
1) You have stored before an integer at that location
2) You read them in the same machine architecture.

jacob
Nov 15 '05 #2

P: n/a
jacob navia <ja***@jacob.remcomp.fr> writes:
James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this
1) In portable ANSI C. 2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.

typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}


Why not just this:

unsigned int convert(char *s, int d)
{
unsigned int i;
memcpy(&i, s + d, sizeof i);
return i;
}

Character access is allowed to any type; memcpy() does character access.
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}
Nov 15 '05 #3

P: n/a
In article <43**********************@news.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:
James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this 1) In portable ANSI C.
typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U; unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}


I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]

You are on safer grounds to cast the object pointer to char* .
--
I was very young in those days, but I was also rather dim.
-- Christopher Priest
Nov 15 '05 #4

P: n/a
James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.
Impossible if I take you literally, since an unsigned int isn't
guaranteed to be any bigger than 16 bits, and 4 bytes will be 32 bits
(since we're presumably talking about 8-bit bytes, not "C bytes" which
can be larger).

Make it an unsigned long instead. You could redescribe the problem as
"extracting sizeof(unsigned int) bytes" too, but this is something
different, and it may not be the problem at hand.

Alternatively, you could mean "in ANSI C that's portable save for the
assumption that an unsigned int is 32 bits". This will be acceptable for
the majority of existing platforms, as long as you keep in mind the
limits of portability here.
2) As efficiently as possible.
That's the trick, isn't it? The most efficient thing you can do is
obviously just interpreting those 4 bytes as an int through a union. But
that's not guaranteed to work (also see below).
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.


How can we take it into account if you don't describe what endianness
issues there are? What do the bytes in the string mean? Assuming the
four bytes are a contiguous sequence of bits making up the binary
representation of an integer, you'd still need to know in what order
they're stored before you can turn them into a machine integer.

Theoretically there are 24 separate orderings, but of course the only
ones that matter in practice are big-endian (call this B4 B3 B2 B1) and
little-endian (B1 B2 B3 B4), and maybe some mixed form for 16-bit
architectures (B3 B4 B1 B2 and B2 B1 B4 B3, perverse but not unheard
of). You do not need to know the endianness of the target architecture
to perform the conversion (though it may help for efficiency), but you
do need to know the endianness of the bytes in the string.

For practical approaches, see the "obvious" solutions already posted by
others. It's important to know what problems these solve, and if they
match the problem you described.

S.
Nov 15 '05 #5

P: n/a
Walter Roberson <ro******@ibd.nrc-cnrc.gc.ca> wrote:
I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]


I believe the clause you are looking for is this one, from 3.3.2.3 of
the draft available at http://dev.unicals.com/papers/c89-draft.html:

"With one exception, if a member of a union object is accessed after a
value has been stored in a different member of the object, the
behavior is implementation-defined." [with the one exception being the
one you pointed out]

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Nov 15 '05 #6

P: n/a
Ben Pfaff wrote:
jacob navia <ja***@jacob.remcomp.fr> writes:

James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this
1) In portable ANSI C. 2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.


typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}

Why not just this:

unsigned int convert(char *s, int d)
{
unsigned int i;
memcpy(&i, s + d, sizeof i);
return i;
}

Character access is allowed to any type; memcpy() does character access.


Well Ben, you are right :-)

Much shorter, and essentially the same stuff.

jacob
Nov 15 '05 #7

P: n/a
ro******@ibd.nrc-cnrc.gc.ca (Walter Roberson) writes:
In article <43**********************@news.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:
typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}


I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]


My best understanding is that it's debateable whether accessing
a union member other than the last one written results in
undefined behavior or in implementation-defined behavior. An
entry in an (informative) annex lists it as implementation-defined.
Nov 15 '05 #8

P: n/a
Christopher Benson-Manica <at***@nospam.cyberspace.org> writes:
Walter Roberson <ro******@ibd.nrc-cnrc.gc.ca> wrote:
I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]


I believe the clause you are looking for is this one, from 3.3.2.3 of
the draft available at http://dev.unicals.com/papers/c89-draft.html:

"With one exception, if a member of a union object is accessed after a
value has been stored in a different member of the object, the
behavior is implementation-defined." [with the one exception being the
one you pointed out]


This sentence has disappeared from the Standard by now. There is
however a similar statement in an informative annex.
Nov 15 '05 #9

P: n/a
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
This sentence has disappeared from the Standard by now. There is
however a similar statement in an informative annex.


What, exactly, is the difference between "normative" and
"informative"? IIUC, "informative" is not strictly "standard" - does
that mean that there is no "normative" text specifying how
implementations should deal with union member access?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Nov 15 '05 #10

P: n/a
Christopher Benson-Manica <at***@nospam.cyberspace.org> writes:
What, exactly, is the difference between "normative" and
"informative"?


Normative text is part of the standard.
Informative text, like footnotes, examples, and some appendices,
are not part of the standard. They are for information only.
--
Go not to Usenet for counsel, for they will say both no and yes.
Nov 15 '05 #11

P: n/a
Christopher Benson-Manica <at***@nospam.cyberspace.org> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
This sentence has disappeared from the Standard by now. There is
however a similar statement in an informative annex.


What, exactly, is the difference between "normative" and
"informative"? IIUC, "informative" is not strictly "standard" - does
that mean that there is no "normative" text specifying how
implementations should deal with union member access?


Taken from ISO/IEC Directives part 3:

3.4

normative elements

those elements setting out the provisions to which it is
necessary to conform in order to be able to claim compliance
with the standard

A "normative element" must be observed in order to conform to the
standard in question. Any "informative" text is supposed to be
right (and presumably useful), but it does not by itself impose
requirements on whatever is being defined in the standard. Both
normative text and informative text are part of a standard, but
only normative text imposes requirements that must be observed.

There is normative text that gives requirements for accessing
union members, but that text is sprinkled through the rest of the
C Standard. So it isn't easy to tell if the logical consequences
of those requirements imply implementation defined behavior.
Nov 15 '05 #12

P: n/a
Ben Pfaff <bl*@cs.stanford.edu> writes:
Christopher Benson-Manica <at***@nospam.cyberspace.org> writes:
What, exactly, is the difference between "normative" and
"informative"?


Normative text is part of the standard.
Informative text, like footnotes, examples, and some appendices,
are not part of the standard. They are for information only.


Not exactly. Both normative elements and informative elements
are part of a standard, but only normative elements give
provisions that must be observed in order to claim conformance
(to whatever it is that's being standardized).

(I admit it's a minor distinction; I thought some people
might appreciate the clarification.)
Nov 15 '05 #13

P: n/a
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
(I admit it's a minor distinction; I thought some people
might appreciate the clarification.)


I did - thank you.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Nov 15 '05 #14

P: n/a
Skarmander wrote:

James S. Singleton wrote:
Let S be a pointer to a bytestring of length L.
I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4,
and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.


Impossible if I take you literally, since an unsigned int isn't
guaranteed to be any bigger than 16 bits, and 4 bytes will be 32 bits
(since we're presumably talking about 8-bit bytes, not "C bytes" which
can be larger).


"portable ANSI C" means "C bytes"

You are correct in that it is impossible.

Code doesn't have to be portable to be useful.

Pretending that code is portable when it isn't, is wrong.

--
pete
Nov 15 '05 #15

P: n/a
pete wrote:
<snip>
Code doesn't have to be portable to be useful.

Pretending that code is portable when it isn't, is wrong.


"Portable modulo X" is a meaningful concept. That is, "portable across
all machines where an int is 32 bits" is meaningful. Whether it's
acceptable depends, and nobody could advertise it as "100% portable ISO
C", but it's not "pretending". Just as long as you don't call it
"portable" without qualification.

S.
Nov 15 '05 #16

P: n/a
On 2005-10-27, Skarmander <in*****@dontmailme.com> wrote:
pete wrote:
<snip>
Code doesn't have to be portable to be useful.

Pretending that code is portable when it isn't, is wrong.


"Portable modulo X" is a meaningful concept. That is, "portable across
all machines where an int is 32 bits" is meaningful. Whether it's
acceptable depends, and nobody could advertise it as "100% portable ISO
C", but it's not "pretending". Just as long as you don't call it
"portable" without qualification.


And all that's needed in this case is to use long instead of int.
Nov 15 '05 #17

P: n/a
Jordan Abel wrote:

On 2005-10-27, Skarmander <in*****@dontmailme.com> wrote:
pete wrote:
<snip>
Code doesn't have to be portable to be useful.

Pretending that code is portable when it isn't, is wrong.

"Portable modulo X" is a meaningful concept.
That is, "portable across
all machines where an int is 32 bits" is meaningful. Whether it's
acceptable depends,
and nobody could advertise it as "100% portable ISO
C", but it's not "pretending". Just as long as you don't call it
"portable" without qualification.


OK.
And all that's needed in this case is to use long instead of int.
Qulaification is still required for that.

For this specification:
I would like to extract 4 bytes and store them into an unsigned int. 1) In portable ANSI C.


sizeof(long) can be less than 4 if CHAR_BIT is greater than 8.

--
pete
Nov 15 '05 #18

This discussion thread is closed

Replies have been disabled for this discussion.