473,396 Members | 1,827 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Portably extracting data from a bytestring

Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.
2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.


Nov 15 '05 #1
17 1819
James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.
2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.

typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}

This assumes that at the given location an integer was stored.
The problem is that you did not define what "extract four bytes"
and "store them in an unsigned int" really means.

If you do not care about alignment (x86 architecture) you could

unsigned int convert(char *S,int d)
{
U *u;
u = (U *)(S+d);
return u->i;
}
More efficient, but you could get an alignment trap.

Both suppose that
1) You have stored before an integer at that location
2) You read them in the same machine architecture.

jacob
Nov 15 '05 #2
jacob navia <ja***@jacob.remcomp.fr> writes:
James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this
1) In portable ANSI C. 2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.

typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}


Why not just this:

unsigned int convert(char *s, int d)
{
unsigned int i;
memcpy(&i, s + d, sizeof i);
return i;
}

Character access is allowed to any type; memcpy() does character access.
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}
Nov 15 '05 #3
In article <43**********************@news.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:
James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this 1) In portable ANSI C.
typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U; unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}


I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]

You are on safer grounds to cast the object pointer to char* .
--
I was very young in those days, but I was also rather dim.
-- Christopher Priest
Nov 15 '05 #4
James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.
Impossible if I take you literally, since an unsigned int isn't
guaranteed to be any bigger than 16 bits, and 4 bytes will be 32 bits
(since we're presumably talking about 8-bit bytes, not "C bytes" which
can be larger).

Make it an unsigned long instead. You could redescribe the problem as
"extracting sizeof(unsigned int) bytes" too, but this is something
different, and it may not be the problem at hand.

Alternatively, you could mean "in ANSI C that's portable save for the
assumption that an unsigned int is 32 bits". This will be acceptable for
the majority of existing platforms, as long as you keep in mind the
limits of portability here.
2) As efficiently as possible.
That's the trick, isn't it? The most efficient thing you can do is
obviously just interpreting those 4 bytes as an int through a union. But
that's not guaranteed to work (also see below).
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.


How can we take it into account if you don't describe what endianness
issues there are? What do the bytes in the string mean? Assuming the
four bytes are a contiguous sequence of bits making up the binary
representation of an integer, you'd still need to know in what order
they're stored before you can turn them into a machine integer.

Theoretically there are 24 separate orderings, but of course the only
ones that matter in practice are big-endian (call this B4 B3 B2 B1) and
little-endian (B1 B2 B3 B4), and maybe some mixed form for 16-bit
architectures (B3 B4 B1 B2 and B2 B1 B4 B3, perverse but not unheard
of). You do not need to know the endianness of the target architecture
to perform the conversion (though it may help for efficiency), but you
do need to know the endianness of the bytes in the string.

For practical approaches, see the "obvious" solutions already posted by
others. It's important to know what problems these solve, and if they
match the problem you described.

S.
Nov 15 '05 #5
Walter Roberson <ro******@ibd.nrc-cnrc.gc.ca> wrote:
I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]


I believe the clause you are looking for is this one, from 3.3.2.3 of
the draft available at http://dev.unicals.com/papers/c89-draft.html:

"With one exception, if a member of a union object is accessed after a
value has been stored in a different member of the object, the
behavior is implementation-defined." [with the one exception being the
one you pointed out]

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Nov 15 '05 #6
Ben Pfaff wrote:
jacob navia <ja***@jacob.remcomp.fr> writes:

James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this
1) In portable ANSI C. 2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.


typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}

Why not just this:

unsigned int convert(char *s, int d)
{
unsigned int i;
memcpy(&i, s + d, sizeof i);
return i;
}

Character access is allowed to any type; memcpy() does character access.


Well Ben, you are right :-)

Much shorter, and essentially the same stuff.

jacob
Nov 15 '05 #7
ro******@ibd.nrc-cnrc.gc.ca (Walter Roberson) writes:
In article <43**********************@news.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:
typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}


I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]


My best understanding is that it's debateable whether accessing
a union member other than the last one written results in
undefined behavior or in implementation-defined behavior. An
entry in an (informative) annex lists it as implementation-defined.
Nov 15 '05 #8
Christopher Benson-Manica <at***@nospam.cyberspace.org> writes:
Walter Roberson <ro******@ibd.nrc-cnrc.gc.ca> wrote:
I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]


I believe the clause you are looking for is this one, from 3.3.2.3 of
the draft available at http://dev.unicals.com/papers/c89-draft.html:

"With one exception, if a member of a union object is accessed after a
value has been stored in a different member of the object, the
behavior is implementation-defined." [with the one exception being the
one you pointed out]


This sentence has disappeared from the Standard by now. There is
however a similar statement in an informative annex.
Nov 15 '05 #9
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
This sentence has disappeared from the Standard by now. There is
however a similar statement in an informative annex.


What, exactly, is the difference between "normative" and
"informative"? IIUC, "informative" is not strictly "standard" - does
that mean that there is no "normative" text specifying how
implementations should deal with union member access?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Nov 15 '05 #10
Christopher Benson-Manica <at***@nospam.cyberspace.org> writes:
What, exactly, is the difference between "normative" and
"informative"?


Normative text is part of the standard.
Informative text, like footnotes, examples, and some appendices,
are not part of the standard. They are for information only.
--
Go not to Usenet for counsel, for they will say both no and yes.
Nov 15 '05 #11
Christopher Benson-Manica <at***@nospam.cyberspace.org> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
This sentence has disappeared from the Standard by now. There is
however a similar statement in an informative annex.


What, exactly, is the difference between "normative" and
"informative"? IIUC, "informative" is not strictly "standard" - does
that mean that there is no "normative" text specifying how
implementations should deal with union member access?


Taken from ISO/IEC Directives part 3:

3.4

normative elements

those elements setting out the provisions to which it is
necessary to conform in order to be able to claim compliance
with the standard

A "normative element" must be observed in order to conform to the
standard in question. Any "informative" text is supposed to be
right (and presumably useful), but it does not by itself impose
requirements on whatever is being defined in the standard. Both
normative text and informative text are part of a standard, but
only normative text imposes requirements that must be observed.

There is normative text that gives requirements for accessing
union members, but that text is sprinkled through the rest of the
C Standard. So it isn't easy to tell if the logical consequences
of those requirements imply implementation defined behavior.
Nov 15 '05 #12
Ben Pfaff <bl*@cs.stanford.edu> writes:
Christopher Benson-Manica <at***@nospam.cyberspace.org> writes:
What, exactly, is the difference between "normative" and
"informative"?


Normative text is part of the standard.
Informative text, like footnotes, examples, and some appendices,
are not part of the standard. They are for information only.


Not exactly. Both normative elements and informative elements
are part of a standard, but only normative elements give
provisions that must be observed in order to claim conformance
(to whatever it is that's being standardized).

(I admit it's a minor distinction; I thought some people
might appreciate the clarification.)
Nov 15 '05 #13
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
(I admit it's a minor distinction; I thought some people
might appreciate the clarification.)


I did - thank you.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Nov 15 '05 #14
Skarmander wrote:

James S. Singleton wrote:
Let S be a pointer to a bytestring of length L.
I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4,
and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.


Impossible if I take you literally, since an unsigned int isn't
guaranteed to be any bigger than 16 bits, and 4 bytes will be 32 bits
(since we're presumably talking about 8-bit bytes, not "C bytes" which
can be larger).


"portable ANSI C" means "C bytes"

You are correct in that it is impossible.

Code doesn't have to be portable to be useful.

Pretending that code is portable when it isn't, is wrong.

--
pete
Nov 15 '05 #15
pete wrote:
<snip>
Code doesn't have to be portable to be useful.

Pretending that code is portable when it isn't, is wrong.


"Portable modulo X" is a meaningful concept. That is, "portable across
all machines where an int is 32 bits" is meaningful. Whether it's
acceptable depends, and nobody could advertise it as "100% portable ISO
C", but it's not "pretending". Just as long as you don't call it
"portable" without qualification.

S.
Nov 15 '05 #16
On 2005-10-27, Skarmander <in*****@dontmailme.com> wrote:
pete wrote:
<snip>
Code doesn't have to be portable to be useful.

Pretending that code is portable when it isn't, is wrong.


"Portable modulo X" is a meaningful concept. That is, "portable across
all machines where an int is 32 bits" is meaningful. Whether it's
acceptable depends, and nobody could advertise it as "100% portable ISO
C", but it's not "pretending". Just as long as you don't call it
"portable" without qualification.


And all that's needed in this case is to use long instead of int.
Nov 15 '05 #17
Jordan Abel wrote:

On 2005-10-27, Skarmander <in*****@dontmailme.com> wrote:
pete wrote:
<snip>
Code doesn't have to be portable to be useful.

Pretending that code is portable when it isn't, is wrong.

"Portable modulo X" is a meaningful concept.
That is, "portable across
all machines where an int is 32 bits" is meaningful. Whether it's
acceptable depends,
and nobody could advertise it as "100% portable ISO
C", but it's not "pretending". Just as long as you don't call it
"portable" without qualification.


OK.
And all that's needed in this case is to use long instead of int.
Qulaification is still required for that.

For this specification:
I would like to extract 4 bytes and store them into an unsigned int. 1) In portable ANSI C.


sizeof(long) can be less than 4 if CHAR_BIT is greater than 8.

--
pete
Nov 15 '05 #18

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Steve | last post by:
Hi, I have a very long string, someting like: DISPLAY=localhost:0.0,FORT_BUFFERED=true, F_ERROPT1=271\,271\,2\,1\,2\,2\,2\,2,G03BASIS=/opt/g03b05/g03/basis,...
2
by: Pablo | last post by:
I have a dilemma. Currently, I may be passing standard text (strings of char) or binary of 1 to 'x' bytes long to a program for comparison with data previously written to a file. The problem...
5
by: Michael Hill | last post by:
Hi, folks. I am writing a Javascript program that accepts (x, y) data pairs from a text box and then analyzes that data in various ways. This is my first time using text area boxes; in the past,...
1
by: v0lcan0 | last post by:
Any help on extracting the time part from the datetime field in SQL database. even though i had entered only the time part in the database when i extract the field it gives me only the date...
3
by: Alfred | last post by:
Hi I would like to extract only 15 records at a time from the backend in alfabetic order. Click on a button and then the next 15. Reason data must come over a 56k modem. The data is not...
2
by: Dickyb | last post by:
Extracting an Icon and Placing It On The Desktop (C# Language) I constructed a suite of programs in C++ several years ago that handle my financial portfolio, and now I have converted them to...
13
by: Randy | last post by:
Is there any way to do this? I've tried tellg() followed by seekg(), inserting the stream buffer to an ostringstream (ala os << is.rdbuf()), read(), and having no luck. The problem is, all of...
6
by: Tom E H | last post by:
My Python application includes some data files that need to be accessed by modules I distribute with it. Where can I put them, and how should I arrange my code, so that it works across...
6
by: Werner | last post by:
Hi, I try to read (and extract) some "self extracting" zipefiles on a Windows system. The standard module zipefile seems not to be able to handle this. False Is there a wrapper or has...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.