Portably extracting data from a bytestring

James S. Singleton

Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.
2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.

Nov 15 '05 #1

Subscribe Post Reply

1819

jacob navia

James S. Singleton wrote:

Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.
2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.

typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}

This assumes that at the given location an integer was stored.
The problem is that you did not define what "extract four bytes"
and "store them in an unsigned int" really means.

If you do not care about alignment (x86 architecture) you could

unsigned int convert(char *S,int d)
{
U *u;
u = (U *)(S+d);
return u->i;
}
More efficient, but you could get an alignment trap.

Both suppose that
1) You have stored before an integer at that location
2) You read them in the same machine architecture.

jacob

Nov 15 '05 #2

Ben Pfaff

jacob navia <ja***@jacob.remcomp.fr> writes:

James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this
1) In portable ANSI C. 2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.

typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}

Why not just this:

unsigned int convert(char *s, int d)
{
unsigned int i;
memcpy(&i, s + d, sizeof i);
return i;
}

Character access is allowed to any type; memcpy() does character access.
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}

Nov 15 '05 #3

Walter Roberson

In article <43**********************@news.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:

James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this 1) In portable ANSI C.
typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U; unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}

I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]

You are on safer grounds to cast the object pointer to char* .
--
I was very young in those days, but I was also rather dim.
-- Christopher Priest

Nov 15 '05 #4

Skarmander

James S. Singleton wrote:

Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.
Impossible if I take you literally, since an unsigned int isn't
guaranteed to be any bigger than 16 bits, and 4 bytes will be 32 bits
(since we're presumably talking about 8-bit bytes, not "C bytes" which
can be larger).

Make it an unsigned long instead. You could redescribe the problem as
"extracting sizeof(unsigned int) bytes" too, but this is something
different, and it may not be the problem at hand.

Alternatively, you could mean "in ANSI C that's portable save for the
assumption that an unsigned int is 32 bits". This will be acceptable for
the majority of existing platforms, as long as you keep in mind the
limits of portability here.
2) As efficiently as possible.
That's the trick, isn't it? The most efficient thing you can do is
obviously just interpreting those 4 bytes as an int through a union. But
that's not guaranteed to work (also see below).
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.

How can we take it into account if you don't describe what endianness
issues there are? What do the bytes in the string mean? Assuming the
four bytes are a contiguous sequence of bits making up the binary
representation of an integer, you'd still need to know in what order
they're stored before you can turn them into a machine integer.

Theoretically there are 24 separate orderings, but of course the only
ones that matter in practice are big-endian (call this B4 B3 B2 B1) and
little-endian (B1 B2 B3 B4), and maybe some mixed form for 16-bit
architectures (B3 B4 B1 B2 and B2 B1 B4 B3, perverse but not unheard
of). You do not need to know the endianness of the target architecture
to perform the conversion (though it may help for efficiency), but you
do need to know the endianness of the bytes in the string.

For practical approaches, see the "obvious" solutions already posted by
others. It's important to know what problems these solve, and if they
match the problem you described.

S.

Nov 15 '05 #5

Christopher Benson-Manica

Walter Roberson <ro******@ibd.nrc-cnrc.gc.ca> wrote:

I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]

I believe the clause you are looking for is this one, from 3.3.2.3 of
the draft available at http://dev.unicals.com/papers/c89-draft.html:

"With one exception, if a member of a union object is accessed after a
value has been stored in a different member of the object, the
behavior is implementation-defined." [with the one exception being the
one you pointed out]

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.

Nov 15 '05 #6

jacob navia

Ben Pfaff wrote:

jacob navia <ja***@jacob.remcomp.fr> writes:

James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this
1) In portable ANSI C. 2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.

typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}

Why not just this:

unsigned int convert(char *s, int d)
{
unsigned int i;
memcpy(&i, s + d, sizeof i);
return i;
}

Character access is allowed to any type; memcpy() does character access.

Well Ben, you are right :-)

Much shorter, and essentially the same stuff.

jacob

Nov 15 '05 #7

Tim Rentsch

ro******@ibd.nrc-cnrc.gc.ca (Walter Roberson) writes:

In article <43**********************@news.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:
typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,sizeof(unsigned int));
return u.i;
}

I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]

My best understanding is that it's debateable whether accessing
a union member other than the last one written results in
undefined behavior or in implementation-defined behavior. An
entry in an (informative) annex lists it as implementation-defined.

Nov 15 '05 #8

Tim Rentsch

Christopher Benson-Manica <at***@nospam.cyberspace.org> writes:

Walter Roberson <ro******@ibd.nrc-cnrc.gc.ca> wrote:
I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]

I believe the clause you are looking for is this one, from 3.3.2.3 of
the draft available at http://dev.unicals.com/papers/c89-draft.html:

"With one exception, if a member of a union object is accessed after a
value has been stored in a different member of the object, the
behavior is implementation-defined." [with the one exception being the
one you pointed out]

This sentence has disappeared from the Standard by now. There is
however a similar statement in an informative annex.

Nov 15 '05 #9

Christopher Benson-Manica

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:

This sentence has disappeared from the Standard by now. There is
however a similar statement in an informative annex.

What, exactly, is the difference between "normative" and
"informative"? IIUC, "informative" is not strictly "standard" - does
that mean that there is no "normative" text specifying how
implementations should deal with union member access?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.

Nov 15 '05 #10

Ben Pfaff

Christopher Benson-Manica <at***@nospam.cyberspace.org> writes:

What, exactly, is the difference between "normative" and
"informative"?

Normative text is part of the standard.
Informative text, like footnotes, examples, and some appendices,
are not part of the standard. They are for information only.
--
Go not to Usenet for counsel, for they will say both no and yes.

Nov 15 '05 #11

Tim Rentsch

Christopher Benson-Manica <at***@nospam.cyberspace.org> writes:

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
This sentence has disappeared from the Standard by now. There is
however a similar statement in an informative annex.

What, exactly, is the difference between "normative" and
"informative"? IIUC, "informative" is not strictly "standard" - does
that mean that there is no "normative" text specifying how
implementations should deal with union member access?

Taken from ISO/IEC Directives part 3:

3.4

normative elements

those elements setting out the provisions to which it is
necessary to conform in order to be able to claim compliance
with the standard

A "normative element" must be observed in order to conform to the
standard in question. Any "informative" text is supposed to be
right (and presumably useful), but it does not by itself impose
requirements on whatever is being defined in the standard. Both
normative text and informative text are part of a standard, but
only normative text imposes requirements that must be observed.

There is normative text that gives requirements for accessing
union members, but that text is sprinkled through the rest of the
C Standard. So it isn't easy to tell if the logical consequences
of those requirements imply implementation defined behavior.

Nov 15 '05 #12

Tim Rentsch

Ben Pfaff <bl*@cs.stanford.edu> writes:

Christopher Benson-Manica <at***@nospam.cyberspace.org> writes:
What, exactly, is the difference between "normative" and
"informative"?

Normative text is part of the standard.
Informative text, like footnotes, examples, and some appendices,
are not part of the standard. They are for information only.

Not exactly. Both normative elements and informative elements
are part of a standard, but only normative elements give
provisions that must be observed in order to claim conformance
(to whatever it is that's being standardized).

(I admit it's a minor distinction; I thought some people
might appreciate the clarification.)

Nov 15 '05 #13

Christopher Benson-Manica

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:

(I admit it's a minor distinction; I thought some people
might appreciate the clarification.)

I did - thank you.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.

Nov 15 '05 #14

pete

Skarmander wrote:

James S. Singleton wrote:
Let S be a pointer to a bytestring of length L.
I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4,
and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.

Impossible if I take you literally, since an unsigned int isn't
guaranteed to be any bigger than 16 bits, and 4 bytes will be 32 bits
(since we're presumably talking about 8-bit bytes, not "C bytes" which
can be larger).

"portable ANSI C" means "C bytes"

You are correct in that it is impossible.

Code doesn't have to be portable to be useful.

Pretending that code is portable when it isn't, is wrong.

--
pete

Nov 15 '05 #15

Skarmander

pete wrote:
<snip>

Code doesn't have to be portable to be useful.

Pretending that code is portable when it isn't, is wrong.

"Portable modulo X" is a meaningful concept. That is, "portable across
all machines where an int is 32 bits" is meaningful. Whether it's
acceptable depends, and nobody could advertise it as "100% portable ISO
C", but it's not "pretending". Just as long as you don't call it
"portable" without qualification.

S.

Nov 15 '05 #16

Jordan Abel

On 2005-10-27, Skarmander <in*****@dontmailme.com> wrote:

pete wrote:
<snip>
Code doesn't have to be portable to be useful.

Pretending that code is portable when it isn't, is wrong.

"Portable modulo X" is a meaningful concept. That is, "portable across
all machines where an int is 32 bits" is meaningful. Whether it's
acceptable depends, and nobody could advertise it as "100% portable ISO
C", but it's not "pretending". Just as long as you don't call it
"portable" without qualification.

And all that's needed in this case is to use long instead of int.

Nov 15 '05 #17

pete

Jordan Abel wrote:

On 2005-10-27, Skarmander <in*****@dontmailme.com> wrote:
pete wrote:
<snip>
Code doesn't have to be portable to be useful.

Pretending that code is portable when it isn't, is wrong.

"Portable modulo X" is a meaningful concept.
That is, "portable across
all machines where an int is 32 bits" is meaningful. Whether it's
acceptable depends,
and nobody could advertise it as "100% portable ISO
C", but it's not "pretending". Just as long as you don't call it
"portable" without qualification.

OK.
And all that's needed in this case is to use long instead of int.
Qulaification is still required for that.

For this specification:
I would like to extract 4 bytes and store them into an unsigned int. 1) In portable ANSI C.

sizeof(long) can be less than 4 if CHAR_BIT is greater than 8.

--
pete

Nov 15 '05 #18

Similar topics

Extracting/finding strings from a list

by: Steve | last post by:

Hi, I have a very long string, someting like: DISPLAY=localhost:0.0,FORT_BUFFERED=true, F_ERROPT1=271\,271\,2\,1\,2\,2\,2\,2,G03BASIS=/opt/g03b05/g03/basis,...

Python

How to treat unknown data...

by: Pablo | last post by:

I have a dilemma. Currently, I may be passing standard text (strings of char) or binary of 1 to 'x' bytes long to a program for comparison with data previously written to a file. The problem...

C / C++

Extracting Numerica Data Pairs from Text Box

by: Michael Hill | last post by:

Hi, folks. I am writing a Javascript program that accepts (x, y) data pairs from a text box and then analyzes that data in various ways. This is my first time using text area boxes; in the past,...

Javascript

Extracting Time from the SQL DateTime field.

by: v0lcan0 | last post by:

Any help on extracting the time part from the datetime field in SQL database. even though i had entered only the time part in the database when i extract the field it gives me only the date...

Microsoft SQL Server

extracting data

by: Alfred | last post by:

Hi I would like to extract only 15 records at a time from the backend in alfabetic order. Click on a button and then the next 15. Reason data must come over a 56k modem. The data is not...

Microsoft Access / VBA

Extracting an Icon and Placing It On The Desktop (C# Language)

by: Dickyb | last post by:

Extracting an Icon and Placing It On The Desktop (C# Language) I constructed a suite of programs in C++ several years ago that handle my financial portfolio, and now I have converted them to...

C# / C Sharp

COPYING (not extracting) data from an istream object

by: Randy | last post by:

Is there any way to do this? I've tried tellg() followed by seekg(), inserting the stream buffer to an ostringstream (ala os << is.rdbuf()), read(), and having no luck. The problem is, all of...

C / C++

Accessing application data portably

by: Tom E H | last post by:

My Python application includes some data files that need to be accessed by modules I distribute with it. Where can I put them, and how should I arrange my code, so that it works across...

Python

self extracting zipefile (windows) and (standard module) zipefile

by: Werner | last post by:

Hi, I try to read (and extract) some "self extracting" zipefiles on a Windows system. The standard module zipefile seems not to be able to handle this. False Is there a wrapper or has...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice