473,698 Members | 2,751 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Portably extracting data from a bytestring

Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.
2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.


Nov 15 '05 #1
17 1849
James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.
2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.

typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,s izeof(unsigned int));
return u.i;
}

This assumes that at the given location an integer was stored.
The problem is that you did not define what "extract four bytes"
and "store them in an unsigned int" really means.

If you do not care about alignment (x86 architecture) you could

unsigned int convert(char *S,int d)
{
U *u;
u = (U *)(S+d);
return u->i;
}
More efficient, but you could get an alignment trap.

Both suppose that
1) You have stored before an integer at that location
2) You read them in the same machine architecture.

jacob
Nov 15 '05 #2
jacob navia <ja***@jacob.re mcomp.fr> writes:
James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this
1) In portable ANSI C. 2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.

typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,s izeof(unsigned int));
return u.i;
}


Why not just this:

unsigned int convert(char *s, int d)
{
unsigned int i;
memcpy(&i, s + d, sizeof i);
return i;
}

Character access is allowed to any type; memcpy() does character access.
--
int main(void){char p[]="ABCDEFGHIJKLM NOPQRSTUVWXYZab cdefghijklmnopq rstuvwxyz.\
\n",*q="kl BIcNBFr.NKEzjwC IxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+= strchr(p,*q++)-p;if(i>=(int)si zeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}
Nov 15 '05 #3
In article <43************ **********@news .wanadoo.fr>,
jacob navia <ja***@jacob.re mcomp.fr> wrote:
James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this 1) In portable ANSI C.
typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U; unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,s izeof(unsigned int));
return u.i;
}


I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]

You are on safer grounds to cast the object pointer to char* .
--
I was very young in those days, but I was also rather dim.
-- Christopher Priest
Nov 15 '05 #4
James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this

1) In portable ANSI C.
Impossible if I take you literally, since an unsigned int isn't
guaranteed to be any bigger than 16 bits, and 4 bytes will be 32 bits
(since we're presumably talking about 8-bit bytes, not "C bytes" which
can be larger).

Make it an unsigned long instead. You could redescribe the problem as
"extracting sizeof(unsigned int) bytes" too, but this is something
different, and it may not be the problem at hand.

Alternatively, you could mean "in ANSI C that's portable save for the
assumption that an unsigned int is 32 bits". This will be acceptable for
the majority of existing platforms, as long as you keep in mind the
limits of portability here.
2) As efficiently as possible.
That's the trick, isn't it? The most efficient thing you can do is
obviously just interpreting those 4 bytes as an int through a union. But
that's not guaranteed to work (also see below).
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.


How can we take it into account if you don't describe what endianness
issues there are? What do the bytes in the string mean? Assuming the
four bytes are a contiguous sequence of bits making up the binary
representation of an integer, you'd still need to know in what order
they're stored before you can turn them into a machine integer.

Theoretically there are 24 separate orderings, but of course the only
ones that matter in practice are big-endian (call this B4 B3 B2 B1) and
little-endian (B1 B2 B3 B4), and maybe some mixed form for 16-bit
architectures (B3 B4 B1 B2 and B2 B1 B4 B3, perverse but not unheard
of). You do not need to know the endianness of the target architecture
to perform the conversion (though it may help for efficiency), but you
do need to know the endianness of the bytes in the string.

For practical approaches, see the "obvious" solutions already posted by
others. It's important to know what problems these solve, and if they
match the problem you described.

S.
Nov 15 '05 #5
Walter Roberson <ro******@ibd.n rc-cnrc.gc.ca> wrote:
I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]


I believe the clause you are looking for is this one, from 3.3.2.3 of
the draft available at http://dev.unicals.com/papers/c89-draft.html:

"With one exception, if a member of a union object is accessed after a
value has been stored in a different member of the object, the
behavior is implementation-defined." [with the one exception being the
one you pointed out]

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cybers pace.org | don't, I need to know. Flames welcome.
Nov 15 '05 #6
Ben Pfaff wrote:
jacob navia <ja***@jacob.re mcomp.fr> writes:

James S. Singleton wrote:
Let S be a pointer to a bytestring of length L. I would like to extract 4
bytes from S at the location p = S + d, with 0 < d < L - 4, and store them
into an unsigned int. I am looking for suggestions on how to do this
1) In portable ANSI C. 2) As efficiently as possible.
3) Taking full account of the potential data alignment and
endianness issues that this action must tackle.


typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,s izeof(unsigned int));
return u.i;
}

Why not just this:

unsigned int convert(char *s, int d)
{
unsigned int i;
memcpy(&i, s + d, sizeof i);
return i;
}

Character access is allowed to any type; memcpy() does character access.


Well Ben, you are right :-)

Much shorter, and essentially the same stuff.

jacob
Nov 15 '05 #7
ro******@ibd.nr c-cnrc.gc.ca (Walter Roberson) writes:
In article <43************ **********@news .wanadoo.fr>,
jacob navia <ja***@jacob.re mcomp.fr> wrote:
typedef union {
unsigned char c[sizeof(unsigned int)];
unsigned int i;
} U;

unsigned int convert(char *S,int d)
{
U u;
memcpy(&u,S+d,s izeof(unsigned int));
return u.i;
}


I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]


My best understanding is that it's debateable whether accessing
a union member other than the last one written results in
undefined behavior or in implementation-defined behavior. An
entry in an (informative) annex lists it as implementation-defined.
Nov 15 '05 #8
Christopher Benson-Manica <at***@nospam.c yberspace.org> writes:
Walter Roberson <ro******@ibd.n rc-cnrc.gc.ca> wrote:
I can't find the clause at the moment, but I'm relatively sure
that the behaviour is undefined to read a union member out of a
union unless it was the same one last written [except for cases
where you are retrieving from the same fundamental types
in union members with common prefixes.]


I believe the clause you are looking for is this one, from 3.3.2.3 of
the draft available at http://dev.unicals.com/papers/c89-draft.html:

"With one exception, if a member of a union object is accessed after a
value has been stored in a different member of the object, the
behavior is implementation-defined." [with the one exception being the
one you pointed out]


This sentence has disappeared from the Standard by now. There is
however a similar statement in an informative annex.
Nov 15 '05 #9
Tim Rentsch <tx*@alumnus.ca ltech.edu> wrote:
This sentence has disappeared from the Standard by now. There is
however a similar statement in an informative annex.


What, exactly, is the difference between "normative" and
"informativ e"? IIUC, "informativ e" is not strictly "standard" - does
that mean that there is no "normative" text specifying how
implementations should deal with union member access?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cybers pace.org | don't, I need to know. Flames welcome.
Nov 15 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2998
by: Steve | last post by:
Hi, I have a very long string, someting like: DISPLAY=localhost:0.0,FORT_BUFFERED=true, F_ERROPT1=271\,271\,2\,1\,2\,2\,2\,2,G03BASIS=/opt/g03b05/g03/basis, GAMESS=/opt/gamess,GAUSS_ARCHDIR=/opt/g03b05/g03/arch, GAUSS_EXEDIR=/opt/g03b05/g03/bsd:/opt/g03b05/g03/private:/opt/g03b05/g
2
1960
by: Pablo | last post by:
I have a dilemma. Currently, I may be passing standard text (strings of char) or binary of 1 to 'x' bytes long to a program for comparison with data previously written to a file. The problem I'm having is I'm writing some routines which may compare the data written on the disk to the some data being passed by the programmer. And, the streams I'm passing may be shorter than the allowable length limit (defined at runtime) but never...
5
2948
by: Michael Hill | last post by:
Hi, folks. I am writing a Javascript program that accepts (x, y) data pairs from a text box and then analyzes that data in various ways. This is my first time using text area boxes; in the past, I have used individual entry fields for each variable. I would now like to use text area boxes to simplify the data entry (this way, data can be produced by another program--FORTRAN, "C", etc.--but analyzed online, so long as it is first...
1
17178
by: v0lcan0 | last post by:
Any help on extracting the time part from the datetime field in SQL database. even though i had entered only the time part in the database when i extract the field it gives me only the date part. i’m using Vb.net datagrid as a front end. any assistance appreciated!! :?: --
3
1926
by: Alfred | last post by:
Hi I would like to extract only 15 records at a time from the backend in alfabetic order. Click on a button and then the next 15. Reason data must come over a 56k modem. The data is not alphabetticaly in database. Any ideas how to right such a function thanks alfred
2
2813
by: Dickyb | last post by:
Extracting an Icon and Placing It On The Desktop (C# Language) I constructed a suite of programs in C++ several years ago that handle my financial portfolio, and now I have converted them to C#. The only significant problem that I have encountered in the conversion is this one - extracting an icon from the 'KTEntryPoint' program into the software suite and placing that icon on the PC Desktop.
13
3729
by: Randy | last post by:
Is there any way to do this? I've tried tellg() followed by seekg(), inserting the stream buffer to an ostringstream (ala os << is.rdbuf()), read(), and having no luck. The problem is, all of these methods EXTRACT the data at one point or another. The other problem is there appears to be NO WAY to get at the actual buffer pointer (char*) of the characters in the stream. There is a way to get the streambuf object associated with the...
6
1662
by: Tom E H | last post by:
My Python application includes some data files that need to be accessed by modules I distribute with it. Where can I put them, and how should I arrange my code, so that it works across platforms? On Linux, I could install the data to "/usr/lib/myprogram/datafile", and on Windows to "datafile" relative to where the executable (made by py2exe) is installed. Then I could detect the operating system, and choose appropriately.
6
4447
by: Werner | last post by:
Hi, I try to read (and extract) some "self extracting" zipefiles on a Windows system. The standard module zipefile seems not to be able to handle this. False Is there a wrapper or has some one experience with other libaries to
0
8674
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8603
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
8893
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8861
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7723
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5860
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4366
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4619
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2001
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.