473,785 Members | 2,618 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Pointing to high and low bytes of something

My code contains this declaration:

: typedef union {
: word Word;
: struct {
: byte Low;
: byte High;
: } Bytes;
: } reg;

The colons are not part of the declaration.

Assume that 'word' is always a 16-bit unsigned integral type, and that
'byte' is always an 8-bit unsigned integral type ('unsigned short int'
and 'unsigned char' respectively on my implementation) .

My understanding, after browsing through previous threads on this and
other newsgroup, is that given a variable Var of type reg, accessing
Var.Word after having assigned values to Var.Bytes.Low and
Var.Bytes.High or, conversely, accessing Var.Bytes.Low and
Var.Bytes.High after having assigned a value to Var.Word, results in
implementation-defined behavior (or possibly undefined behavior).

If it is indeed implementation-defined behavior, my question is: can
the implementation only take the liberty to choose whether
Var.Bytes.Low or Var.Bytes.High will contain the LSB of Var.Word, and
whether Var.Bytes.High or Var.Bytes.Low will contains the MSB, or can
the implementation take other liberties?

Intuitively, I would say that there is more than this (specifically,
that the compiler can insert padding after the first member of the
Bytes struct), but some articles I've read seemed to imply otherwise.

Anyway, it all comes down to: assume that I am willing to sacrifice
portability by forcing the maintainer to exchange the positions of the
two members of Bytes depending on the implementation; do I then have a
guarantee that Var.Bytes.Low will always evaluate to the LSB of
Var.Word, and that Var.Bytes.High will always evaluate to the MSB of
Var.Word?

If not, then I would gladly accept suggestions on how to change my
code.
Keep in mind that I need to access:
1) Var.Word (or its equivalent after the change) by address
2) Var.Bytes.Low (or its equivalent) by address
3) Var.Bytes.High (or its equivalent) by address
to the effect that this code can be modified in a straight-forward way
to work as intended:

: #include <stdlib.h>
: #include <stdio.h>
:
: int main() {
: reg Var;
: reg *VarWordP;
: reg *VarLSBP;
: reg *VarMSBP;
: VarWordP=&(Var. Word);
: VarLSBP=&(Var.B ytes.Low);
: VarMSBP=&(Var.B ytes.High);
: *VarWordP=0x123 4;
: printf("%x %x %x\n", *VarWordP, *VarLSBP, *VarMSBP);
: return 0;
: }

Assume type reg has been defined as above. I should always get
1234 34 12
as the program's output, save any changes that could be needed in the
printf() format specifiers.
by LjL
lj****@tiscali. it
Nov 13 '05
19 5886
Dan Pop wrote:

Your observation about LSB and MSB is theoretically correct, but C
implementations on 8-bit bytes machines are supposed to use the
underlying hardware bit order, which never assigns the bits randomly.
The only known variation is the byte order, but not the order of bits
inside a byte.


<pet-peeve>

Unless the machine is able to address objects smaller
than a byte, "the order of bits inside a byte" is undetectable
and need not even be meaningful at all.

An example, from the ever-popular DeathStation 9000.
As everyone knows, some models of the DS9000 use an eleven-
bit byte, and the hardware manual says that those eleven
bits occupy all but one of the vertices of a regular
dodecahedron (the twelfth vertex is reserved for future
expansion).

Another DS9000 model also uses eleven-bit bytes, but
arranges them differently: the ten low-order bits are
stored in five four-state "fits" with the sign bit in a
single two-state device positioned between the second
and third fits:

512, 1 : leftmost fit
256, 2 : second fit
sign : bit
128, 4 : third fit
64, 8 : fourth fit
32, 16 : rightmost fit

The challenge is to devise a C program that can
determine which DS9000 model it is running on. I do not
believe the challenge can be met, and so I assert that
"the order of bits inside a byte" is a vacuous concept
on machines that don't support sub-byte addressing.

</pet-peeve>

--
Er*********@sun .com
Nov 13 '05 #11
In <3F************ ***@sun.com> Eric Sosman <Er*********@su n.com> writes:
Dan Pop wrote:

Your observation about LSB and MSB is theoretically correct, but C
implementations on 8-bit bytes machines are supposed to use the
underlying hardware bit order, which never assigns the bits randomly.
The only known variation is the byte order, but not the order of bits
inside a byte.


<pet-peeve>

Unless the machine is able to address objects smaller
than a byte, "the order of bits inside a byte" is undetectable
and need not even be meaningful at all.


When you map a wider object by an array of bytes, it helps a lot if the
order of bits inside the byte is consistent with the order of bits inside
the wider object. Both hardware designers and C implementors seem to
agree on this point.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #12
Dan Pop wrote:

In <3F************ ***@sun.com> Eric Sosman <Er*********@su n.com> writes:
Dan Pop wrote:

Your observation about LSB and MSB is theoretically correct, but C
implementations on 8-bit bytes machines are supposed to use the
underlying hardware bit order, which never assigns the bits randomly.
The only known variation is the byte order, but not the order of bits
inside a byte.


<pet-peeve>

Unless the machine is able to address objects smaller
than a byte, "the order of bits inside a byte" is undetectable
and need not even be meaningful at all.


When you map a wider object by an array of bytes, it helps a lot if the
order of bits inside the byte is consistent with the order of bits inside
the wider object. Both hardware designers and C implementors seem to
agree on this point.


<topicality degree="strayin g">

I think you've missed the thrust of the argument, or
perhaps the argument's thrust went wide of you. I'm saying
that (1) bit order is not detectable by any C construct I
can imagine, (2) bit order is not detectable by any CPU
instruction on a machine that lacks bit-level addressing,
(3) by Occam's Razor, that which is undetectable is better
omitted from discussion.

I offered two fanciful examples of situations where bit
order could not really be said to exist at all. For a real-
life example, consider the signals that travel between a pair
of modems. Early modems used two easily-distinguished tones
to transmit individual bits: BEEP for zero and BOOP for one,
as it were. Later, it was found that higher speeds could be
obtained by associating the bits with the transitions between
the tones rather than with the tones themselves. Modern
modems go even further: they use a whole palette of tones
(BEEP, BOOP, BRAAP, BZZZ, ...) and encode a whole bunch of
bits in each transition.

The question: What is the "bit order" of the N bits
encoded by one single BZZZ-to-BEEP transition in this scheme?
Note that all N bits leave the transmitter encoded in one
single event and arrive at the receiver the same way: they
are simultaneous and indivisible -- and I say the entire
idea of "bit order" in such a situation is meaningless.

</topicality>

--
Er*********@sun .com
Nov 13 '05 #13
"Simon Biber" <ne**@ralminNOS PAM.cc> wrote in message news:<3f******* *************** *@news.optusnet .com.au>...
"Dan Pop" <Da*****@cern.c h> wrote:
When you map a wider object by an array of bytes, it helps a
lot if the order of bits inside the byte is consistent with
the order of bits inside the wider object. Both hardware
designers and C implementors seem to agree on this point.


I think Dan gets the point while Eric does not.

Even if we assume that the type unsigned short
(a) is 16 bits
(b) is two bytes
(c) has no padding bits

If I wrote:
unsigned short x = 0x1234;
unsigned char *a = (unsigned char *)&x;

a[0] need not be either 0x12 or 0x34, and a[1] need not be
either 0x12 or 0x34. This is because the value bits can be
stored in a DIFFERENT order for unsigned short compared to
unsigned char.

The value 0x1234 could be mapped into the two bytes as:
a[0] == 0x13
a[1] == 0x24
If we then replace:
a[0] = 0x26
a[1] = 0x48
And then see that
x == 0x2468
I believe that would be a conforming implementation.


Sure, an implementation conforming to my desire to strangle it.
But I can see your point. Do you have any suggestion on how to solve
my puzzle (with bit-masks being the only viable solution I suppose,
given the above)? My problem can be basically summarized as:
1) I am given a 'pointer' to a 'byte' or a 'word' (I know in advance
whether it'll be 'byte' or 'word', so I can branch accordingly). While
my 'pointers' are real pointers ATM, feel free to extend the meaning
of 'pointer' as "anything that uniquely identifies the object it
refers to".
2) I should be able to use the dereferenced pointer both as an
expression value and as an lvalue; I need to assign to it.
3) Whenever I assign to a dereferenced pointer, the value of the
dereferenced pointer itself changes (obviously), but there is at least
another dereferenced pointer among those I can get at point 1) that
changes simultaneously. Specifically, if I assign to a deref. pointer
to 'word', two deref. pointers to two 'byte's will change; if I assign
to a deref. pointer to 'byte', one deref. pointer to 'word' will
change.

In real life, in case it's easier to understand, this translates to:
I am simulating a processor that has some registers called B, C, D, E,
H and L. These are 8-bit. The processor, however, can also treat them
as the 16-bit pairs BC, DE and HL.
Given a simulated machine instruction (which by definition tells me
whether it wants to access a register or a register pair), I can call
a function(said instruction) that returns me a pointer to the operand
- that is, depending on the instruction, a pointer to an 8-bit or to a
16-bit value.
I then use the dereferenced pointer how I see fit.

With a solution that has (B, C, D, E, H) and (BC, DE, HL) as separate
variables, when one group gets modified, the other group does not need
to be synchronized immediately, it can wait the next loop iteration,
if this helps.

Of course, I do have (more than) a solution: for example, I could
simply use an 'assign' function, instead of the normal C assignment
operator, that takes care of synchronizing the values.
But it's not a solution I like too much, and I was hoping someone here
could find a more elegant one (or, dare I say, a 'more efficient' one,
with the word 'efficiency' being defined vaguely enough - say, as few
dumb synchronize-thee function calls as possible).
On a side note, anyone who has the temper to tell me "keep using your
pointers, no implementation in the next 50 years will mess them up"?
:-)
by LjL
lj****@tiscali. it
Nov 13 '05 #14
On 20 Nov 2003 04:11:38 -0800, lj****@tiscalin et.it (Lorenzo J.
Lucchini) wrote:
Sure, an implementation conforming to my desire to strangle it.
But I can see your point. Do you have any suggestion on how to solve
my puzzle (with bit-masks being the only viable solution I suppose,
given the above)?


Sure, but it's OT here. Stop worrying about it, and include a comment
saying "If this should fail when ported to another implementation,
please call Simon."

--
Al Balmer
Balmer Consulting
re************* ***********@att .net
Nov 13 '05 #15
In <1f************ **************@ posting.google. com> lj****@tiscalin et.it (Lorenzo J. Lucchini) writes:
In real life, in case it's easier to understand, this translates to:
I am simulating a processor that has some registers called B, C, D, E,
H and L. These are 8-bit. The processor, however, can also treat them
as the 16-bit pairs BC, DE and HL.
Given a simulated machine instruction (which by definition tells me
whether it wants to access a register or a register pair), I can call
a function(said instruction) that returns me a pointer to the operand
- that is, depending on the instruction, a pointer to an 8-bit or to a
16-bit value.
I then use the dereferenced pointer how I see fit.

With a solution that has (B, C, D, E, H) and (BC, DE, HL) as separate
variables, when one group gets modified, the other group does not need
to be synchronized immediately, it can wait the next loop iteration,
if this helps.


The "no assumptions" solution is to simply use an array of unsigned char
for storing the values of the individual registers, in the order
B, C, D, E, H, L, padding or F, A. This order is made obvious by the
Z80/8080 instruction encoding.

When you need a register pair, you compute it on the fly:

words[DE] = (unsigned)regs[D] << 8 + regs[E];

When an instruction has modified a register pair (few Z80 and even fewer
8080 instructions can do that), you update the individual registers:

regs[D] = words[DE] >> 8;
regs[E] = words[DE] & 0xFF;

I also believe that this approach will actually simplify the overall
coding of your simulator, because it allows using the register fields
inside the opcodes to be used as indices in the array, so you never have
to figure out what is the variable corresponding to a value of 2 in the
register field, you simply use 2 as an index in the registers array.
The instruction decoding becomes a piece of cake, this way.

The words array doesn't have to be kept in sync at all, except when
simulating a word instruction or indirect addressing via HL (and even
then, only the relevant elements have to be synchronised).

Mapping the registers by words doesn't work well for little endian
platforms, because the right way of doing it (in the framework of your
initial assumptions) would be:

unsigned short words[4];
unsigned char *regs = (unsigned char *)words;

But this would map B into the LSB of BC and C into the MSB of BC, which
is wrong. And you really want to store the registers in the order
defined above.

The union approach may look tempting, but it doesn't fit very well into
the scheme of a simple and efficient emulator. Using the right data
structure and format for the registers is essential for the rest of the
code of the simulator and I believe that my solution, apart from relying
on no assumptions, is also optimal for the rest of the program.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #16
Da*****@cern.ch (Dan Pop) wrote in message news:<bp******* ***@sunnews.cer n.ch>...
In <1f************ **************@ posting.google. com> lj****@tiscalin et.it (Lorenzo J. Lucchini) writes:
[Registers and register pairs on a Z80 and how to handle a
simulation of them in C]
The "no assumptions" solution is to simply use an array of unsigned char
for storing the values of the individual registers, in the order
B, C, D, E, H, L, padding or F, A. This order is made obvious by the
Z80/8080 instruction encoding.

When you need a register pair, you compute it on the fly:

words[DE] = (unsigned)regs[D] << 8 + regs[E];

When an instruction has modified a register pair (few Z80 and even fewer
8080 instructions can do that), you update the individual registers:

regs[D] = words[DE] >> 8;
regs[E] = words[DE] & 0xFF;

I also believe that this approach will actually simplify the overall
coding of your simulator, because it allows using the register fields
inside the opcodes to be used as indices in the array, so you never have
to figure out what is the variable corresponding to a value of 2 in the
register field, you simply use 2 as an index in the registers array.
The instruction decoding becomes a piece of cake, this way.


While it looks like this approach will require some quite extensive
reworking of my code - which I hoped to avoid -, it does look
extremely interesting. I'll do it.

Clearly, I knew perfectly well that instructions have a register
field, which in turn means it's 'obvious' that there is intrinsicly a
preferred order for the registers... Nevertheless, I don't think I
would have ever thought of putting them in an array. I cannot but
thank you for the suggestion.

I'll probably submit some code for review, when it's in a better
shape.
[snip]


by LjL
lj****@tiscali. it
Nov 13 '05 #17
In <1f************ **************@ posting.google. com> lj****@tiscalin et.it (Lorenzo J. Lucchini) writes:
Da*****@cern.c h (Dan Pop) wrote in message news:<bp******* ***@sunnews.cer n.ch>...
In <1f************ **************@ posting.google. com> lj****@tiscalin et.it (Lorenzo J. Lucchini) writes:
> [Registers and register pairs on a Z80 and how to handle a
> simulation of them in C]


The "no assumptions" solution is to simply use an array of unsigned char
for storing the values of the individual registers, in the order
B, C, D, E, H, L, padding or F, A. This order is made obvious by the
Z80/8080 instruction encoding.

When you need a register pair, you compute it on the fly:

words[DE] = (unsigned)regs[D] << 8 + regs[E];

When an instruction has modified a register pair (few Z80 and even fewer
8080 instructions can do that), you update the individual registers:

regs[D] = words[DE] >> 8;
regs[E] = words[DE] & 0xFF;

I also believe that this approach will actually simplify the overall
coding of your simulator, because it allows using the register fields
inside the opcodes to be used as indices in the array, so you never have
to figure out what is the variable corresponding to a value of 2 in the
register field, you simply use 2 as an index in the registers array.
The instruction decoding becomes a piece of cake, this way.


While it looks like this approach will require some quite extensive
reworking of my code - which I hoped to avoid -, it does look
extremely interesting. I'll do it.


To show you how it works, here's a function covering a quarter of the 8080
opcode space:

#define BC 0
#define DE 1
#define HL 2

#define COMP(rp) ((unsigned)regs[rp * 2] << 8 + regs[rp * 2 + 1])
#define SYNC(rp) (words[rp] = COMP(rp))
#define SPILL(rp) (regs[rp * 2] = words[rp] >> 8,\
regs[rp * 2 + 1] = words[rp] & 0xFF)

unsigned char regs[8], shadowregs[8], mem[0x10000];
unsigned words[3], ix, iy, pc, sp;

void ld8rr(int opcode)
{
int dest = (opcode & 0x38) >> 3;
int src = opcode & 7;

assert(opcode != 0x76); /* that's the HALT opcode! */

if (dest != 6 && src != 6) {
regs[dest] = regs[src];
return;
}

if (dest == 6) {
mem[COMP(HL)] = regs[src];
return;
}

regs[dest] = mem[COMP(HL)];
return;
}

COMP merely computes the value of a register pair, while SYNC also
syncronizes the respective element of the words array. SPILL is used
to update the regs when a 16-bit instruction has changed the value of a
register pair.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #18
In article <news:PT******* **********@news read1.news.pas. earthlink.net>
mike gillmore <rm********@hot mail.com> writes:
I have used this little program for many years to discover the machine
endian-ness. Use it in good health.
[most of it snipped, but here is the output line]
printf( " %x %s %x isBigEndian = %s(%d)\n",
*firstBytePtr, isBigEndian ? "!=" : "==", ( unsigned char )testValue,
isBigEndian ? "TRUE" : "FALSE", isBigEndian );


So, which endian-ness does this claim for the PDP-11? Which one
*should* it claim? The PDP-11 has little-endian bytes-in-16-bit-words
and big-endian 16-bit-words-in-32-bit-longs (and also in floats).

If you give up control of endian-ness and choose instead to match
the byte order of your local machine(s), be aware that there are
more than two.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 13 '05 #19
In <bp**********@e lf.torek.net> Chris Torek <no****@torek.n et> writes:
If you give up control of endian-ness and choose instead to match
the byte order of your local machine(s), be aware that there are
more than two.


However, the large popularity of the terms "big endian" and "little
endian" is a strong hint that all the others are history.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3431
by: Jeremy Sanders | last post by:
Hi - I'd like to write a program which basically does a few snmpgets. I haven't been able to find a python package which gives you a nice high-level and simple way of doing this (like PHP has). Everything appears to be extremely low level. All I need is SNMPv1. Does anyone know of a simple python package for doing this? I'd rather have something written in pure python, so that it is easily cross-platform.
62
3397
by: christopher diggins | last post by:
Since nobody responded to my earlier post , I thought I would try to explain what I am doing a bit differently. When multiply inheriting pure virtual (abstract) base classes, a class obviously bloats quickly for each new vtable needed. Execution slows down considerably as well. You can work around this by using interfaces referemnces which have a pointer to the object and a pointer to an external function lookup table. This technique...
4
4484
by: James Roberge | last post by:
I am having a little trouble getting my union/struct to work correctly. I am creating a struct that will contain information about the status of various Z80 cpu registers in an emulator i am trying to write. some of the registers such as "DE" can be accessed as 16 bit data or the high and low bytes can be accessed separately. SO, "DE" refers to the 16 bit data, where "D" and "E" refer to the high and low bytes respectively. Here is...
1
5013
by: Jim | last post by:
Hello, I'm trying to do urllib.urlencode() with unicode correctly, and I wonder if some kind person could set me straight? My understanding is that I am supposed to be able to urlencode anything up to the top half of latin-1 -- decimal 128-255. I can't just send urlencode a unicode character:
1
2794
by: bwmiller16 | last post by:
Folks - I'm seeing this warning on the log...I'm curious if it has a more sinister meaning than what's expressed here. It seems to me that if IBM wants me to know about this then there might be something else going on... Linux AS 3 on UDB ESE 8 FP9.
23
1944
by: Gautam | last post by:
this piece of code assigns an int pointer(evident) to a char, and when i try to access the ascii value of the char through the integer pointer(p) , what i get is a junk value or not i don't know ! 'cause it shows consistent values for diff. chars , for eg. c = 'a' ans=>-9119
6
7161
by: Friso Wiskerke | last post by:
Hi all, I'm creating a fixed length textfile with data which is sent out to a third-party which in turn reads the file and processes it. Some of the characters are not part of the lower ASCII table. This causes problems because an È (&HC4) in the textfile is converted into 2 bytes on the receiving end which then in turn shifts the remaining data on the line one byte to the right... and in a fixed length textfile that's a disaster Is...
11
9822
by: Usenet User | last post by:
..NET 1.1/2.0 I have a need to display high-resolution scrollable images in a .NET application (Windows Forms). One well known solution is to create a Panel with AutoScroll set to "true" and then add a PictureBox or another Panel to it, that is used to display the image. The above approach works, however, to my surprise, .NET GDI+-based graphics are not really hi-res friendly.
17
7970
by: Cesar | last post by:
Hello people. I'm having a Winform app that contains a webbrowser control that keeps navigating from one page to another permanentrly to make some tests. The problem I'm having is that after a while, the application is using more than 100 or 150 Mb in RAM, and if I let it continue, it can leave the system without memory. I've been watching in some pages that other people has the same problem with this control when keep navigating for a...
0
9645
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10341
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10155
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9954
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7502
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5513
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4054
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3656
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2881
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.