473,320 Members | 1,810 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Problem with asm

Hello!

I got the following Code in Assembler (NASM), which prints out "5" in
realmode:

mov ax, 0xB800
mov es, ax
mov byte [es:0], '5'

I want to do the same in gcc now, but I'm stuck. GCC doesn't like the [es:0]
syntax...

asm("mov %ax, 0xB800");
asm("mov %es, %ax");
asm("mov [%es:0], '5'"); <-- Error about "[" :-(

Can anyone help me how to do this?

Or can I also do this directly in C with pointers and don't use the asm() at
all? I couldn't get this to work either...

Thanks for any help!

Patrik

Nov 14 '05 #1
37 2245
Patrik Huber <pa**********@balcab.ch> scribbled the following:
Hello! I got the following Code in Assembler (NASM), which prints out "5" in
realmode: mov ax, 0xB800
mov es, ax
mov byte [es:0], '5' I want to do the same in gcc now, but I'm stuck. GCC doesn't like the [es:0]
syntax... asm("mov %ax, 0xB800");
asm("mov %es, %ax");
asm("mov [%es:0], '5'"); <-- Error about "[" :-( Can anyone help me how to do this?
What makes you think a question about assembly language has anything to
do with C? Why not try Fortran or COBOL questions while you're at it?
Or can I also do this directly in C with pointers and don't use the asm() at
all? I couldn't get this to work either...


Trying my best to read your assembly code, I figure you want to do
this...
char **pp = (char**)0xB000;
**pp = '5';
Which means storing the byte value corresponding to the character '5'
at the address located in the address 0xB000.

Be aware, though, that the above causes undefined behaviour by
indirecting through an absolute address. This might work on your
platform, but it might cause your program to segfault, or worse,
crash the entire computer, on some other platforms.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
Nov 14 '05 #2
On Sun, 22 Aug 2004 21:28:30 +0200, "Patrik Huber"
<pa**********@balcab.ch> wrote in comp.lang.c:
Hello!

I got the following Code in Assembler (NASM), which prints out "5" in
realmode:
And why do you think your question belongs in comp.lang.c? Assembly
language is not C.

[snip off-topic code]
Can anyone help me how to do this?
Yes, the people in the proper newsgroup, which would be
news:comp.lang.asm.x86 almost certainly can. But you had better
specify what operating system you are using. Unless it is something
old like MS-DOS, most modern operating systems will not let that code
work even if you can get it to build.
Or can I also do this directly in C with pointers and don't use the asm() at
all? I couldn't get this to work either...


C allows assigning an arbitrary address, represented as an integer
type, to a pointer with an appropriate cast. The result is
implementation-defined. Attempting to use the pointer to read or
write is completely undefined. So you need to try the assembly
language group.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Nov 14 '05 #3
Jack Klein wrote:

[snip]
Or can I also do this directly in C with pointers and don't use the asm() at
all? I couldn't get this to work either...

C allows assigning an arbitrary address, represented as an integer
type, to a pointer with an appropriate cast. The result is
implementation-defined. Attempting to use the pointer to read or
write is completely undefined. So you need to try the assembly
language group.


It may be undefined, but I suspect every computer system you use
is doing it constantly.

Hard to believe something so undefined is relied on to work.
Programmers must be a gullible lot. ;-)

I suspect the guys in comp.arch.embedded or an OS group could
give the OP a hand to convert his program to C (undefined, of course!)

I agree that it is implementation defined, but any implementation
that didn't do something reasonable would be considered broken.

-Rich
--
Richard Pennington
Email: ri**@pennware.com
http://www.pennware.com ftp://ftp.pennware.com

Nov 14 '05 #4
Richard Pennington <ri**@pennware.com> scribbled the following:
Jack Klein wrote:
[snip]
Or can I also do this directly in C with pointers and don't use the asm() at
all? I couldn't get this to work either...
C allows assigning an arbitrary address, represented as an integer
type, to a pointer with an appropriate cast. The result is
implementation-defined. Attempting to use the pointer to read or
write is completely undefined. So you need to try the assembly
language group.

It may be undefined, but I suspect every computer system you use
is doing it constantly. Hard to believe something so undefined is relied on to work.
Programmers must be a gullible lot. ;-)
No one ever said using an arbitrary absolute address was undefined.
Doing it in C is, though.
I suspect the guys in comp.arch.embedded or an OS group could
give the OP a hand to convert his program to C (undefined, of course!) I agree that it is implementation defined, but any implementation
that didn't do something reasonable would be considered broken.


Of course an implementation can, and frequently does, consistently do
things that cause undefined behaviour by the C standard. It would be
difficult to make an OS or an embedded device otherwise.
But this still doesn't mean that it has to be defined in C, too. The C
language and the implementation it runs on are two different things.
The C standard committee does not want to tie C down to platforms where
using arbitrary absolute addresses has a specific meaning, so they
leave this behaviour as undefined.
Undefined behaviour does not mean "must crash", "must cause an error"
or "must have unpredictable results". Replace "must" with "can" and you
are closer to the real meaning. Nothing is preventing anyone from
making a C implementation where arbitrary absolute addresses work in
the exact same way as they do in the underlying OS or hardware. OTOH,
nothing is preventing anyone from making one where they don't.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"O pointy birds, O pointy-pointy. Anoint my head, anointy-nointy."
- Dr. Michael Hfuhruhurr
Nov 14 '05 #5


Joona I Palaste wrote:

[snip - lot's of good stuff, including my ;-)]
I agree that it is implementation defined, but any implementation
that didn't do something reasonable would be considered broken.

Of course an implementation can, and frequently does, consistently do
things that cause undefined behaviour by the C standard. It would be
difficult to make an OS or an embedded device otherwise.
But this still doesn't mean that it has to be defined in C, too. The C
language and the implementation it runs on are two different things.
The C standard committee does not want to tie C down to platforms where
using arbitrary absolute addresses has a specific meaning, so they
leave this behaviour as undefined.
Undefined behaviour does not mean "must crash", "must cause an error"
or "must have unpredictable results". Replace "must" with "can" and you
are closer to the real meaning. Nothing is preventing anyone from
making a C implementation where arbitrary absolute addresses work in
the exact same way as they do in the underlying OS or hardware. OTOH,
nothing is preventing anyone from making one where they don't.


Nothing prevents someone from making an implementation where absolute
addresses don't work as people expect. That's true. The implementor
might be sad when no one uses the compiler, however.

-Rich
--
Richard Pennington
Email: ri**@pennware.com
http://www.pennware.com ftp://ftp.pennware.com

Nov 14 '05 #6
Mac
On Sun, 22 Aug 2004 20:33:52 +0000, Richard Pennington wrote:
Jack Klein wrote:

[snip]
Or can I also do this directly in C with pointers and don't use the asm() at
all? I couldn't get this to work either...

C allows assigning an arbitrary address, represented as an integer
type, to a pointer with an appropriate cast. The result is
implementation-defined. Attempting to use the pointer to read or
write is completely undefined. So you need to try the assembly
language group.


It may be undefined, but I suspect every computer system you use
is doing it constantly.

Hard to believe something so undefined is relied on to work.
Programmers must be a gullible lot. ;-)

I suspect the guys in comp.arch.embedded or an OS group could
give the OP a hand to convert his program to C (undefined, of course!)

I agree that it is implementation defined, but any implementation
that didn't do something reasonable would be considered broken.


I think I have to disagree. What you say may be OK for kernels and what
not, but it is certainly not true for application code on systems with
memory management. You cannot assign an absolute address (a hardware
address, if you will) to a pointer and then dereference it.

You seem to be somewhat knowledgeable, and maybe you already realize this,
but you give the impression that it is perfectly normal to use hardware
addresses in any old C code.
-Rich


--Mac

Nov 14 '05 #7
Mac wrote:
I think I have to disagree. What you say may be OK for kernels and what
not, but it is certainly not true for application code on systems with
memory management. You cannot assign an absolute address (a hardware
address, if you will) to a pointer and then dereference it.

You seem to be somewhat knowledgeable, and maybe you already realize this,
but you give the impression that it is perfectly normal to use hardware
addresses in any old C code.

-Rich

--Mac


It could be OK in application code also. Image a system that memory
mapped a peripherial into a process' address space. A video frame
buffer for example.

I agree that this is not normally the case in application code.

I suspect the OP was targeting some kind of system where memory access
is allowed, especially since he mentioned "real mode".

There is nothing in the C standard that says C should only be used for
application code. I suspect, in numbers of processors running C code,
the reverse is true: There are probably many more C programs that *can"
access memory arbitrarily than can't.

My argument is meaningless, of course. ;-) I'm talking about all the
bazillions of embedded microcontrollers.

As for being somewhat knowledgeable, I don't know about that.
I've been doing embedded programming for 26 years and writing
compilers for 25 years. I do tend to forget the operating system
from time to time. Must be getting old.

-Rich

--
Richard Pennington
Email: ri**@pennware.com
http://www.pennware.com ftp://ftp.pennware.com

Nov 14 '05 #8
This is a bit off-topic in comp.lang.c, so I'm mailing my reply directly
to you instead.

On Sun, 2004-08-22 at 21:28 +0200, Patrik Huber wrote:
Hello!

I got the following Code in Assembler (NASM), which prints out "5" in
realmode:

mov ax, 0xB800
mov es, ax
mov byte [es:0], '5'

I want to do the same in gcc now, but I'm stuck. GCC doesn't like the [es:0]
syntax...

asm("mov %ax, 0xB800");
asm("mov %es, %ax");
asm("mov [%es:0], '5'"); <-- Error about "[" :-(

Can anyone help me how to do this?
That's just so wrong on so many levels... sorry, but it really is.

First of all, the reason GCC is complaining to begin with is because
that construct is written "0(,1)" in gas syntax, if I recall correctly.
There are many other errors in your assembly syntax, since gas doesn't
use the same syntax as native x86 assemblers like NASM. In fact, there's
not a single part of that assembly code that is correct. You should read
about the differences in the gas texinfo, under the "Machine
Dependencies" section, "i386-Dependant" subsection.

Second, that code assumes that your program is running in real mode,
which it won't - GCC only compiles 32-bit code. After that, how to do
what you want depends on the operating system the program will be
running on. Many systems won't even allow you to do that, for
multitasking protection purposes. If you run that program on Linux, a
NTOS kernel (that is, WinNT4, Win2k or WinXP) or some other x86 UN*X,
for example, it will crash and burn.

If you're compiling it for DOS with DJGPP to run under some DPMI
interface like DOS4GW or a Win9x kernel, it's possible to get it to do
what you want, but it was quite some time since I programmed under DPMI
interfaces, so I don't really recall all the details. IIRC, you still
have to unprotect that memory and add it to your segment. When you have
done so, that memory will be available on the linear address that
corresponds to the real-mode address that you wish to access. B800:0000
in real mode corresponds to the linear address 000B8000. This is because
real mode addresses a 20-bit address bus by shifting the segment
register four bits to the left and adding the offset to that to produce
an address bus value.

If you manage to look up somewhere how to unprotect the memory (it's
somewhere in the DJGPP manual), the following assembly code would
accomplish your purpose:

asm("movl $0x35, 0xb8000(,1)");

You mustn't touch the segment registers in protected mode unless you
really know what you're doing, since in protected mode, the segment
registers cease to be segment registers, and are instead selector
registers, for selecting the segment descriptor you wish to operate
through (a segment in protected mode is _not_ the same thing as in real
mode). For more info on this, I suggest reading "Intel Architecture
Software Developer's Manual, Volume 3: System Programming", published by
Intel, order number 243192, downloadable as PDF through Intel's website.
Or can I also do this directly in C with pointers and don't use the asm() at
all? I couldn't get this to work either...


Indeed:

struct {
char glyph, color;
} *textmem = (void *)0x000b8000;
textmem[0].glyph = '5';

That will accomplish the same as the assembly code I gave above. Of
course, since it will most likely yield the exact same assembler output,
it is subject to the same operating system and protection constraints as
described above.

Fredrik Tolf
Nov 14 '05 #9
On Mon, 2004-08-23 at 04:40 +0200, Fredrik Tolf wrote:
This is a bit off-topic in comp.lang.c, so I'm mailing my reply directly
to you instead.


Oops - Sorry about that. It seems Evolution didn't really do what I
thought. Pressing the "Reply to Sender" button posted back to the
newsgroup instead of mailing to the original author, as I had expected
it to.

Sorry for posting off-topic to the group.

Fredrik Tolf
Nov 14 '05 #10
On Sun, 22 Aug 2004 20:52:05 GMT
Richard Pennington <ri**@pennware.com> wrote:
Joona I Palaste wrote:

[snip - lot's of good stuff, including my ;-)]
I agree that it is implementation defined, but any implementation
that didn't do something reasonable would be considered broken.

Of course an implementation can, and frequently does, consistently
do things that cause undefined behaviour by the C standard. It would
be difficult to make an OS or an embedded device otherwise.
But this still doesn't mean that it has to be defined in C, too. The
C language and the implementation it runs on are two different
things. The C standard committee does not want to tie C down to
platforms where using arbitrary absolute addresses has a specific
meaning, so they leave this behaviour as undefined.
Undefined behaviour does not mean "must crash", "must cause an
error" or "must have unpredictable results". Replace "must" with
"can" and you are closer to the real meaning. Nothing is preventing
anyone from making a C implementation where arbitrary absolute
addresses work in the exact same way as they do in the underlying OS
or hardware. OTOH, nothing is preventing anyone from making one
where they don't.


Nothing prevents someone from making an implementation where absolute
addresses don't work as people expect. That's true. The implementor
might be sad when no one uses the compiler, however.


Try doing it with Microsoft Visual C++ either in C or C++ in a normal
application and watch it crash. Try doing it with gcc on either a Unix
derivative or a Windows NT derivative and again watch it crash.

Most modern operating systems prevent applications from accessing
arbitrary addresses (including memory mapped hardware) for very good
reasons. So most code running on a computer does not and cannot use
pointers to arbitrary locations.

Software which deals directly with the hardware (such as the OS, device
drivers, or parts of embedded applications) are written using documented
extensions to C or in some other language, where one documented
extension is how to access the HW directly.
--
Flash Gordon
Sometimes I think shooting would be far too good for some people.
Although my email address says spam, it is real and I read it.
Nov 14 '05 #11
Flash Gordon wrote:
[snip]

Nothing prevents someone from making an implementation where absolute
addresses don't work as people expect. That's true. The implementor
might be sad when no one uses the compiler, however.

Try doing it with Microsoft Visual C++ either in C or C++ in a normal
application and watch it crash. Try doing it with gcc on either a Unix
derivative or a Windows NT derivative and again watch it crash.

Most modern operating systems prevent applications from accessing
arbitrary addresses (including memory mapped hardware) for very good
reasons. So most code running on a computer does not and cannot use
pointers to arbitrary locations.

Software which deals directly with the hardware (such as the OS, device
drivers, or parts of embedded applications) are written using documented
extensions to C or in some other language, where one documented
extension is how to access the HW directly.


I do it all the time. You do it all the time.
Linux, NetBSD, FreeBSD, Windows (I suspect) are all mostly or completely
written in C. Chances are you're running at least on of those OSs.

The code you run every day.

There are no extensions being used (for memory access at least). Just
implementation defined behavior.

I understand your point about memory protection. Even in a memory
protected system the compiler is doing the "right" thing. It is the
OS that is trapping the illegal access. The compiler still happily
attempts it.

-Rich

--
Richard Pennington
Email: ri**@pennware.com
http://www.pennware.com ftp://ftp.pennware.com

Nov 14 '05 #12
Richard Pennington <ri**@pennware.com> scribbled the following:
Flash Gordon wrote:
[snip]
Nothing prevents someone from making an implementation where absolute
addresses don't work as people expect. That's true. The implementor
might be sad when no one uses the compiler, however.
Try doing it with Microsoft Visual C++ either in C or C++ in a normal
application and watch it crash. Try doing it with gcc on either a Unix
derivative or a Windows NT derivative and again watch it crash.

Most modern operating systems prevent applications from accessing
arbitrary addresses (including memory mapped hardware) for very good
reasons. So most code running on a computer does not and cannot use
pointers to arbitrary locations.

Software which deals directly with the hardware (such as the OS, device
drivers, or parts of embedded applications) are written using documented
extensions to C or in some other language, where one documented
extension is how to access the HW directly.

I do it all the time. You do it all the time.
Linux, NetBSD, FreeBSD, Windows (I suspect) are all mostly or completely
written in C. Chances are you're running at least on of those OSs.
Not completely. Parts of them are written in assembly language. It is
pretty much impossible to write a real-world (in contrast to simulated)
OS in pure C.
The code you run every day. There are no extensions being used (for memory access at least). Just
implementation defined behavior. I understand your point about memory protection. Even in a memory
protected system the compiler is doing the "right" thing. It is the
OS that is trapping the illegal access. The compiler still happily
attempts it.


This still does not change the fact that using arbitrary absolute memory
addresses in C causes undefined behaviour. This means that the C
language does not define anything about the behaviour. The underlying
implementation can define this behaviour, but it does not have to.
If you are really upset about arbitrary absolute memory access causing
undefined behaviour, take the issue up at comp.std.c and submit a
change proposal to the standard. Here at comp.lang.c we stick by what
the standard says.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"Stronger, no. More seductive, cunning, crunchier the Dark Side is."
- Mika P. Nieminen
Nov 14 '05 #13
Joona I Palaste wrote:
Richard Pennington <ri**@pennware.com> scribbled the following: [snip]
I do it all the time. You do it all the time.
Linux, NetBSD, FreeBSD, Windows (I suspect) are all mostly or completely
written in C. Chances are you're running at least on of those OSs.

Not completely. Parts of them are written in assembly language. It is
pretty much impossible to write a real-world (in contrast to simulated)
OS in pure C.


The examples I gave (with the possible exception of Windows, I haven't
seen the source) have very little assembly language used in their
implementation. Usually startup code, context switching, etc. and
very little else.

I think all of them, with the possible exception of Windows, are
real-world OSs.
The code you run every day.


There are no extensions being used (for memory access at least). Just
implementation defined behavior.


I understand your point about memory protection. Even in a memory
protected system the compiler is doing the "right" thing. It is the
OS that is trapping the illegal access. The compiler still happily
attempts it.

This still does not change the fact that using arbitrary absolute memory
addresses in C causes undefined behaviour. This means that the C
language does not define anything about the behaviour. The underlying
implementation can define this behaviour, but it does not have to.
If you are really upset about arbitrary absolute memory access causing
undefined behaviour, take the issue up at comp.std.c and submit a
change proposal to the standard. Here at comp.lang.c we stick by what
the standard says.


I'm not upset about anything. I think that many people don't understand
how the real world works. We compiler and OS writers will wink and nod
at each other and continue to rely on undefined behavior.

I do agree that this is off topic here. I'll try to let it drop.

-Rich

--
Richard Pennington
Email: ri**@pennware.com
http://www.pennware.com ftp://ftp.pennware.com

Nov 14 '05 #14
Richard Pennington <ri**@pennware.com> scribbled the following:
Joona I Palaste wrote:
Richard Pennington <ri**@pennware.com> scribbled the following: [snip]
I do it all the time. You do it all the time.
Linux, NetBSD, FreeBSD, Windows (I suspect) are all mostly or completely
written in C. Chances are you're running at least on of those OSs.


Not completely. Parts of them are written in assembly language. It is
pretty much impossible to write a real-world (in contrast to simulated)
OS in pure C. The examples I gave (with the possible exception of Windows, I haven't
seen the source) have very little assembly language used in their
implementation. Usually startup code, context switching, etc. and
very little else.
If they contain any assembly language at all, they're not pure C. Pure C
means you can't even use non-standard libraries that have been written
in assembly language.
I think all of them, with the possible exception of Windows, are
real-world OSs.
Even Windows is a real-world OS.
The code you run every day.

There are no extensions being used (for memory access at least). Just
implementation defined behavior.

I understand your point about memory protection. Even in a memory
protected system the compiler is doing the "right" thing. It is the
OS that is trapping the illegal access. The compiler still happily
attempts it.


This still does not change the fact that using arbitrary absolute memory
addresses in C causes undefined behaviour. This means that the C
language does not define anything about the behaviour. The underlying
implementation can define this behaviour, but it does not have to.
If you are really upset about arbitrary absolute memory access causing
undefined behaviour, take the issue up at comp.std.c and submit a
change proposal to the standard. Here at comp.lang.c we stick by what
the standard says.

I'm not upset about anything. I think that many people don't understand
how the real world works. We compiler and OS writers will wink and nod
at each other and continue to rely on undefined behavior.
If you are writing your own OS, or your own compiler for a specific OS,
you can rely on undefined behaviour all you want, no one, not even none
of us here, will stop you or think you are doing anything wrong. After
all, when you are doing so, you are effectively working under *two*
definitions: the C language one and your own in addition to that. Your
own definition is free to define anything the C language does not
define.
Once again, undefined behaviour does not mean that nothing anywhere may
ever define the behaviour. All it means is that the C language does not
define it, but anything other, even the underlying implementation is
allowed to.
I do agree that this is off topic here. I'll try to let it drop.


You are correct, discussing specific implementations is off-topic here.
This is why we answer questions like "How do I use VGA graphics in C?"
with "By using non-standard libraries which are off-topic here".
This means that it is impossible in pure ISO standard C, but if you
use non-standard libraries, it may be possible. These non-standard
libraries cause undefined behaviour, but this is not necessarily a bad
thing, if your own implementation defines this undefined behaviour and
you accept that your code is now non-portable.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"We're women. We've got double standards to live up to."
- Ally McBeal
Nov 14 '05 #15
Hello Fredrik

Thank you very much, you were really helpful!
So do I understand correctly that if I write code with the asm() command in
gcc it compiles that with gas?
Second, that code assumes that your program is running in real mode,
which it won't - GCC only compiles 32-bit code. After that, how to do
what you want depends on the operating system the program will be
running on.
This really explains the trouble I have...
Can I get GCC to compile my code in 16-bit for running in realmode?
Or is there another C-compiler that can do that?

I do not want to run this on any OS like Linux, WinNT. I'm trying to do this
with my own bootsector (in real-mode). So the direct memory-access shouldn't
be a problem so far.
So basically I want to write my own little real-mode app in C.

For more info on this, I suggest reading "Intel Architecture
Software Developer's Manual, Volume 3: System Programming", published by
Intel, order number 243192, downloadable as PDF through Intel's website.


I'm currently reading this one, thank you anyway for pointing it out :)
To the others who said this is off-topic: You may be right, but I posted
this here because I thought it's a C problem and not an asm-one, since the
asm-code itself works. Sorry about that!

Thanks again

Patrik
Nov 14 '05 #16
On Sun, 2004-08-22 at 16:32 -0700, Mac wrote:
On Sun, 22 Aug 2004 20:33:52 +0000, Richard Pennington wrote:
Jack Klein wrote:

[snip]
Or can I also do this directly in C with pointers and don't use the asm() at
all? I couldn't get this to work either...
C allows assigning an arbitrary address, represented as an integer
type, to a pointer with an appropriate cast. The result is
implementation-defined. Attempting to use the pointer to read or
write is completely undefined. So you need to try the assembly
language group.


It may be undefined, but I suspect every computer system you use
is doing it constantly.

Hard to believe something so undefined is relied on to work.
Programmers must be a gullible lot. ;-)

I suspect the guys in comp.arch.embedded or an OS group could
give the OP a hand to convert his program to C (undefined, of course!)

I agree that it is implementation defined, but any implementation
that didn't do something reasonable would be considered broken.


I think I have to disagree. What you say may be OK for kernels and what
not, but it is certainly not true for application code on systems with
memory management. You cannot assign an absolute address (a hardware
address, if you will) to a pointer and then dereference it.


Correct me if I'm wrong, but I have to think that loading an absolute
address into a pointer cannot be wrong, no matter what code. The
impression I have of a pointer is that of a number which points out an
address. Just because you load an absolute address doesn't mean that
address has to be a hardware address. If the code is running under
memory management, segmentation, paging and what not, that absolute
address just points out an address in the process' address space, is
that not true?

Truly, if you just load an arbitrary pointer and dereference it, you're
likely to generate a page or segmentation fault, but that's an OS issue,
not a compiler issue, right?

I could imagine that the strictest of C standards would define that as
undefined since maybe not all architectures (like x86 real mode, like in
the case of the original message of this thread) don't have a concept of
truly linear addresses. However, correct me if I'm wrong - I really
don't know - but isn't ISO C defined on architectures with a linear
address space (whether or not it happens to be segmented, paged and what
not)?

Fredrik Tolf
Nov 14 '05 #17
"Patrik Huber" <pa**********@balcab.ch> wrote in
news:cg**********@newshispeed.ch:
To the others who said this is off-topic: You may be right, but I posted
this here because I thought it's a C problem and not an asm-one, since
the asm-code itself works. Sorry about that!


Say, there are GCC newsgroups where I'd bet people could help you with
this issue, why not try there? See gnu.gcc.help to start maybe.

--
- Mark ->
--
Nov 14 '05 #18
In <cY*****************@newssvr31.news.prodigy.com> Richard Pennington <ri**@pennware.com> writes:
Joona I Palaste wrote:
Richard Pennington <ri**@pennware.com> scribbled the following:

[snip]
I do it all the time. You do it all the time.
Linux, NetBSD, FreeBSD, Windows (I suspect) are all mostly or completely
written in C. Chances are you're running at least on of those OSs.


Not completely. Parts of them are written in assembly language. It is
pretty much impossible to write a real-world (in contrast to simulated)
OS in pure C.


The examples I gave (with the possible exception of Windows, I haven't
seen the source) have very little assembly language used in their
implementation. Usually startup code, context switching, etc. and
very little else.


OTOH, they abound with C code invoking undefined behaviour.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #19
In <10***********************@pc7.dolda2000.com> Fredrik Tolf <fr*****@dolda2000.com> writes:
Correct me if I'm wrong, but I have to think that loading an absolute
address into a pointer cannot be wrong, no matter what code.


1. This is downright impossible in user code running on systems with
virtual memory: all the addresses in the program are interpreted as
virtual addresses.

2. There are platforms where loading an arbitrary address in an address
register generates a fault. If the compiler decides to store the
pointer in an address register or if loading something into a pointer
involves storing the data first into an address register...

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #20
On Mon, 2004-08-23 at 13:26 +0000, Dan Pop wrote:
In <10***********************@pc7.dolda2000.com> Fredrik Tolf <fr*****@dolda2000.com> writes:
Correct me if I'm wrong, but I have to think that loading an absolute
address into a pointer cannot be wrong, no matter what code.
1. This is downright impossible in user code running on systems with
virtual memory: all the addresses in the program are interpreted as
virtual addresses.


Precisely my point. If you load an absolute pointer on such a system and
dereference it, it is interpreted as a virtual address, and I can't see
the problem (except the OS will likely raise a page fault or similar,
which is besides the problem). Care to enlighten me?
2. There are platforms where loading an arbitrary address in an address
register generates a fault. If the compiler decides to store the
pointer in an address register or if loading something into a pointer
involves storing the data first into an address register...


You have to forgive me, but I find this hard to believe. Surely, there
must be a way for malloc (and other related function - mmap etc.) to
return the address to the newly allocated chunk, and surely, that must
involve storing the address of the chunk into an address register? Or
did I misunderstand something?

Fredrik Tolf
Nov 14 '05 #21
In <10*********************@pc7.dolda2000.com> Fredrik Tolf <fr*****@dolda2000.com> writes:
On Mon, 2004-08-23 at 13:26 +0000, Dan Pop wrote:
In <10***********************@pc7.dolda2000.com> Fredrik Tolf <fr*****@dolda2000.com> writes:
>Correct me if I'm wrong, but I have to think that loading an absolute
>address into a pointer cannot be wrong, no matter what code.
1. This is downright impossible in user code running on systems with
virtual memory: all the addresses in the program are interpreted as
virtual addresses.


Precisely my point. If you load an absolute pointer on such a system and
dereference it, it is interpreted as a virtual address, and I can't see
the problem (except the OS will likely raise a page fault or similar,
which is besides the problem). Care to enlighten me?


The problem is that you *cannot* load an absolute address into a pointer
in such an execution environment. ALL addresses are virtual, you have no
access to absolute addresses.
2. There are platforms where loading an arbitrary address in an address
register generates a fault. If the compiler decides to store the
pointer in an address register or if loading something into a pointer
involves storing the data first into an address register...


You have to forgive me, but I find this hard to believe.


Then, read the Rationale of the C standard.
Surely, there
must be a way for malloc (and other related function - mmap etc.) to
return the address to the newly allocated chunk, and surely, that must
involve storing the address of the chunk into an address register? Or
did I misunderstand something?


Yup. They first allocate the chunk and only after that they store
its address into an address register. Can't see any problem...

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #22
On Mon, 2004-08-23 at 14:43 +0000, Dan Pop wrote:
In <10*********************@pc7.dolda2000.com> Fredrik Tolf <fr*****@dolda2000.com> writes:
On Mon, 2004-08-23 at 13:26 +0000, Dan Pop wrote:
In <10***********************@pc7.dolda2000.com> Fredrik Tolf <fr*****@dolda2000.com> writes:

>Correct me if I'm wrong, but I have to think that loading an absolute
>address into a pointer cannot be wrong, no matter what code.

1. This is downright impossible in user code running on systems with
virtual memory: all the addresses in the program are interpreted as
virtual addresses.


Precisely my point. If you load an absolute pointer on such a system and
dereference it, it is interpreted as a virtual address, and I can't see
the problem (except the OS will likely raise a page fault or similar,
which is besides the problem). Care to enlighten me?


The problem is that you *cannot* load an absolute address into a pointer
in such an execution environment. ALL addresses are virtual, you have no
access to absolute addresses.


Sorry, we seem to be out of sync. By absolute pointer I mean an
arbitrary address, not a physical address.
2. There are platforms where loading an arbitrary address in an address
register generates a fault. If the compiler decides to store the
pointer in an address register or if loading something into a pointer
involves storing the data first into an address register...


You have to forgive me, but I find this hard to believe.


Then, read the Rationale of the C standard.


It's not that I doubt that you have a point, it's just that I'm just
having a hard time digesting it.
Surely, there
must be a way for malloc (and other related function - mmap etc.) to
return the address to the newly allocated chunk, and surely, that must
involve storing the address of the chunk into an address register? Or
did I misunderstand something?


Yup. They first allocate the chunk and only after that they store
its address into an address register. Can't see any problem...


I'm thinking that a malloc implementation must somehow "make up" an
address to allocate, and in that case it seems to me that it's a bit
arbitrary.

I guess my problem in seeing it is I don't see the difference between an
arbitrary pointer and a non-arbitrary one. That is, what's the
difference between these following two statements?

char *buf1 = NULL; /* Should be valid, right? */
char *buf2 = 0xdeadbeef /* Generates an exception, if I follow you. */

If loading NULL, which is defined to zero AFAIK, is allowed, then what
about this?

char *buf = NULL;
buf += 0xdeadbeef;

For surely, you have to be allowed to add offsets to a pointer like
this?

for(p = buf; *p; p++) {...}

Or could it be that they (these platforms) have a concept of an
undefined pointer when storing NULL? In that case, what about this?

char *buf = (char *)malloc(1);
buf += (0xdeadbeef - (int)buf); /* Note incrementation, not storage */

I guess I'm just having a hard time understanding how such a platform
would work... would you mind providing an example of such a platform for
me to study? I guess it is at times like these that I really hate the
fact that archs other than i386 are so prohibitively expensive to get my
hands on... :-(

Fredrik Tolf
Nov 14 '05 #23
Fredrik Tolf <fr*****@dolda2000.com> scribbled the following:
On Mon, 2004-08-23 at 14:43 +0000, Dan Pop wrote:
In <10*********************@pc7.dolda2000.com> Fredrik Tolf <fr*****@dolda2000.com> writes:
>On Mon, 2004-08-23 at 13:26 +0000, Dan Pop wrote:
>> In <10***********************@pc7.dolda2000.com> Fredrik Tolf <fr*****@dolda2000.com> writes:
>> >Correct me if I'm wrong, but I have to think that loading an absolute
>> >address into a pointer cannot be wrong, no matter what code.
>>
>> 1. This is downright impossible in user code running on systems with
>> virtual memory: all the addresses in the program are interpreted as
>> virtual addresses.
>
>Precisely my point. If you load an absolute pointer on such a system and
>dereference it, it is interpreted as a virtual address, and I can't see
>the problem (except the OS will likely raise a page fault or similar,
>which is besides the problem). Care to enlighten me?
The problem is that you *cannot* load an absolute address into a pointer
in such an execution environment. ALL addresses are virtual, you have no
access to absolute addresses. Sorry, we seem to be out of sync. By absolute pointer I mean an
arbitrary address, not a physical address.
I.e. something like int *p = (int*)0xcafebabe; where the physical
address need not actually be 0xcafebabe?

(snip)
>Surely, there
>must be a way for malloc (and other related function - mmap etc.) to
>return the address to the newly allocated chunk, and surely, that must
>involve storing the address of the chunk into an address register? Or
>did I misunderstand something?


Yup. They first allocate the chunk and only after that they store
its address into an address register. Can't see any problem...

I'm thinking that a malloc implementation must somehow "make up" an
address to allocate, and in that case it seems to me that it's a bit
arbitrary. I guess my problem in seeing it is I don't see the difference between an
arbitrary pointer and a non-arbitrary one. That is, what's the
difference between these following two statements? char *buf1 = NULL; /* Should be valid, right? */
char *buf2 = 0xdeadbeef /* Generates an exception, if I follow you. */ If loading NULL, which is defined to zero AFAIK, is allowed, then what
about this?
NULL, which is defined as zero, is a special case. The pointer constant
zero, which need not be the physical address zero, is guaranteed to be
an address which by itself is fully defined, but dereferencing it
causes undefined behaviour. This does not apply to any other absolute
pointer constant. ("Absolute" used in the Tolf meaning, not the Pop
meaning.)
char *buf = NULL;
buf += 0xdeadbeef;
AFAIK the second line causes undefined behaviour by computing the
non-zero absolute pointer value 0xdeadbeef.
For surely, you have to be allowed to add offsets to a pointer like
this? for(p = buf; *p; p++) {...}
This is different. If buf points to allocated memory and all bytes from
buf up to and including the first zero byte are also in allocated
memory, this is fully defined and safe. The addresses that allocated
memory resides in are always guaranteed to be fully defined. However,
with the exception of zero, no other addresses are. The only way to
legally end up with allocated memory addresses is either to use the
*alloc() functions, or to take the address of a variable or a string
literal (for example char *p="foobar";).
This code, for example, causes undefined behaviour:
char *p = malloc(100);
if (p != NULL) {
p+200;
}
The reason is that the line "p+200;" computes an absolute address
which does not reside in allocated memory.
Or could it be that they (these platforms) have a concept of an
undefined pointer when storing NULL? In that case, what about this? char *buf = (char *)malloc(1);
buf += (0xdeadbeef - (int)buf); /* Note incrementation, not storage */
This, I think, also causes undefined behaviour, simply because
computing the address 0xdeadbeef causes undefined behaviour, unless
it happens to be in allocated memory.
I guess I'm just having a hard time understanding how such a platform
would work... would you mind providing an example of such a platform for
me to study? I guess it is at times like these that I really hate the
fact that archs other than i386 are so prohibitively expensive to get my
hands on... :-(


Get a second-hand Amiga, Atari ST or old-fashioned Macintosh. You can get
an MC68000 environment, which is definitely non-i386, for less than
$50.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"O pointy birds, O pointy-pointy. Anoint my head, anointy-nointy."
- Dr. Michael Hfuhruhurr
Nov 14 '05 #24
Fredrik Tolf <fr*****@dolda2000.com> writes:
[...]
Correct me if I'm wrong, but I have to think that loading an absolute
address into a pointer cannot be wrong, no matter what code. The
impression I have of a pointer is that of a number which points out an
address. Just because you load an absolute address doesn't mean that
address has to be a hardware address. If the code is running under
memory management, segmentation, paging and what not, that absolute
address just points out an address in the process' address space, is
that not true?


It's best not to think of a C pointer as a number. Just think of it
as a pointer. It can point to an object, or it can be a null pointer,
or it can be invalid. You can perform some limited arithmetic and
comparison operations on pointers, but only what's defined by the
standard.

If pointers were numbers, it would make sense to add or multiply two
pointer values. It doesn't.

Pointers, or addresses, are of course implemented as integers on many
systems, but assuming that will get you into trouble when you try to
port your code to a system with a different pointer representation.

Suggested reading: C FAQ, section 4.

Further suggested reading: C FAQ, the whole thing.

<http://www.eskimo.com/~scs/C-faq/faq.html>

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #25
Fredrik Tolf <fr*****@dolda2000.com> writes:
[...]
I'm thinking that a malloc implementation must somehow "make up" an
address to allocate, and in that case it seems to me that it's a bit
arbitrary.
malloc() has to know what it's doing. The C code that implements
malloc() (assuming it's implemented in C; it doesn't have to be) very
likely invokes undefined behavior, but it's free to take advantage of
implementation-specific characteristics of the underlying system.

A pointer returned by malloc() isn't "arbitrary" in the sense that
we're using the term here. It's either a null pointer or a pointer to
a newly allocated object.
I guess my problem in seeing it is I don't see the difference between an
arbitrary pointer and a non-arbitrary one. That is, what's the
difference between these following two statements?

char *buf1 = NULL; /* Should be valid, right? */
Right. A NULL pointer is valid for some operations (assignment,
comparison, etc.) but invalid for others (dereferencing, pointer
arithmetic, etc.).
char *buf2 = 0xdeadbeef /* Generates an exception, if I follow you. */
It invokes undefined behavior. That can mean generating an exception,
or it can mean storing a value in buf2 that happens to be valid, or it
can mean making demons fly out your nose.
If loading NULL, which is defined to zero AFAIK, is allowed, then what
about this?

char *buf = NULL;
buf += 0xdeadbeef;

For surely, you have to be allowed to add offsets to a pointer like
this?
Pointer arithmetic is valid as long as the resulting pointer points
into the same object as the original value. (It's actually slightly
more complex than that.) For example:

int arr[10];
int *ptr = arr; /* points to arr[0] */
ptr += 5; /* points to arr[5] */
ptr += 100; /* invalid */

If the original pointer doesn't point to an object, any arithmetic on
it will invoke undefined behavior. (If ptr==NULL, I'm not sure what
the standard says about ptr+0 -- but I'm not sure I care.)

[...] char *buf = (char *)malloc(1);
buf += (0xdeadbeef - (int)buf); /* Note incrementation, not storage */


You can convert a pointer to an integer, but there's not much you can
portably do with the result. The statement above is very likely to
give you an invalid pointer, so don't do that.

[...]

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #26
Joona I Palaste <pa*****@cc.helsinki.fi> writes:
Fredrik Tolf <fr*****@dolda2000.com> scribbled the following:

[...]
I guess I'm just having a hard time understanding how such a platform
would work... would you mind providing an example of such a platform for
me to study? I guess it is at times like these that I really hate the
fact that archs other than i386 are so prohibitively expensive to get my
hands on... :-(


Get a second-hand Amiga, Atari ST or old-fashioned Macintosh. You can get
an MC68000 environment, which is definitely non-i386, for less than
$50.


But a 68000 isn't particularly exotic. In fact, it's a much more
regular architecture than the x86. No segment registers, just a
simple linear address space (and a richer set of general-purpose data
and address registers).

A second-hand IBM AS/400 might be instructive, but I seriously doubt
that you could get a working one for $50.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #27
On Mon, 2004-08-23 at 15:58 +0000, Joona I Palaste wrote:
Fredrik Tolf <fr*****@dolda2000.com> scribbled the following:
On Mon, 2004-08-23 at 14:43 +0000, Dan Pop wrote:
In <10*********************@pc7.dolda2000.com> Fredrik Tolf <fr*****@dolda2000.com> writes:
>On Mon, 2004-08-23 at 13:26 +0000, Dan Pop wrote:
>> In <10***********************@pc7.dolda2000.com> Fredrik Tolf <fr*****@dolda2000.com> writes:
>> >Correct me if I'm wrong, but I have to think that loading an absolute
>> >address into a pointer cannot be wrong, no matter what code.
>>
>> 1. This is downright impossible in user code running on systems with
>> virtual memory: all the addresses in the program are interpreted as
>> virtual addresses.
>
>Precisely my point. If you load an absolute pointer on such a system and
>dereference it, it is interpreted as a virtual address, and I can't see
>the problem (except the OS will likely raise a page fault or similar,
>which is besides the problem). Care to enlighten me?

The problem is that you *cannot* load an absolute address into a pointer
in such an execution environment. ALL addresses are virtual, you have no
access to absolute addresses.
Sorry, we seem to be out of sync. By absolute pointer I mean an
arbitrary address, not a physical address.


I.e. something like int *p = (int*)0xcafebabe; where the physical
address need not actually be 0xcafebabe?


Yeah. An offset into the address space current in use in the execution
environment, if you will.
[snip]
NULL, which is defined as zero, is a special case. The pointer constant
zero, which need not be the physical address zero, is guaranteed to be
an address which by itself is fully defined, but dereferencing it
causes undefined behaviour. This does not apply to any other absolute
pointer constant. ("Absolute" used in the Tolf meaning, not the Pop
meaning.)


I think I will be having trouble continuing this discussion if one thing
is not made clear.

I am (possible incorrectly) under the impression that C is defined to
operate in an execution environment where you have exactly one linear
address space, and pointers are defined as offsets into this address
space.

If this is not so and pointers are defined in a more abstract meaning, I
fully understand why all these described actions cause undefined
behavior. Would anyone care to clarify this?

Fredrik Tolf
Nov 14 '05 #28
On Mon, 2004-08-23 at 16:06 +0000, Keith Thompson wrote:
Fredrik Tolf <fr*****@dolda2000.com> writes:
[...]
Correct me if I'm wrong, but I have to think that loading an absolute
address into a pointer cannot be wrong, no matter what code. The
impression I have of a pointer is that of a number which points out an
address. Just because you load an absolute address doesn't mean that
address has to be a hardware address. If the code is running under
memory management, segmentation, paging and what not, that absolute
address just points out an address in the process' address space, is
that not true?


It's best not to think of a C pointer as a number. Just think of it
as a pointer. It can point to an object, or it can be a null pointer,
or it can be invalid. You can perform some limited arithmetic and
comparison operations on pointers, but only what's defined by the
standard.


Indeed, I know that is the best way to think of it. However, I'm in the
dark as to whether it's _possible_ to think of them as numbers. See, I'm
under the impression that C is defined to operate in a linear address
space, with pointers being defined as offsets into this address space.
Please tell me if this is not the case. (Also, if this is not the case,
can someone be as kind as to provide an example of an architecture where
it isn't?)

Fredrik Tolf
Nov 14 '05 #29
In <10**********************@pc7.dolda2000.com> Fredrik Tolf <fr*****@dolda2000.com> writes:
On Mon, 2004-08-23 at 14:43 +0000, Dan Pop wrote:
In <10*********************@pc7.dolda2000.com> Fredrik Tolf <fr*****@dolda2000.com> writes:
>On Mon, 2004-08-23 at 13:26 +0000, Dan Pop wrote:
>> In <10***********************@pc7.dolda2000.com> Fredrik Tolf <fr*****@dolda2000.com> writes:
>>
>> >Correct me if I'm wrong, but I have to think that loading an absolute ^^^^^^^^ >> >address into a pointer cannot be wrong, no matter what code. ^^^^^^^ >>
>> 1. This is downright impossible in user code running on systems with
>> virtual memory: all the addresses in the program are interpreted as
>> virtual addresses.
>
>Precisely my point. If you load an absolute pointer on such a system and
>dereference it, it is interpreted as a virtual address, and I can't see
>the problem (except the OS will likely raise a page fault or similar,
>which is besides the problem). Care to enlighten me?
The problem is that you *cannot* load an absolute address into a pointer
in such an execution environment. ALL addresses are virtual, you have no
access to absolute addresses.


Sorry, we seem to be out of sync. By absolute pointer I mean an
arbitrary address, not a physical address.


You have written "absolute address" in the text I was repying to. I was
not aware that "absolute" and "arbitrary" can be used interchangeably in
context. But the issue is clarified now.
>> 2. There are platforms where loading an arbitrary address in an address
>> register generates a fault. If the compiler decides to store the
>> pointer in an address register or if loading something into a pointer
>> involves storing the data first into an address register...
>
>You have to forgive me, but I find this hard to believe.


Then, read the Rationale of the C standard.


It's not that I doubt that you have a point, it's just that I'm just
having a hard time digesting it.
>Surely, there
>must be a way for malloc (and other related function - mmap etc.) to
>return the address to the newly allocated chunk, and surely, that must
>involve storing the address of the chunk into an address register? Or
>did I misunderstand something?


Yup. They first allocate the chunk and only after that they store
its address into an address register. Can't see any problem...


I'm thinking that a malloc implementation must somehow "make up" an
address to allocate, and in that case it seems to me that it's a bit
arbitrary.


You're missing a couple of points:

1. malloc's implementation can use whatever works on the underlying
hardware. If the hardware doesn't support arbitrary addresses in
address registers, then this is not an option.

2. malloc seldom (if ever) has to manipulate arbitrary addresses. The
malloc arena is entirely manipulated using pointer arithmetic. Look
at the sample implementations provided by K&R2 and Plauger.
I guess my problem in seeing it is I don't see the difference between an
arbitrary pointer and a non-arbitrary one. That is, what's the
difference between these following two statements?

char *buf1 = NULL; /* Should be valid, right? */
Right.
char *buf2 = 0xdeadbeef /* Generates an exception, if I follow you. */
This is not valid C code. Integers are not converted automatically to
pointers.

char *buf2 = (char *)0xdeadbeef; /* MAY generate an exception */
If loading NULL, which is defined to zero AFAIK, is allowed, then what
about this?

char *buf = NULL;
buf += 0xdeadbeef;
Undefined behaviour. You can perform pointer arithmetic only inside
objects (or one byte after). You cannot perform pointer arithmetic on
null pointers.
For surely, you have to be allowed to add offsets to a pointer like
this?
You may want to actually learn C *before* continuing this discussion...
for(p = buf; *p; p++) {...}
This is correct only as long as p stays within buf. And completely
irrelevant to our dicussion.
Or could it be that they (these platforms) have a concept of an
undefined pointer when storing NULL? In that case, what about this?
They don't need such a concept, they merely have to insure that storing
the value of a null pointer in an address register doesn't generate any
exception. Trivially achieved by allocating a reserved object at the
corresponding address.
char *buf = (char *)malloc(1);
buf += (0xdeadbeef - (int)buf); /* Note incrementation, not storage */
Nope, this is compound assignment, not incrementation. But both imply
a storage operation. Which may never happen, because your pointer
aritmetic invokes undefined behaviour, unless 0xdeadbeef - (int)buf
evaluates to 0 or 1.
I guess I'm just having a hard time understanding how such a platform
would work...
You're having a hard time understanding how pointer arithmetic is defined
in C. And this is NOT a guess!
would you mind providing an example of such a platform for me to study?


80286 in protected mode, according to some people. Load some garbage in
a segment register (instead of a proper selector value) and it traps.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #30
Fredrik Tolf <fr*****@dolda2000.com> writes:
[...]
I think I will be having trouble continuing this discussion if one thing
is not made clear.

I am (possible incorrectly) under the impression that C is defined to
operate in an execution environment where you have exactly one linear
address space, and pointers are defined as offsets into this address
space.

If this is not so and pointers are defined in a more abstract meaning, I
fully understand why all these described actions cause undefined
behavior. Would anyone care to clarify this?


Yes, that impression is incorrect. C does not guarantee a single
linear address space (though it certainly allows one). A valid C
implementation could have a distinct address space for each object
(each chunk of memory returned by malloc() or equivalent, and each
declared variable). Comparing pointers to different objects (other
than for equality) invokes undefined behavior, because there may not
be a defined greater-than/less-than relationship between them.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #31
On Mon, 2004-08-23 at 17:01 +0000, Keith Thompson wrote:
Fredrik Tolf <fr*****@dolda2000.com> writes:
[...]
I think I will be having trouble continuing this discussion if one thing
is not made clear.

I am (possible incorrectly) under the impression that C is defined to
operate in an execution environment where you have exactly one linear
address space, and pointers are defined as offsets into this address
space.

If this is not so and pointers are defined in a more abstract meaning, I
fully understand why all these described actions cause undefined
behavior. Would anyone care to clarify this?


Yes, that impression is incorrect. C does not guarantee a single
linear address space (though it certainly allows one). A valid C
implementation could have a distinct address space for each object
(each chunk of memory returned by malloc() or equivalent, and each
declared variable). Comparing pointers to different objects (other
than for equality) invokes undefined behavior, because there may not
be a defined greater-than/less-than relationship between them.


That does explain a lot of things. Thanks for the clarification.

Fredrik Tolf
Nov 14 '05 #32
"Fredrik Tolf" <fr*****@dolda2000.com> wrote in message
news:10*********************@pc7.dolda2000.com...
On Mon, 2004-08-23 at 13:26 +0000, Dan Pop wrote:
In <10***********************@pc7.dolda2000.com> Fredrik Tolf
<fr*****@dolda2000.com> writes:
>Correct me if I'm wrong, but I have to think that loading an absolute
>address into a pointer cannot be wrong, no matter what code.


1. This is downright impossible in user code running on systems with
virtual memory: all the addresses in the program are interpreted as
virtual addresses.


Precisely my point. If you load an absolute pointer on such a system and
dereference it, it is interpreted as a virtual address, and I can't see
the problem (except the OS will likely raise a page fault or similar,
which is besides the problem). Care to enlighten me?


If the address is not valid, some machines will fault when it is loaded into
a register -- not just when you try to dereference it. That your current
platform does not fault should not be taken as an indication such behavior
is portable.

Odds are that any arbitrary pointer you try to load will be invalid, though
there's a small chance you might get "lucky" and point into a valid area
allocated by some other means.
2. There are platforms where loading an arbitrary address in an address
register generates a fault. If the compiler decides to store the
pointer in an address register or if loading something into a pointer
involves storing the data first into an address register...


You have to forgive me, but I find this hard to believe. Surely, there
must be a way for malloc (and other related function - mmap etc.) to
return the address to the newly allocated chunk, and surely, that must
involve storing the address of the chunk into an address register? Or
did I misunderstand something?


malloc() returns valid addresses, which such a machine will not fault on.
Presumably malloc() does some implementation-defined magic to tell the CPU
which addresses are valid and which aren't. You do not have that luxury if
you want your code to be portable.

There are some addresses on some ABIs that are defined to be invalid,
therefore by definition malloc() (or any similar function) _cannot_ return
them.

S

--
Stephen Sprunk "Those people who think they know everything
CCIE #3723 are a great annoyance to those of us who do."
K5SSS --Isaac Asimov

Nov 14 '05 #33
>> Richard Pennington <ri**@pennware.com> scribbled the following:
[snip]
Linux, NetBSD, FreeBSD, Windows (I suspect) are all mostly or completely
written in C. ...

In article <news:cY*****************@newssvr31.news.prodigy.c om>
Richard Pennington <ri**@pennware.com> wrote:The examples I gave (with the possible exception of Windows, I haven't
seen the source) have very little assembly language used in their
implementation.
Take a look at the NetBSD and FreeBSD <machine/bus.h> headers.

What look like C function calls actually expand to inline assembly.

Device drivers are chock full of assembly code, on Intel-based
systems at least.

Linux does the same sort of thing, but in different headers. None
of these three OSes can be compiled with anything but the GNU
compilers, for this reason.

The BSD code *is* specified well enough to allow (in most cases)
substituting some other compiler, but not easily. (For instance,
the semantics of the bus_read_* and bus_write_* "functions" are
such that you could write assembly-language subroutines to do the
same job.)
... I think that many people don't understand
how the real world works. We compiler and OS writers will wink and nod
at each other and continue to rely on undefined behavior.


There is nothing wrong with using undefined behavior to extend any
given compiler and then write code that depends on it -- but you
(the generic "you") should be *aware* of it when you do it, because
it ties you down. It often removes the freedom to change compilers,
as in the cases mentioned above. You must make sure the price you
pay buys you something worthwhile (and, if you are clever enough,
you can "keep the price relatively low" the way the BSD guys did).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #34
>> >Correct me if I'm wrong, but I have to think that loading an absolute
>address into a pointer cannot be wrong, no matter what code.
1. This is downright impossible in user code running on systems with
virtual memory: all the addresses in the program are interpreted as
virtual addresses.


Precisely my point. If you load an absolute pointer on such a system and
dereference it, it is interpreted as a virtual address, and I can't see
the problem (except the OS will likely raise a page fault or similar,
which is besides the problem). Care to enlighten me?


Raising a page fault or similar corresponds to what the compiler
calls "undefined behavior".

If you load what you think is a physical address pointer into a
register and then attempt to use it, on many systems it will be
interpreted as a virtual address. One of the worst things that can
happen here is that it appears to work, but you access the *WRONG
MEMORY*. Just because physical address 0x0B0000 or 0xB000:0000 is
video memory doesn't mean that any virtual address refers to video
memory, or if there IS a way to access video memory, that it looks
anything like 0xB0000.

If there is a way to map physical memory, it is likely to be done
by asking the OS nicely, and the return value from the request will
tell you where (in virtual memory) the OS put it. (Consider mmap(),
for example).
2. There are platforms where loading an arbitrary address in an address
register generates a fault. If the compiler decides to store the
pointer in an address register or if loading something into a pointer
involves storing the data first into an address register...


You have to forgive me, but I find this hard to believe. Surely, there
must be a way for malloc (and other related function - mmap etc.) to
return the address to the newly allocated chunk, and surely, that must
involve storing the address of the chunk into an address register? Or
did I misunderstand something?


There is a way for malloc() to return the address of a newly allocated
chunk. Put some arbitrary bit pattern in there instead, and KABOOM!
Consider an address with a random segment number on an i386-architecture
in protected mode. The chances that the random segment number refers
to one actually in use are small.

Gordon L. Burditt
Nov 14 '05 #35
>Indeed, I know that is the best way to think of it. However, I'm in the
dark as to whether it's _possible_ to think of them as numbers. See, I'm
under the impression that C is defined to operate in a linear address
space, with pointers being defined as offsets into this address space.
Stop thinking that right now!
C is defined to operate in a non-liner address space, where pointers
are *NOT* necessarily offsets into this address space.

It sorta accidentally happens that a linear address space works too.
Please tell me if this is not the case. (Also, if this is not the case,
can someone be as kind as to provide an example of an architecture where
it isn't?)


Take a hard look at the Intel [3456]86 architecture.

Consider large-model 16-bit real mode, where a pointer consists of
a segment register and an offset.

Consider large-model protected mode (16 or 32 bit) where loading a
segment register with an invalid segment number causes a trap.

Gordon L. Burditt
Nov 14 '05 #36
Fredrik Tolf wrote:
On Mon, 2004-08-23 at 13:26 +0000, Dan Pop wrote:
.... snip ...
2. There are platforms where loading an arbitrary address in an
address register generates a fault. If the compiler decides
to store the pointer in an address register or if loading
something into a pointer involves storing the data first
into an address register...


You have to forgive me, but I find this hard to believe. Surely,
there must be a way for malloc (and other related function -
mmap etc.) to return the address to the newly allocated chunk,
and surely, that must involve storing the address of the chunk
into an address register? Or did I misunderstand something?


But malloc etc. are not returning an _/arbitrary/_ address. They
are returning the address of a chunk of memory that has been
specifically made available and usable to the caller. How it does
this is not specified, and obviously must be system specific.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #37
On Mon, 2004-08-23 at 20:45 +0000, Gordon Burditt wrote:
Indeed, I know that is the best way to think of it. However, I'm in the
dark as to whether it's _possible_ to think of them as numbers.

See, I'm
under the impression that C is defined to operate in a linear address
space, with pointers being defined as offsets into this address space.


Stop thinking that right now!
C is defined to operate in a non-liner address space, where pointers
are *NOT* necessarily offsets into this address space.

It sorta accidentally happens that a linear address space works too.


I already found out in another reply, but thanks for replying anyway.
Please tell me if this is not the case. (Also, if this is not the case,
can someone be as kind as to provide an example of an architecture where
it isn't?)


Take a hard look at the Intel [3456]86 architecture.

Consider large-model 16-bit real mode, where a pointer consists of
a segment register and an offset.

Consider large-model protected mode (16 or 32 bit) where loading a
segment register with an invalid segment number causes a trap.


Well, that was precisely the thing: The only real mode C implementation
I've seen (MSVC 1.0, long ago) uses a non-compliant extension called
"far pointers" that incorporate the segment, and all protected mode C
implementations I've seen don't deal with selectors at all. That was
what had made my impression of that.

Fredrik Tolf
Nov 14 '05 #38

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Kostatus | last post by:
I have a virtual function in a base class, which is then overwritten by a function of the same name in a publically derived class. When I call the function using a pointer to the derived class...
117
by: Peter Olcott | last post by:
www.halting-problem.com
18
by: Ian Stanley | last post by:
Hi, Continuing my strcat segmentation fault posting- I have a problem which occurs when appending two sting literals using strcat. I have tried to fix it by writing my own function that does the...
28
by: Jon Davis | last post by:
If I have a class with a virtual method, and a child class that overrides the virtual method, and then I create an instance of the child class AS A base class... BaseClass bc = new ChildClass();...
6
by: Ammar | last post by:
Dear All, I'm facing a small problem. I have a portal web site, that contains articles, for each article, the end user can send a comment about the article. The problem is: I the comment length...
16
by: Dany | last post by:
Our web service was working fine until we installed .net Framework 1.1 service pack 1. Uninstalling SP1 is not an option because our largest customer says service packs marked as "critical" by...
2
by: Mike Collins | last post by:
I cannot get the correct drop down list value from a drop down I have on my web form. I get the initial value that was loaded in the list. It was asked by someone else what the autopostback was...
0
by: =?Utf-8?B?am8uZWw=?= | last post by:
Hello All, I am developing an Input Methop (IM) for PocketPC / Windows Mobile (PPC/WM). On some devices the IM will not start. The IM appears in the IM-List but when it is selected from the...
1
by: sherifbk | last post by:
Problem description ============== - I have 4 clients and 1 server (SQL server) - 3 clients are Monitoring console 1 client is operation console - Monitoring console collects some data from...
9
by: AceKnocks | last post by:
I am working on a framework design problem in which I have to design a C++ based framework capable of solving three puzzles for now but actually it should work with a general puzzle of any kind and I...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.