pointer normalization

sophia

Dear all,

can any one explain what is meant by pointer normalization
given here:-

http://c-faq.com/ansi/norml.html

Jun 27 '08 #1

Subscribe Post Reply

2414

Richard Heathfield

sophia said:

Dear all,

can any one explain what is meant by pointer normalization
given here:-

http://c-faq.com/ansi/norml.html

It is possible for a pointer value to have more than one object
representation. If p1 and p2 are two pointers to the same object, but with
different object representations, then p1 == p2 is required to yield 1, so
the implementation must (behave as if to) supply code to "normalise" the
pointer values used in the comparison - i.e. to convert one or the other
or both to a common form. (This was perfectly common in MS-DOS days.)

Note that the same requirement (of identifying the equality of those two
pointers) is not imposed on memcmp(&p1, &p2, sizeof p1).

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Jun 27 '08 #2

sophia

On Apr 16, 7:59 am, Richard Heathfield <r...@see.sig.invalidwrote:

sophia said:

Dear all,

can any one explain what is meant by pointer normalization
given here:-

http://c-faq.com/ansi/norml.html

It is possible for a pointer value to have more than one object
representation. If p1 and p2 are two pointers to the same object, >but with different object representations,

Object in C means region of data storage isn't it ?

i am not getting your point,
same object with different representations ?

Jun 27 '08 #3

Nick Keighley

On 16 Apr, 13:33, sophia <sophia.ag...@gmail.comwrote:

On Apr 16, 7:59 am, Richard Heathfield <r...@see.sig.invalidwrote:

*can any one explain what is meant by pointer normalization
given here:-

>http://c-faq.com/ansi/norml.html

It is possible for a pointer value to have more than one object
representation. If p1 and p2 are two pointers to the same object,
but with different object representations,

Object in C means region of data storage isn't it ?

i am not getting your point,
*same object with different representations ?

not possible on a sane architecture, but see the "Segmentation"
section of http://en.wikipedia.org/wiki/Intel_8086
--
Nick Keighley

Jun 27 '08 #4

Richard Heathfield

sophia said:

On Apr 16, 7:59 am, Richard Heathfield <r...@see.sig.invalidwrote:
>sophia said:

Dear all,

can any one explain what is meant by pointer normalization
given here:-

>http://c-faq.com/ansi/norml.html

It is possible for a pointer value to have more than one object
representation. If p1 and p2 are two pointers to the same object, >but
with different object representations,

Object in C means region of data storage isn't it ?

i am not getting your point,
same object with different representations ?

Consider MS-DOS's 20-bit pointers, where logical addresses are described by
two 16-bit values, called "segment" and "offset" respectively. To get the
physical address, we left-shift the segment value by four bits, and then
add the offset value.

Logical Logical Physical
segment offset address
address address
0001 1030 01040
0002 1020 01040
0003 1010 01040
0004 1000 01040
0005 00F0 01040

etc.

Five different pointer representations, all pointing to the same physical
object. All must compare equal when compared with ==. The implementation
is responsible for making this work.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Jun 27 '08 #5

Kenny McCormack

In article <aY*********************@bt.com>,
Richard Heathfield <rj*@see.sig.invalidwrote:
....

>Consider MS-DOS's 20-bit pointers, where logical addresses are described by
two 16-bit values, called "segment" and "offset" respectively. To get the
physical address, we left-shift the segment value by four bits, and then
add the offset value.

A nitpick surely worthy of this group...

It's not "MS-DOS's 20-bit pointers". This is a feature of the 8086
processors (and its descendants, running in "real mode"). It is not a
function of the OS in any way.

Further note that the descendants still maintain this functionality; it
is just that it is rarely used. It is no longer necessary (at least up
to the 4G mark. I'm not sure what happens if you have a machine with
more than 4G RAM).

Jun 27 '08 #6

Kenny McCormack

In article <fu**********@news.xmission.com>,
Kenny McCormack <ga*****@xmission.xmission.comwrote:

>In article <aY*********************@bt.com>,
Richard Heathfield <rj*@see.sig.invalidwrote:
...
>>Consider MS-DOS's 20-bit pointers, where logical addresses are described by
two 16-bit values, called "segment" and "offset" respectively. To get the
physical address, we left-shift the segment value by four bits, and then
add the offset value.

A nitpick surely worthy of this group...

And note that a *real* first-class nitpicker would point out that it's
not even *MS*-DOS, as if this functionality were someone unique to
and/or invented by Microsoft...

Jun 27 '08 #7

Kenneth Brody

sophia wrote:

>
On Apr 16, 7:59 am, Richard Heathfield <r...@see.sig.invalidwrote:
sophia said:

Dear all,

can any one explain what is meant by pointer normalization
given here:-

>http://c-faq.com/ansi/norml.html
It is possible for a pointer value to have more than one object
representation. If p1 and p2 are two pointers to the same object,
but with different object representations,

Object in C means region of data storage isn't it ?

i am not getting your point,
same object with different representations ?

Consider, for example, "real mode" on an x86 CPU. On that particular
platform, addresses are represented by a 16-bit segment and a 16-bit
offset. (The physical address is segment*16+offset.) Using this
particular architecture, the following segment/offset pairs all point
to the same physical address:

1234:0000
1230:0040
1200:0340
1000:2340
and even
1111:1230
0235:FFF0

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h|
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>

Jun 27 '08 #8

anoncoholic

Kenny McCormack wrote:

In article <aY*********************@bt.com>,
Richard Heathfield <rj*@see.sig.invalidwrote:
...
>Consider MS-DOS's 20-bit pointers, where logical addresses are described by
two 16-bit values, called "segment" and "offset" respectively. To get the
physical address, we left-shift the segment value by four bits, and then
add the offset value.

A nitpick surely worthy of this group...

It's not "MS-DOS's 20-bit pointers". This is a feature of the 8086
processors (and its descendants, running in "real mode"). It is not a
function of the OS in any way.

Well if we're nitpicking.. :P
He didn't claim ms-dos 'owns or invented' the concept. He just offered
it as an example. MS-DOS did in fact have 20bit pointers because it ran
on 8086.

Jun 27 '08 #9

Kenny McCormack

In article <48********@news.acsalaska.net>, anoncoholic <no@no.netwrote:

>Kenny McCormack wrote:
>In article <aY*********************@bt.com>,
Richard Heathfield <rj*@see.sig.invalidwrote:
...
>>Consider MS-DOS's 20-bit pointers, where logical addresses are described by
two 16-bit values, called "segment" and "offset" respectively. To get the
physical address, we left-shift the segment value by four bits, and then
add the offset value.

A nitpick surely worthy of this group...

It's not "MS-DOS's 20-bit pointers". This is a feature of the 8086
processors (and its descendants, running in "real mode"). It is not a
function of the OS in any way.

Well if we're nitpicking.. :P
He didn't claim ms-dos 'owns or invented' the concept. He just offered
it as an example. MS-DOS did in fact have 20bit pointers because it ran
on 8086.

I doubt the MSDOS standards document uses the phrase "20 bit"...

Jun 27 '08 #10

santosh

Kenny McCormack wrote:

In article <aY*********************@bt.com>,
Richard Heathfield <rj*@see.sig.invalidwrote:
...
>>Consider MS-DOS's 20-bit pointers, where logical addresses are
described by two 16-bit values, called "segment" and "offset"
respectively. To get the physical address, we left-shift the segment
value by four bits, and then add the offset value.

A nitpick surely worthy of this group...

It's not "MS-DOS's 20-bit pointers". This is a feature of the 8086
processors (and its descendants, running in "real mode"). It is not a
function of the OS in any way.

Further note that the descendants still maintain this functionality;
it is just that it is rarely used. It is no longer necessary (at
least up to the 4G mark. I'm not sure what happens if you have a
machine with more than 4G RAM).

The system could employ PAE to use upto 64 Gb, though applications still
see only 4 Gb and remain flat model based.

Jun 27 '08 #11

Kaz Kylheku

On Apr 16, 5:47*am, Nick Keighley <nick_keighley_nos...@hotmail.com>
wrote:

On 16 Apr, 13:33, sophia <sophia.ag...@gmail.comwrote:

On Apr 16, 7:59 am, Richard Heathfield <r...@see.sig.invalidwrote:
*can any one explain what is meant by pointer normalization
given here:-

http://c-faq.com/ansi/norml.html

It is possible for a pointer value to have more than one object
representation. If p1 and p2 are two pointers to the same object,
but with different object representations,

Object in C means region of data storage isn't it ?

i am not getting your point,
*same object with different representations ?

not possible on a sane architecture, but see the "Segmentation"
section ofhttp://en.wikipedia.org/wiki/Intel_8086

Many sane architectures support virtual memory, by means of which you
can create aliases of the same object at different virtual addresses.

On some architectures, certain bits in an address determine whether,
for instance, the same address range is being accessed cached or
uncached.

There are also sane forms of segmentation, not simply based on
multiplying a segment by some constant and adding the offset.

Jun 27 '08 #12

Richard Tobin

In article <64**********************************@t54g2000hsg. googlegroups.com>,
Kaz Kylheku <kk******@gmail.comwrote:

It is possible for a pointer value to have more than one object
representation. If p1 and p2 are two pointers to the same object,
but with different object representations,

Object in C means region of data storage isn't it ?

i am not getting your point,
*same object with different representations ?

>not possible on a sane architecture, but see the "Segmentation"
section ofhttp://en.wikipedia.org/wiki/Intel_8086

>Many sane architectures support virtual memory, by means of which you
can create aliases of the same object at different virtual addresses.

On some architectures, certain bits in an address determine whether,
for instance, the same address range is being accessed cached or
uncached.

This is true, but not really relevant to the question. Such different
representations will not arise as a result of standard C operations.
An implementation may well provide a way to get an uncached pointer
to an object, but it is under no obligation to make that compare
equal to the cached version.

-- Richard
--
:wq

Jun 27 '08 #13

Richard Tobin

In article <fu**********@news.xmission.com>,
Kenny McCormack <ga*****@xmission.xmission.comwrote:

>I doubt the MSDOS standards document uses the phrase "20 bit"...

There's an MSDOS standards document?

-- Richard

--
:wq

Jun 27 '08 #14

Richard Bos

ri*****@cogsci.ed.ac.uk (Richard Tobin) wrote:

Kenny McCormack <ga*****@xmission.xmission.comwrote:
I doubt the MSDOS standards document uses the phrase "20 bit"...

There's an MSDOS standards document?

Erm, yes? I don't remember what colour it was, but IIRC burgundy.
Techref for the hardware was blue. They came in binders.

Richard

Jun 27 '08 #15

Charlton Wilbur

>>>>"MD" == Morris Dovey <mr*****@iedu.comwrites:

(quoting someone else, which attribution I have lost)

>If the top 4 bits are taken outside, they should be decoded
(eg. zeros). Otherwise extending the memory beyond 4K words
becomes difficult. And problems on those top 4 address lines
might be missed.

MDI'm almost embarassed to point out how easily any such
MDproblems are avoided. All that decoding unused address lines
MDgets is increased product cost. We already have enough
MDbloatware without building it into circuits, too.

>If not then the uP should insist those 4 bits were zero or
whatever. Then the same software can run on a version with more
address lines.

MDNah - it's a non-issue, and only worth mentioning as an
MDexample of multiple addresses pointing to the same memory
MDlocation.

Ask the people of Mac SE/30s who had to struggle with "32-bit clean"
software issues, particularly from Microsoft, if it's a "non-issue."

You see, in the early days of the Macintosh computer line, around the
time that the Macintosh II came out, it was obvious to anyone that
only 24 bits of memory addressing would ever be necessary.
Microsoft's products, most notably Word, used the upper 8 bits of the
32-bit pointer to hold information about the pointer.

Of course, like the famous 640K prediction, this one was also false,
and when Macs came out that actually had enough memory, virtual and
otherwise, so that full 32-bit addressing was useful, people had to
cope with the backwards Microsoft decision, and the resulting random
crashes of Word, destroying hours' worth of work, for several years
while it all got straightened out.

So yeah, if you are *sure* that you'll never need the full range of
pointers, don't bother with any sort of normalization. Just use
Microsoft as an object lesson.

Charlton
--
Charlton Wilbur
cw*****@chromatico.net

Jun 27 '08 #16

Erik Trulsson

Charlton Wilbur <cw*****@chromatico.netwrote:

>>>>>"MD" == Morris Dovey <mr*****@iedu.comwrites:

(quoting someone else, which attribution I have lost)

>If the top 4 bits are taken outside, they should be decoded
>(eg. zeros). Otherwise extending the memory beyond 4K words
>becomes difficult. And problems on those top 4 address lines
>might be missed.

MDI'm almost embarassed to point out how easily any such
MDproblems are avoided. All that decoding unused address lines
MDgets is increased product cost. We already have enough
MDbloatware without building it into circuits, too.

>If not then the uP should insist those 4 bits were zero or
>whatever. Then the same software can run on a version with more
>address lines.

MDNah - it's a non-issue, and only worth mentioning as an
MDexample of multiple addresses pointing to the same memory
MDlocation.

Ask the people of Mac SE/30s who had to struggle with "32-bit clean"
software issues, particularly from Microsoft, if it's a "non-issue."

For a general purpose-computer with lots of third-party software it can
indeed be a problem.

For an embedded device (CPUs with only 4K words are not used in anything
except embedded devices nowadays) it is not a problem.
Here people can switch to a completely different CPU architecture (with
all the porting this entails) if they can save 50 cents per unit.

For embedded devices one usually does not need to care about binary
compatibility (or even source code compatibility) with previous or
future generations of the product.

>
You see, in the early days of the Macintosh computer line, around the
time that the Macintosh II came out, it was obvious to anyone that
only 24 bits of memory addressing would ever be necessary.
Microsoft's products, most notably Word, used the upper 8 bits of the
32-bit pointer to hold information about the pointer.

Of course, like the famous 640K prediction, this one was also false,
and when Macs came out that actually had enough memory, virtual and
otherwise, so that full 32-bit addressing was useful, people had to
cope with the backwards Microsoft decision, and the resulting random
crashes of Word, destroying hours' worth of work, for several years
while it all got straightened out.

So yeah, if you are *sure* that you'll never need the full range of
pointers, don't bother with any sort of normalization. Just use
Microsoft as an object lesson.

Charlton

--
<Insert your favourite quote here.>
Erik Trulsson
er******@student.uu.se

Jun 27 '08 #17

Morris Dovey

Charlton Wilbur wrote:

So yeah, if you are *sure* that you'll never need the full range of
pointers, don't bother with any sort of normalization. Just use
Microsoft as an object lesson.

Hmm. My choice would be to _not_ use M$ in /any/ context.

--
Morris Dovey
DeSoto Solar
DeSoto, Iowa USA
http://www.iedu.com/DeSoto/

Jun 27 '08 #18

Charlton Wilbur

>>>>"MD" == Morris Dovey <mr*****@iedu.comwrites:

MDCharlton Wilbur wrote:

>So yeah, if you are *sure* that you'll never need the full
range of pointers, don't bother with any sort of normalization.
Just use Microsoft as an object lesson.

MDHmm. My choice would be to _not_ use M$ in /any/ context.

Even as an example of what not to do?

Charlton
--
Charlton Wilbur
cw*****@chromatico.net

Jun 27 '08 #19

Morris Dovey

Charlton Wilbur wrote:

>

>>>"MD" == Morris Dovey <mr*****@iedu.comwrites:

MDCharlton Wilbur wrote:

>So yeah, if you are *sure* that you'll never need the full
>range of pointers, don't bother with any sort of normalization.
>Just use Microsoft as an object lesson.

MDHmm. My choice would be to _not_ use M$ in /any/ context.

Even as an example of what not to do?

There're enough examples without their contributions.

--
Morris Dovey
DeSoto Solar
DeSoto, Iowa USA
http://www.iedu.com/DeSoto/

Jun 27 '08 #20

sophia

On Apr 16, 5:52*pm, Richard Heathfield <r...@see.sig.invalidwrote:

sophia said:

On Apr 16, 7:59 am, Richard Heathfield <r...@see.sig.invalidwrote:
sophia said:

Dear all,

*can any one explain what is meant by pointer normalization
given here:-

http://c-faq.com/ansi/norml.html

It is possible for a pointer value to have more than one object
representation. If p1 and p2 are two pointers to the same object, >but
with different object representations,

Object in C means region of data storage isn't it ?

i am not getting your point,
*same object with different representations ?

Consider MS-DOS's 20-bit pointers, where logical addresses are described by
two 16-bit values, called "segment" and "offset" respectively. To get the
physical address, we left-shift the segment value by four bits, and then
add the offset value.

Logical * *Logical * *Physical
segment * *offset * * address
address * *address
* 0001 * * * 1030 * * 01040
* 0002 * * * 1020 * * 01040
* 0003 * * * 1010 * * 01040
* 0004 * * * 1000 * * 01040
* 0005 * * * 00F0 * * 01040

etc.

Five different pointer representations, all pointing to the same physical
object. All must compare equal when compared with ==. The implementation
is responsible for making this work.

Peter van der linden in his book says that

An address on the intel 8086 is formed by combining a 16- bit segment
with a 16 bit offset..........................
.................................................. ...
............................................

In general , there will be 4096 different segment/offset combinations
that point to the same address.

A C - compiler writer needs to make sure that pointers are compared in
canonical form on a P.C , otherwise two pointers that have different
bit patterns but designate the same address may wrongly compare un
equal. this will be done for you if you use the "huge" keyword, but
does not occur for the large model.

now question is

In present day 32 bit C compilers how this pointer normalization
done ?

Jun 27 '08 #21

Bartc

sophia wrote:

Peter van der linden in his book says that

An address on the intel 8086 is formed by combining a 16- bit segment
with a 16 bit offset..........................
.................................................. ...
............................................

In general , there will be 4096 different segment/offset combinations
that point to the same address.

....

In present day 32 bit C compilers how this pointer normalization
done ?

If you're talking about common desktop PCs, there usually isn't any
normalisation needed, each pointer is a linear 32-bit value.

But there will always be a few odd machines with strange ways of addressing
memory. And with 64-bit machines, if 32-bit and 64-bit pointers are ever
mixed in the same application, some normalising is needed there (but the
situation is a little different: a short pointer with a long one).

That's the compiler's headache however not yours.

Are you researching an article or something?

--
Bart

Jun 27 '08 #22

Gordon Burditt

>Peter van der linden in his book says that

>
An address on the intel 8086 is formed by combining a 16- bit segment
with a 16 bit offset..........................

>In general , there will be 4096 different segment/offset combinations
that point to the same address.

A C - compiler writer needs to make sure that pointers are compared in
canonical form on a P.C , otherwise two pointers that have different
bit patterns but designate the same address may wrongly compare un
equal. this will be done for you if you use the "huge" keyword, but
does not occur for the large model.

A compiler writer only needs to worry about normalizing pointers
if non-normalized pointers can occur by normal operation of C. For
example, in large model, pointer addition affects only the offset
and you can't have an object larger than 64k, so pointer arithmetic
won't generate un-normalized pointers, so a straight 32-bit comparison
can be used to compare pointers (assuming you've covered all the other
ways of generating un-normalized pointers). However, in huge model,
objects can be larger than 64k, and pointer arithmetic may alter
the segment, so you have a choice of pointer arithmetic always doing
normalizing, or comparing pointers doing normalizing.

>In present day 32 bit C compilers how this pointer normalization
done ?

Present day desktops use linear addresses (unless you're talking
about Pentium 48-bit pointers and a larger than 4GB address space
for an individual process). You can't have an un-normalized pointer,
so it's a non-issue.

Jun 27 '08 #23

pointer normalization

Similar topics