Why pointer to "one past" is allowed but pointer to "one before" is not ?

spibou

Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ? Is there any reason why the
former is more useful than the later ?

Spiros Bousbouras

Jun 23 '06 #1

Subscribe Reply

2004

Nils O. Selåsdal

sp****@gmail.com wrote:

Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ? Is there any reason why the
former is more useful than the later ?

Consider code such as

char *str = somestring;
while(*str++) {
...
}

str might end up one place past somestring - nice to allow that.

Jun 23 '06 #2

spibou

Nils O. Selåsdal wrote:

sp****@gmail.com wrote:
Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ? Is there any reason why the
former is more useful than the later ?

Consider code such as

char *str = somestring;
while(*str++) {
...
}

str might end up one place past somestring - nice to allow that.

Yes it is. My question was why the "opposite" is not allowed too.
One could have just as easily something like
while (source >= beg_of_string) *dest++ = *source-- ;
to copy a string in reverse for example.

Does my example evoke undefined behaviour by the way ?

Spiros Bousbouras

Jun 23 '06 #3

Richard Bos

sp****@gmail.com wrote:

Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ?

Because a pointer one past any array need only take a single byte (since
only the address of the _first byte_ of the virtual member need be
valid, not any further ones), but a pointer one before the beginning
requires the assignment of memory space the size of an entire array
member. Given that the array member can be a humungous struct containing
arrays of structs of arrays of long doubles, this can cost a lot of
address space that could otherwise be gainfully employed.

Richard

Jun 23 '06 #4

Marc Boyer

Le 23-06-2006, sp****@gmail.com <sp****@gmail.com> a écrit*:

Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ? Is there any reason why the
former is more useful than the later ?

More useful, yes if you agree than there are more
increasing loop than decreasing ones.

But I believe the real reason is that it is easier to
implement on hardware: you just have to waste 1
memory adress, that is to say, your processor
can adress from 0 up to 2^N-1, then, if all
data are stored bewteen 0 and 2^N-2, then,
'one position past' is at worst 2^N-1, which is
a valid adress for your processor, and pointer
arithmetic still apply.

But, 'one position before' is harder. You can not
have any bound on the size of the reserved memory
at the beginning. Because if an object of size S
is stored at adress N, then, &S+1 is just one char
after the space used to store S, but &S-1 is
'sizeof(S)' char before...

It's a bit hard to explain without any blackboard,
and I am not very good at ASCII art.

Marc Boyer

Jun 23 '06 #5

Richard Tobin

In article <44**************@news.xs4all.nl>,
Richard Bos <rl*@hoekstra-uitgeverij.nl> wrote:

Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ?
Because a pointer one past any array need only take a single byte (since
only the address of the _first byte_ of the virtual member need be
valid, not any further ones), but a pointer one before the beginning
requires the assignment of memory space the size of an entire array
member.

That's one reason, but I think a much more compelling one was that
there was lots of existing code that did things like

for(p=proc; p<procNPROC; p++)

and very little that did the reverse.

-- Richard

Jun 23 '06 #6

Nils O. Selåsdal

sp****@gmail.com wrote:

Nils O. Selåsdal wrote:
sp****@gmail.com wrote:
Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ? Is there any reason why the
former is more useful than the later ?

Consider code such as

char *str = somestring;
while(*str++) {
...
}

str might end up one place past somestring - nice to allow that.

Yes it is. My question was why the "opposite" is not allowed too.

It's much,*much* more common to iterate this way, over the other way,
and probably was when the spec made :)

Jun 23 '06 #7

Andrey Tarasevich

sp****@gmail.com wrote:

...
Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ? Is there any reason why the
former is more useful than the later ?
...

There are several different reasons for that. One of them is described
below.

The storage is normally filled with the objects from smaller addresses
to larger addresses (i.e. in the same direction in which array indices
grow). For this reason, it is not unusual to have an object that resides
close to the beginning of the storage. To create a "before" pointer (and
properly support all pointer operations) for such an object might be
either impossible or unjustifiably difficult (since such a pointer would
have to point somewhere before the beginning of the storage). "Beginning
of the storage" in this case does not necessarily stand for the
beginning of physical memory. On a hardware platform with
segmented-memory the beginning of a segment has similar properties.

--
Best regards,
Andrey Tarasevich

Jun 23 '06 #8

Andrey Tarasevich

sp****@gmail.com wrote:

...
Yes it is. My question was why the "opposite" is not allowed too.
One could have just as easily something like
while (source >= beg_of_string) *dest++ = *source-- ;
to copy a string in reverse for example.

Does my example evoke undefined behaviour by the way ?
...

Formally, it does lead to UB, since it attempts to create a "one before"
pointer.

The problem with your code in its nature is similar to the problem with
the following code

unsigned i;
...
while (i >= 0) dest[i] = source[i--];

Note that an unsigned value will never be negative and the loop will
never end.

Essentially the same thing can happen in case of a pointer. If we think
consider pointers (addresses) as arithmetic values, they are unsigned.
Imagine that your 'beg_of_string' pointer points to address 0. How do
you expect you loop to end in this case? How do you expect to represent
a pointer that is less than '0'?

--
Best regards,
Andrey Tarasevich

Jun 23 '06 #9

William Ahern

On Fri, 23 Jun 2006 05:40:43 -0700, spibou wrote:

Nils O. Selåsdal wrote:
sp****@gmail.com wrote:
> Why is a pointer allowed to point to one position past the end of an
> array but not to one position before the beginning of an array ? Is
> there any reason why the former is more useful than the later ?

Consider code such as

char *str = somestring;
while(*str++) {
...
}
}
str might end up one place past somestring - nice to allow that.

Yes it is. My question was why the "opposite" is not allowed too. One
could have just as easily something like while (source >= beg_of_string)
*dest++ = *source-- ; to copy a string in reverse for example.

Does my example evoke undefined behaviour by the way ?

Yes. And to see a real world example of such a program failing because of
this (not that undefined means it must fail), try this compiler

http://fabrice.bellard.free.fr/tcc/

w/ your code, using the -b switch (bounds checker). I never heeded the
standard on this point until I started using TCC to improve my code
portability.

I must say there are some circumstances where it is indeed desirable to
iterate backwards. For example, I had to tweak many places in a memory
pool library because I would iterate backwards from a given pointer
reading bookkeeping information until I hit a terminator bit. Took me hours to
figure out why my program would crash using TCC:

/*
* Beginning from *p, work backwards reconstructing the value of an
* rbitsint_t integer. Stop when the highest order bit of *p is set, which
* should have been previously preserved as a marker. Return the
* reconstructed value, setting *end to the last position used of p.
*/
static inline rbitsint_t rbits_get(unsigned char *p, unsigned char **end) {
rbitsint_t i = 0; /* currently typedef to size_t */
int n = 0;

do {
i |= (*p & ~(1 << (CHAR_BIT - 1))) << (n++ * (CHAR_BIT - 1));
} while (!(*(p--) & (1 << (CHAR_BIT - 1))));

*end = p + 1;

return i;
} /* rbits_get() */

Jun 23 '06 #10

Mark McIntyre

On 23 Jun 2006 05:40:43 -0700, in comp.lang.c , sp****@gmail.com
wrote:

Yes it is. My question was why the "opposite" is not allowed too.

Imagine your object was at the *very start* of memory. one-before
would be nowhere.

Going the other way is no problem - the abstract machine has infinite
memory so there is no 'very end'...

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Jun 23 '06 #11

Keith Thompson

Mark McIntyre <ma**********@spamcop.net> writes:

On 23 Jun 2006 05:40:43 -0700, in comp.lang.c , sp****@gmail.com
wrote:
Yes it is. My question was why the "opposite" is not allowed too.

Imagine your object was at the *very start* of memory. one-before
would be nowhere.

Going the other way is no problem - the abstract machine has infinite
memory so there is no 'very end'...

The abstract machine has no "very start" of memory either, and its
memory is limited to 2**(CHAR_BIT * sizeof(void*)) bytes.

The purpose of the rule is to avoid problems on real-world machines
with finite address spaces. To allow a pointer just past the end of
an array, an implementation, at most, has to allocate one extra byte.
To allow a pointer just before the beginning of an array, it might
have to allocate enough space for an entire array element, which can
be almost arbitrarily large. (It may not have to allocate any memory
if to can form an address for memory that doesn't exist, but the
standard allows for the possibility that it does have to do so.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jun 24 '06 #12

Skarmander

William Ahern wrote:

On Fri, 23 Jun 2006 05:40:43 -0700, spibou wrote:
Nils O. Selåsdal wrote:
sp****@gmail.com wrote:
Why is a pointer allowed to point to one position past the end of an
array but not to one position before the beginning of an array ? Is
there any reason why the former is more useful than the later ?
Consider code such as

char *str = somestring;
while(*str++) {
...
}
}
str might end up one place past somestring - nice to allow that. Yes it is. My question was why the "opposite" is not allowed too. One
could have just as easily something like while (source >= beg_of_string)
*dest++ = *source-- ; to copy a string in reverse for example.

Does my example evoke undefined behaviour by the way ?

Yes. And to see a real world example of such a program failing because of
this (not that undefined means it must fail), try this compiler

http://fabrice.bellard.free.fr/tcc/

w/ your code, using the -b switch (bounds checker). I never heeded the
standard on this point until I started using TCC to improve my code
portability.

I must say there are some circumstances where it is indeed desirable to
iterate backwards. For example, I had to tweak many places in a memory
pool library because I would iterate backwards from a given pointer
reading bookkeeping information until I hit a terminator bit. Took me hours to
figure out why my program would crash using TCC:

/*
* Beginning from *p, work backwards reconstructing the value of an
* rbitsint_t integer. Stop when the highest order bit of *p is set, which
* should have been previously preserved as a marker. Return the
* reconstructed value, setting *end to the last position used of p.
*/
static inline rbitsint_t rbits_get(unsigned char *p, unsigned char **end) {
rbitsint_t i = 0; /* currently typedef to size_t */
int n = 0;

do {
i |= (*p & ~(1 << (CHAR_BIT - 1))) << (n++ * (CHAR_BIT - 1));

Two problems with this line.

First, ~(1 << (CHAR_BIT - 1)) is a questionable expression. 1 << (CHAR_BIT -
1) is a signed integer, to which ~ is applied. The value of the result is
implementation-defined. This will happen to work here because *p is at most
as wide as the bitmask, so the irrelevant bits will be masked out and the
actual value of the expression doesn't matter. Still, this is a habit best
unlearned. Use U<FOO>_MAX >> 1 for the all-ones-except-the-MSB bitmask,
where U<FOO>_MAX can be defined if necessary as ((unsigned foo) -1).

Second, (*p & ~(1 << (CHAR_BIT - 1))) is an int. You then shift this int by
(n++ * (CHAR_BIT - 1)), which has potential for undefined behavior since it
can exceed the width of an int. What you want is to shift an rbitsint_t, not
an int. (Unlike the previous issue, this can be a real problem -- try
typedef'ing rbitsint_t as a 64-bit type on a 32-bit architecture and reading
more than 32 value bits into it to see what I mean.)

This should fix both issues:
i |= (rbitsint_t) (*p & (UCHAR_MAX >> 1)) << (n++ * (CHAR_BIT - 1));
} while (!(*(p--) & (1 << (CHAR_BIT - 1))));

*end = p + 1;

return i;
} /* rbits_get() */

Not inspired by the overflow observation, but here's my take:

#define LOMASK (UCHAR_MAX >> 1)
#define MSB (LOMASK + 1)

static inline rbitsint_t rbits_get(unsigned char *p, unsigned char **end) {
rbitsint_t i;

unsigned char *q = p;
while (!(*q & MSB)) --q;
*end = q;

i = *q & LOMASK;
while (++q <= p) {
i = i << (CHAR_BIT - 1) | *q;
}
return i;
}

Yes, it's more lines; yes, I go backwards just to go forwards again. If the
operations involved are expensive, this is not a good idea, but here the
operations aren't expensive. I personally find this easier to read, and for
my compiler and my machine (YMMV) it's actually faster.

Of course, I assume you've rewritten your code since discovering the
one-before-the-beginning bug, possibly even along these lines, so my point
may be moot.

S.

Jun 24 '06 #13

by: Christian Seberino | last post by:

I specify packages in my setup.py but what if I want all py files in a directory but one??? How remove just that one?? (Without listing all files I want explicitly??) I need header files to...

Python

naked func for "one-way recurrsion"

by: Allen | last post by:

Hi all, I have a tree object I'm writing that searches itself using a recursive search. Thing is, I'm wanting to also include the ability to simply transfer execution if I want ("one-way...

C / C++

Telling an empty binary file from a "full" one

by: Michel Rouzic | last post by:

I have a binary file used to store the values of variables in order to use them again. I easily know whether the file exists or not, but the problem is, in case the program has been earlier...

C / C++

Unexplainable Error "One Usage per IP Socket"

by: Grant Richard | last post by:

Using the TcpListener and TcpClient I created a program that just sends and receives a short string - over and over again. The program is fine until it gets to around 1500 to 1800 messages. At...

C# / C Sharp

Is "One view per table in DW" really a good practice?

by: wxqun | last post by:

Our company is now trying to make a "standard" of creating a base view for each user table. This initiative is suggested as a good practice by a data modeling consultant who is helping us to build...

DB2 Database

" PLEASE TELL SOME ONE "

by: 2Barter.net | last post by:

" Given BACK what was freely GIVEN " More options 2 messages - Expand all 2Barter.net View profile More options Dec 12, 9:48 pm Blessing Are Country

C / C++

In 1 form..have 2 "DIV".."DIV" one is for SEARCH.."DIV" two is for result searching

by: rempit | last post by:

In 1 form..have 2 "DIV".. "DIV" one is for SEARCH.. "DIV" two is for showing the RESULT from Database of "DIV" one Button.. Everything in one page.. Anyone can help me please..

.NET Framework

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

Why pointer to "one past" is allowed but pointer to "one before" is not ?

Similar topics