libitery directory in gcc-3.1.1 source code package

Liang Chen

Is the file "bcopy.c" in the "libitery" directory the implement of the
GNU C library function "bcopy"? If so, how can it run so fast?(copy-by-byte
rather than copy-by-word) When I copy the code of "libitery/bcopy.c" to my
own code, I find that it is so slow even if I turn on "-O3" and define
"NDEBUG". Why?

Nov 14 '05 #1

Subscribe Post Reply

2054

Gordon Burditt

> Is the file "bcopy.c" in the "libitery" directory the implement of the

GNU C library function "bcopy"?
Most likely. I'm not sure I'm looking at the same version you are,
and that interface and functionality of "bcopy()" predates GNU, I
believe, but it looks like it.
If so, how can it run so fast?
Who claims that it does run fast, whether so or non-so? And with
what evidence?
(copy-by-byte
rather than copy-by-word)
When I copy the code of "libitery/bcopy.c" to my
own code, I find that it is so slow even if I turn on "-O3" and define
"NDEBUG". Why?

That implementation of bcopy() (which seems to be portable to all
platforms) still runs faster than a program that needs bcopy() but
doesn't have any implementation of it at all (and therefore won't
link).

Often you have a tradeoff: pick 1:

portable and mediocre performance
unportable, good performance on some platforms,
won't work or works incorrectly on others
unportable, good performance on some platforms,
terrible performance on others

Gordon L. Burditt

Nov 14 '05 #2

Liang Chen

Thank you for your reply, and I should say sorry for my unclear expression of these questions, which should mainly attribute to my poor English :(

In "libitery", I find the file, "memcpy.c". In it, there is the following code fragment,

PTR
DEFUN(memcpy, (out, in, length), PTR out AND const PTR in AND size_t length)
{
bcopy(in, out, length);
return out;
}
It is clear that memcpy() calls bcopy(). So, I open "bcopy.c" in the same directory and find these codes,

void
bcopy (src, dest, len)
register char *src, *dest;
int len;
{
if (dest < src)
while (len--)
*dest++ = *src++;
else
{
char *lasts = src + (len-1);
char *lastd = dest + (len-1);
while (len--)
*(char *)lastd-- = *(char *)lasts--;
}
}
This version of bcopy() is implemented to behave more "correctly" when memory blocks are overlaped. We know that according to the C89 standard, function memcpy() does not need to have this kind of "correct" behavior(maybe bcopy() needs for some dependence issues), and if a programmer calls memcpy() with two overlaped memory blocks, its behavior is not defined. So, I feel that this implementation of memcpy() is too awful. The following implementation can be better,

void* memcpy1 (register void* des, register void* src, register size_t len)
{
void* pdes = des;

for(; len>0; --len)
*(char*)des++ = *(char*)src++;

return pdes;
}

And it can be more efficient when copy a word directly,

void* memcpy2 (register void* des, register void* src, register size_t len)
{
void* pdes = des;

switch(len%sizeof(int))
{
case 3: *(char*)des++ = *(char*)src++;
case 2: *(char*)des++ = *(char*)src++;
case 1: *(char*)des++ = *(char*)src++;
}
for(len/=sizeof(int); len>0; --len)
*(int*)des++ = *(int*)src++;

return pdes;
}

It can be much more efficient if I copy more words rather than one word from des to src in "for" loop. Anyhow, memcpy2() should run faster than memcpy() does when processing large memory blocks, I believe. But, when I test them(copy between two 10240 bytes memory blocks), I am surprised to find that memcpy() runs the fastest. This result make me completely confused. Do you know the reason? Would you kind to explain it to me? Thank you!
Liang Chen

"Gordon Burditt" <go***********@burditt.org> wrote in message news:cf*******@library1.airnews.net...

Is the file "bcopy.c" in the "libitery" directory the implement of the
GNU C library function "bcopy"?

Most likely. I'm not sure I'm looking at the same version you are,
and that interface and functionality of "bcopy()" predates GNU, I
believe, but it looks like it.

If so, how can it run so fast?

Who claims that it does run fast, whether so or non-so? And with
what evidence?

(copy-by-byte
rather than copy-by-word)
When I copy the code of "libitery/bcopy.c" to my
own code, I find that it is so slow even if I turn on "-O3" and define
"NDEBUG". Why?

That implementation of bcopy() (which seems to be portable to all
platforms) still runs faster than a program that needs bcopy() but
doesn't have any implementation of it at all (and therefore won't
link).

Often you have a tradeoff: pick 1:

portable and mediocre performance
unportable, good performance on some platforms,
won't work or works incorrectly on others
unportable, good performance on some platforms,
terrible performance on others

Gordon L. Burditt

Nov 14 '05 #3

Gordon Burditt

>This version of bcopy() is implemented to behave more "correctly" when =

memory blocks are overlaped. We know that according to the C89 standard, =
function memcpy() does not need to have this kind of "correct" =
behavior(maybe bcopy() needs for some dependence issues), and if a =
programmer calls memcpy() with two overlaped memory blocks, its behavior =
is not defined. So, I feel that this implementation of memcpy() is too =
awful. The following implementation can be better,
I believe the "definition" of bcopy() (which is not ANSI C, but some
kind of old BSD de-facto non-standard) includes non-destructive
handling of overlapping areas. This is NOT true of memcpy() in
ANSI C but is true of memmove().
void* memcpy1 (register void* des, register void* src, register size_t =
len)
{
void* pdes =3D des;

for(; len>0; --len)
*(char*)des++ =3D *(char*)src++;

return pdes;
}

And it can be more efficient when copy a word directly,
Warning: source code below appears to have been MIMEd to death.
void* memcpy2 (register void* des, register void* src, register size_t =
len)
{
void* pdes =3D des;

switch(len%sizeof(int))
{
case 3: *(char*)des++ =3D *(char*)src++;
case 2: *(char*)des++ =3D *(char*)src++;
case 1: *(char*)des++ =3D *(char*)src++;
}
for(len/=3Dsizeof(int); len>0; --len)
*(int*)des++ =3D *(int*)src++;
I can see no reason why the above line won't smegfault on a
majority of calls to memcpy2() on a machine which enforces alignment
restrictions. Nasty example:
char buf[10240];

... something to put some data in buf ...
memcpy2(buf+3, buf, strlen(buf)+1);

Another possibility is that the machine doesn't enforce alignment
restrictions but comes up with the wrong answer. That is, assuming
4 byte ints,
*(int *) 0xdeadbee3
fetches or stores the integer at the addresses 0xdeadbee0 thru 0xdeadbee3,
*NOT* 0xdeadbee3 thru 0xdeadbee6.
return pdes;
}

It can be much more efficient if I copy more words rather than one word =
from des to src in "for" loop.
I don't consider "segmentation fault - core dumped" to be more
efficient than anything which doesn't core dump. There are ways
to copy words at a time in the presence of alignment restrictions.
This isn't it.
Anyhow, memcpy2() should run faster than =
memcpy() does when processing large memory blocks, I believe.
I believe that any such statement about how performance otto-be is
made *BECAUSE* it is wrong.
But, when =
I test them(copy between two 10240 bytes memory blocks), I am surprised =
to find that memcpy() runs the fastest. This result make me completely =
confused. Do you know the reason? Would you kind to explain it to me? =
Thank you!

I don't see any measurement methodologies or test results here.
Any performance measurements where the difference between two
ways of doing something are less than 1% or less than 10 times
the granularity of the clock being used to measure the time are
likely crap. And multitasking screws things up even worse.
The best performance demonstrations are those where you can
easily measure the difference in time with a wrist watch, *IF*
throwing the test in a loop and repeating it a million times
doesn't screw up what you are trying to measure (e.g. maybe
you don't want the test run completely from cache).

Also, are you sure you are using the memcpy() from the libiberty
directory? (As opposed to one in libc?) On FreeBSD the two
are very different.

Gordon L. Burditt

Nov 14 '05 #4

CBFalconer

Liang Chen wrote:

Part 1.1 Type: Plain Text (text/plain)
Encoding: quoted-printable

Please do not use html or mime attachments in newsgroups.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #5

Liang Chen

> I can see no reason why the above line won't smegfault on a

majority of calls to memcpy2() on a machine which enforces alignment
restrictions. Nasty example:
char buf[10240];

... something to put some data in buf ...
memcpy2(buf+3, buf, strlen(buf)+1);
Could I consider that memcpy2() is un-portable and hardware-sensitive?
Another possibility is that the machine doesn't enforce alignment
restrictions but comes up with the wrong answer. That is, assuming
4 byte ints,
*(int *) 0xdeadbee3
fetches or stores the integer at the addresses 0xdeadbee0 thru 0xdeadbee3,
*NOT* 0xdeadbee3 thru 0xdeadbee6.
I run and test my programmes on a PC. The CPU is Intel Pentium. The OS is
Linux 2.4.18-12L, not xBSD. I use GDB to debug memcpy2(), and I find that
the situation is not the same as you said above. *(int*)0xdeadbee3 does
fetch and store the integer, for example, at the addresses 0xdeadbee3 thru
0xdeadbee6 rather than 0xdeadbee0 thru 0xdeadbee3.
I don't consider "segmentation fault - core dumped" to be more
efficient than anything which doesn't core dump. There are ways
to copy words at a time in the presence of alignment restrictions.
This isn't it.
I checked my programme thoroughly last night. Now, memcpy2() looks like
this,

void* memcpy2 (register void* dest, register void* src, register size_t len)
{
void* pdest = dest;

for(; len%sizeof(int)!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
for(len/=sizeof(int); len>0; --len, dest=(int*)dest+1, src=(int*)src+1)
*(int*)dest = *(int*)src;

return pdest;
}

It is a ANSI C program this time. But my machine doesn't enforce alignment
restrictions. I do not know how to copy words at a time in the presence of
alignment restrictions. Can you give me some examples or hints?
I don't see any measurement methodologies or test results here.
Any performance measurements where the difference between two
ways of doing something are less than 1% or less than 10 times
the granularity of the clock being used to measure the time are
likely crap. And multitasking screws things up even worse.
The best performance demonstrations are those where you can
easily measure the difference in time with a wrist watch, *IF*
throwing the test in a loop and repeating it a million times
doesn't screw up what you are trying to measure (e.g. maybe
you don't want the test run completely from cache).
Now memcpy2() is as fast as memcpy() in library.
Also, are you sure you are using the memcpy() from the libiberty
directory? (As opposed to one in libc?) On FreeBSD the two
are very different.

When I say "memcpy()", I mean the memcpy() in libc.
They are different? You mean the memcpy() in libiberty is not the real code
to be compiled to add into libc? But, does the libc be made when I MAKE a
GCC package? If it does, where is it's source codes, whatever they are C
codes or ASM codes?

Chen L.

Nov 14 '05 #6

Liang Chen

> I can see no reason why the above line won't smegfault on a

majority of calls to memcpy2() on a machine which enforces alignment
restrictions. Nasty example:
char buf[10240];

... something to put some data in buf ...
memcpy2(buf+3, buf, strlen(buf)+1);
Could I consider that memcpy2() is un-portable and hardware-sensitive?
Another possibility is that the machine doesn't enforce alignment
restrictions but comes up with the wrong answer. That is, assuming
4 byte ints,
*(int *) 0xdeadbee3
fetches or stores the integer at the addresses 0xdeadbee0 thru 0xdeadbee3,
*NOT* 0xdeadbee3 thru 0xdeadbee6.
I run and test my programmes on a PC. The CPU is Intel Pentium. The OS is
Linux 2.4.18-12L, not xBSD. I use GDB to debug memcpy2(), and I find that
the situation is not the same as you said above. *(int*)0xdeadbee3 does
fetch and store the integer, for example, at the addresses 0xdeadbee3 thru
0xdeadbee6 rather than 0xdeadbee0 thru 0xdeadbee3.
I don't consider "segmentation fault - core dumped" to be more
efficient than anything which doesn't core dump. There are ways
to copy words at a time in the presence of alignment restrictions.
This isn't it.
I checked my programme thoroughly last night. Now, memcpy2() looks like
this,

void* memcpy2 (register void* dest, register void* src, register size_t len)
{
void* pdest = dest;

for(; len%sizeof(int)!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
for(len/=sizeof(int); len>0; --len, dest=(int*)dest+1, src=(int*)src+1)
*(int*)dest = *(int*)src;

return pdest;
}

It is a ANSI C program this time. But my machine doesn't enforce alignment
restrictions. I do not know how to copy words at a time in the presence of
alignment restrictions. Can you give me some examples or hints?
I don't see any measurement methodologies or test results here.
Any performance measurements where the difference between two
ways of doing something are less than 1% or less than 10 times
the granularity of the clock being used to measure the time are
likely crap. And multitasking screws things up even worse.
The best performance demonstrations are those where you can
easily measure the difference in time with a wrist watch, *IF*
throwing the test in a loop and repeating it a million times
doesn't screw up what you are trying to measure (e.g. maybe
you don't want the test run completely from cache).
Now memcpy2() is as fast as memcpy() in library.
Also, are you sure you are using the memcpy() from the libiberty
directory? (As opposed to one in libc?) On FreeBSD the two
are very different.

Nov 14 '05 #7

Chris Torek

[someone noted possible alignment problems in some code variants]

In article <news:cf**********@mail.cn99.com>
Liang Chen <ch*******@citiz.net> wrote:

I run and test my programmes on a PC. The CPU is Intel Pentium. ...
Pentium-based systems never[%] enforce alignment constraints.
Try a MIPS, ARM, or SPARC-based system, for instance (if you
can get hold of one).
When I say "memcpy()", I mean the memcpy() in libc.
They are different? You mean the memcpy() in libiberty is not the real code
to be compiled to add into libc? But, does the libc be made when I MAKE a
GCC package? If it does, where is it's source codes, whatever they are C
codes or ASM codes?

None of these are really questions about using Standard C, but rather
about how to build GNU programs with nonstandard extensions.

As it happens, the answer (based on your earlier mention of underlying
OS -- which I snipped) is that they are indeed different, the source
code is not in libiberty at all, and the source code *is* available
somewhere (because of the nature of Linux) but it is difficult to
say precisely where (again because of the nature of Linux :-) ).
The Linux C library is built when you build the Linux C library --
which, unless you-the-reader rebuild Linux, is not something you-
the-reader would normally do, even when installing various GNU
software.

As it also happens, if you use the GNU C compiler on a Pentium
system and turn optimization up high, calls to memcpy() often never
even call anything at all -- they turn into inline assembly code
instead. The compiler is allowed to do this because the name
"memcpy" is reserved, so the compiler can be sure precisely what
any call to memcpy() is supposed to do. This in turn means that
if you attempt to replace memcpy(), but do it by supplying a
different memcpy() function, your new function may never get called
at all!

The behavior described in the last paragraph above -- in which an
attempt to replace a C library function with some other substitute
fails -- is allowed by the C standard. If you want your programs
to run on any system that supports Standard C, do not attempt to
override library functions: if it works at all, it may not work
correctly.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 14 '05 #8

Chris Torek

In article <news:cf********@news4.newsguy.com> I wrote:

Pentium-based systems never[%] enforce alignment constraints.

Gah, I forgot the footnote:

[%] What, never?
No, never!
What, never?
Well, hardly ever!

(The SSE instructions require alignment.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 14 '05 #9

Ben Pfaff

Chris Torek <no****@torek.net> writes:

In article <news:cf********@news4.newsguy.com> I wrote:
Pentium-based systems never[%] enforce alignment constraints.

Gah, I forgot the footnote:

[%] What, never?
No, never!
What, never?
Well, hardly ever!

(The SSE instructions require alignment.)

Also, if you set bit 18, called "AC" or "Alignment Check", in
EFLAGS, then most unaligned accesses in user mode will fault.
--
"I should killfile you where you stand, worthless human." --Kaz

Nov 14 '05 #10

Dan Pop

In <87************@benpfaff.org> Ben Pfaff <bl*@cs.stanford.edu> writes:

Chris Torek <no****@torek.net> writes:
In article <news:cf********@news4.newsguy.com> I wrote:
Pentium-based systems never[%] enforce alignment constraints.

Gah, I forgot the footnote:

[%] What, never?
No, never!
What, never?
Well, hardly ever!

(The SSE instructions require alignment.)

Also, if you set bit 18, called "AC" or "Alignment Check", in
EFLAGS, then most unaligned accesses in user mode will fault.

Unfortunately, no Pentium-based OS in wide use does it.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #11

Gordon Burditt

>> I can see no reason why the above line won't smegfault on a

majority of calls to memcpy2() on a machine which enforces alignment
restrictions. Nasty example:
char buf[10240];

... something to put some data in buf ...
memcpy2(buf+3, buf, strlen(buf)+1);
Could I consider that memcpy2() is un-portable and hardware-sensitive?

Yes.

Another possibility is that the machine doesn't enforce alignment
restrictions but comes up with the wrong answer. That is, assuming
4 byte ints,
*(int *) 0xdeadbee3
fetches or stores the integer at the addresses 0xdeadbee0 thru 0xdeadbee3,
*NOT* 0xdeadbee3 thru 0xdeadbee6.

I run and test my programmes on a PC. The CPU is Intel Pentium.

This is not a CPU that enforces alignment restrictions, in general.
There's a bit you can turn on to try enforcing restrictions, but I
don't think any major OS running on an i386 platform lets you use it.
The OS is
Linux 2.4.18-12L, not xBSD. I use GDB to debug memcpy2(), and I find that
the situation is not the same as you said above. *(int*)0xdeadbee3 does
fetch and store the integer, for example, at the addresses 0xdeadbee3 thru
0xdeadbee6 rather than 0xdeadbee0 thru 0xdeadbee3.
It could behave that way on some CPU. I didn't say it would
on the one you happen to use.

I don't consider "segmentation fault - core dumped" to be more
efficient than anything which doesn't core dump. There are ways
to copy words at a time in the presence of alignment restrictions.
This isn't it.

I checked my programme thoroughly last night. Now, memcpy2() looks like
this,

void* memcpy2 (register void* dest, register void* src, register size_t len)
{
void* pdest = dest;

for(; len%sizeof(int)!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
for(len/=sizeof(int); len>0; --len, dest=(int*)dest+1, src=(int*)src+1)
*(int*)dest = *(int*)src;

return pdest;
}

I don't see any significant change: casting a void * pointer to
int * and then dereferencing it can cause a segfault.
It is a ANSI C program this time.
One which invokes the wrath of undefined behavior under many
combinations of parameters which are perfectly acceptable to pass
to memcpy().
But my machine doesn't enforce alignment
restrictions. I do not know how to copy words at a time in the presence of
alignment restrictions. Can you give me some examples or hints?
Example: if you dereference an int pointer containing an address that
is not a multiple of 4, you get a smegmentation fault.

Question: how do you *PORTABLY* figure out whether a pointer
is aligned to a multiple of 4?

If dest and src are 3 apart, then this: *(int*)dest = *(int*)src;

is *GUARANTEED* to cause a smegmentation fault on such a machine, because
one of them MUST be odd. If you increment both of them by the
same amount first, you still have the same problem.

I don't see any measurement methodologies or test results here.
Any performance measurements where the difference between two
ways of doing something are less than 1% or less than 10 times
the granularity of the clock being used to measure the time are
likely crap. And multitasking screws things up even worse.
The best performance demonstrations are those where you can
easily measure the difference in time with a wrist watch, *IF*
throwing the test in a loop and repeating it a million times
doesn't screw up what you are trying to measure (e.g. maybe
you don't want the test run completely from cache).

Now memcpy2() is as fast as memcpy() in library.

If you don't tell me how you measured it, or at least establish
credentials in knowing how to do benchmarks, I'm not going to believe
any statement that X is faster than Y on platform Z. This could
just as well mean "X is faster than Y on platform Z by
0.0000000000000001%", which is a meaningless difference.

Also, are you sure you are using the memcpy() from the libiberty
directory? (As opposed to one in libc?) On FreeBSD the two
are very different.

When I say "memcpy()", I mean the memcpy() in libc.
They are different? You mean the memcpy() in libiberty is not the real code
to be compiled to add into libc? But, does the libc be made when I MAKE a
GCC package? If it does, where is it's source codes, whatever they are C
codes or ASM codes?

When I make a GCC package, I do not make libc, as GCC does not include
a C library at all (on platforms such as FreeBSD, Ultrix, Tru64 aka OSF,
etc.). I believe that even on Linux the C library is not considered
to be part of gcc.

On FreeBSD, the memcpy() and bcopy() code under 'libiberty' is very
different from the code under /usr/src/lib/libc.

Gordon L. Burditt

Nov 14 '05 #12

L. Chen

> Question: how do you *PORTABLY* figure out whether a pointer

is aligned to a multiple of 4?
How about this one?

void* memcpy3 (register void* dest, register void* src, register size_t len)
{
void* pdest = dest;

if( ((unsigned int)dest)%4==((unsigned int)src)%4 )
{
for(; dest%4!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
for(; len>0; len-=sizeof(int), dest=(int*)dest+1, src=(int*)src+1)
*(int*)dest = *(int*)src;
for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
}
else
{
for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
}

return pdest;
}
When I make a GCC package, I do not make libc, as GCC does not include
a C library at all (on platforms such as FreeBSD, Ultrix, Tru64 aka OSF,
etc.). I believe that even on Linux the C library is not considered
to be part of gcc.

On FreeBSD, the memcpy() and bcopy() code under 'libiberty' is very
different from the code under /usr/src/lib/libc.

Oh, I see.(I always think when I build GCC, it will automatically re-compile
libc.)

Nov 14 '05 #13

L. Chen

> Question: how do you *PORTABLY* figure out whether a pointer

is aligned to a multiple of 4?
How about this one?

void* memcpy3 (register void* dest, register void* src, register size_t len)
{
void* pdest = dest;

if( ((unsigned int)dest)%4==((unsigned int)src)%4 )
{
for(; dest%4!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
for(; len>0; len-=sizeof(int), dest=(int*)dest+1, src=(int*)src+1)
*(int*)dest = *(int*)src;
for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
}
else
{
for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
}

return pdest;
}
When I make a GCC package, I do not make libc, as GCC does not include
a C library at all (on platforms such as FreeBSD, Ultrix, Tru64 aka OSF,
etc.). I believe that even on Linux the C library is not considered
to be part of gcc.

On FreeBSD, the memcpy() and bcopy() code under 'libiberty' is very
different from the code under /usr/src/lib/libc.

Oh, I see.(I always think when I build GCC, it will automatically re-compile
libc.)

---
Liang Chen

Nov 14 '05 #14

CBFalconer

"L. Chen" wrote:

Question: how do you *PORTABLY* figure out whether a pointer
is aligned to a multiple of 4?

How about this one?

void* memcpy3 (register void* dest, register void* src, register size_t len)
{
void* pdest = dest;

if( ((unsigned int)dest)%4==((unsigned int)src)%4 )

Nope. Casting a pointer to any form of integer is not guaranteed
to be reversible, and the results are implementation defined.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #15

L. Chen

> Pentium-based systems never[%] enforce alignment constraints.

Try a MIPS, ARM, or SPARC-based system, for instance (if you
can get hold of one).

As it happens, the answer (based on your earlier mention of underlying
OS -- which I snipped) is that they are indeed different, the source
code is not in libiberty at all, and the source code *is* available
somewhere (because of the nature of Linux) but it is difficult to
say precisely where (again because of the nature of Linux :-) ).
The Linux C library is built when you build the Linux C library --
which, unless you-the-reader rebuild Linux, is not something you-
the-reader would normally do, even when installing various GNU
software.
I am a beginner in Linux. Sometimes, I lack the basic knowledge about it :(
As it also happens, if you use the GNU C compiler on a Pentium
system and turn optimization up high, calls to memcpy() often never
even call anything at all -- they turn into inline assembly code
instead. The compiler is allowed to do this because the name
"memcpy" is reserved, so the compiler can be sure precisely what
any call to memcpy() is supposed to do. This in turn means that
if you attempt to replace memcpy(), but do it by supplying a
different memcpy() function, your new function may never get called
at all!
I find gcc has been more and more clever.
The behavior described in the last paragraph above -- in which an
attempt to replace a C library function with some other substitute
fails -- is allowed by the C standard. If you want your programs
to run on any system that supports Standard C, do not attempt to
override library functions: if it works at all, it may not work
correctly.

I am not going to override them. I am just surperised about the source codes
in libitery. Of course, I know that they are not the source code of memcpy
in libc.:P

---
L. Chen

Nov 14 '05 #16

L. Chen

Nope. Casting a pointer to any form of integer is not guaranteed
to be reversible, and the results are implementation defined.

Sometimes, I feel it is so difficult to make the C programmes portable. :(
There is the modified one,

void* memcpy3 (register void* dest, register void* src, register size_t len)
{
void* pdest = dest;

if( (dest-src)%4==0 )
{
for(; dest%4!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
for(; len>0; len-=sizeof(int), dest=(int*)dest+1, src=(int*)src+1)
*(int*)dest = *(int*)src;
for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
}
else
{
for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
}

return pdest;
}

(dest-src) results a ptrdiff_t variable. Then, I treat it as an integer. Is
that OK?
---
L. Chen

Nov 14 '05 #17

Keith Thompson

"L. Chen" <ch*******@citiz.net> writes:

Nope. Casting a pointer to any form of integer is not guaranteed
to be reversible, and the results are implementation defined.

Sometimes, I feel it is so difficult to make the C programmes portable. :(
There is the modified one,

void* memcpy3 (register void* dest, register void* src, register size_t len)
{
void* pdest = dest;

if( (dest-src)%4==0 )
{
for(; dest%4!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
for(; len>0; len-=sizeof(int), dest=(int*)dest+1, src=(int*)src+1)
*(int*)dest = *(int*)src;
for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
}
else
{
for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
}

return pdest;
}

(dest-src) results a ptrdiff_t variable. Then, I treat it as an integer. Is
that OK?

You can certainly treat (dest-src) as an integer, but it doesn't
buy you anything.

For one thing, I think you're assuming that sizeof(int)==4.

You're still computing dest%4; the "%" operator doesn't apply to
pointers.

To implement something like memcpy() efficiently, you pretty much have
to make non-portable assumptions. There's no portable way to detect
the alignment of a pointer, but there's almost always a reasonably
efficient non-portable way to do it (such as examining the low-order
bits of the pointer's representation).

Assume the CPU traps on unaligned memory accesses.

If the source and destination are both word-aligned, or are both
misaligned by the same amount, you can probably save some time by
copying a word at a time.

If the source and destination address differ in alignment by 1 byte,
you can't copy data from one to the other using chunks larger than 1
byte; a 4-byte aligned chunk of the source corresponds to a misaligned
4-byte chunk of the target. If they differ in alignment by 2 bytes,
you can probably copy 2-byte chunks.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #18

Tim Rentsch

"L. Chen" <ch*******@citiz.net> writes:

Question: how do you *PORTABLY* figure out whether a pointer
is aligned to a multiple of 4?

How about this one?

void* memcpy3 (register void* dest, register void* src, register size_t len)
{
void* pdest = dest;

if( ((unsigned int)dest)%4==((unsigned int)src)%4 )
{
for(; dest%4!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
for(; len>0; len-=sizeof(int), dest=(int*)dest+1, src=(int*)src+1)
*(int*)dest = *(int*)src;
for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
}
else
{
for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
*(char*)dest = *(char*)src;
}

return pdest;
}

First let's see if we can fix the definite bugs (not counting any
possible problems with casting a pointer to an unsigned int, I count
at least three), and clean the code up a bit:

void *
memcpy3( register void *dest, register void *src, register size_t len )
{
char *d = dest, *s = src;
size_t n = len;
const size_t K = sizeof(int);

while( n > 0 && (unsigned int)d % K != 0 ) n--, *d++ = *s++;

if( (unsigned int)s % K == 0 ){
while( n >= K ) n -= K, *(int*)d = *(int*)s, d += K, s += K;
}

while( n > 0 ) n--, *d++ = *s++;

return dest;
}
Now let's return to the question at the start of the posting.

Even though the method of testing pointer alignment in the code above
isn't guaranteed to work, the fact is that it will work on many
architectures (probably most architectures, but I expect that depends
on how the counting is done). Since this is so, why not provide a
standard means of checking for it? There could be a C preprocessor
symbol, eg, SINGLE_LINEAR_ADDRESS_SPACE, that could be used to mean
that pointers look like integers. Something along these lines could
be written into the standard to provide a conformant means of writing
code to do this kind of pointer manipulation. Make sense?

Nov 14 '05 #19

Keith Thompson

Tim Rentsch <tx*@alumnus.caltech.edu> writes:
[...]

First let's see if we can fix the definite bugs (not counting any
possible problems with casting a pointer to an unsigned int, I count
at least three), and clean the code up a bit:
[snip] { [snip] if( (unsigned int)s % K == 0 ){

Now let's return to the question at the start of the posting.

Even though the method of testing pointer alignment in the code above
isn't guaranteed to work, the fact is that it will work on many
architectures (probably most architectures, but I expect that depends
on how the counting is done). Since this is so, why not provide a
standard means of checking for it? There could be a C preprocessor
symbol, eg, SINGLE_LINEAR_ADDRESS_SPACE, that could be used to mean
that pointers look like integers. Something along these lines could
be written into the standard to provide a conformant means of writing
code to do this kind of pointer manipulation. Make sense?

I suspect that would encourage programmers to write code that only
works if SINGLE_LINEAR_ADDRESS_SPACE is true. (Too many programmers
do that already, of course.)

Of course you can implement such a preprocessor symbol yourself, and
configure it for each system. It's a little extra work, but frankly
it probably should be.

Currently, if I write portable code that will work even for a
non-linear address space, I can recompile and run it on a
"non-lineary" system and it should work.

An example of a system where SINGLE_LINEAR_ADDRESS_SPACE would be
undefined is a Cray vector system, where a machine address points to a
64-bit word. The C compiler has CHAR_BIT==8 to allow for code
portability, but a char* pointer has a 3-bit offset in the top of the
word. Well written portable code works just fine. Code that makes
assumptions about how pointers are represented doesn't. (The systems
run a Unix-based OS, and most Unix-based software compiles and runs
correctly, so the lack of ability to do that kind of low-level pointer
manipulation hasn't been much of a problem.)

Of course something like memcpy() can be made much more efficient if
it can detect pointer alignment and copy word-by-word whenever
possible. That's why memcpy() is in the standard library, where it
can be implemented with non-portable code.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #20

CBFalconer

"L. Chen" wrote:

Nope. Casting a pointer to any form of integer is not guaranteed
to be reversible, and the results are implementation defined.

Sometimes, I feel it is so difficult to make the C programmes
portable. :( There is the modified one,

void* memcpy3 (register void* dest, register void* src, register size_t len)
{
void* pdest = dest;

if( (dest-src)%4==0 )

Not if you stick to operations defined by the standard. The above
is not. That is one of the fundamental reasons that memcpy()
exists as a standard function. You can't duplicate it without
using forbidden (for portability purposes) system knowledge.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #21

Tim Rentsch

Keith Thompson <ks***@mib.org> writes:

Tim Rentsch <tx*@alumnus.caltech.edu> writes:
[...]
First let's see if we can fix the definite bugs (not counting any
possible problems with casting a pointer to an unsigned int, I count
at least three), and clean the code up a bit:
[snip]
{

[snip]
if( (unsigned int)s % K == 0 ){

Now let's return to the question at the start of the posting.

Even though the method of testing pointer alignment in the code above
isn't guaranteed to work, the fact is that it will work on many
architectures (probably most architectures, but I expect that depends
on how the counting is done). Since this is so, why not provide a
standard means of checking for it? There could be a C preprocessor
symbol, eg, SINGLE_LINEAR_ADDRESS_SPACE, that could be used to mean
that pointers look like integers. Something along these lines could
be written into the standard to provide a conformant means of writing
code to do this kind of pointer manipulation. Make sense?

I suspect that would encourage programmers to write code that only
works if SINGLE_LINEAR_ADDRESS_SPACE is true. (Too many programmers
do that already, of course.)

Notice the argumentative sleight-of-hand. An opinion is presented
without any reasoning or supporting evidence, then subsequent
discussion implicitly gives the opinion the status of fact.

On the contrary - if something like SINGLE_LINEAR_ADDRESS_SPACE were
written into the standard, then the sort of people who know about such
things would likely provide a specialized implementation for those
systems that had it defined as 1, and a more mundane implementation
(or just an outright error) for those systems that had it defined
as 0. And of course people who didn't know about it wouldn't change
their behavior. In either case the situation is no worse off then
before, unless of course someone thinks that people knowing about
the flag will be encouraged to write code that depends on it being
true and NOT checking for it. That seems a little silly.

Moreover, the presence of such a standard-defined flag would mean
that a system could give a diagnostic for code that seems to make
such assumptions on a system where SINGLE_LINEAR_ADDRESS_SPACE is
defined to be 0. For that matter, compilers could give a diagnostic
for code on *any* system that has code using SLAS-specific behavior
and not wrapped in a '#if' or 'if' testing SINGLE_LINEAER_ADDRESS_SPACE.
I'm not suggesting that such tests be made mandatory, only that they
could be put in place if some compiler writers chose to - and surely
that would *raise* consciousness about what assumptions are reasonable
to make when trying to write portable C code.

Of course you can implement such a preprocessor symbol yourself, and
configure it for each system. It's a little extra work, but frankly
it probably should be.
Another unsupported opinion, and a statement that's just plain false.
It's not a little extra work, it's quite a bit of work, and
furthermore one that most people simply don't have the resources to
support. I for one do not have access to many of the different,
unusual machine architectures to know how they should be labelled;
I doubt most people reading this newsgroup do either. (For the
record that last statement was my opinion - I welcome people who
do have such access to chime in with some evidence.)

Currently, if I write portable code that will work even for a
non-linear address space, I can recompile and run it on a
"non-lineary" system and it should work.
The presence of a standard-imposed definition of SLAS wouldn't change
that. It would give you the option of writing specialized code that
ran only on such systems, presumably with some performance gain, and
using the non-SLAS code as fallback on other systems. And your code
would still be portable to all the machines it was before.

An example of a system where SINGLE_LINEAR_ADDRESS_SPACE would be
undefined is a Cray vector system, where a machine address points to a
64-bit word. The C compiler has CHAR_BIT==8 to allow for code
portability, but a char* pointer has a 3-bit offset in the top of the
word. Well written portable code works just fine. Code that makes
assumptions about how pointers are represented doesn't. (The systems
run a Unix-based OS, and most Unix-based software compiles and runs
correctly, so the lack of ability to do that kind of low-level pointer
manipulation hasn't been much of a problem.)
What bugs me here is the phrase "code that makes assumptions....".
The whole point of an SLAS-like proposal is to provide a means whereby
code doesn't "make assumptions" but relies on some standard-defined
behavior to enable certain kinds of code in some situations.

Incidentally, the description of the Cray character addresses is
perhaps interesting, but not especially relevant after the first
sentence. If SLAS were "off" on the Cray, then code that depended
on SLAS being "on" wouldn't be present.

Of course something like memcpy() can be made much more efficient if
it can detect pointer alignment and copy word-by-word whenever
possible. That's why memcpy() is in the standard library, where it
can be implemented with non-portable code.

There are basically two paths we can think about going down here.
One, we can relegate all system-specific behavior to library
functions, and make the library ever larger as more and more functions
are argued about and agreed to in the standards committee. Or, two,
we can try to provide some definitions in the language environment
that allow some system-specific -- but still universally conformant --
code to be written without having to wait for the right library
function to appear. Whether it is a SINGLE_LINEAR_ADDRESS_SPACE
symbol or some other similar mechanism, the second path seems better
than the first path.
Perhaps this should have been posted on comp.std.c rather than
comp.lang.c (or perhaps both). It seemed better to continue
the thread in the group it started, and comp.lang.c seems
at least reasonably appropriate. If anyone has some comments
on that, please feel free to send them to me directly in email.
thanks.

Nov 14 '05 #22

Keith Thompson

Tim Rentsch <tx*@alumnus.caltech.edu> writes:

Keith Thompson <ks***@mib.org> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> writes: [...] I suspect that would encourage programmers to write code that only
works if SINGLE_LINEAR_ADDRESS_SPACE is true. (Too many programmers
do that already, of course.)
Notice the argumentative sleight-of-hand. An opinion is presented
without any reasoning or supporting evidence, then subsequent
discussion implicitly gives the opinion the status of fact.

I presented a suspicion, clearly labeled as such. I don't believe
everything I wrote after that depended on the truth of the suspicion.
On the contrary - if something like SINGLE_LINEAR_ADDRESS_SPACE were
written into the standard, then the sort of people who know about such
things would likely provide a specialized implementation for those
systems that had it defined as 1, and a more mundane implementation
(or just an outright error) for those systems that had it defined
as 0. And of course people who didn't know about it wouldn't change
their behavior. In either case the situation is no worse off then
before, unless of course someone thinks that people knowing about
the flag will be encouraged to write code that depends on it being
true and NOT checking for it. That seems a little silly.
Why not just write the "mundane implementation" and be done with it?
(Presumably the answer is improved performance for the SLAS==1 case;
see below for my response to that.)

My concern is with the "outright error" case. With the current
situation, knowledgeable programmers write code that doesn't assume a
single linear address space. It turns out that most of them do a
pretty good job within that restriction. One result of this is that I
can use Perl scripts on a Cray SV1 (Perl's implementation includes
several hundred thousand lines of C); I can also use my favorite text
editor and other tools (more hundreds of thousands of lines of C).

If the programmers who wrote all that code had taken advantage of a
SINGLE_LINEAR_ADDRESS_SPACE macro, one of two things would happen.
Either they'd write two distinct versions of some of their code (and
the SLAS==0 version might never be tested if the authors didn't have
an exotic system at their disposal), or they'd only bother to write
the SLAS==1 version (and the code wouldn't compile on a Cray vector
system in the first place).
Moreover, the presence of such a standard-defined flag would mean
that a system could give a diagnostic for code that seems to make
such assumptions on a system where SINGLE_LINEAR_ADDRESS_SPACE is
defined to be 0. For that matter, compilers could give a diagnostic
for code on *any* system that has code using SLAS-specific behavior
and not wrapped in a '#if' or 'if' testing SINGLE_LINEAER_ADDRESS_SPACE.
I'm not suggesting that such tests be made mandatory, only that they
could be put in place if some compiler writers chose to - and surely
that would *raise* consciousness about what assumptions are reasonable
to make when trying to write portable C code.

Compilers are already free to give a diagnostic for code that assumes
a single linear address space. In effect, the current situation is as
if SINGLE_LINEAR_ADDRESS_SPACE were 0 for all systems.

Of course you can implement such a preprocessor symbol yourself, and
configure it for each system. It's a little extra work, but frankly
it probably should be.

Another unsupported opinion, and a statement that's just plain false.
It's not a little extra work, it's quite a bit of work, and
furthermore one that most people simply don't have the resources to
support. I for one do not have access to many of the different,
unusual machine architectures to know how they should be labelled;
I doubt most people reading this newsgroup do either. (For the
record that last statement was my opinion - I welcome people who
do have such access to chime in with some evidence.)

Ok, that's not a bad point. It's a small amount of work for each
system you want to support. If the SINGLE_LINEAR_ADDRESS_SPACE macro
were required by the language, that small amount of work would be done
approximately once for each platform, by the implementers of the
compiler for that platform, rather than once on each platform by each
programmer who cares about the issue.

[...]

Of course something like memcpy() can be made much more efficient if
it can detect pointer alignment and copy word-by-word whenever
possible. That's why memcpy() is in the standard library, where it
can be implemented with non-portable code.

There are basically two paths we can think about going down here.
One, we can relegate all system-specific behavior to library
functions, and make the library ever larger as more and more functions
are argued about and agreed to in the standards committee. Or, two,
we can try to provide some definitions in the language environment
that allow some system-specific -- but still universally conformant --
code to be written without having to wait for the right library
function to appear. Whether it is a SINGLE_LINEAR_ADDRESS_SPACE
symbol or some other similar mechanism, the second path seems better
than the first path.

The point of SINGLE_LINEAR_ADDRESS_SPACE is to enable certain
techniques to be used on systems that support them. I'm just not
convinced that it's worth it. In my (admittedly limited) experience,
I haven't found the need to make more assumptions about pointers than
what the standard guarantees. The example given in this thread of a
case where it would help was a re-implementation of memcpy() -- but
memcpy() is in the standard library, so its implementation is free to
any system-specific tricks that will improve its performance. (That
could include things like knowledge of alignment requirements and
block-copy CPU instructions, neither of which would be supported by
SINGLE_LINEAR_ADDRESS_SPACE.)

Even if adding SINGLE_LINEAR_ADDRESS_SPACE wouldn't hurt code
portability, it would require some effort to create a rigorous
definition of what it means, and it would add slightly to the
complexity of the language. To overcome that, you'll have to convince
a number of people (not me) that there's a significant benefit. That
probably means showing that there are significant real-world
algorithms outside the standard library that can be implemented
significantly more efficiently with the assumption of a single linear
address space, and that adding the macro makes more sense than making
an addition to the standard library.

You should also be aware that changes to the language take years to
show up in the standard, and it takes many years after that before
programmers can safely assume a new feature is available widely enough
to be used in portable code.

I'm not trying to discourage you. If you think you can demonstrate
that it would be a good addition to the language, I wish you luck.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #23

libitery directory in gcc-3.1.1 source code package

Similar topics