473,769 Members | 7,097 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

libitery directory in gcc-3.1.1 source code package

Is the file "bcopy.c" in the "libitery" directory the implement of the
GNU C library function "bcopy"? If so, how can it run so fast?(copy-by-byte
rather than copy-by-word) When I copy the code of "libitery/bcopy.c" to my
own code, I find that it is so slow even if I turn on "-O3" and define
"NDEBUG". Why?
Nov 14 '05 #1
22 2110
> Is the file "bcopy.c" in the "libitery" directory the implement of the
GNU C library function "bcopy"?
Most likely. I'm not sure I'm looking at the same version you are,
and that interface and functionality of "bcopy()" predates GNU, I
believe, but it looks like it.
If so, how can it run so fast?
Who claims that it does run fast, whether so or non-so? And with
what evidence?
(copy-by-byte
rather than copy-by-word)
When I copy the code of "libitery/bcopy.c" to my
own code, I find that it is so slow even if I turn on "-O3" and define
"NDEBUG". Why?


That implementation of bcopy() (which seems to be portable to all
platforms) still runs faster than a program that needs bcopy() but
doesn't have any implementation of it at all (and therefore won't
link).

Often you have a tradeoff: pick 1:

portable and mediocre performance
unportable, good performance on some platforms,
won't work or works incorrectly on others
unportable, good performance on some platforms,
terrible performance on others

Gordon L. Burditt
Nov 14 '05 #2
Thank you for your reply, and I should say sorry for my unclear expression of these questions, which should mainly attribute to my poor English :(

In "libitery", I find the file, "memcpy.c". In it, there is the following code fragment,

PTR
DEFUN(memcpy, (out, in, length), PTR out AND const PTR in AND size_t length)
{
bcopy(in, out, length);
return out;
}
It is clear that memcpy() calls bcopy(). So, I open "bcopy.c" in the same directory and find these codes,

void
bcopy (src, dest, len)
register char *src, *dest;
int len;
{
if (dest < src)
while (len--)
*dest++ = *src++;
else
{
char *lasts = src + (len-1);
char *lastd = dest + (len-1);
while (len--)
*(char *)lastd-- = *(char *)lasts--;
}
}
This version of bcopy() is implemented to behave more "correctly" when memory blocks are overlaped. We know that according to the C89 standard, function memcpy() does not need to have this kind of "correct" behavior(maybe bcopy() needs for some dependence issues), and if a programmer calls memcpy() with two overlaped memory blocks, its behavior is not defined. So, I feel that this implementation of memcpy() is too awful. The following implementation can be better,

void* memcpy1 (register void* des, register void* src, register size_t len)
{
void* pdes = des;

for(; len>0; --len)
*(char*)des++ = *(char*)src++;

return pdes;
}

And it can be more efficient when copy a word directly,

void* memcpy2 (register void* des, register void* src, register size_t len)
{
void* pdes = des;

switch(len%size of(int))
{
case 3: *(char*)des++ = *(char*)src++;
case 2: *(char*)des++ = *(char*)src++;
case 1: *(char*)des++ = *(char*)src++;
}
for(len/=sizeof(int); len>0; --len)
*(int*)des++ = *(int*)src++;

return pdes;
}

It can be much more efficient if I copy more words rather than one word from des to src in "for" loop. Anyhow, memcpy2() should run faster than memcpy() does when processing large memory blocks, I believe. But, when I test them(copy between two 10240 bytes memory blocks), I am surprised to find that memcpy() runs the fastest. This result make me completely confused. Do you know the reason? Would you kind to explain it to me? Thank you!
Liang Chen

"Gordon Burditt" <go***********@ burditt.org> wrote in message news:cf*******@ library1.airnew s.net...
Is the file "bcopy.c" in the "libitery" directory the implement of the
GNU C library function "bcopy"?


Most likely. I'm not sure I'm looking at the same version you are,
and that interface and functionality of "bcopy()" predates GNU, I
believe, but it looks like it.
If so, how can it run so fast?


Who claims that it does run fast, whether so or non-so? And with
what evidence?
(copy-by-byte
rather than copy-by-word)
When I copy the code of "libitery/bcopy.c" to my
own code, I find that it is so slow even if I turn on "-O3" and define
"NDEBUG". Why?


That implementation of bcopy() (which seems to be portable to all
platforms) still runs faster than a program that needs bcopy() but
doesn't have any implementation of it at all (and therefore won't
link).

Often you have a tradeoff: pick 1:

portable and mediocre performance
unportable, good performance on some platforms,
won't work or works incorrectly on others
unportable, good performance on some platforms,
terrible performance on others

Gordon L. Burditt

Nov 14 '05 #3
>This version of bcopy() is implemented to behave more "correctly" when =
memory blocks are overlaped. We know that according to the C89 standard, =
function memcpy() does not need to have this kind of "correct" =
behavior(may be bcopy() needs for some dependence issues), and if a =
programmer calls memcpy() with two overlaped memory blocks, its behavior =
is not defined. So, I feel that this implementation of memcpy() is too =
awful. The following implementation can be better,
I believe the "definition " of bcopy() (which is not ANSI C, but some
kind of old BSD de-facto non-standard) includes non-destructive
handling of overlapping areas. This is NOT true of memcpy() in
ANSI C but is true of memmove().
void* memcpy1 (register void* des, register void* src, register size_t =
len)
{
void* pdes =3D des;

for(; len>0; --len)
*(char*)des++ =3D *(char*)src++;

return pdes;
}

And it can be more efficient when copy a word directly,
Warning: source code below appears to have been MIMEd to death.
void* memcpy2 (register void* des, register void* src, register size_t =
len)
{
void* pdes =3D des;

switch(len%size of(int))
{
case 3: *(char*)des++ =3D *(char*)src++;
case 2: *(char*)des++ =3D *(char*)src++;
case 1: *(char*)des++ =3D *(char*)src++;
}
for(len/=3Dsizeof(int); len>0; --len)
*(int*)des++ =3D *(int*)src++;
I can see no reason why the above line won't smegfault on a
majority of calls to memcpy2() on a machine which enforces alignment
restrictions. Nasty example:
char buf[10240];

... something to put some data in buf ...
memcpy2(buf+3, buf, strlen(buf)+1);

Another possibility is that the machine doesn't enforce alignment
restrictions but comes up with the wrong answer. That is, assuming
4 byte ints,
*(int *) 0xdeadbee3
fetches or stores the integer at the addresses 0xdeadbee0 thru 0xdeadbee3,
*NOT* 0xdeadbee3 thru 0xdeadbee6.
return pdes;
}

It can be much more efficient if I copy more words rather than one word =
from des to src in "for" loop.
I don't consider "segmentati on fault - core dumped" to be more
efficient than anything which doesn't core dump. There are ways
to copy words at a time in the presence of alignment restrictions.
This isn't it.
Anyhow, memcpy2() should run faster than =
memcpy() does when processing large memory blocks, I believe.
I believe that any such statement about how performance otto-be is
made *BECAUSE* it is wrong.
But, when =
I test them(copy between two 10240 bytes memory blocks), I am surprised =
to find that memcpy() runs the fastest. This result make me completely =
confused. Do you know the reason? Would you kind to explain it to me? =
Thank you!


I don't see any measurement methodologies or test results here.
Any performance measurements where the difference between two
ways of doing something are less than 1% or less than 10 times
the granularity of the clock being used to measure the time are
likely crap. And multitasking screws things up even worse.
The best performance demonstrations are those where you can
easily measure the difference in time with a wrist watch, *IF*
throwing the test in a loop and repeating it a million times
doesn't screw up what you are trying to measure (e.g. maybe
you don't want the test run completely from cache).

Also, are you sure you are using the memcpy() from the libiberty
directory? (As opposed to one in libc?) On FreeBSD the two
are very different.

Gordon L. Burditt
Nov 14 '05 #4
Liang Chen wrote:

Part 1.1 Type: Plain Text (text/plain)
Encoding: quoted-printable


Please do not use html or mime attachments in newsgroups.

--
Chuck F (cb********@yah oo.com) (cb********@wor ldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net> USE worldnet address!
Nov 14 '05 #5
> I can see no reason why the above line won't smegfault on a
majority of calls to memcpy2() on a machine which enforces alignment
restrictions. Nasty example:
char buf[10240];

... something to put some data in buf ...
memcpy2(buf+3, buf, strlen(buf)+1);
Could I consider that memcpy2() is un-portable and hardware-sensitive?
Another possibility is that the machine doesn't enforce alignment
restrictions but comes up with the wrong answer. That is, assuming
4 byte ints,
*(int *) 0xdeadbee3
fetches or stores the integer at the addresses 0xdeadbee0 thru 0xdeadbee3,
*NOT* 0xdeadbee3 thru 0xdeadbee6.
I run and test my programmes on a PC. The CPU is Intel Pentium. The OS is
Linux 2.4.18-12L, not xBSD. I use GDB to debug memcpy2(), and I find that
the situation is not the same as you said above. *(int*)0xdeadbe e3 does
fetch and store the integer, for example, at the addresses 0xdeadbee3 thru
0xdeadbee6 rather than 0xdeadbee0 thru 0xdeadbee3.
I don't consider "segmentati on fault - core dumped" to be more
efficient than anything which doesn't core dump. There are ways
to copy words at a time in the presence of alignment restrictions.
This isn't it.
I checked my programme thoroughly last night. Now, memcpy2() looks like
this,

void* memcpy2 (register void* dest, register void* src, register size_t len)
{
void* pdest = dest;

for(; len%sizeof(int) !=0; --len, dest=(char*)des t+1, src=(char*)src+ 1)
*(char*)dest = *(char*)src;
for(len/=sizeof(int); len>0; --len, dest=(int*)dest +1, src=(int*)src+1 )
*(int*)dest = *(int*)src;

return pdest;
}

It is a ANSI C program this time. But my machine doesn't enforce alignment
restrictions. I do not know how to copy words at a time in the presence of
alignment restrictions. Can you give me some examples or hints?
I don't see any measurement methodologies or test results here.
Any performance measurements where the difference between two
ways of doing something are less than 1% or less than 10 times
the granularity of the clock being used to measure the time are
likely crap. And multitasking screws things up even worse.
The best performance demonstrations are those where you can
easily measure the difference in time with a wrist watch, *IF*
throwing the test in a loop and repeating it a million times
doesn't screw up what you are trying to measure (e.g. maybe
you don't want the test run completely from cache).
Now memcpy2() is as fast as memcpy() in library.
Also, are you sure you are using the memcpy() from the libiberty
directory? (As opposed to one in libc?) On FreeBSD the two
are very different.


When I say "memcpy()", I mean the memcpy() in libc.
They are different? You mean the memcpy() in libiberty is not the real code
to be compiled to add into libc? But, does the libc be made when I MAKE a
GCC package? If it does, where is it's source codes, whatever they are C
codes or ASM codes?

Chen L.
Nov 14 '05 #6
> I can see no reason why the above line won't smegfault on a
majority of calls to memcpy2() on a machine which enforces alignment
restrictions. Nasty example:
char buf[10240];

... something to put some data in buf ...
memcpy2(buf+3, buf, strlen(buf)+1);
Could I consider that memcpy2() is un-portable and hardware-sensitive?
Another possibility is that the machine doesn't enforce alignment
restrictions but comes up with the wrong answer. That is, assuming
4 byte ints,
*(int *) 0xdeadbee3
fetches or stores the integer at the addresses 0xdeadbee0 thru 0xdeadbee3,
*NOT* 0xdeadbee3 thru 0xdeadbee6.
I run and test my programmes on a PC. The CPU is Intel Pentium. The OS is
Linux 2.4.18-12L, not xBSD. I use GDB to debug memcpy2(), and I find that
the situation is not the same as you said above. *(int*)0xdeadbe e3 does
fetch and store the integer, for example, at the addresses 0xdeadbee3 thru
0xdeadbee6 rather than 0xdeadbee0 thru 0xdeadbee3.
I don't consider "segmentati on fault - core dumped" to be more
efficient than anything which doesn't core dump. There are ways
to copy words at a time in the presence of alignment restrictions.
This isn't it.
I checked my programme thoroughly last night. Now, memcpy2() looks like
this,

void* memcpy2 (register void* dest, register void* src, register size_t len)
{
void* pdest = dest;

for(; len%sizeof(int) !=0; --len, dest=(char*)des t+1, src=(char*)src+ 1)
*(char*)dest = *(char*)src;
for(len/=sizeof(int); len>0; --len, dest=(int*)dest +1, src=(int*)src+1 )
*(int*)dest = *(int*)src;

return pdest;
}

It is a ANSI C program this time. But my machine doesn't enforce alignment
restrictions. I do not know how to copy words at a time in the presence of
alignment restrictions. Can you give me some examples or hints?
I don't see any measurement methodologies or test results here.
Any performance measurements where the difference between two
ways of doing something are less than 1% or less than 10 times
the granularity of the clock being used to measure the time are
likely crap. And multitasking screws things up even worse.
The best performance demonstrations are those where you can
easily measure the difference in time with a wrist watch, *IF*
throwing the test in a loop and repeating it a million times
doesn't screw up what you are trying to measure (e.g. maybe
you don't want the test run completely from cache).
Now memcpy2() is as fast as memcpy() in library.
Also, are you sure you are using the memcpy() from the libiberty
directory? (As opposed to one in libc?) On FreeBSD the two
are very different.


When I say "memcpy()", I mean the memcpy() in libc.
They are different? You mean the memcpy() in libiberty is not the real code
to be compiled to add into libc? But, does the libc be made when I MAKE a
GCC package? If it does, where is it's source codes, whatever they are C
codes or ASM codes?

Chen L.

Nov 14 '05 #7
[someone noted possible alignment problems in some code variants]

In article <news:cf******* ***@mail.cn99.c om>
Liang Chen <ch*******@citi z.net> wrote:
I run and test my programmes on a PC. The CPU is Intel Pentium. ...
Pentium-based systems never[%] enforce alignment constraints.
Try a MIPS, ARM, or SPARC-based system, for instance (if you
can get hold of one).
When I say "memcpy()", I mean the memcpy() in libc.
They are different? You mean the memcpy() in libiberty is not the real code
to be compiled to add into libc? But, does the libc be made when I MAKE a
GCC package? If it does, where is it's source codes, whatever they are C
codes or ASM codes?


None of these are really questions about using Standard C, but rather
about how to build GNU programs with nonstandard extensions.

As it happens, the answer (based on your earlier mention of underlying
OS -- which I snipped) is that they are indeed different, the source
code is not in libiberty at all, and the source code *is* available
somewhere (because of the nature of Linux) but it is difficult to
say precisely where (again because of the nature of Linux :-) ).
The Linux C library is built when you build the Linux C library --
which, unless you-the-reader rebuild Linux, is not something you-
the-reader would normally do, even when installing various GNU
software.

As it also happens, if you use the GNU C compiler on a Pentium
system and turn optimization up high, calls to memcpy() often never
even call anything at all -- they turn into inline assembly code
instead. The compiler is allowed to do this because the name
"memcpy" is reserved, so the compiler can be sure precisely what
any call to memcpy() is supposed to do. This in turn means that
if you attempt to replace memcpy(), but do it by supplying a
different memcpy() function, your new function may never get called
at all!

The behavior described in the last paragraph above -- in which an
attempt to replace a C library function with some other substitute
fails -- is allowed by the C standard. If you want your programs
to run on any system that supports Standard C, do not attempt to
override library functions: if it works at all, it may not work
correctly.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #8
In article <news:cf******* *@news4.newsguy .com> I wrote:
Pentium-based systems never[%] enforce alignment constraints.


Gah, I forgot the footnote:

[%] What, never?
No, never!
What, never?
Well, hardly ever!

(The SSE instructions require alignment.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #9
Chris Torek <no****@torek.n et> writes:
In article <news:cf******* *@news4.newsguy .com> I wrote:
Pentium-based systems never[%] enforce alignment constraints.


Gah, I forgot the footnote:

[%] What, never?
No, never!
What, never?
Well, hardly ever!

(The SSE instructions require alignment.)


Also, if you set bit 18, called "AC" or "Alignment Check", in
EFLAGS, then most unaligned accesses in user mode will fault.
--
"I should killfile you where you stand, worthless human." --Kaz
Nov 14 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
10284
by: francescomoi | last post by:
Hi. I'm trying to build 'MySQL-python-1.2.0' on my Linux FC2: ---------------------------------- # export PATH=$PATH:/usr/local/mysql/bin/ # export mysqlclient=mysqlclient_r # python setup.py clean # python setup.py build running build running build_py
3
9243
by: Charles Herman | last post by:
I am using the following flags for the g++ command -I/usr/local/gcc-3.3/include/c++/3.3 -I/usr/local/include/g++-3 But istead of searching the first directory first, it searches the second directory first. When I interchange the directories, it still searches /usr/local/include/g++-3 first. How can I force it to search in the order I want? When I do not include any -I flag in the g++ commnand, it searches only
10
21107
by: siroregano | last post by:
Hello- I've got a nice C program written that uses libsndfile (#include <sndfile.h>) to convert my raw data into a properly-formatted wav file. The program is composed of a single .c file that compiles without error under gnu/linux. Unfortunately, when ld tries to link the file, I get the following: bash> gcc -Wall -D_GNU_SOURCE wavconvert.c -o wavconvert
7
1578
by: Mopelee | last post by:
Hi,everyone I'm using gcc to compile a program with a thirdpart framework. but i find some precompile in *.h file of framework like this #include <DKFramework/filename.h> So, when i compile my program,gcc will return some error to tell that cann't find the path DKFramework.
4
12156
by: Gernot Frisch | last post by:
Might b e a bit OT, but I know there's lots of you who can help quickly. Is there any way to let gcc/g++ output the .o files in a folder I can specify? Thank you for not flaming, -Gernot int main(int argc, char** argv) {printf
2
1062
by: Chris Pesarchick | last post by:
I installed the Universal Mac OSX binary for Python 2.4.3 When I execute 'python setup.py install' for any of my applications that I need to build, I get errors like the following: gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk - fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd - fno-common -dynamic -DNDEBUG -g -DHAVE_LIBJPEG -DHAVE_LIBZ - DWORDS_BIGENDIAN...
16
2837
by: Michael | last post by:
Hi All, Why when I complile and run the following: #include "stdio.h" main(void) { printf("test test test\n"); }
8
2528
by: flyingleon | last post by:
I have today downloaded gcc-4.1.1 and tried to configure and build it. I have successfully done this before with gcc-3.4.2. The gcc-4.1.1 includes gfortran which is a language that I need. The native build platform is Linux AMD 64bit Opteron. I have previous built libgmp.so.3.4.1 and libmpfr.so.1.0.0 and installed them in /usr/local/lib which I understand are needed by gfortran. My build directory on a local disk is: ...
6
1887
by: CoL | last post by:
I am installing gcc 2.95.3 version as required by our product. current compiler 2.96 used for building --installing After ./configure make bootstrap gives below error ../parse.y:379.10-15: type redeclaration for interface_type_list ../parse.y:388.10-15: type redeclaration for class_member_declaration ../parse.y:401.10-15: type redeclaration for unary_expression_not_plus_minus make: *** Error 1
7
1962
by: post | last post by:
www.equation.com Free compiler for multi-core and multi-processor computer.
0
10214
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9996
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9865
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8872
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6674
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5304
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5447
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3563
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2815
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.