C Slower than Python?

Bartc

I'd been benchmarking my own pet language against Python for manipulation of
short strings. This tested the expression a=b+c for strings, and the Python
code looks like:

b="abc"
c="xyz"
for i in xrange(10000000):
a=b+c
print "A=",a

This took about 2.5 secs with Python 2.5 on my machine (my own efforts
achieved 0.7 secs..)

Pretty good, but how fast could C do it? I expected both of these to be
thrashed, yet the code below took over 4 seconds (mingw 3.4.5 with -O2).

(Timings for longer 60-chars strings were 3.5 secs for Python and 7.5 secs
for C. All timings are elapsed time)

OK, this code is naive and simplistic, but how else would you do it in C?
(BTW I've omitted malloc checking, which is in my own code and I presume is
in Python.)

/* Evaluate string a=b+c 10m times */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(void) {
int i,t;
char *a=NULL;
char *b="abc";
char *c="xyz";

t=clock();

for (i=0; i<10000000; ++i) {
free(a);
a=malloc(strlen(b)+strlen(c)+1);
strcpy(a,b);
strcat(a,c);
}

printf("Result=%s\n",a);

printf("Time: %d\n",clock()-t);
}

--
Bartc

Oct 8 '08 #1

Subscribe Post Reply

1699

Richard Heathfield

Bartc said:

I'd been benchmarking my own pet language against Python for manipulation
of short strings. This tested the expression a=b+c for strings, and the
Python code looks like:

b="abc"
c="xyz"
for i in xrange(10000000):
a=b+c
print "A=",a

This took about 2.5 secs with Python 2.5 on my machine (my own efforts
achieved 0.7 secs..)

Pretty good, but how fast could C do it?

Here are some timings for an ISO C program which produces the same output.

me@heretime ./foo
abcxyz

real 0m0.015s
user 0m0.000s
sys 0m0.010s
me@heretime ./foo /dev/null

real 0m0.003s
user 0m0.000s
sys 0m0.000s
me@herecat foo.c
#include <stdio.h>

int main(void)
{
printf ("abcxyz\n");

return 0;
}

You might need to ask a better question.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Oct 8 '08 #2

Richard Tobin

In article <RN*******************@text.news.virginmedia.com >,
Bartc <bc@freeuk.comwrote:

>I'd been benchmarking my own pet language against Python for manipulation of
short strings.

Your tests are almost certainly primarily measuring memory allocation
speed. Python may well used a more specialised allocator than
malloc(), and malloc() varies greatly between implementations.

(On my computer, the C program is about 3 times faster than the Python.)

-- Richard
--
Please remember to mention me / in tapes you leave behind.

Oct 8 '08 #3

vippstar

On Oct 8, 2:26 pm, "Bartc" <b...@freeuk.comwrote:

I'd been benchmarking my own pet language against Python for manipulation of
short strings. This tested the expression a=b+c for strings, and the Python
code looks like:

b="abc"
c="xyz"
for i in xrange(10000000):
a=b+c
print "A=",a

This took about 2.5 secs with Python 2.5 on my machine (my own efforts
achieved 0.7 secs..)

Pretty good, but how fast could C do it? I expected both of these to be
thrashed, yet the code below took over 4 seconds (mingw 3.4.5 with -O2).

(Timings for longer 60-chars strings were 3.5 secs for Python and 7.5 secs
for C. All timings are elapsed time)

OK, this code is naive and simplistic, but how else would you do it in C?
(BTW I've omitted malloc checking, which is in my own code and I presume is
in Python.)

So WHERE's the C question? You should know better than posting
benchmarking stuff here.

/* Evaluate string a=b+c 10m times */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(void) {
int i,t;
char *a=NULL;
char *b="abc";
char *c="xyz";

t=clock();

t is an integer. clock returns clock_t, which is an arithmetic type,
but not necessarily an int.
It could be a long, or unsigned; in which case depending on the value
you may be "invoking" impl-defined behavior or raising an impl-defined
signal.
It could be a float, in which case you'd lose information. (and if the
value is bigger than INT_MAX or less than INT_MIN, also what I said
before)

for (i=0; i<10000000; ++i) {

If INT_MAX < 10000000, infinite loop and then you invoke undefined
behavior. (integer overflow)

free(a);
a=malloc(strlen(b)+strlen(c)+1);

What if malloc returns NULL? You don't care?

strcpy(a,b);
strcat(a,c);

Not an actual bug: if b is a zero length string your code doesn't
work. (it obviously isn't in your code, but I'm only mentioning this
in case you decide to let the user choose the strings)

}

printf("Result=%s\n",a);

printf("Time: %d\n",clock()-t);

If clock returns any integer type larger than int, or a floating point
type, you invoke UB.

>
}

If you're going to time implementations, do it with CONFORMING
programs. Blaming the implementation for its output when your program
is incorrect is plain silly.
Moreover, if you're going to post your results, please do it in an
appropriate group, not comp.lang.c.

Oct 8 '08 #4

vippstar

On Oct 8, 2:55 pm, vipps...@gmail.com wrote:

On Oct 8, 2:26 pm, "Bartc" <b...@freeuk.comwrote:

strcpy(a,b);
strcat(a,c);

Not an actual bug: if b is a zero length string your code doesn't
work. (it obviously isn't in your code, but I'm only mentioning this
in case you decide to let the user choose the strings)

Hmm, nevermind that. If b is a zero length string, strcpy(a, b); sets
a[0] to 0.

Oct 8 '08 #5

Richard Tobin

In article <ed**********************************@s1g2000prg.g ooglegroups.com>,
<vi******@gmail.comwrote:

>So WHERE's the C question?

Give it a rest.

-- Richard
--
Please remember to mention me / in tapes you leave behind.

Oct 8 '08 #6

vippstar

On Oct 8, 3:15 pm, rich...@cogsci.ed.ac.uk (Richard Tobin) wrote:

In article <ed6266c6-cb27-4620-8ad8-ed681238b...@s1g2000prg.googlegroups.com>,

<vipps...@gmail.comwrote:
So WHERE's the C question?

Give it a rest.

Oh come on, do you really want a benchmarking discussion here?

Oct 8 '08 #7

Richard Tobin

In article <2b**********************************@q26g2000prq. googlegroups.com>,
<vi******@gmail.comwrote:

>Oh come on, do you really want a benchmarking discussion here?

If it concerns aspects of the C language, then yes, of course.

-- Richard

--
Please remember to mention me / in tapes you leave behind.

Oct 8 '08 #8

Kenny McCormack

In article <gc***********@pc-news.cogsci.ed.ac.uk>,
Richard Tobin <ri*****@cogsci.ed.ac.ukwrote:

>In article <ed**********************************@s1g2000prg.g ooglegroups.com>,
<vi******@gmail.comwrote:
>>So WHERE's the C question?

Give it a rest.

Oh come on, he's just doing what all the regs have been doing for years
(decades?) now. The problem is that vippy and CBF just aren't *quite*
as good at it as are Heathfield and Thompson (et al) and their coteries
(in whose numbers I count yourself). This is in much the same way that
McCain isn't quite as good at it (BS'ing the masses) as GWB is/was. The
yokels (in both cases - here and in American politics) are beginning to
figure it out.

And when they do, the curtain is going to fall. Fast and firm.

P.S. Another way to say this is that, with vippy and CBF, you can see
the wires. It ruins the effect.

Oct 8 '08 #9

Dik T. Winter

In article <RN*******************@text.news.virginmedia.com"B artc" <bc@freeuk.comwrites:

I'd been benchmarking my own pet language against Python for manipulation of
short strings. This tested the expression a=b+c for strings, and the Python
code looks like:

....

OK, this code is naive and simplistic, but how else would you do it in C?
(BTW I've omitted malloc checking, which is in my own code and I presume is
in Python.)

....
The difference in time is almost certainly because Python keeps track of the
length of strings, which C does not. And moreover, Python uses problably a
special memory allocator.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Oct 8 '08 #10

Willem

Dik T. Winter wrote:
) The difference in time is almost certainly because Python keeps track of the
) length of strings, which C does not. And moreover, Python uses problably a
) special memory allocator.

I disagree.
The difference is almost certainly because of the memory allocator.

Keeping track of the length of the string has a negligible effect on such
small strings.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Oct 8 '08 #11

Dik T. Winter

In article <K8********@cwi.nl"Dik T. Winter" <Di********@cwi.nlwrites:

In article <RN*******************@text.news.virginmedia.com"B artc" <bc@freeuk.comwrites:

....

...
The difference in time is almost certainly because Python keeps track of the
length of strings, which C does not. And moreover, Python uses problably a
special memory allocator.

I just checked. Keeping track of the string length in C reduced the time
for the program from 2.1 to 1.6 seconds on my machine.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Oct 8 '08 #12

Richard Bos

"Dik T. Winter" <Di********@cwi.nlwrote:

In article <RN*******************@text.news.virginmedia.com"B artc" <bc@freeuk.comwrites:
I'd been benchmarking my own pet language against Python for manipulation of
short strings. This tested the expression a=b+c for strings, and the Python
code looks like:
...
OK, this code is naive and simplistic, but how else would you do it in C?
(BTW I've omitted malloc checking, which is in my own code and I presume is
in Python.)
...
The difference in time is almost certainly because Python keeps track of the
length of strings, which C does not. And moreover, Python uses problably a
special memory allocator.

And moreover, Python probably optimises specifically for string
manipulation by default, while Bartc's C compiler probably optimises for
general purpose programming.

Richard

Oct 8 '08 #13

Richard Harter

On Wed, 08 Oct 2008 11:26:41 GMT, "Bartc" <bc@freeuk.comwrote:

>I'd been benchmarking my own pet language against Python for manipulation of
short strings. This tested the expression a=b+c for strings, and the Python
code looks like:

b="abc"
c="xyz"
for i in xrange(10000000):
a=b+c
print "A=",a

This took about 2.5 secs with Python 2.5 on my machine (my own efforts
achieved 0.7 secs..)

Pretty good, but how fast could C do it? I expected both of these to be
thrashed, yet the code below took over 4 seconds (mingw 3.4.5 with -O2).

(Timings for longer 60-chars strings were 3.5 secs for Python and 7.5 secs
for C. All timings are elapsed time)

OK, this code is naive and simplistic, but how else would you do it in C?
(BTW I've omitted malloc checking, which is in my own code and I presume is
in Python.)

/* Evaluate string a=b+c 10m times */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(void) {
int i,t;
char *a=NULL;
char *b="abc";
char *c="xyz";

t=clock();

for (i=0; i<10000000; ++i) {
free(a);
a=malloc(strlen(b)+strlen(c)+1);
strcpy(a,b);
strcat(a,c);
}

printf("Result=%s\n",a);

printf("Time: %d\n",clock()-t);
}

Fundamentally, your C code is quite inefficient and the code
inside python that implements the loop is optimized. In what way
is your code inefficient?

(a) You don't use a scratch buffer. One of the important
optimizations is to reuse space rather than allocating and
freeing temporary space. This is a dangerous optimization
because you are responsible for keeping track of whether your
scratch buffers are big enough and whether they are actually
free. However it is an important optimization; careful design
can reduce the number of malloc/free calls by orders of
magnitude.

(b) You don't keep track of string lengths. Many operations are
cheap if you know the string length and not so cheap if you
don't.

(c) You use strcat. The trouble with strcat(a,c) is that it has
to find the end of a before it can start copying in c.
Below is some code that, in the grand tradition of clc, has not
been tested. It is probably closer to what the internal code in
Python is actually doing in your test. You might take a look at
it and compare its timings with your original code.

As a final remark, writing tests like this is always problematic.
To do it right you have to arrange the code in a way that only
tests the time of the operation you are measuring while not
giving the optimizer a chance to optimize away your test.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(void) {
char * a = NULL;
char * b = "abc";
char * c = "xyx";
char * buf = 0;
size_t szbuf = 0;
size_t len_a = 0;
size_t len_b = 0;
size_t len_c = 0;
size_t i;
size_t t;

len_b = strlen(b);
len_c = strlen(c);
len_a = len_b + len_c + 1;

t = clock();
for (i=0;i<10000000;i++) {
if (len_a szbuf) {
buf = malloc(len_a);
if (!buf) {
fprintf("Die Earth Pig, you have no memory.\n");
exit(EXIT_FAILURE);
}
szbuf = len_a;
}
strcpy(a,b);
strcpy(a+len_b,c);
}
printf(
"You have just wasted %lu clocks of CPU time.\n",
clock()-t);
exit(EXIT_SUCCESS);
}
Richard Harter, cr*@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.

Oct 8 '08 #14

Antoninus Twink

On 8 Oct 2008 at 12:35, Kenny McCormack wrote:

Richard Tobin <ri*****@cogsci.ed.ac.ukwrote:
><vi******@gmail.comwrote:
>>>So WHERE's the C question?

Give it a rest.

Oh come on, he's just doing what all the regs have been doing for
years (decades?) now. The problem is that vippy and CBF just aren't
*quite* as good at it as are Heathfield and Thompson (et al) and their
coteries (in whose numbers I count yourself).

I don't think that's fair. RT, unlike a number of the other Richards,
has historically proved himself much more open to broader discussions in
clc.

It's interesting that even within The Clique, it's possible to just be
too much of a dick and start getting marginalized even if you have the
right topicality politics - the attacks on CBF from all quarters are one
example, and now "VIP Star" is proving too much of an embarrassment even
for Heathfield.

Oct 8 '08 #15

Richard

vi******@gmail.com writes:

On Oct 8, 2:26 pm, "Bartc" <b...@freeuk.comwrote:
>I'd been benchmarking my own pet language against Python for manipulation of
short strings. This tested the expression a=b+c for strings, and the Python
code looks like:

b="abc"
c="xyz"
for i in xrange(10000000):
a=b+c
print "A=",a

This took about 2.5 secs with Python 2.5 on my machine (my own efforts
achieved 0.7 secs..)

Pretty good, but how fast could C do it? I expected both of these to be
thrashed, yet the code below took over 4 seconds (mingw 3.4.5 with -O2).

(Timings for longer 60-chars strings were 3.5 secs for Python and 7.5 secs
for C. All timings are elapsed time)

OK, this code is naive and simplistic, but how else would you do it in C?
(BTW I've omitted malloc checking, which is in my own code and I presume is
in Python.)

So WHERE's the C question? You should know better than posting
benchmarking stuff here.

It seems blindingly obvious what it is to me.

Can you not see it?

Hint : he has written some C which is much slower than Python. He wants
to know why.

You really have become an obstructive arse.

Oct 8 '08 #16

Richard

vi******@gmail.com writes:

On Oct 8, 3:15 pm, rich...@cogsci.ed.ac.uk (Richard Tobin) wrote:
>In article <ed6266c6-cb27-4620-8ad8-ed681238b...@s1g2000prg.googlegroups.com>,

<vipps...@gmail.comwrote:
>So WHERE's the C question?

Give it a rest.

Oh come on, do you really want a benchmarking discussion here?

Try reading the post then start your posturing and showing off you self
important blow hard.

Writing "efficient" code can certainly be considered in terms of ISO C
even IF compilers come into it.

Oct 8 '08 #17

Malcolm McLean

"Willem" <wi****@stack.nlwrote in message

I disagree.
The difference is almost certainly because of the memory allocator.

Keeping track of the length of the string has a negligible effect on such
small strings.

However you are doing the length calculation 2 * 10 million times, and not
much else in that loop.
Memory reads are expensive. Doing four * 2 instead of one could well account
for the bulk of the difference.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Oct 8 '08 #18

Michael

On Wed, 08 Oct 2008 11:26:41 +0000, Bartc wrote:

I'd been benchmarking my own pet language against Python for manipulation of
short strings. This tested the expression a=b+c for strings, and the Python
code looks like:

I don't know how your code can be made more efficient, but
since my research shows that python itself is written in C,
I also don't see how well written C code can be slower than
Python.

--
HTH
Mike

Oct 8 '08 #19

Antoninus Twink

On 8 Oct 2008 at 20:21, Michael wrote:

I don't know how your code can be made more efficient, but since my
research shows that python itself is written in C, I also don't see
how well written C code can be slower than Python.

Moreover, in C you can always drop down to inline assembly if there's a
bottleneck you really need to speed up. I don't know how easy that is in
Python.

Oct 8 '08 #20

Willem

Malcolm McLean wrote:
) However you are doing the length calculation 2 * 10 million times, and not
) much else in that loop.

I don't have the original code at hand, but doesn't the 'not much else'
include calling malloc() ? So malloc() is being called 10 million times
as well.

) Memory reads are expensive. Doing four * 2 instead of one could well account
) for the bulk of the difference.

When cache is involved, reading one byte usually gets the next few into
the cache as well, making those following reads a lot quicker.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Oct 8 '08 #21

Flash Gordon

Richard Harter wrote, On 08/10/08 16:42:

On Wed, 08 Oct 2008 11:26:41 GMT, "Bartc" <bc@freeuk.comwrote:

>I'd been benchmarking my own pet language against Python for manipulation of
short strings. This tested the expression a=b+c for strings, and the Python
code looks like:

b="abc"
c="xyz"
for i in xrange(10000000):
a=b+c
print "A=",a

This took about 2.5 secs with Python 2.5 on my machine (my own efforts
achieved 0.7 secs..)

Pretty good, but how fast could C do it? I expected both of these to be
thrashed, yet the code below took over 4 seconds (mingw 3.4.5 with -O2).

<snip>

Below is some code that, in the grand tradition of clc, has not
been tested.

In that grand tradition it also has bugs :-)

It is probably closer to what the internal code in
Python is actually doing in your test. You might take a look at
it and compare its timings with your original code.

As a final remark, writing tests like this is always problematic.
To do it right you have to arrange the code in a way that only
tests the time of the operation you are measuring while not
giving the optimizer a chance to optimize away your test.

Where as in the real world, of course, the optimiser might be able to do
some of those very tricks to improve it further :-)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(void) {
char * a = NULL;
char * b = "abc";
char * c = "xyx";
char * buf = 0;

Not an error, but didn't you mean
char * buf = NULL;

size_t szbuf = 0;
size_t len_a = 0;
size_t len_b = 0;
size_t len_c = 0;
size_t i;

Why a size_t for i? Surely long or unsigned long would be more appropriate.

size_t t;

Don't you mean clock_t?

Neither of the above are really relevant to your point though.

len_b = strlen(b);
len_c = strlen(c);
len_a = len_b + len_c + 1;

t = clock();
for (i=0;i<10000000;i++) {
if (len_a szbuf) {
buf = malloc(len_a);

Memory leak (i.e. the real big bug I spotted). You should have done a
free(buf) first.

if (!buf) {
fprintf("Die Earth Pig, you have no memory.\n");
exit(EXIT_FAILURE);
}
szbuf = len_a;
}
strcpy(a,b);
strcpy(a+len_b,c);

How about...
memcpy(a,b,len_b);
memcpy(a+len_b,c,len_c+1);
Alternatively loose the +1 and add a[len_a]='\0';

}
printf(
"You have just wasted %lu clocks of CPU time.\n",
clock()-t);

You should have a cast there. Just after people have been discussing
times when they say you should not have a cast :-)

exit(EXIT_SUCCESS);
}

--
Flash Gordon
If spamming me sent it to sm**@spam.causeway.com
If emailing me use my reply-to address
See the comp.lang.c Wiki hosted by me at http://clc-wiki.net/

Oct 8 '08 #22

Bartc

"Richard Harter" <cr*@tiac.netwrote in message
news:48****************@news.sbtc.net...

On Wed, 08 Oct 2008 11:26:41 GMT, "Bartc" <bc@freeuk.comwrote:

>>b="abc"
c="xyz"
for i in xrange(10000000):
a=b+c

>>This took about 2.5 secs with Python 2.5 on my machine (my own efforts
achieved 0.7 secs..)

Pretty good, but how fast could C do it? I expected both of these to be
thrashed, yet the code below took over 4 seconds (mingw 3.4.5 with -O2).

>>OK, this code is naive and simplistic, but how else would you do it in C?

Fundamentally, your C code is quite inefficient and the code
inside python that implements the loop is optimized. In what way
is your code inefficient?

(a) You don't use a scratch buffer. One of the important
optimizations is to reuse space rather than allocating and
freeing temporary space. This is a dangerous optimization
because you are responsible for keeping track of whether your
scratch buffers are big enough and whether they are actually
free. However it is an important optimization; careful design
can reduce the number of malloc/free calls by orders of
magnitude.

(b) You don't keep track of string lengths. Many operations are
cheap if you know the string length and not so cheap if you
don't.

(c) You use strcat. The trouble with strcat(a,c) is that it has
to find the end of a before it can start copying in c.
Below is some code that, in the grand tradition of clc, has not
been tested. It is probably closer to what the internal code in
Python is actually doing in your test. You might take a look at
it and compare its timings with your original code.

I ran your code, with a mod or two (setting a=buf for example).

My original timings were about 4.5 seconds for each of lccwin32 and Mingw.
DMC just tested was 3.2 seconds.

Your code produced 2 seconds for lccwin32, but Mingw and DMC showed a
dramatic reduction to 0.15 seconds.

However, this code doesn't realistically emulate the short string handling I
was trying to test; in practice each string will have a different length,
and the destination may already have content that needs to be cleared.
Imagine a function:

void addstring(char **a, char *b, char *c);

which does the equivalent of a=b+c, /where *a may already point to a
string/.

My own code for this (to implement an interpreted and dynamically typed
language, a lot of overheads C doesn't have) achieved some 0.7 seconds. This
uses special fast routines for allocation and freeing and copying (for
example malloc() is only called once per megabyte, free() I think never, for
small allocations).

I suppose these same routines could be used in a C library to speed things
up. *But* using what is already provided in C, the obvious way to program my
addstring() routine is to use malloc, free, strlen, strcpy and so on:

void addstring(char **a, char*b, char*c){
free(*a);
*a=malloc(strlen(b)+strlen(c)+1);
// *a=malloc(7);
strcpy(*a,b);
strcat(*a,c);
// memcpy(*a,b,3);
// memcpy(*a+3,c,4);
}

This took DMC about 3.2 seconds to execute 10 million times.

Taking string length calculation out of the equation (using the commented
lines above) took 1.8 seconds (don't try this unless b and c have strlen
3!).

Remember my Python 2.5 took (appropriately) 2.5 seconds; nearer 2.0 seconds
without the loop overhead (C's loop overhead was neglible).

--
Bartc

Oct 8 '08 #23

Peter Nilsson

vipps...@gmail.com wrote:

...If you're going to time implementations, do it
with CONFORMING programs.

Nit: Undefined behaviour does not preclude programs
from being conforming. Of course, such programs aren't
strictly conforming.

--
Peter

Oct 8 '08 #24

Flash Gordon

Bartc wrote, On 08/10/08 23:38:

<snip>

I suppose these same routines could be used in a C library to speed things
up. *But* using what is already provided in C, the obvious way to
program my
addstring() routine is to use malloc, free, strlen, strcpy and so on:

void addstring(char **a, char*b, char*c){
free(*a);
*a=malloc(strlen(b)+strlen(c)+1);
// *a=malloc(7);
strcpy(*a,b);
strcat(*a,c);
// memcpy(*a,b,3);
// memcpy(*a+3,c,4);

Note that as has been pointed out you are calling strlen above so you
could actually use the correct values for whatever parameters are passed
in with no additional overhead.

}

This took DMC about 3.2 seconds to execute 10 million times.

Taking string length calculation out of the equation (using the commented
lines above) took 1.8 seconds (don't try this unless b and c have strlen
3!).

Well, 1.8 seconds is less than 2 seconds. Note what I say above about
already having the correct numbers available programatically for the
memcpy calls.

Remember my Python 2.5 took (appropriately) 2.5 seconds; nearer 2.0
seconds without the loop overhead (C's loop overhead was neglible).

Well, since less than 2 seconds is less than "nearer 2 seconds" and
significantly less than 2.5 seconds you have now shown that C is faster
than Python for this task if you write efficient C.
--
Flash Gordon
If spamming me sent it to sm**@spam.causeway.com
If emailing me use my reply-to address
See the comp.lang.c Wiki hosted by me at http://clc-wiki.net/

Oct 8 '08 #25

Mark McIntyre

Dik T. Winter wrote:

The difference in time is almost certainly because Python keeps track of the
length of strings, which C does not. And moreover, Python uses problably a
special memory allocator.

Using memcpy speeds up the code by about 30% on my PC.

Mind you the C version runs in about 0.78 seconds, and the python one in
about 6 seconds. The phrase "QOI" is starting to come to mind...

--
Mark McIntyre

CLC FAQ <http://c-faq.com/>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>

Oct 8 '08 #26

Bartc

"Flash Gordon" <sm**@spam.causeway.comwrote in message
news:sm************@news.flash-gordon.me.uk...

Bartc wrote, On 08/10/08 23:38:

>void addstring(char **a, char*b, char*c){
free(*a);
*a=malloc(strlen(b)+strlen(c)+1);
// *a=malloc(7);
strcpy(*a,b);
strcat(*a,c);
// memcpy(*a,b,3);
// memcpy(*a+3,c,4);

Note that as has been pointed out you are calling strlen above so you
could actually use the correct values for whatever parameters are passed
in with no additional overhead.

>This took DMC about 3.2 seconds to execute 10 million times.

Taking string length calculation out of the equation (using the commented
lines above) took 1.8 seconds (don't try this unless b and c have strlen
3!).

Well, 1.8 seconds is less than 2 seconds. Note what I say above about
already having the correct numbers available programatically for the
memcpy calls.

>Remember my Python 2.5 took (appropriately) 2.5 seconds; nearer 2.0
seconds without the loop overhead (C's loop overhead was neglible).

Well, since less than 2 seconds is less than "nearer 2 seconds" and
significantly less than 2.5 seconds you have now shown that C is faster
than Python for this task if you write efficient C.

Only just, and by pretending that string lengths are available. Normally
they are not without extra effort.

But other posters have said their C code is quite a bit faster than Python,
by the sort of factors I would have expected.

--
Bartc

Oct 8 '08 #27

Richard Harter

On Wed, 08 Oct 2008 22:03:18 +0100, Flash Gordon
<sm**@spam.causeway.comwrote:

>Richard Harter wrote, On 08/10/08 16:42:

<snip>

Many thanks for rooting through the code.

><snip>

>Below is some code that, in the grand tradition of clc, has not
been tested.

In that grand tradition it also has bugs :-)

> It is probably closer to what the internal code in
Python is actually doing in your test. You might take a look at
it and compare its timings with your original code.

As a final remark, writing tests like this is always problematic.
To do it right you have to arrange the code in a way that only
tests the time of the operation you are measuring while not
giving the optimizer a chance to optimize away your test.

Where as in the real world, of course, the optimiser might be able to do
some of those very tricks to improve it further :-)

>#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(void) {
char * a = NULL;
char * b = "abc";
char * c = "xyx";
char * buf = 0;

Not an error, but didn't you mean
char * buf = NULL;

Well, no, I meant 0. I never use NULL. The only reason it is in
the code is that I copied the first few lines of the original.

>
> size_t szbuf = 0;
size_t len_a = 0;
size_t len_b = 0;
size_t len_c = 0;
size_t i;

Why a size_t for i? Surely long or unsigned long would be more appropriate.

Appropriate, perhaps, but size_t for indexing through an object
is always right.

>
> size_t t;

Don't you mean clock_t?

I do. I hardly ever use clock so I looked it up. I misread the
standard.

>
Neither of the above are really relevant to your point though.

> len_b = strlen(b);
len_c = strlen(c);
len_a = len_b + len_c + 1;

t = clock();
for (i=0;i<10000000;i++) {
if (len_a szbuf) {
buf = malloc(len_a);

Memory leak (i.e. the real big bug I spotted). You should have done a
free(buf) first.

Generally speaking, it is a bug, but in this program it is not -
malloc is called exactly once. That said, the code should have a
free.

>
> if (!buf) {
fprintf("Die Earth Pig, you have no memory.\n");
exit(EXIT_FAILURE);
}
szbuf = len_a;
}
strcpy(a,b);
strcpy(a+len_b,c);

How about...
memcpy(a,b,len_b);
memcpy(a+len_b,c,len_c+1);
Alternatively loose the +1 and add a[len_a]='\0';

Good catch. That should be faster.

>
> }
printf(
"You have just wasted %lu clocks of CPU time.\n",
clock()-t);

You should have a cast there. Just after people have been discussing
times when they say you should not have a cast :-)

Well, for reasons that are personally enbarrassing, that code
would have been correct if clock had a size_t return.

>
> exit(EXIT_SUCCESS);
}
--
Flash Gordon
If spamming me sent it to sm**@spam.causeway.com
If emailing me use my reply-to address
See the comp.lang.c Wiki hosted by me at http://clc-wiki.net/

Richard Harter, cr*@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.

Oct 9 '08 #28

Flash Gordon

Richard Harter wrote, On 09/10/08 01:18:

On Wed, 08 Oct 2008 22:03:18 +0100, Flash Gordon
<sm**@spam.causeway.comwrote:

>Richard Harter wrote, On 08/10/08 16:42:
<snip>

Many thanks for rooting through the code.

It's one of the things the group is here from :-)

<snip>

>>int main(void) {
char * a = NULL;
char * b = "abc";
char * c = "xyx";
char * buf = 0;
Not an error, but didn't you mean
char * buf = NULL;

Well, no, I meant 0. I never use NULL. The only reason it is in
the code is that I copied the first few lines of the original.

OK, I can live with either as long as it is consistent.

>> size_t szbuf = 0;
size_t len_a = 0;
size_t len_b = 0;
size_t len_c = 0;
size_t i;
Why a size_t for i? Surely long or unsigned long would be more appropriate.

Appropriate, perhaps, but size_t for indexing through an object
is always right.

Well, in this case the iteration is longer than size_t is guaranteed to
allow.

>> size_t t;
Don't you mean clock_t?

I do. I hardly ever use clock so I looked it up. I misread the
standard.

Easily done.

<snip>

>> t = clock();
for (i=0;i<10000000;i++) {
if (len_a szbuf) {
buf = malloc(len_a);
Memory leak (i.e. the real big bug I spotted). You should have done a
free(buf) first.

Generally speaking, it is a bug, but in this program it is not -
malloc is called exactly once. That said, the code should have a
free.

True. I was forgetting the length does not change and thinking about the
general principle. I've written code for ever-increasing buffers so
often it was automatic.

<snip>

>> strcpy(a,b);
strcpy(a+len_b,c);
How about...
memcpy(a,b,len_b);
memcpy(a+len_b,c,len_c+1);
Alternatively loose the +1 and add a[len_a]='\0';

Good catch. That should be faster.

I've written mystrdup a implementation and thought very carefully about
it because it was likely to be called a lot of times. Same principle. Of
course, there is no guarantee of it being faster, we just both expect it
to be :-)

<snip>
--
Flash Gordon
If spamming me sent it to sm**@spam.causeway.com
If emailing me use my reply-to address
See the comp.lang.c Wiki hosted by me at http://clc-wiki.net/

Oct 9 '08 #29

Flash Gordon

Bartc wrote, On 09/10/08 00:24:

>
"Flash Gordon" <sm**@spam.causeway.comwrote in message
news:sm************@news.flash-gordon.me.uk...
>Bartc wrote, On 08/10/08 23:38:

>>void addstring(char **a, char*b, char*c){
free(*a);
*a=malloc(strlen(b)+strlen(c)+1);

^^^^^^^^^ ^^^^^^^^^

>>// *a=malloc(7);
strcpy(*a,b);
strcat(*a,c);
// memcpy(*a,b,3);
// memcpy(*a+3,c,4);

Note that as has been pointed out you are calling strlen above so you
could actually use the correct values for whatever parameters are passed
in with no additional overhead.

?

Look at the bit of your code I have underlined above. The bit where you
find the lengths of the strings. That you don't keep the information as
was shown by a previous poster in this thread is your poor choice.

>>This took DMC about 3.2 seconds to execute 10 million times.

Taking string length calculation out of the equation (using the
commented
lines above) took 1.8 seconds (don't try this unless b and c have strlen
3!).

Well, 1.8 seconds is less than 2 seconds. Note what I say above about
already having the correct numbers available programatically for the
memcpy calls.

>>Remember my Python 2.5 took (appropriately) 2.5 seconds; nearer 2.0
seconds without the loop overhead (C's loop overhead was neglible).

Well, since less than 2 seconds is less than "nearer 2 seconds" and
significantly less than 2.5 seconds you have now shown that C is faster
than Python for this task if you write efficient C.

Only just, and by pretending that string lengths are available. Normally
they are not without extra effort.

Ah, but they *are* available. Try adding two extra variables to store
the lengths when you calculate them and then using those variables in
the two places where you need the length.

But other posters have said their C code is quite a bit faster than
Python, by the sort of factors I would have expected.

Try turning your optimiser up!
--
Flash Gordon
If spamming me sent it to sm**@spam.causeway.com
If emailing me use my reply-to address
See the comp.lang.c Wiki hosted by me at http://clc-wiki.net/

Oct 9 '08 #30

Richard Bos

"Bartc" <bc@freeuk.comwrote:

However, this code doesn't realistically emulate the short string handling I
was trying to test; in practice each string will have a different length,
and the destination may already have content that needs to be cleared.

Then ISTM that your original benchmark wasn't greatly to the point,
because it didn't take any of that into account, either.

Richard

Oct 9 '08 #31

Dik T. Winter

In article <sl********************@snail.stack.nlWillem <wi****@stack.nlwrites:

Dik T. Winter wrote:
) The difference in time is almost certainly because Python keeps track of
) the length of strings, which C does not. And moreover, Python uses
) problably a special memory allocator.

I disagree.
The difference is almost certainly because of the memory allocator.

Keeping track of the length of the string has a negligible effect on such
small strings.

All operations are on small strings. But I did measure and I found a 25%
different when removing strlen and kept track of the length of the strings
in another way.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Oct 9 '08 #32

Dik T. Winter

In article <Si******************@text.news.virginmedia.com"Ba rtc" <bc@freeuk.comwrites:

"Flash Gordon" <sm**@spam.causeway.comwrote in message
news:sm************@news.flash-gordon.me.uk...

....

Well, since less than 2 seconds is less than "nearer 2 seconds" and
significantly less than 2.5 seconds you have now shown that C is faster
than Python for this task if you write efficient C.

Only just, and by pretending that string lengths are available. Normally
they are not without extra effort.

In Python the string lengths are always available without effort, that is
by design. You can do the same in C if you design your program properly
but in that case the language will nog help you.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Oct 9 '08 #33

Richard Harter

On Wed, 08 Oct 2008 22:38:58 GMT, "Bartc" <bc@freeuk.comwrote:

>
"Richard Harter" <cr*@tiac.netwrote in message
news:48****************@news.sbtc.net...

<snip>

>Below is some code that, in the grand tradition of clc, has not
been tested. It is probably closer to what the internal code in
Python is actually doing in your test. You might take a look at
it and compare its timings with your original code.

I ran your code, with a mod or two (setting a=buf for example).

My original timings were about 4.5 seconds for each of lccwin32 and Mingw.
DMC just tested was 3.2 seconds.

Your code produced 2 seconds for lccwin32, but Mingw and DMC showed a
dramatic reduction to 0.15 seconds.

However, this code doesn't realistically emulate the short string handling I
was trying to test; in practice each string will have a different length,
and the destination may already have content that needs to be cleared.
Imagine a function:

void addstring(char **a, char *b, char *c);

which does the equivalent of a=b+c, /where *a may already point to a
string/.

<snip>

Your timings are interesting. I think, however, that you have
discovered an unfortunate truth and haven't quite understood the
implications of what you have discovered. Don't take this amiss
- I think many strange things.

The truth is that the C library is the consolidation and
clarification of existing practice as it was 20-30 years in
environments quite different from those of today. It was never
designed to provide efficient general purpose string processing.
If you want that, you have to build it yourself, or use a package
that someone else developed.

Many of us have done just that, including yourself from what you
say. That wheel has been reinvented numerous times. It is
unfortunate that the standardizers of twenty years ago did not
mandate a standard package, but it would have been quite
impossible. Clarifying existing practice and filling in the
holes was enough of a challenge.

Richard Harter, cr*@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.

Oct 9 '08 #34

MisterE

OK, this code is naive and simplistic, but how else would you do it in C?

(BTW I've omitted malloc checking, which is in my own code and I presume
is
in Python.)

Why are you malloc'ing. You might as well compare apples to oranges. Without
the mallocing the program runs waaaaaaaaaaaaay faster.

Oct 11 '08 #35

C Slower than Python?

Similar topics