473,320 Members | 1,982 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Hmmm, It's Curious

The following code implements a strcat-like function which can receive a
variable number of arguments. I thought it would be faster if I kept a
pointer to the end of the string as it is built so that strcat() would
not always have to find the end of the string as it gets longer and
longer. To test this I called the vstrcat() function with a list of
pointers, then use the regular strcat() function in a loop to build the
same string. After 10,000,000 iteration the regular strcat() won every
time, sometimes by as much as 10 seconds. I've tried several different
approaches (as can be seen with code commented out in the vstrcat()
function) but strcat() always wins. This is using my old DOS C
compiler. On my UNIX machine they are about even.
/************************************************** ********************/
/* File Id. vstrcat.c. */
/* Author: Stan Milam. */
/* Date Written: 01 Jun. 92. */
/* Description: */
/* Implement a variardic string concatenation function which will */
/* be much more efficient than successive calls to strcat(). */
/* */
/************************************************** ********************/

#include <errno.h>
#include <stddef.h>
#include <string.h>
#include <stdarg.h>

/************************************************** ********************/
/* Name: */
/* vstrcat() - Fast, string concatenation. */
/* */
/* Description: */
/* This function receives a variable number of string pointers */
/* and concatenates them together. The only two requirements */
/* is that the first argument which will be the target of the */
/* concatenation must have enough space to contain the entire */
/* concatenated string. The second requirement is the variable */
/* string pointers must be terminated by a NULL pointer. */
/* */
/* Arguments: */
/* char *string - A pointer to a character string where the */
/* remaining character string will be concatenated. */
/* The remaining arguments may vary in number, but must be */
/* pointers to character strings and be NULL terminated. */
/* */
/* Returns: A pointer to the concatenated string. */
/* */
/************************************************** ********************/

char *
vstrcat(char *string, ...)
{
if ( string == NULL )
errno = EINVAL;
else {
va_list argptr;
char *end, *wrk;

/************************************************** ************/
/* Get the address to the list of pointers. Then find the */
/* initial end of the string. */
/************************************************** ************/

va_start(argptr, string);
end = *string == 0 ? string : string + (strlen(string));

/************************************************** ************/
/* Pull each pointer from the list and concatenate to the end */
/* and maintain the end of the string throughout. This way */
/* we only count the characters added once. This speeds up */
/* the process tremendously. */
/************************************************** ************/

while (!((wrk = va_arg(argptr, char *)) == NULL)) {
/*
strcat( end, wrk );
*/
strcpy( end, wrk );
end += strlen( wrk );
}

va_end(argptr);
}
return string;
}

#ifdef TEST
#include <time.h>
#include "adjust.h"

int
main( void )
{
int x_sub;
long l_sub;
time_t begin, end;

char wrkbuf[1000];
char *wrkarray[] = {
"This is a ",
"bunch of strings ",
"that we will concatenate ",
"very efficiently by always ",
"knowing where the end of the string is going to be. ",
"This makes vstrcat() much ",
"more efficient than successive calls to strcat!",
NULL
};

begin = time(NULL);
for ( l_sub = 0; l_sub < 10000000; l_sub++ ) {
memset(wrkbuf, 0, sizeof(wrkbuf) );
vstrcat(wrkbuf, wrkarray[0],
wrkarray[1],
wrkarray[2],
wrkarray[3],
wrkarray[4],
wrkarray[5],
wrkarray[6],
wrkarray[7],
wrkarray[8]);
}
end = time(NULL);
printf("Total time for %ld iterations of vstrcat(): %ld\n",
l_sub, (long) end - begin);

begin = time(NULL);

for( l_sub = 0; l_sub < 10000000; l_sub++ ) {
memset(wrkbuf, 0, sizeof(wrkbuf) );
for ( x_sub = 0; wrkarray[x_sub]; x_sub++ )
strcat( wrkbuf, wrkarray[x_sub] );
}

end = time(NULL);
printf("Total time for strcat(): %ld\n", (long) end - begin);
return 0;
}
#endif /* TEST */
Nov 15 '05 #1
35 1673
Stan Milam wrote:
errno = EINVAL;
EINVAL isn't standard C.
char *wrkarray[] = {
"This is a ",
"bunch of strings ",
"that we will concatenate ",
"very efficiently by always ",
"knowing where the end of the string is going to be. ",
"This makes vstrcat() much ",
"more efficient than successive calls to strcat!",
NULL
That array has 8 elements.
vstrcat(wrkbuf, wrkarray[0],
wrkarray[1],
wrkarray[2],
wrkarray[3],
wrkarray[4],
wrkarray[5],
wrkarray[6],
wrkarray[7],
wrkarray[8]);


wrkarray[8] is the ninth element of an 8 element array.

--
pete
Nov 15 '05 #2
Stan Milam wrote:
.... snip ... char *wrkarray[] = {
"This is a ",
"bunch of strings ",
"that we will concatenate ",
"very efficiently by always ",
"knowing where the end of the string is going to be. ",
"This makes vstrcat() much ",
"more efficient than successive calls to strcat!",
NULL
};

begin = time(NULL);
for ( l_sub = 0; l_sub < 10000000; l_sub++ ) {
memset(wrkbuf, 0, sizeof(wrkbuf) );
vstrcat(wrkbuf, wrkarray[0],
wrkarray[1],
wrkarray[2],
wrkarray[3],
wrkarray[4],
wrkarray[5],
wrkarray[6],
wrkarray[7],
wrkarray[8]);
}


There is no wrkarray[8], so you are invoking undefined behavior
here.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
Nov 15 '05 #3
CBFalconer wrote:
Stan Milam wrote:

... snip ...
char *wrkarray[] = {
"This is a ",
"bunch of strings ",
"that we will concatenate ",
"very efficiently by always ",
"knowing where the end of the string is going to be. ",
"This makes vstrcat() much ",
"more efficient than successive calls to strcat!",
NULL
};

begin = time(NULL);
for ( l_sub = 0; l_sub < 10000000; l_sub++ ) {
memset(wrkbuf, 0, sizeof(wrkbuf) );
vstrcat(wrkbuf, wrkarray[0],
wrkarray[1],
wrkarray[2],
wrkarray[3],
wrkarray[4],
wrkarray[5],
wrkarray[6],
wrkarray[7],
wrkarray[8]);
}

There is no wrkarray[8], so you are invoking undefined behavior
here.


I fixed it. It is still slower.
Nov 15 '05 #4
Stan Milam wrote:
memset(wrkbuf, 0, sizeof(wrkbuf) );
It is still slower.


Try
*wrkbuf = '\0';
intstead of
memset(wrkbuf, 0, sizeof(wrkbuf) );

--
pete
Nov 15 '05 #5
Stan Milam wrote:
end = *string == 0 ? string : string + (strlen(string));
Why not just
end = string + strlen(string);
?

I'm philosophically opposed to optimizing the degenerate case
at any expense at all to the general case.
{
/*
strcat( end, wrk );
*/
strcpy( end, wrk );
end += strlen( wrk );
}
You could also try:

{
size_t length = strlen( wrk );

memcpy( end, wrk, 1 + length );
end += length ;
}
#include "adjust.h"


It works better on my system if I replace that with
#include <stdio.h>

--
pete
Nov 15 '05 #6
Stan Milam wrote:

The following code implements a strcat-like
function which can receive a
variable number of arguments.
I thought it would be faster if I kept a
pointer to the end of the string as it is built so that strcat() would
not always have to find the end of the string as it gets longer and
longer. To test this I called the vstrcat() function with a list of
pointers,
then use the regular strcat() function in a loop to build the
same string.
After 10,000,000 iteration the regular strcat() won every
time, sometimes by as much as 10 seconds.
I've tried several different
approaches (as can be seen with code commented out in the vstrcat()
function) but strcat() always wins. This is using my old DOS C
compiler. On my UNIX machine they are about even.


I changed it a little bit. This is what I got:

Total tme for strcat(): 9.328000
Total time for 10000000 iterations of vstrcat(): 8.454000

/* BEGIN new.c */

#include <string.h>
#include <stdarg.h>

char *
vstrcat(char *string, ...)
{
va_list argptr;
char *end, *wrk;

end = string + strlen(string);
va_start(argptr, string);
wrk = va_arg(argptr, char *);
while (wrk != NULL) {
strcpy( end, wrk );
end += strlen( wrk ) ;
wrk = va_arg(argptr, char *);
}
va_end(argptr);
return string;
}

#include <time.h>
#include <stdio.h>

int
main( void )
{
long l_sub;
time_t begin, end;
char wrkbuf[1000];
char *wrkarray[] = {
"This is a ",
"bunch of strings ",
"that we will concatenate ",
"very efficiently by always ",
"knowing where the end of the string is going to be. ",
"This makes vstrcat() much ",
"more efficient than successive calls to strcat!",
NULL
};

begin = clock();
for( l_sub = 0; l_sub < 10000000; l_sub++ ) {
*wrkbuf = '\0';
strcat( wrkbuf, wrkarray[0 ] );
strcat( wrkbuf, wrkarray[1 ] );
strcat( wrkbuf, wrkarray[2 ] );
strcat( wrkbuf, wrkarray[3 ] );
strcat( wrkbuf, wrkarray[4 ] );
strcat( wrkbuf, wrkarray[5 ] );
strcat( wrkbuf, wrkarray[6 ] );
}
end = clock();
printf("Total tme for strcat(): %f\n",
(double) (end - begin) / CLOCKS_PER_SEC);
begin = clock();
for ( l_sub = 0; l_sub < 10000000; l_sub++ ) {
*wrkbuf = '\0';
vstrcat(wrkbuf, wrkarray[0],
wrkarray[1],
wrkarray[2],
wrkarray[3],
wrkarray[4],
wrkarray[5],
wrkarray[6]);
}
end = clock();
printf("Total time for %ld iterations of vstrcat(): %f\n",
l_sub, (double) (end - begin) / CLOCKS_PER_SEC);
return 0;
}

/* END new.c */
--
pete
Nov 15 '05 #7
pete wrote:
int
main( void )
{
long l_sub;
time_t begin, end;
That should be clock_t instead of time_t.
printf("Total tme for strcat(): %f\n",
(double) (end - begin) / CLOCKS_PER_SEC); printf("Total time for %ld iterations of vstrcat(): %f\n",
l_sub, (double) (end - begin) / CLOCKS_PER_SEC);


The last argument to both of those printf calls, should be:

(double) (end - begin) / (double) CLOCKS_PER_SEC

in case CLOCKS_PER_SEC is of type long double.

--
pete
Nov 15 '05 #8
On Fri, 24 Jun 2005 05:23:31 +0000, pete wrote:
Stan Milam wrote:
I changed it a little bit. This is what I got:

Total tme for strcat(): 9.328000
Total time for 10000000 iterations of vstrcat(): 8.454000


And this is what I got:

Total tme for strcat(): 8.850000
Total time for 10000000 iterations of vstrcat(): 24.930000
vstrcat(wrkbuf, wrkarray[0], [...] wrkarray[6]);
What happened to wrkarray[7]? You now have no NULL terminator, for your
test: while (wrk != NULL) {
I added it and with that fixed, I got 12.750000s for vstrcat.

So the OP's complaint would appear at first sight to be still true (on
my system anyway). What's going on? Well the answer is that firstly
different functions are being compared: vstrcat is in fact using strcpy
which we are comparing to multiple runs of strcat; and secondly varargs
are being used which for some reason causes the slowdown on my system.

I tried the alternative:
strcat( end, wrk );
and I got 9.290000s for vstrcat.

A lot better but still slower that multiple strcats.

So I tried an alternative function to test whether it was the varargs that
were causing the slowdown:

char *
vstrcat(char *string, char **strings) {
char *end, *wrk;

end = string + strlen(string);
wrk = *strings;
while (wrk) {
strcpy( end, wrk );
end += strlen( wrk ) ;
strings++;
wrk = *strings;
}
return string;
}

This gave times of:

11.700000s (using strcpy as above) - slightly faster than the
varargs-vstrcat; and,
8.650000s for the strcat alternative; this is faster than varargs-vstrcat
_and_ the multiple strcats... so we can finally invalidate our OP's
complaint.

Unfortunately it's less convenient to call this function than the varargs
alternative; but perhaps the OP will just have to accept that his function
isn't necessarily going to save time on all systems - this is true at
least on mine and his; though judging by your post possibly not yours pete.

Using your other suggestion:
size_t length = strlen( wrk );
memcpy( end, wrk, 1 + length );
end += length ;
I got 10.990000 secs for varargs-vstrcat and 11.320000s for array-vstrcat.

No one else has commented on the incongruity of: strcpy( end, wrk );
end += strlen( wrk );


The vstrcat function was designed to save the cost of running strlen on
the entire in-progress string, but here we have a separate duplication
because this strlen repeats some of the preceding strcpy's work. So I
tried replacing these 2 lines with:
for ( ; *end = *wrk; end++, wrk++) ;
and I got 16.870000s for varargs-vstrcat and 18.470000s for array-vstrcat.
I didn't get the improvement I'd hoped for; obviously there's a bit of
optimisation going on in those library functions.

So the moral of the story is... you can't predict savings... but at least
try to compare apples with apples and eliminate/evaluate any differences
between the two approaches where you can.

Nov 15 '05 #9
Netocrat wrote:
On Fri, 24 Jun 2005 05:23:31 +0000, pete wrote:

Stan Milam wrote:

I changed it a little bit. This is what I got:

Total tme for strcat(): 9.328000
Total time for 10000000 iterations of vstrcat(): 8.454000

And this is what I got:

Total tme for strcat(): 8.850000
Total time for 10000000 iterations of vstrcat(): 24.930000

vstrcat(wrkbuf, wrkarray[0],


[...]
wrkarray[6]);

What happened to wrkarray[7]? You now have no NULL terminator, for your
test:
while (wrk != NULL) {

I added it and with that fixed, I got 12.750000s for vstrcat.

So the OP's complaint would appear at first sight to be still true (on
my system anyway). What's going on? Well the answer is that firstly
different functions are being compared: vstrcat is in fact using strcpy
which we are comparing to multiple runs of strcat; and secondly varargs
are being used which for some reason causes the slowdown on my system.

I tried the alternative:
strcat( end, wrk );
and I got 9.290000s for vstrcat.

A lot better but still slower that multiple strcats.

So I tried an alternative function to test whether it was the varargs that
were causing the slowdown:

char *
vstrcat(char *string, char **strings) {
char *end, *wrk;

end = string + strlen(string);
wrk = *strings;
while (wrk) {
strcpy( end, wrk );
end += strlen( wrk ) ;
strings++;
wrk = *strings;
}
return string;
}

This gave times of:

11.700000s (using strcpy as above) - slightly faster than the
varargs-vstrcat; and,
8.650000s for the strcat alternative; this is faster than varargs-vstrcat
_and_ the multiple strcats... so we can finally invalidate our OP's
complaint.

Unfortunately it's less convenient to call this function than the varargs
alternative; but perhaps the OP will just have to accept that his function
isn't necessarily going to save time on all systems - this is true at
least on mine and his; though judging by your post possibly not yours pete.

Using your other suggestion:
size_t length = strlen( wrk );
memcpy( end, wrk, 1 + length );
end += length ;
I got 10.990000 secs for varargs-vstrcat and 11.320000s for array-vstrcat.

No one else has commented on the incongruity of:
strcpy( end, wrk );
end += strlen( wrk );

The vstrcat function was designed to save the cost of running strlen on
the entire in-progress string, but here we have a separate duplication
because this strlen repeats some of the preceding strcpy's work. So I
tried replacing these 2 lines with:
for ( ; *end = *wrk; end++, wrk++) ;
and I got 16.870000s for varargs-vstrcat and 18.470000s for array-vstrcat.
I didn't get the improvement I'd hoped for; obviously there's a bit of
optimisation going on in those library functions.

So the moral of the story is... you can't predict savings... but at least
try to compare apples with apples and eliminate/evaluate any differences
between the two approaches where you can.


Thanks! I really wasn't complaining. This is the kind of discussion I
was hoping for, not more o that "undefined behavior" shit. I just
thought it curious that I was trying to get around what appeared obvious
to me: that strcat() would have to spend time finding the end of a
continually longer string upon each call. I did a little digging into
the implementation of the string functions for my old, cheap C compiler.
The string functions were written in assembler and strcat() is the
model of efficiency. It sets a count register to 0, establishes an
address to the string in an index register then uses just one
instruction to find the end of the string. You can't beat that. Now,
if the string library had been written in C I think my vstrcat() might
have performed better. Hmmm, I feel a new project coming on.... :-)

To the poster who pointed out the EINVAL is not standard. Thanks, I did
not know that. However, I pulled out my copy of Plauger's "Standard C
Library" and found there are only three or four required values (EDOM,
ERANGE) and perhaps a couple of others, but an implementation may define
it's own set of values and be completely within the standard. So, I
suppose that is where EINVAL comes in. Anyway, it has always worked on
any machine I've ever developed on.

Regards,
Stan Milam.
Nov 15 '05 #10
Stan Milam wrote:
I did a little digging into
the implementation of the string functions for my old, cheap C compiler.
The string functions were written in assembler and strcat() is the
model of efficiency. It sets a count register to 0, establishes an
address to the string in an index register then uses just one
instruction to find the end of the string. You can't beat that.


That isn't obvious. Sometimes, complicated single instructions
run more slowly than sequence of simple instructions with the
same effect (and are also less amenable to optimisation).

[I'm particularly thinking of the VAX `index` instruction, which
on at least some machines was slower than the corresponding
sequence of simple integer instructions. It all depends on how
the instruction is implemented and how many special cases it
has to cater for and how much room there is on the chip and
whether the moon is made of green cheese.]

--
Chris "electric hedgehog" Dollin
It's called *extreme* programming, not *stupid* programming.
Nov 15 '05 #11
Netocrat wrote:
No one else has commented on the incongruity of:
strcpy( end, wrk );
end += strlen( wrk );

The vstrcat function was designed to save the cost of running strlen on
the entire in-progress string, but here we have a separate duplication
because this strlen repeats some of the preceding strcpy's work. So I
tried replacing these 2 lines with:
for ( ; *end = *wrk; end++, wrk++) ;
and I got 16.870000s for varargs-vstrcat and 18.470000s for array-vstrcat.
I didn't get the improvement I'd hoped for; obviously there's a bit of
optimisation going on in those library functions.

So the moral of the story is... you can't predict savings... but at least
try to compare apples with apples and eliminate/evaluate any differences
between the two approaches where you can.


You are completely correct. That is incongrous. I changed the crital
loop within vstrcat to:
while (!((wrk = va_arg(argptr, char *)) == NULL))
while ( *wrk != 0 ) *end++ = *wrk++;

The result? More than a 50% increase in speed! It now blows away the
repeatitive strcat().

Thanks for jarring the mental cobwebs!

Regards,
Stan Milam.
Nov 15 '05 #12
Chris Dollin wrote:
Stan Milam wrote:

I did a little digging into
the implementation of the string functions for my old, cheap C compiler.
The string functions were written in assembler and strcat() is the
model of efficiency. It sets a count register to 0, establishes an
address to the string in an index register then uses just one
instruction to find the end of the string. You can't beat that.

That isn't obvious. Sometimes, complicated single instructions
run more slowly than sequence of simple instructions with the
same effect (and are also less amenable to optimisation).

[I'm particularly thinking of the VAX `index` instruction, which
on at least some machines was slower than the corresponding
sequence of simple integer instructions. It all depends on how
the instruction is implemented and how many special cases it
has to cater for and how much room there is on the chip and
whether the moon is made of green cheese.]


Well, I was trying not to be too specific. I used to program a fair
amount in 8086 assembler (seems like a lifetime ago) and I was refering
to the "repnz scasb" instruction which is extremely fast.

Stan.
Nov 15 '05 #13
On Fri, 24 Jun 2005 14:06:08 +0000, Stan Milam wrote:
Netocrat wrote:
<snip analysis>
Thanks! I really wasn't complaining.
Inaccurate manner-of-speech: s/complaint/interesting problem/.
This is the kind of discussion I
was hoping for, not more o that "undefined behavior" shit. I just thought
it curious that I was trying to get around what appeared obvious to me:
that strcat() would have to spend time finding the end of a continually
longer string upon each call.
Well I find that undefined behaviour discussion useful since it's good to
know whether your code is guaranteed to compile properly on
standards-compliant compilers. But, yes I do find it interesting working
out what's going on with problems like this, especially when results don't
match my intuitions/summary analysis. It's counter to what I would expect
that using varargs actually impedes performance on my machine to the point
that it eliminates the gains we get from avoiding strcat having to seek to
the end of the increasing string. But there it is - and I can't find
fault with the test.
I did a little digging into the
implementation of the string functions for my old, cheap C compiler.
The string functions were written in assembler and strcat() is the
model of efficiency. It sets a count register to 0, establishes an
address to the string in an index register then uses just one
instruction to find the end of the string. You can't beat that.
It sounds fast, but you weren't trying to beat it; you were eliminating it
altogether. So no matter what, I would have expected your function to
perform better... except that there are obviously other factors at play
which I _believe_ is the varargs - seems to be on my system anyway.
Now, if the string library had been written in C I think my vstrcat()
might have performed better. Hmmm, I feel a new project coming on.... :-)


If you're going to be rewriting your function in assembler, consider an
assembler implementation of my for loop - it will avoid the duplication of
the strlen.

Nov 15 '05 #14
"pete" <pf*****@mindspring.com> wrote in message
news:42***********@mindspring.com...
Stan Milam wrote:

The following code implements a strcat-like
function which can receive a
variable number of arguments.
I thought it would be faster if I kept a
pointer to the end of the string as it is built so that strcat() would
not always have to find the end of the string as it gets longer and
longer. To test this I called the vstrcat() function with a list of
pointers,
then use the regular strcat() function in a loop to build the
same string.
After 10,000,000 iteration the regular strcat() won every
time, sometimes by as much as 10 seconds.
I've tried several different
approaches (as can be seen with code commented out in the vstrcat()
function) but strcat() always wins. This is using my old DOS C
compiler. On my UNIX machine they are about even.
I changed it a little bit. This is what I got:

Total tme for strcat(): 9.328000
Total time for 10000000 iterations of vstrcat(): 8.454000


It shouldn't have worked at all, the call to vstrcat() invokes UBH
how will va_arg know where to stop? (no (char *)NULL terminator)

As for your implementation, a little messy and contains unnecessary
overhead...
Here's how I would code it:

char *
vstrcat(char *s, ...)
{
char *p = &s[strlen(s)], *wrk;
va_list ap;
va_start(ap, s);
while((wrk = va_arg(ap, char *)) != NULL)
p += sprintf(p, wrk);
va_end(ap);
return s;
}
Untested (I know it won't work with the driver below for as
I've previously stated, it's flawed) but it should outperform
both your and the original poster's functions ;)
If you do decide to test it, let us know the results.

Mark


/* BEGIN new.c */

#include <string.h>
#include <stdarg.h>

char *
vstrcat(char *string, ...)
{
va_list argptr;
char *end, *wrk;

end = string + strlen(string);
va_start(argptr, string);
wrk = va_arg(argptr, char *);
while (wrk != NULL) {
strcpy( end, wrk );
end += strlen( wrk ) ;
wrk = va_arg(argptr, char *);
}
va_end(argptr);
return string;
}

#include <time.h>
#include <stdio.h>

int
main( void )
{
long l_sub;
time_t begin, end;
char wrkbuf[1000];
char *wrkarray[] = {
"This is a ",
"bunch of strings ",
"that we will concatenate ",
"very efficiently by always ",
"knowing where the end of the string is going to be. ",
"This makes vstrcat() much ",
"more efficient than successive calls to strcat!",
NULL
};

begin = clock();
for( l_sub = 0; l_sub < 10000000; l_sub++ ) {
*wrkbuf = '\0';
strcat( wrkbuf, wrkarray[0 ] );
strcat( wrkbuf, wrkarray[1 ] );
strcat( wrkbuf, wrkarray[2 ] );
strcat( wrkbuf, wrkarray[3 ] );
strcat( wrkbuf, wrkarray[4 ] );
strcat( wrkbuf, wrkarray[5 ] );
strcat( wrkbuf, wrkarray[6 ] );
}
end = clock();
printf("Total tme for strcat(): %f\n",
(double) (end - begin) / CLOCKS_PER_SEC);
begin = clock();
for ( l_sub = 0; l_sub < 10000000; l_sub++ ) {
*wrkbuf = '\0';
vstrcat(wrkbuf, wrkarray[0],
wrkarray[1],
wrkarray[2],
wrkarray[3],
wrkarray[4],
wrkarray[5],
wrkarray[6]);
}
end = clock();
printf("Total time for %ld iterations of vstrcat(): %f\n",
l_sub, (double) (end - begin) / CLOCKS_PER_SEC);
return 0;
}

/* END new.c */
--
pete

Nov 15 '05 #15
Stan Milam <st*****@swbell.net> writes:
I used to program a fair
amount in 8086 assembler (seems like a lifetime ago) and I was
refering to the "repnz scasb" instruction which is extremely fast.


As I understand it, `repnz scasb' is faster than the equivalent
sequence of simple instructions on some chips, but slower on
others.
--
"This is a wonderful answer.
It's off-topic, it's incorrect, and it doesn't answer the question."
--Richard Heathfield
Nov 15 '05 #16
Ben Pfaff wrote:
Stan Milam <st*****@swbell.net> writes:

I used to program a fair
amount in 8086 assembler (seems like a lifetime ago) and I was
refering to the "repnz scasb" instruction which is extremely fast.

As I understand it, `repnz scasb' is faster than the equivalent
sequence of simple instructions on some chips, but slower on
others.


I have heard that before, but like I said it was a long time ago for me
and I have not kept up with that corner of the tech field. Now I use C
for just about everything :-).

Stan.
Nov 15 '05 #17
On Fri, 24 Jun 2005 16:53:01 +0000, Mark wrote:
"pete" <pf*****@mindspring.com> wrote in message
news:42***********@mindspring.com...
Stan Milam wrote:
As for your implementation, a little messy and contains unnecessary
overhead... [snip code] Untested (I know it won't work with the driver below for as I've
previously stated, it's flawed) but it should outperform both your and the
original poster's functions ;) If you do decide to test it, let us know
the results.


Approximately 4 times slower on my machine. Why did you expect better
performance? Even though you are getting back the length and don't have
to do a separate strlen, sprintf is by its nature much more complex than
strcat or strcpy and no doubt these two simple library functions are
much more highly optimised for this task, probably by using processor
instructions that copy many bytes at a time, which I would not expect
sprintf to be optimised to do.

But yes, your implementation is neat and concise.

Nov 15 '05 #18
Stan Milam wrote:
.... snip ...
Well, I was trying not to be too specific. I used to program a
fair amount in 8086 assembler (seems like a lifetime ago) and I
was refering to the "repnz scasb" instruction which is extremely
fast.


Which has widely variable running time, and has to access every
byte on the way up. In the bad old days it was also subject to
failure to restart after an interrupt.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
Nov 15 '05 #19
On Fri, 24 Jun 2005 17:13:55 +0000, Stan Milam wrote:
Ben Pfaff wrote:
Stan Milam <st*****@swbell.net> writes:
I used to program a fair
amount in 8086 assembler (seems like a lifetime ago) and I was refering
to the "repnz scasb" instruction which is extremely fast.


As I understand it, `repnz scasb' is faster than the equivalent sequence
of simple instructions on some chips, but slower on others.


I have heard that before, but like I said it was a long time ago for me
and I have not kept up with that corner of the tech field. Now I use C
for just about everything :-).

Stan.


Stan if you're interested in collaborating on some fusion asm-C, send me
an email at the address above.

Nov 15 '05 #20

"Netocrat" <ne******@dodo.com.au> wrote in message
news:pa****************************@dodo.com.au...
On Fri, 24 Jun 2005 16:53:01 +0000, Mark wrote:
"pete" <pf*****@mindspring.com> wrote in message
news:42***********@mindspring.com...
Stan Milam wrote:

As for your implementation, a little messy and contains unnecessary
overhead...

[snip code]
Untested (I know it won't work with the driver below for as I've
previously stated, it's flawed) but it should outperform both your and
the
original poster's functions ;) If you do decide to test it, let us know
the results.


Approximately 4 times slower on my machine. Why did you expect better
performance? Even though you are getting back the length and don't have
to do a separate strlen, sprintf is by its nature much more complex than
strcat or strcpy and no doubt these two simple library functions are
much more highly optimised for this task, probably by using processor
instructions that copy many bytes at a time, which I would not expect
sprintf to be optimised to do.

But yes, your implementation is neat and concise.


You're right... I guess I should have tested it first.
I expected sprintf()'s performance to be nearly the same
as strcpy()'s when no conversion specifiers were present
in the format string... and that 1 pass with sprintf()
would outperform the 2 passes needed by strcpy() and strlen().
Guess I was mistaken. :-)
Here's another version... tested this time... and this one should
substantially outperform the other.

char *
vstrcat(char *s, ...)
{
register char *p = &s[strlen(s)];
register char *arg;
va_list ap;
va_start(ap, s);
while((arg = va_arg(ap, char *)) != NULL)
while((*p++ = *arg++) != 0);
va_end(ap);
return s;
}

Mark
Nov 15 '05 #21

"Mark" <so***@localbar.com> wrote in message
news:O1******************@newshog.newsread.com...

"Netocrat" <ne******@dodo.com.au> wrote in message
news:pa****************************@dodo.com.au...
On Fri, 24 Jun 2005 16:53:01 +0000, Mark wrote:
"pete" <pf*****@mindspring.com> wrote in message
news:42***********@mindspring.com...
Stan Milam wrote:

As for your implementation, a little messy and contains unnecessary
overhead...

[snip code]
Untested (I know it won't work with the driver below for as I've
previously stated, it's flawed) but it should outperform both your and
the
original poster's functions ;) If you do decide to test it, let us know
the results.


Approximately 4 times slower on my machine. Why did you expect better
performance? Even though you are getting back the length and don't have
to do a separate strlen, sprintf is by its nature much more complex than
strcat or strcpy and no doubt these two simple library functions are
much more highly optimised for this task, probably by using processor
instructions that copy many bytes at a time, which I would not expect
sprintf to be optimised to do.

But yes, your implementation is neat and concise.


You're right... I guess I should have tested it first.
I expected sprintf()'s performance to be nearly the same
as strcpy()'s when no conversion specifiers were present
in the format string... and that 1 pass with sprintf()
would outperform the 2 passes needed by strcpy() and strlen().
Guess I was mistaken. :-)
Here's another version... tested this time... and this one should
substantially outperform the other.

char *
vstrcat(char *s, ...)
{
register char *p = &s[strlen(s)];
register char *arg;
va_list ap;
va_start(ap, s);
while((arg = va_arg(ap, char *)) != NULL)
while((*p++ = *arg++) != 0);
va_end(ap);
return s;
}

Mark


Might as well post it before someone else does...
without optimizations - my new function still sucks!
Following results with 10,000,000 iterations on my system

gcc X.c
strcat() = 12.460928
pete-vstrcat=9.812500
mine=15.085938

with optimizations, very different results
gcc -O1 X.c
strcat() = 12.437500
pete-vstrcat=9.164062
mine=4.445312

gcc -02 X.c
strcat() = 12.468750
pete-vstrcat=9.156250
mine=4.468750

gcc -03 X.c
strcat() = 12.453125
pete-vstrcat=9.148438
mine=4.460938

Neither the strcat() or pete's vstrcat() seem to be
affected much by the optimizations, though it makes
one hell of a difference to my code!

Mark
Nov 15 '05 #22
On Fri, 24 Jun 2005 18:11:58 +0000, Mark wrote:

"Netocrat" <ne******@dodo.com.au> wrote in message
news:pa****************************@dodo.com.au...
On Fri, 24 Jun 2005 16:53:01 +0000, Mark wrote:
"pete" <pf*****@mindspring.com> wrote in message
news:42***********@mindspring.com...
Stan Milam wrote:

Here's another version... tested this time...


Not too thoroughly though :P

I already posted an equivalent solution to that except that it didn't use
the register keyword, and boy was it slow. With register though it is the
best solution.

Anyhow the problem is that the increments should occur within the loop,
not in the test. A for loop is more appropriate. Also the != 0 is
redundant.
while((*p++ = *arg++) != 0); becomes
for( ; *p = *arg; p++, arg++) ;
and this one should
substantially outperform the other.


It sure does. It's about 25% faster than multiple strcats.

Nov 15 '05 #23
On Fri, 24 Jun 2005 18:29:04 +0000, Mark wrote:
Might as well post it before someone else does... without optimizations -
my new function still sucks! Following results with 10,000,000 iterations
on my system

gcc X.c
strcat() = 12.460928
pete-vstrcat=9.812500
mine=15.085938

with optimizations, very different results gcc -O1 X.c
strcat() = 12.437500
pete-vstrcat=9.164062
mine=4.445312

gcc -02 X.c
strcat() = 12.468750
pete-vstrcat=9.156250
mine=4.468750

gcc -03 X.c
strcat() = 12.453125
pete-vstrcat=9.148438
mine=4.460938

Neither the strcat() or pete's vstrcat() seem to be affected much by the
optimizations, though it makes one hell of a difference to my code!


Actually that's it - I thought it was the register keyword that improved
my originally very slow for loop, but I tested it with/without that
keyword and with/without optimisations:

reg no reg
opt 6.98 6.98
no opt 15.59 18.88

Seems that registers are being used anyhow under optimisation; without
optimisation there's not very much improvement using registers. Which
goes to show what I'm coming to believe more and more... leave
optimisation to the compiler. Keywords like register are funky and all,
but in most cases that sort of decision is best made by the compiler.

Nov 15 '05 #24

"Netocrat" <ne******@dodo.com.au> wrote in message
news:pa****************************@dodo.com.au...
On Fri, 24 Jun 2005 18:11:58 +0000, Mark wrote:
Not too thoroughly though :P Pretty thouroughly, ;) results posted elsewhere ;)
Anyhow the problem is that the increments should occur within the loop,
not in the test. It is being done in the loop, after the test... no?
A for loop is more appropriate. Also the != 0 is
redundant.
I prefer the redundancy in that instance (assignment in condition) as it
makes the code clear to others who read it someday!

for instance, consider:
if(a = b)
valid line, may be exactly what you want to do, but forces others to examine
surrounding code to see if you've made a mistake... not to mention the fact
that your compiler may complain about it if you've got warnings turned on.

if((a = b))
also a valid line, may stop your compiler from complaining, but forces
others to examine surrounding code to wonder if you threw that in merely
to stop the compiler from bitching...

if((a = b) != 0)
equivalent to both previous examples... but now both your compiler
AND your co-workers should be happy. No question as to your intentions.

while((*p++ = *arg++) != 0);
becomes
for( ; *p = *arg; p++, arg++) ;

I did use a for() loop initially (noticed that's what my implementation
did in strcpy()) but I got slightly better results with the while() for some
reason when I was testing... so that's the version I posted.

Mark
Nov 15 '05 #25
CBFalconer wrote:
Stan Milam wrote:

... snip ...
Well, I was trying not to be too specific. I used to program a
fair amount in 8086 assembler (seems like a lifetime ago) and I
was refering to the "repnz scasb" instruction which is extremely
fast.

Which has widely variable running time, and has to access every
byte on the way up. In the bad old days it was also subject to
failure to restart after an interrupt.


I never heard about the failure to restart, or maybe I forgot(it was a
long time ago), and it is true that it scanned every byte on the way up,
but since it is a single instruction executing in the processor it is
way faster than say C code running down to find the end of the string as
a C version of strcat() would have to do.

Regards,
Stan.
Nov 15 '05 #26
On Fri, 24 Jun 2005 19:30:06 +0000, Mark wrote:

"Netocrat" <ne******@dodo.com.au> wrote in message
news:pa****************************@dodo.com.au...
On Fri, 24 Jun 2005 18:11:58 +0000, Mark wrote: Not too thoroughly
though :P Pretty thouroughly, ;) results posted elsewhere ;)


You have a good case to make that you did some benchmarking. As for
thorough testing... well, here's the output of your function:

"This is a "

which should have been

"This is a bunch of strings that we will concatenate very efficiently by
always knowing where the end of the string is going to be. This makes
vstrcat() much more efficient than successive calls to strcat!"

Doh! :P
Anyhow the problem is that the increments should occur within the loop,
not in the test.

It is being done in the loop, after the test... no?


Sure thing. Problem is that even when the test is true and the loop
exits, the increments still occur. So the pointers get incremented past
the terminating '\0'.

When you use a for loop or put the increments into the loop body
rather than the test, this isn't a problem because the increments don't
occur when the loop exits.
A for loop is more appropriate. Also the != 0 is
redundant.


I prefer the redundancy in that instance (assignment in condition) as it
makes the code clear to others who read it someday!

[snip] if((a = b) != 0)
equivalent to both previous examples... but now both your compiler AND
your co-workers should be happy. No question as to your intentions.


Fair enough, it's a matter of preference and style and you have reasons
for your choice.
while((*p++ = *arg++) != 0);
becomes
for( ; *p = *arg; p++, arg++) ;

I did use a for() loop initially (noticed that's what my implementation
did in strcpy()) but I got slightly better results with the while() for
some reason when I was testing... so that's the version I posted.


Yes I noticed that the incorrect version is slightly faster than the fixed
version under gcc - which compiler we're both using; I don't know why -
it's counter-intuitive - the code requires more increment operations yet
it's faster...

Nov 15 '05 #27
"Netocrat" <ne******@dodo.com.au> wrote in message
news:pa****************************@dodo.com.au...
On Fri, 24 Jun 2005 19:30:06 +0000, Mark wrote:
"Netocrat" <ne******@dodo.com.au> wrote in message
news:pa****************************@dodo.com.au...
On Fri, 24 Jun 2005 18:11:58 +0000, Mark wrote: Not too thoroughly
though :P

Pretty thouroughly, ;) results posted elsewhere ;)

You have a good case to make that you did some benchmarking. As for
thorough testing... well, here's the output of your function:
"This is a "
which should have been
"This is a bunch of strings that we will concatenate very efficiently by
always knowing where the end of the string is going to be. This makes
vstrcat() much more efficient than successive calls to strcat!"
Doh! :P

:-) :-) :-) :-) :-) :-) :-) :-)
ouch, you got me ;-)
(those are my shit-eating grins)

Nov 15 '05 #28
Stan Milam <st*****@swbell.net> writes:
Ben Pfaff wrote:
Stan Milam <st*****@swbell.net> writes:
I used to program a fair
amount in 8086 assembler (seems like a lifetime ago) and I was
refering to the "repnz scasb" instruction which is extremely fast.

As I understand it, `repnz scasb' is faster than the equivalent
sequence of simple instructions on some chips, but slower on
others.


I have heard that before, but like I said it was a long time ago for
me and I have not kept up with that corner of the tech field. Now I
use C for just about everything :-).


The point I was trying to make is that performance is very much
system-dependent and thus not a great topic for comp.lang.c,
which focuses on the C language, independent of particular
implementations.
--
"This is a wonderful answer.
It's off-topic, it's incorrect, and it doesn't answer the question."
--Richard Heathfield
Nov 15 '05 #29

On 25/06/2005 01:20, Ben Pfaff wrote:
Stan Milam <st*****@swbell.net> writes:
Ben Pfaff wrote:
Stan Milam <st*****@swbell.net> writes:

I used to program a fair
amount in 8086 assembler (seems like a lifetime ago) and I was
refering to the "repnz scasb" instruction which is extremely fast.
As I understand it, `repnz scasb' is faster than the equivalent
sequence of simple instructions on some chips, but slower on
others.


I have heard that before, but like I said it was a long time ago for
me and I have not kept up with that corner of the tech field. Now I
use C for just about everything :-).


The point I was trying to make is that performance is very much
system-dependent and thus not a great topic for comp.lang.c,
which focuses on the C language, independent of particular
implementations.


But it's good to write portable C code more likely to be optimized, so
that's not completely off-topic either. Indeed, it's worth the "pain" of
learning the C standard, and complying to it, if we know that the compiler
is able to optimize, and in which way, because then we don't need to use
processor specific extensions, or at least we need them less often.
So it's a good advertisement to show how portable C is translated.
There were already examples on this NG, with the thread on "switch", and the
other on "converting 4 bytes to an int", and probably much more.

Nov 15 '05 #30
pete wrote:
Stan Milam wrote:

errno = EINVAL;

EINVAL isn't standard C.


From the C standard for errno.h:

Additional macro definitions, beginning with E and a digit or E and an
upper-case letter, may also be specified by the implementation.

Regards,
Stan Milam.
Nov 15 '05 #31
>pete wrote:
EINVAL isn't standard C.

In article <fE**************@newssvr30.news.prodigy.com>
Stan Milam <st*****@swbell.net> wrote: From the C standard for errno.h:

Additional macro definitions, beginning with E and a digit or E and an
upper-case letter, may also be specified by the implementation.


Unfortunately, all this means is that you, the programmer, cannot
do things like:

#include <errno.h>
...
/* flags for object description */
#define ELASTIC 1 /* stretchy */
#define ELEPHANTINE 2 /* very large */
#define EMERALD 4 /* a nice green color */
#define EMERY 8 /* rough */
#define EXPENSIVE 16 /* costs over a million bucks */
#define EXQUISITE 32 /* and worth it */

because the implementor might put conflicting definitions of each of
those macros in his <errno.h>. You cannot count on them being in
<errno.h> either, of course -- nor, at least in the context of
"pure ANSI/ISO C", can you count on EINVAL, EISDIR, and E2BIG
being present.

(Aside: I am not sure what is very large, green, rough, stretchy,
and worth over a million bucks... :-) )

I tend to ignore this particular part of the standard, as I consider
taking away "almost every identifier beginning with uppercase E"
a form of overreaching. In fact, I tend to ignore almost everything
the C standard says about <errno.h>, preferring other standards that
are at least more useful (such as POSIX). But here in comp.lang.c
I will not recommend that others do so. :-)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 15 '05 #32
Chris Torek wrote:
.... snip ...
Unfortunately, all this means is that you, the programmer, cannot
do things like:

#include <errno.h>
...
/* flags for object description */
#define ELASTIC 1 /* stretchy */
#define ELEPHANTINE 2 /* very large */
#define EMERALD 4 /* a nice green color */
#define EMERY 8 /* rough */
#define EXPENSIVE 16 /* costs over a million bucks */
#define EXQUISITE 32 /* and worth it */

because the implementor might put conflicting definitions of each
of those macros in his <errno.h>. You cannot count on them being
in <errno.h> either, of course -- nor, at least in the context of
"pure ANSI/ISO C", can you count on EINVAL, EISDIR, and E2BIG
being present.

(Aside: I am not sure what is very large, green, rough, stretchy,
and worth over a million bucks... :-) )


ROTFL. Kermit the Frog meets most of the requirements. Will you
settle for 61?

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
Nov 15 '05 #33
Stan Milam wrote:
I never heard about the failure to restart, or maybe I forgot ...


IIRC the problem arose when there were two prefixes, e.g.
REP and segment-override. Restart overlooked one of the
prefixes.

Talk of fast strcpy()/memcpy() loops reminds me of the
68020-based Workstation-3 from WellKnown Respected Vendor, Inc. (tm).
I was amazed that a simple memcpy() in C dramatically outperformed
the library routine from this Highly Respected company, so I
examined their (assembly language) source code. It was a 2-instruction
loop and the comment told the story:
"optimized for the cache of the 68010."
The 68010 had been used in the company's Workstation-2 which had
been out of production for several years!

James

Nov 15 '05 #34
On Mon, 27 Jun 2005 02:32:52 -0700, jdallen2000 wrote:
Stan Milam wrote:
I never heard about the failure to restart, or maybe I forgot ...


IIRC the problem arose when there were two prefixes, e.g. REP and
segment-override. Restart overlooked one of the prefixes.

Talk of fast strcpy()/memcpy() loops reminds me of the 68020-based
Workstation-3 from WellKnown Respected Vendor, Inc. (tm). I was amazed
that a simple memcpy() in C dramatically outperformed the library routine
from this Highly Respected company, so I examined their (assembly
language) source code. It was a 2-instruction loop and the comment told
the story:
"optimized for the cache of the 68010." The 68010 had been used in the
company's Workstation-2 which had been out of production for several
years!


I'm finding something similar with memcmp as implemented by glibc for
pentiums. A naive C implementation compiled with full optimisation
outperforms the library version by more than a factor of three for the
first 3 bytes and is well ahead for at least the first 35. After 100
bytes there doesn't seem to be any significant difference.

The reason: glibc uses the instruction repe; cmpsb which is apparently
quite slow on some chips.

Nov 15 '05 #35
On Tue, 28 Jun 2005 16:48:32 -0500, Netocrat wrote
(in article <pa****************************@dodo.com.au>):
I'm finding something similar with memcmp as implemented by glibc for
pentiums. A naive C implementation compiled with full optimisation
outperforms the library version by more than a factor of three for the
first 3 bytes and is well ahead for at least the first 35. After 100
bytes there doesn't seem to be any significant difference.


think that is interesting, try the MSVC implementation of
memcmp(), it is simply HORRIBLE on large blocks. The native
linux glibc implementation is pretty good, about 30-40% faster
on the same hardware. Resorting to an inline asm implementation
of memcmp() on Windows will get almost identical performance.
Basically, memcmp() with MSVC is not worth using if it forms any
part of the critical path for your application.

Nov 15 '05 #36

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: James Andrus | last post by:
I don't know anything about making web pages, .NET, etc. I have noticed that my Norton Internet Security (2003) ad/popup blocker is blocking the loading of .aspx pages. The results of a Google...
67
by: Serve La | last post by:
I asked if MS has any plans to support C99 in the next VisualC. This is their answer. I think we should whine more :-) We feel that C++ addresses this space sufficiently. In general we have no...
0
by: G | last post by:
Does every one here use s/w patterns with C#? What is the programming background of you guys and gals? (Java, C++, C, VB, Delphi or other) I'm just curious Regards G
30
by: Tibby | last post by:
Okay, I've made my DLL with COM interop so it can be used with VB6, and it works like a charm, my question is, how do I register it and it's .tlb file on another computer so I can use it there? ...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.