469,602 Members | 1,838 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,602 developers. It's quick & easy.

Comment on trim string function please

Just looking for a few eyes on this code other than my own.

void TrimCString(char *str)
{
// Trim whitespace from beginning:
size_t i = 0;
size_t j;

while(isspace(str[i]))
{
i++;
}
if(i 0)
{
for(j = 0; i < strlen(str); j++, i++)
{
str[j] = str[i];
}
str[j] = '\0';
}

// Trim whitespace from end:
i = strlen(str) - 1;

while(isspace(str[i]))
{
i--;
}
if(i < (strlen(str) - 1))
{
str[i + 1] = '\0';
}
}
Jul 10 '08 #1
121 4307
sw***********@gmail.com <sw***********@gmail.comwrote:
Just looking for a few eyes on this code other than my own.
void TrimCString(char *str)
{
// Trim whitespace from beginning:
I guess I would trim from the end first since then you have less
copying to do afterwards.
size_t i = 0;
size_t j;
while(isspace(str[i]))
isspace() expects an int and not a char as it's argument.
{
i++;
}
if(i 0)
{
for(j = 0; i < strlen(str); j++, i++)
{
str[j] = str[i];
}
str[j] = '\0';
}
An alternative would be to use memmove() here, so you don't
have to do it byte by byte. Also callling strlen() each time
through the loop is a bit of a waste of time - it doesn't
change and can be replaced by a check if str[i] is '\0'.
// Trim whitespace from end:
i = strlen(str) - 1;
Careful: This could set 'i' to -1 (if the string consistet of white
space only) and then the rest won't work anymore.
while(isspace(str[i]))
{
i--;
}
if(i < (strlen(str) - 1))
{
str[i + 1] = '\0';
}
}
Here's an alternative version using pointers (and trying to
minimize the number of calls of strlen()):

void
TrimCString( char *str )
{
char *p,
*q;

/* Check that we've got something that looks like a string */

if ( ! str || ! * str )
return;

/* Trim from end */

for ( p = str + strlen( str ) - 1; p != str && isspace( ( int ) *p ); p-- )
/* empty */ ;

if ( p == str ) /* only white space in string */
{
*str = '\0';
return;
}

*++p = '\0';

/* Trim from start */

for ( q = str; isspace( ( int ) *q ); q++ )
/* empty */ ;

if ( q != str )
memmove( str, q, p - q + 1 );
}
Regards, Jens
--
\ Jens Thoms Toerring ___ jt@toerring.de
\__________________________ http://toerring.de
Jul 10 '08 #2
sw***********@gmail.com wrote:
Just looking for a few eyes on this code other than my own.

void TrimCString(char *str)
Why not return a char *, like most other string functions?

char *TrimCString(char *str)

using char * enables you to piggyback your function in the
middle of other functions, eg
printf("%s\n", TrimCString(someString));
{
// Trim whitespace from beginning:
size_t i = 0;
size_t j;

while(isspace(str[i]))
{
i++;
}
if(i 0)
{
for(j = 0; i < strlen(str); j++, i++)
move the strlen() outside the loop.
Maybe use memmove() instead of the loop.
{
str[j] = str[i];
}
str[j] = '\0';
}

// Trim whitespace from end:
i = strlen(str) - 1;
Use the strlen you computed before, when
you moved it out of the for loop above :)
>
while(isspace(str[i]))
{
i--;
}
if(i < (strlen(str) - 1))
{
str[i + 1] = '\0';
No need for the test.
when i >= (strlen(str) - 1) -- it can only be equal, anyway -- the
assignment overwrites a '\0' with a brand new '\0'.
Anyway, if you want to keep the test, use the computed strlen.
}
}

A couple what-if's

* what if a pass NULL to the function?
TrimCString(NULL);

* what if I pass a constant string literal to the function?
TrimCString(" 4 spaces at both ends ");
Jul 10 '08 #3
sw***********@gmail.com wrote:
Just looking for a few eyes on this code other than my own.
This pair of eyes sees three bugs, two occurring twice each
and the other perhaps due to too much snippage. There are also
some opportunities to improve speed and style. So, here we go:

Bug: Since you're using isspace() and strlen(), you need to
#include <ctype.hand <string.hto get their declarations.
Without the declarations, a compiler operating under C99 rules
must generate a diagnostic. Under C89 rules the code will work,
but might not be as fast as if the vendor's "magic" declarations
were present.
void TrimCString(char *str)
{
// Trim whitespace from beginning:
size_t i = 0;
Style: Instead of initializing `i' at its declaration and then
relying on the initial value later on, consider initializing it
closer to its use. The `for' statement is convenient for such
purposes.
size_t j;

while(isspace(str[i]))
Bug: If `str' contains negative-valued characters, this use
may violate the "contract" of isspace() by passing an out-of-range
argument. Use `while (isspace( (unsigned char)str[i] ))'. (This
is one of those occasions where a cast is almost always required
and almost always omitted, as opposed to the more usual situation
where a cast is almost always inserted and almost always wrong.)
{
i++;
}
if(i 0)
{
for(j = 0; i < strlen(str); j++, i++)
Speed: This loop calculates strlen(str) on every iteration.
Since it will return the same value each time, all calls after the
first are wasted effort. Call strlen() once before the loop and
store the result in a variable, and use that variable to control
the loop.
{
str[j] = str[i];
}
str[j] = '\0';
}

// Trim whitespace from end:
i = strlen(str) - 1;
Bug: If `str' is the empty string (either because it was
empty to begin with or because it contained only white space),
this calculation will produce a very large `i' that is almost
assuredly wrong, R-O-N-G.
while(isspace(str[i]))
Bug: Same missing cast as above.
{
i--;
}
if(i < (strlen(str) - 1))
Bug: Same mishandling of the empty string as above.

Speed: strlen(str) is still the same as it was a few lines
ago, so there's no point in computing it again.
{
str[i + 1] = '\0';
Speed: It's probably quicker -- and certainly less verbose --
to do this assignment unconditionally than to test whether it's
needed. If it isn't, you'll just store a zero on top of an
existing zero, which is harmless.
}
}
Summary: Not quite ready for prime time, but not as bad as
some attempts I've seen.

Challenge: See if you can think of a way to do the job in
just one pass over the string (calling strlen() counts as a
"pass"). Hint: During the copy-to-front operation, can you
somehow figure out where the final '\0' should land without
going back and inspecting the moved characters a second time?

--
Er*********@sun.com
Jul 10 '08 #4
On Jul 10, 1:32*pm, j...@toerring.de (Jens Thoms Toerring) wrote:
swengineer...@gmail.com <swengineer...@gmail.comwrote:
Just looking for a few eyes on this code other than my own.
void TrimCString(char *str)
{
* * * * // Trim whitespace from beginning:

I guess I would trim from the end first since then you have less
copying to do afterwards.
* * * * size_t i = 0;
* * * * size_t j;
* * * * while(isspace(str[i]))

isspace() expects an int and not a char as it's argument.
Doesn't this promotion happen implicitly and without loss of
information since I am actually dealing with characters and not using
the char as a holder for small integers?
>
* * * * {
* * * * * * * * i++;
* * * * }
* * * * if(i 0)
* * * * {
* * * * * * * * for(j = 0; i < strlen(str); j++, i++)
* * * * * * {
* * * * * * * * * * * * str[j] = str[i];
* * * * * * * * }
* * * * * * * * str[j] = '\0';
* * * * }

An alternative would be to use memmove() here, so you don't
have to do it byte by byte. Also callling strlen() each time
through the loop is a bit of a waste of time - it doesn't
change and can be replaced by a check if str[i] is '\0'.
* * * * // Trim whitespace from end:
* * * * i = strlen(str) - 1;

Careful: This could set 'i' to -1 (if the string consistet of white
space only) and then the rest won't work anymore.
i is of type size_t which I believe can't be negative?
>
* * * * while(isspace(str[i]))
* * * * {
* * * * * * * * i--;
* * * * }
* * * * if(i < (strlen(str) - 1))
* * * * {
* * * * * * * * str[i + 1] = '\0';
* * * * }
}

Here's an alternative version using pointers (and trying to
minimize the number of calls of strlen()):

void
TrimCString( char *str )
{
* * char *p,
* * * * * * **q;

* * /* Check that we've got something that looks like a string */

* * * * if ( ! str || ! * str )
* * * * * * * * return;

* * * * /* Trim from end */

* * for ( p = str + strlen( str ) - 1; p != str && isspace( ( int) *p ); p-- )
* * * * * * /* empty */ ;

* * * * if ( p == str ) * * */* only white space in string */
* * * * {
* * * * * * *str = '\0';
* * * * * * * * return;
* * }

* * * * *++p = '\0';

* * /* Trim from start */

* * * * for ( q = str; isspace( ( int ) *q ); q++ )
* * * * /* empty */ ;

* * if ( q != str )
* * * * memmove( str, q, p - q + 1 );}
Thanks for the code and the comments. This is useful.
Jul 10 '08 #5
On Jul 10, 1:39*pm, badc0...@gmail.com wrote:
swengineer...@gmail.com wrote:
Just looking for a few eyes on this code other than my own.
void TrimCString(char *str)

Why not return a char *, like most other string functions?

char *TrimCString(char *str)

* * * * using char * enables you to piggyback your function in the
* * * * middle of other functions, eg
* * * * printf("%s\n", TrimCString(someString));
Good point, I had not thought of it.
>
{
* *// Trim whitespace from beginning:
* *size_t i = 0;
* *size_t j;
* *while(isspace(str[i]))
* *{
* * * * * *i++;
* *}
* *if(i 0)
* *{
* * * * * *for(j = 0; i < strlen(str); j++, i++)

move the strlen() outside the loop.
Maybe use memmove() instead of the loop.
* * * *{
* * * * * * * * * *str[j] = str[i];
* * * * * *}
* * * * * *str[j] = '\0';
* *}
* *// Trim whitespace from end:
* *i = strlen(str) - 1;

Use the strlen you computed before, when
you moved it out of the for loop above :)
the length of the string has changed since then, possibly.
>

* *while(isspace(str[i]))
* *{
* * * * * *i--;
* *}
* *if(i < (strlen(str) - 1))
* *{
* * * * * *str[i + 1] = '\0';

No need for the test.
when i >= (strlen(str) - 1) -- it can only be equal, anyway -- the
assignment overwrites a '\0' with a brand new '\0'.
Anyway, if you want to keep the test, use the computed strlen.
* *}
}

A couple what-if's

* what if a pass NULL to the function?
* TrimCString(NULL);
I added a check for this.
>
* what if I pass a constant string literal to the function?
* TrimCString(" * *4 spaces at both ends * *");
How can I account for this?

Jul 10 '08 #6
On Jul 10, 1:41*pm, Eric Sosman <Eric.Sos...@sun.comwrote:
swengineer...@gmail.com wrote:
Just looking for a few eyes on this code other than my own.

* * *This pair of eyes sees three bugs, two occurring twice each
and the other perhaps due to too much snippage. *There are also
some opportunities to improve speed and style. *So, here we go:

* * *Bug: Since you're using isspace() and strlen(), you need to
#include <ctype.hand <string.hto get their declarations.
Without the declarations, a compiler operating under C99 rules
must generate a diagnostic. *Under C89 rules the code will work,
but might not be as fast as if the vendor's "magic" declarations
were present.
void TrimCString(char *str)
{
* *// Trim whitespace from beginning:
* *size_t i = 0;

* * *Style: Instead of initializing `i' at its declaration and then
relying on the initial value later on, consider initializing it
closer to its use. *The `for' statement is convenient for such
purposes.
* *size_t j;
* *while(isspace(str[i]))

* * *Bug: If `str' contains negative-valued characters, this use
may violate the "contract" of isspace() by passing an out-of-range
argument. *Use `while (isspace( (unsigned char)str[i] ))'. *(This
is one of those occasions where a cast is almost always required
and almost always omitted, as opposed to the more usual situation
where a cast is almost always inserted and almost always wrong.)
* *{
* * * * * *i++;
* *}
* *if(i 0)
* *{
* * * * * *for(j = 0; i < strlen(str); j++, i++)

* * *Speed: This loop calculates strlen(str) on every iteration.
Since it will return the same value each time, all calls after the
first are wasted effort. *Call strlen() once before the loop and
store the result in a variable, and use that variable to control
the loop.
* * * *{
* * * * * * * * * *str[j] = str[i];
* * * * * *}
* * * * * *str[j] = '\0';
* *}
* *// Trim whitespace from end:
* *i = strlen(str) - 1;

* * *Bug: If `str' is the empty string (either because it was
empty to begin with or because it contained only white space),
this calculation will produce a very large `i' that is almost
assuredly wrong, R-O-N-G.
* *while(isspace(str[i]))

* * *Bug: Same missing cast as above.
* *{
* * * * * *i--;
* *}
* *if(i < (strlen(str) - 1))

* * *Bug: Same mishandling of the empty string as above.

* * *Speed: strlen(str) is still the same as it was a few lines
ago, so there's no point in computing it again.
* *{
* * * * * *str[i + 1] = '\0';

* * *Speed: It's probably quicker -- and certainly less verbose --
to do this assignment unconditionally than to test whether it's
needed. *If it isn't, you'll just store a zero on top of an
existing zero, which is harmless.
* *}
}

* * *Summary: Not quite ready for prime time, but not as bad as
some attempts I've seen.

* * *Challenge: See if you can think of a way to do the job in
just one pass over the string (calling strlen() counts as a
"pass"). *Hint: During the copy-to-front operation, can you
somehow figure out where the final '\0' should land without
going back and inspecting the moved characters a second time?

--
Eric.Sos...@sun.com
Thanks for the input. Someone else had pointed out one of the issues
you mentioned but I didn't understsnd what he was saying until I read
your description.
Jul 10 '08 #7
Here is my second go at it after reading the comments provided.

//This was in my original file but just not immediately before this
function
#include <ctype.h>
#include <string.h>

//Added char* return as suggested
char* TrimCString(char *str)
{
// Trim whitespace from beginning:
size_t i = 0;
size_t j;

//Added validation of the argument for NULL
if(str == NULL)
{
return str;
}

//Added cast as suggested. I have always avoided casts for various
reasons
//discussed in the FAQ and elsewhere but it is good to know this
case
while(isspace((unsigned char)str[i]))
{
i++;
}
if(i 0)
{
//Calculate length once
size_t length = strlen(str);
//Decided to leave the bytewise copy I just think I can
understand it a little better
for(j = 0; i < length; j++, i++)
{
str[j] = str[i];
}
str[j] = '\0';
}

// Trim whitespace from end:
//Added check to catch the empty string
i = (strlen(str) ? (strlen(str) - 1) : 0);

while(isspace((unsigned char)str[i]))
{
i--;
}
//Removed check that was not needed
str[i + 1] = '\0';
}
Jul 10 '08 #8
sw***********@gmail.com wrote:
On Jul 10, 1:39 pm, badc0...@gmail.com wrote:
swengineer...@gmail.com wrote:
Just looking for a few eyes on this code other than my own.
* what if I pass a constant string literal to the function?
TrimCString(" 4 spaces at both ends ");
How can I account for this?
Well ... you can't!
But you can do (at least) one of two things:

a) add a comment forbidding the caller to pass a string literal.
The comment goes near the function in the file with the code,
it also goes in the header file and in the documentation
provided for TrimCString()

b) Rewrite TrimCString to write the trimmed string to a
different memory area (variable, array, whatever), a bit
like strcpy() works

char *TrimCString(char *dest, const char *source)
/* caller must ensure `dest` has enough space for
the trimmed string */
/* if source is NULL, trims `dest` in-place. `dest`
*MUST* be writable */
Jul 10 '08 #9
Eric Sosman <Er*********@sun.comwrites:
sw***********@gmail.com wrote:
>Just looking for a few eyes on this code other than my own.
<snip>
> size_t i = 0;

Style: Instead of initializing `i' at its declaration and then
relying on the initial value later on, consider initializing it
closer to its use. The `for' statement is convenient for such
purposes.
If you are suggesting replacing the while (isspace(str[i]) with a
for (int i = 0; isspace((unsigned char)str[i]; i++); then this will not
allow i to tested outside the for:
> while(isspace(str[i]))
{
i++;
}
if(i 0)
here.

--
Ben.
Jul 10 '08 #10
sw***********@gmail.com wrote:
Here is my second go at it after reading the comments provided.
//Added char* return as suggested
char* TrimCString(char *str)
{
[...]
//Calculate length once
size_t length = strlen(str);
//Decided to leave the bytewise copy [...]
for(j = 0; i < length; j++, i++)
{
str[j] = str[i];
}
str[j] = '\0';
}

// Trim whitespace from end:
//Added check to catch the empty string
i = (strlen(str) ? (strlen(str) - 1) : 0);
Here, strlen(str) is the same as j.
You can replace the call to strlen() with j :)

i = j ? j - 1 : 0;
while(isspace((unsigned char)str[i]))
{
i--;
}
//Removed check that was not needed
str[i + 1] = '\0';
And, since you changed the function to return a char *,
do return a char * to the caller.

return str;
}
Jul 10 '08 #11
On 2008-07-10, sw***********@gmail.com <sw***********@gmail.comwrote:
Here is my second go at it after reading the comments provided.
<snip>
// Trim whitespace from end:
//Added check to catch the empty string
i = (strlen(str) ? (strlen(str) - 1) : 0);

while(isspace((unsigned char)str[i]))
{
i--;
}
//Removed check that was not needed
str[i + 1] = '\0';
}
2 things I found:

1. You are missing a return statement at the end.

2. This will not be correct if someone calls your function with an array
of length one (with its only element set to '\0'). The index in your
assignment at the last line will then be 1.

--
Michael Brennan

Jul 10 '08 #12
On Jul 10, 2:59*pm, badc0...@gmail.com wrote:
swengineer...@gmail.com wrote:
Here is my second go at it after reading the comments provided.
//Added char* return as suggested
char* TrimCString(char *str)
{

[...]
* * * * //Calculate length once
* * * * size_t length = strlen(str);
* * * * //Decided to leave the bytewise copy [...]
* * * * for(j = 0; i < length; j++, i++)
* * * * {
* * * * * * str[j] = str[i];
* * * * }
* * * * str[j] = '\0';
* * }
* * // Trim whitespace from end:
* * //Added check to catch the empty string
* * i = (strlen(str) ? (strlen(str) - 1) : 0);

Here, strlen(str) is the same as j.
You can replace the call to strlen() with j :)
This is only true if there was whitespace to trim from the front and
not in the more general case.
Jul 10 '08 #13
sw***********@gmail.com wrote:
Here is my second go at it after reading the comments provided.

//This was in my original file but just not immediately before this
function
#include <ctype.h>
#include <string.h>

//Added char* return as suggested
char* TrimCString(char *str)
{
// Trim whitespace from beginning:
size_t i = 0;
size_t j;

//Added validation of the argument for NULL
if(str == NULL)
{
return str;
}

//Added cast as suggested. I have always avoided casts for various
reasons
//discussed in the FAQ and elsewhere but it is good to know this
case
while(isspace((unsigned char)str[i]))
{
i++;
}
if(i 0)
{
//Calculate length once
size_t length = strlen(str);
//Decided to leave the bytewise copy I just think I can
understand it a little better
for(j = 0; i < length; j++, i++)
{
str[j] = str[i];
}
str[j] = '\0';
}
Why move the characters at all? The original version needed
to because it returned no value, so the only way it could report
the position of the first non-space was to squeeze out the leading
spaces. But now that you're returning a pointer, why not just
point at the first non-space no matter where you find it?
// Trim whitespace from end:
//Added check to catch the empty string
i = (strlen(str) ? (strlen(str) - 1) : 0);

while(isspace((unsigned char)str[i]))
{
i--;
}
This still mis-handles the empty string. Work through it:
You'll set `i' to zero, and then you'll test whether `str[0]'
is a space. It's the '\0', so it's not a space, so you decrement
`i' from zero to a very large number. Then you check whether
`str[very_large_number]' is a space, and there's no telling
what will happen except that it's not likely to be good ...
//Removed check that was not needed
str[i + 1] = '\0';
}
You still haven't picked up on the hints about an easier way
to deal with the trailing spaces. Let's try another analogy:
Imagine you're watching a football ("soccer") game that has
reached the shoot-out tie-breaking phase. Abelard shoots and
the goalkeeper makes the save (space). Then Bertrand shoots
and scores (non-space). Then Claude shoots, and Daniel, and
Edouard, ... When the match is over, you know you will be asked
which shooter scored the side's final goal (which may or may not
have been followed by a few saves). Unfortunately, your memory
is terrible -- but you have an erasable slate with enough space
to write one name at a time. Can you think of a method that will
ensure you can answer the question correctly?

--
Er*********@sun.com
Jul 10 '08 #14
sw***********@gmail.com wrote:
On Jul 10, 2:59 pm, badc0...@gmail.com wrote:
swengineer...@gmail.com wrote:
Here is my second go at it after reading the comments provided.
//Added char* return as suggested
char* TrimCString(char *str)
{
[...]
missing in my previous post
if (i 0)
{
//Calculate length once
size_t length = strlen(str);
//Decided to leave the bytewise copy [...]
for(j = 0; i < length; j++, i++)
{
str[j] = str[i];
}
str[j] = '\0';
}
// Trim whitespace from end:
//Added check to catch the empty string
i = (strlen(str) ? (strlen(str) - 1) : 0);
Here, strlen(str) is the same as j.
You can replace the call to strlen() with j :)
This is only true if there was whitespace to trim from the front and
not in the more general case.
Oops ... you're right, thanks for pointing it out.
But I still think you could do without some of those strlen() calls.
Jul 10 '08 #15
Ben Bacarisse wrote:
) If you are suggesting replacing the while (isspace(str[i]) with a
) for (int i = 0; isspace((unsigned char)str[i]; i++); then this will not
) allow i to tested outside the for:

I assume he meant C89 style:
for (i = 0; isspace((unsigned char)str[i]; i++);
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Jul 10 '08 #16
Eric Sosman wrote:
) Why move the characters at all? The original version needed
) to because it returned no value, so the only way it could report
) the position of the first non-space was to squeeze out the leading
) spaces. But now that you're returning a pointer, why not just
) point at the first non-space no matter where you find it?

Because then the subsequent call to free() would bomb.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Jul 10 '08 #17
I sort of took a different approach based on the comments here.

#include <ctype.h>
#include <string.h>

char* TrimCString(char *str)
{
// Trim whitespace from beginning:
size_t i = 0;
size_t length = strlen(str);
char* new_start;

if((str == NULL) || (length == 0))
{
return str;
}

while(isspace((unsigned char)str[i]))
{
i++;
}

new_start = str + i;
length -= i;

// Trim whitespace from end:
i = (length ? (length - 1) : length);

if(i != 0)
{
while(isspace((unsigned char)str[i]))
{
i--;
length--;
}
str[i + 1] = '\0';
}
memmove(str, new_start, (length + 1));
return str;
}
Jul 10 '08 #18
Ben Bacarisse wrote:
Eric Sosman <Er*********@sun.comwrites:
>sw***********@gmail.com wrote:
>>Just looking for a few eyes on this code other than my own.
<snip>
>> size_t i = 0;
Style: Instead of initializing `i' at its declaration and then
relying on the initial value later on, consider initializing it
closer to its use. The `for' statement is convenient for such
purposes.

If you are suggesting replacing the while (isspace(str[i]) with a
for (int i = 0; isspace((unsigned char)str[i]; i++); then this will not
allow i to tested outside the for:
>> while(isspace(str[i]))
{
i++;
}
if(i 0)

here.
Sorry if I wasn't clear: I wanted to suggest moving the
initialization of `i', not its declaration. And the reason
for moving it is that I just hate reading code that goes

int i = 0;
/*
* forty-two lines not mentioning i
*/
while (foo[i++] != bar) ...

In the O.P.'s case the "forty-two" was only two, but fostering
good habits is a Good Thing. Besides, I called it a "style"
point and suggested he "consider" the change; I'm trying to
lead the horse to Kool-Aid[*] without insisting he drink it.
[*] A misattribution, BTW.

--
Er*********@sun.com
Jul 10 '08 #19
Willem wrote:
Eric Sosman wrote:
) Why move the characters at all? The original version needed
) to because it returned no value, so the only way it could report
) the position of the first non-space was to squeeze out the leading
) spaces. But now that you're returning a pointer, why not just
) point at the first non-space no matter where you find it?

Because then the subsequent call to free() would bomb.
Well, the whole thing suffers from the lack of a clear
usage description, as another poster pointed out. But why
should anyone expect the pointer to be free-able? Do you
expect to pass the result of strchr() to free? Or the
end-pointer produced by strtod()?

Unless it's part of the function's contract that it will
return its first argument, there's no reason to want to call
free() on it. And since the O.P. is willing to consider changes
to the interface and there's no contract in evidence, it seems
unwarranted to assume anything special about the returned value.
The thing is still being designed; we can still change the rules.

--
Er*********@sun.com
Jul 10 '08 #20
sw***********@gmail.com wrote:
I sort of took a different approach based on the comments here.
It probably isn't relevant, but
I profiled 1,000,000 calls of your first and last versions:

index % time self children called name
[2] 94.2 6.26 0.00 1000000 TrimCString1 [2]
[3] 4.7 0.31 0.00 1000000 TrimCString2 [3]

Increase in speed was ( 6.26 / 0.31 = 20+ ) twentyfold :)
Jul 10 '08 #21
sw***********@gmail.com wrote:
I sort of took a different approach based on the comments here.

#include <ctype.h>
#include <string.h>

char* TrimCString(char *str)
{
// Trim whitespace from beginning:
size_t i = 0;
size_t length = strlen(str);
char* new_start;

if((str == NULL) || (length == 0))
{
return str;
}

while(isspace((unsigned char)str[i]))
{
i++;
}

new_start = str + i;
length -= i;

// Trim whitespace from end:
i = (length ? (length - 1) : length);

if(i != 0)
{
while(isspace((unsigned char)str[i]))
{
i--;
length--;
}
str[i + 1] = '\0';
}
memmove(str, new_start, (length + 1));
return str;
}
You're still working far too hard, IMHO. Ponder this one:

#include <ctype.h>

/* Remove trailing white space from a string, and return
* a pointer to the string's first non-space character
* (which might be its terminating '\0').
*/
char *TrimString(char *str)
{
char *ptr;
char *end;

/* skip leading spaces */
while (isspace( (unsigned char)*str ))
++str;

/* find start of trailing spaces (if any) */
end = str;
for (ptr = str; *ptr != '\0'; ++ptr) {
if (! isspace( (unsigned char)*ptr ))
end = ptr + 1;
}

/* clip off trailing spaces (if any) */
*end = '\0';

return str;
}

Or you might prefer to write the second loop this way:

/* find start of trailing spaces (if any) */
end = ptr = str;
while (*ptr != '\0') {
if (! isspace( (unsigned char)*ptr++ ))
end = ptr;
}

Use whichever seems clearer to you; I think it's a toss-up.

--
Er*********@sun.com
Jul 10 '08 #22
Eric Sosman wrote:
sw***********@gmail.com wrote:
>Here is my second go at it after reading the comments provided.
[...]
// Trim whitespace from end:
//Added check to catch the empty string
i = (strlen(str) ? (strlen(str) - 1) : 0);

while(isspace((unsigned char)str[i]))
{
i--;
}

This still mis-handles the empty string. [...]
Oh, botheration. Ignore this remark; I'm completely wrong.
Sorry.

--
Er*********@sun.com
Jul 10 '08 #23
Eric Sosman wrote, On 10/07/08 18:41:
sw***********@gmail.com wrote:
>Just looking for a few eyes on this code other than my own.

This pair of eyes sees three bugs, two occurring twice each
and the other perhaps due to too much snippage. There are also
some opportunities to improve speed and style. So, here we go:

Bug: Since you're using isspace() and strlen(), you need to
#include <ctype.hand <string.hto get their declarations.
Without the declarations, a compiler operating under C99 rules
must generate a diagnostic. Under C89 rules the code will work,
but might not be as fast as if the vendor's "magic" declarations
were present.
Actually, as the functions do not return an int (which under C89 rules
the compiler is required to assume they return) it invokes undefined
behaviour. So although I am not aware of any implementations where it
would fail it still *could* fail.
--
Flash Gordon
Jul 10 '08 #24

Jens Thoms Toerring <jt@toerring.dewrote in message
news:6d************@mid.uni-berlin.de...
sw***********@gmail.com <sw***********@gmail.comwrote:
Just looking for a few eyes on this code other than my own.
for(j = 0; i < strlen(str); j++, i++)
{
str[j] = str[i];
}
An alternative would be to use memmove() here, so you don't
have to do it byte by byte.
What would be the actual metrics to make a decision here
one way or the other? I myself just assume for this kind of stuff
that it will only be done on relatively small strings, just a few
characters, so it seems that a call to memmove() might be overkill.
Or is it? DOES it depend on the size of the block you'll
be moving? How many cycles does it take to assign one
character to a previous array position, versus the overhead of
a call to memmove(), and the operation of the function itself?
Also callling strlen() each time
through the loop is a bit of a waste of time - it doesn't
change and can be replaced by a check if str[i] is '\0'.
Or just do a very simple while() loop that compares pointers:

char *start=str;
char *end=str+strlen(str);

while(start!=end) {
/* do something */
/* start++ for going forward from start */
/* end-- for going backwards from end */
}

Can't go wrong there...I will admit that I use to always use array
sub-scripting such as the OP because I found it easier conceptually
to understand, but then abandoned it largely for the equivalent
pointers as part of my paranoia about compiler "optimizations"
(I turn them ALL off, which means sub-scripting is not converted
to pointers, which I believe takes longer, so to retain the speed
I've manually converted most of the stuff like this).

---
William Ernest Reid

Jul 11 '08 #25
On Thu, 10 Jul 2008 10:56:18 -0700 (PDT), "sw***********@gmail.com"
<sw***********@gmail.comwrote in comp.lang.c:
On Jul 10, 1:32*pm, j...@toerring.de (Jens Thoms Toerring) wrote:
swengineer...@gmail.com <swengineer...@gmail.comwrote:
[snip]
* * * * size_t i = 0;
[snip]
* * * * // Trim whitespace from end:
* * * * i = strlen(str) - 1;
Careful: This could set 'i' to -1 (if the string consistet of white
space only) and then the rest won't work anymore.
i is of type size_t which I believe can't be negative?
You are correct that size_t is an unsigned integer type, and can never
be negative. But that doesn't mean that underflowing it won't cause a
very serious problem. Consider:

#include <stdio.h>

int main(void)
{
size_t i = 0;
--i;
printf("%lu\n", (unsigned long)i);
return 0;
}

Compile and execute this and see what you get.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
Jul 11 '08 #26
another version, i don't know if it's correct, but work fine here

#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>

char *stringtrim(char *str)
{
char *res, *ini, *end, *ptr;
int size;

/* safe mode to NULL string */
if (!str)
return NULL;

size = strlen(str);

if (!size)
return NULL;

ini = str;

/* go to the first non blank character */
while (isspace((unsigned char)*ini)) {
++ini;
--size;
}

if (size <= 0)
return NULL;

end = str + strlen(str) - 1; /* point to the last character */
/* go to the last non blank character */
while (isspace((unsigned char)*end)) {
--end;
--size;
}

if (size <= 0)
return NULL;

res = malloc(sizeof(char) * size);
for (ptr = res; ini != (end+1); ini++)
*ptr++ = *ini;
*ptr = '\0';

return res;
}

int main(void)
{
printf("|%s|\n", stringtrim(" hello, world"));
printf("|%s|\n", stringtrim(" 1234 5678 90 "));
char teste[] = " good bye, cruel world ";
printf("|%s|\n", stringtrim(teste));
return 0;
}
Jul 11 '08 #27
Bill Reid wrote:
) (I turn them ALL off, which means sub-scripting is not converted
) to pointers, which I believe takes longer, so to retain the speed
) I've manually converted most of the stuff like this).

As far as I know, there is no speed difference between sub-scripting
and pointers on most modern CPUs.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Jul 11 '08 #28
On Jul 10, 5:01*pm, Eric Sosman <Eric.Sos...@sun.comwrote:
swengineer...@gmail.com wrote:
I sort of took a different approach based on the comments here.
#include <ctype.h>
#include <string.h>
char* TrimCString(char *str)
{
* * // Trim whitespace from beginning:
* * size_t i = 0;
* * size_t length = strlen(str);
* * char* new_start;
* * if((str == NULL) || (length == 0))
* * {
* * * * return str;
* * }
* * while(isspace((unsigned char)str[i]))
* * {
* * * * i++;
* * }
* * new_start = str + i;
* * length -= i;
* * // Trim whitespace from end:
* * i = (length ? (length - 1) : length);
* * if(i != 0)
* * {
* * * * while(isspace((unsigned char)str[i]))
* * * * {
* * * * * * i--;
* * * * * * length--;
* * * * }
* * * * str[i + 1] = '\0';
* * }
* * memmove(str, new_start, (length + 1));
* * return str;
}

* * *You're still working far too hard, IMHO. *Ponder this one:

* * * * #include <ctype.h>

* * * * /* Remove trailing white space from a string, and return
* * * * ** a pointer to the string's first non-space character
* * * * ** (which might be its terminating '\0').
* * * * **/
* * * * char *TrimString(char *str)
* * * * {
* * * * * * char *ptr;
* * * * * * char *end;

* * * * * * /* skip leading spaces */
* * * * * * while (isspace( (unsigned char)*str ))
* * * * * * * * ++str;

* * * * * * /* find start of trailing spaces (if any) */
* * * * * * end = str;
* * * * * * for (ptr = str; **ptr != '\0'; *++ptr) {
* * * * * * * * if (! isspace( (unsigned char)*ptr ))
* * * * * * * * * * end = ptr + 1;
* * * * * * }

* * * * * * /* clip off trailing spaces (if any) */
* * * * * * *end = '\0';

* * * * * * return str;
* * * * }

Or you might prefer to write the second loop this way:

* * * * /* find start of trailing spaces (if any) */
* * * * end = ptr = str;
* * * * while (*ptr != '\0') {
* * * * * * if (! isspace( (unsigned char)*ptr++ ))
* * * * * * * * end = ptr;
* * * * }

Use whichever seems clearer to you; I think it's a toss-up.

--
Eric.Sos...@sun.com- Hide quoted text -

- Show quoted text -
I think this is more of a stylistic difference than anything else. For
me I feel my version is more understandable. For you maybe not.

I am curious though, did I address your concerns about an empty string
sufficiently?

Thanks for the help.
Jul 11 '08 #29
On Jul 10, 4:48*pm, badc0...@gmail.com wrote:
swengineer...@gmail.com wrote:
I sort of took a different approach based on the comments here.

It probably isn't relevant, but
I profiled 1,000,000 calls of your first and last versions:

index % time * *self *children * *called * * name
[2] * * 94.2 * *6.26 * *0.00 1000000 * * * * TrimCString1 [2]
[3] * * *4.7 * *0.31 * *0.00 1000000 * * * * TrimCString2 [3]

Increase in speed was ( 6.26 / 0.31 = 20+ ) twentyfold :)
Interesting. I would not have guessed it would be that much of a
difference.

I did not introduce any bugs with this change that you see did I? In
my testing it still performs as I would expect.
Jul 11 '08 #30
voidpointer wrote:
another version, i don't know if it's correct, but work fine here

#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>

char *stringtrim(char *str)
{
char *res, *ini, *end, *ptr;
int size;

/* safe mode to NULL string */
if (!str)
return NULL;

size = strlen(str);

if (!size)
return NULL;
Why stringtrim("") returns NULL?
"" is a perfectly valid string.
ini = str;

/* go to the first non blank character */
while (isspace((unsigned char)*ini)) {
++ini;
--size;
}

if (size <= 0)
return NULL;
Why stringtrim(" ") returns NULL?
"" is a perfectly valid string.
end = str + strlen(str) - 1; /* point to the last character */
/* go to the last non blank character */
while (isspace((unsigned char)*end)) {
--end;
--size;
}

if (size <= 0)
return NULL;

res = malloc(sizeof(char) * size);
for (ptr = res; ini != (end+1); ini++)
*ptr++ = *ini;
*ptr = '\0';

return res;
}
There's a memory leak in your program.
You never free the memory you malloc'd for the result.

[snip]

Oh ... `stringtrim` is a reserved name
http://www.gnu.org/software/libtool/...ved-Names.html
Jul 11 '08 #31
On Jul 10, 5:01*pm, Eric Sosman <Eric.Sos...@sun.comwrote:
swengineer...@gmail.com wrote:
I sort of took a different approach based on the comments here.
#include <ctype.h>
#include <string.h>
char* TrimCString(char *str)
{
* * // Trim whitespace from beginning:
* * size_t i = 0;
* * size_t length = strlen(str);
* * char* new_start;
* * if((str == NULL) || (length == 0))
* * {
* * * * return str;
* * }
* * while(isspace((unsigned char)str[i]))
* * {
* * * * i++;
* * }
* * new_start = str + i;
* * length -= i;
* * // Trim whitespace from end:
* * i = (length ? (length - 1) : length);
* * if(i != 0)
* * {
* * * * while(isspace((unsigned char)str[i]))
* * * * {
* * * * * * i--;
* * * * * * length--;
* * * * }
* * * * str[i + 1] = '\0';
* * }
* * memmove(str, new_start, (length + 1));
* * return str;
}

* * *You're still working far too hard, IMHO. *Ponder this one:

* * * * #include <ctype.h>

* * * * /* Remove trailing white space from a string, and return
* * * * ** a pointer to the string's first non-space character
* * * * ** (which might be its terminating '\0').
* * * * **/
* * * * char *TrimString(char *str)
* * * * {
* * * * * * char *ptr;
* * * * * * char *end;

* * * * * * /* skip leading spaces */
* * * * * * while (isspace( (unsigned char)*str ))
* * * * * * * * ++str;

* * * * * * /* find start of trailing spaces (if any) */
* * * * * * end = str;
* * * * * * for (ptr = str; **ptr != '\0'; *++ptr) {
* * * * * * * * if (! isspace( (unsigned char)*ptr ))
* * * * * * * * * * end = ptr + 1;
* * * * * * }

* * * * * * /* clip off trailing spaces (if any) */
* * * * * * *end = '\0';

* * * * * * return str;
* * * * }

Or you might prefer to write the second loop this way:

* * * * /* find start of trailing spaces (if any) */
* * * * end = ptr = str;
* * * * while (*ptr != '\0') {
* * * * * * if (! isspace( (unsigned char)*ptr++ ))
* * * * * * * * end = ptr;
* * * * }

Use whichever seems clearer to you; I think it's a toss-up.

--
Eric.Sos...@sun.com- Hide quoted text -

- Show quoted text -
Actually, now that I look at this a little closer won't this truncate
a string with a space in the middle. That is not what I want. A space
in the middle is fine I just want to trim off the front and back.
Jul 11 '08 #32
"sw***********@gmail.com" <sw***********@gmail.comwrites:
On Jul 10, 5:01*pm, Eric Sosman <Eric.Sos...@sun.comwrote:
<snip>
>* * * * * * /* find start of trailing spaces (if any) */
* * * * * * end = str;
* * * * * * for (ptr = str; **ptr != '\0'; *++ptr) {
* * * * * * * * if (! isspace( (unsigned char)*ptr ))
* * * * * * * * * * end = ptr + 1;
* * * * * * }

* * * * * * /* clip off trailing spaces (if any) */
* * * * * * *end = '\0';

* * * * * * return str;
* * * * }

Or you might prefer to write the second loop this way:

* * * * /* find start of trailing spaces (if any) */
* * * * end = ptr = str;
* * * * while (*ptr != '\0') {
* * * * * * if (! isspace( (unsigned char)*ptr++ ))
* * * * * * * * end = ptr;
* * * * }

Use whichever seems clearer to you; I think it's a toss-up.

--
Eric.Sos...@sun.com- Hide quoted text -

- Show quoted text -

Actually, now that I look at this a little closer won't this truncate
a string with a space in the middle.
Not as far as I can see. In both cases end is set to point just after
something that is know not to be space. Because the loop runs to the
end of the string, end points just after the *last* thing know not to
be a space. This will be the null or the first trailing space.

--
Ben.
Jul 11 '08 #33
sw***********@gmail.com wrote:
I did not introduce any bugs with this change that you see did I? In
my testing it still performs as I would expect.
Hmmm ... if I pass NULL to your TrimCString it'll try to do strlen on
that.
I'm not sure you can do that.

: char* TrimCString(char *str)
: {
: // Trim whitespace from beginning:
: size_t i = 0;
: size_t length = strlen(str); /* <=== strlen(NULL)
is ok? */
: char* new_start;
:
: if((str == NULL) || (length == 0))
: {
: return str;
: }
: [rest of program]

Maybe try this instead

char* TrimCString(char *str)
{
// Trim whitespace from beginning:
size_t i = 0;
size_t length; /* not initialized */
char* new_start;

if(str == NULL) /* only test for NULL */
{
return str;
}
length = strlen(str);
if(length == 0) /* only test for
length == 0 */
{
return str;
}
[rest of program]
Jul 11 '08 #34
"Bill Reid" <ho********@happyhealthy.netwrites:
Jens Thoms Toerring <jt@toerring.dewrote in message
news:6d************@mid.uni-berlin.de...
[...]
>An alternative would be to use memmove() here, so you don't
have to do it byte by byte.

What would be the actual metrics to make a decision here
one way or the other? I myself just assume for this kind of stuff
that it will only be done on relatively small strings, just a few
characters, so it seems that a call to memmove() might be overkill.
Or is it? DOES it depend on the size of the block you'll
be moving? How many cycles does it take to assign one
character to a previous array position, versus the overhead of
a call to memmove(), and the operation of the function itself?
All this is system-specific.

[snip]
Can't go wrong there...I will admit that I use to always use array
sub-scripting such as the OP because I found it easier conceptually
to understand, but then abandoned it largely for the equivalent
pointers as part of my paranoia about compiler "optimizations"
(I turn them ALL off, which means sub-scripting is not converted
to pointers, which I believe takes longer, so to retain the speed
I've manually converted most of the stuff like this).
Subscripting might be faster than pointer manipulation on some
systems; pointer manipulation might be faster on others. A decent
optimizing compiler will know which is faster, and can often convert
either form into whichever is best for the target system.

Consider letting the compiler do its job.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 11 '08 #35
On Jul 11, 9:37*am, badc0...@gmail.com wrote:
swengineer...@gmail.com wrote:
I did not introduce any bugs with this change that you see did I? In
my testing it still performs as I would expect.

Hmmm ... if I pass NULL to your TrimCString it'll try to do strlen on
that.
I'm not sure you can do that.

: char* TrimCString(char *str)
: {
: * * // Trim whitespace from beginning:
: * * size_t i = 0;
: * * size_t length = strlen(str); * * * * * * * /* <=== strlen(NULL)
is ok? */
: * * char* new_start;
:
: * * if((str == NULL) || (length == 0))
: * * {
: * * * * return str;
: * * }
: * *[rest of program]

Maybe try this instead

char* TrimCString(char *str)
{
* * // Trim whitespace from beginning:
* * size_t i = 0;
* * size_t length; * * * * * * * * * * * * * /* not initialized */
* * char* new_start;

* * if(str == NULL) * * * * * * * * * * * * /* only test for NULL */
* * {
* * * * return str;
* * }
* * length = strlen(str);
* * if(length == 0) * * * * * * * * * * * * * * /* only test for
length == 0 */
* * {
* * * * return str;
* * }
* * [rest of program]
Thanks. That could have been a disaster.
Jul 11 '08 #36

Willem <wi****@stack.nlwrote in message
news:sl********************@snail.stack.nl...
Bill Reid wrote:
) (I turn them ALL off, which means sub-scripting is not converted
) to pointers, which I believe takes longer, so to retain the speed
) I've manually converted most of the stuff like this).

As far as I know, there is no speed difference between sub-scripting
and pointers on most modern CPUs.
DAMN!!! More time I've wasted because I JUST WON'T LISTEN!!!

In any event, does the following win the prize for most efficient
implementation of the presumed requirements of the function?

char *remove_beg_end_non_text(char *text) {
char *beg;
size_t length;

for(beg=text;*beg!='\0';beg++)
if(!isspace(*beg)) break;

length=strlen(beg);

while(length>0)
if(!isspace(*(beg+(--length)))) break;

*(beg+(++length))='\0';

return beg==text ? text : memmove(text,beg,length+1);
}

Gotta admit, you couldn't reduce the cycles too much on
that, could you? And it could even win a little bonus prize
for obfuscatory conditions like if(!isspace(*(beg+(--length))))...

---
William Ernest Reid

Jul 12 '08 #37

Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
"Bill Reid" <ho********@happyhealthy.netwrites:
Jens Thoms Toerring <jt@toerring.dewrote in message
news:6d************@mid.uni-berlin.de...
[...]
An alternative would be to use memmove() here, so you don't
have to do it byte by byte.
What would be the actual metrics to make a decision here
one way or the other? I myself just assume for this kind of stuff
that it will only be done on relatively small strings, just a few
characters, so it seems that a call to memmove() might be overkill.
Or is it? DOES it depend on the size of the block you'll
be moving? How many cycles does it take to assign one
character to a previous array position, versus the overhead of
a call to memmove(), and the operation of the function itself?

All this is system-specific.
This is not really responsive. I make a distinction between doing
an "delete" of about 20 characters from a text block of 300K and moving
10 characters one position forward. I find it difficult to believe that
some "specific" systems make no or counter-intuitive operational
distinctions between the two different methods of accomplishing the
operation.

But I'm "listening"; tell me how many cycles different "systems"
would consume to perform each method...or is memmove() kind
of like fread() and fwrite(), you THINK that they're doing something
"special", but really they're using fgetc() and fputc() which in most
cases you probably could have just used yourself and eliminated
the "middleman"...
[snip]
Can't go wrong there...I will admit that I use to always use array
sub-scripting such as the OP because I found it easier conceptually
to understand, but then abandoned it largely for the equivalent
pointers as part of my paranoia about compiler "optimizations"
(I turn them ALL off, which means sub-scripting is not converted
to pointers, which I believe takes longer, so to retain the speed
I've manually converted most of the stuff like this).

Subscripting might be faster than pointer manipulation on some
systems; pointer manipulation might be faster on others. A decent
optimizing compiler will know which is faster, and can often convert
either form into whichever is best for the target system.

Consider letting the compiler do its job.
The compiler has "told" me which it considers faster, since it is
documented that the default optimization is to replace sub-scripts
with pointers. Therefore I cannot really be preventing the compiler
from doing its "job" (your opinion) by pre-emptively writing my
code to do the same thing as the "optimizer", at least for this
particular "optimization".

Riiiiiiiiiiiiiiight? Now as far as some other optimizations are
concerned, I can't re-write my code to emulate those, but THOSE
tend to be the ones I am really concerned about in the first place,
with GOOD reason...

---
William Ernest Reid

Jul 12 '08 #38
"Bill Reid" <ho********@happyhealthy.netwrites:
<snip>
In any event, does the following win the prize for most efficient
implementation of the presumed requirements of the function?
I think you have a bug.
char *remove_beg_end_non_text(char *text) {
char *beg;
size_t length;

for(beg=text;*beg!='\0';beg++)
if(!isspace(*beg)) break;
I'd guess

for (beg = text; isspace((unsigned char)*beg); beg++);

makes shorter code with some compilers.
length=strlen(beg);

while(length>0)
if(!isspace(*(beg+(--length)))) break;

*(beg+(++length))='\0';
If length never is never decremented (because it was zero after the
initial space scan) then this writes outside the string. It always
helps to walk through what your code does in boundary cases like ""
and " ".
return beg==text ? text : memmove(text,beg,length+1);
}

Gotta admit, you couldn't reduce the cycles too much on
that, could you? And it could even win a little bonus prize
for obfuscatory conditions like if(!isspace(*(beg+(--length))))...
You might have obscure it even to yourself!

--
Ben.
Jul 12 '08 #39
Bill Reid wrote:
Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
>"Bill Reid" <ho********@happyhealthy.netwrites:
>>Jens Thoms Toerring <jt@toerring.dewrote in message
news:6d************@mid.uni-berlin.de...
[...]
>>>An alternative would be to use memmove() here, so you don't
have to do it byte by byte.
What would be the actual metrics to make a decision here
one way or the other? I myself just assume for this kind of stuff
that it will only be done on relatively small strings, just a few
characters, so it seems that a call to memmove() might be overkill.
Or is it? DOES it depend on the size of the block you'll
be moving? How many cycles does it take to assign one
character to a previous array position, versus the overhead of
a call to memmove(), and the operation of the function itself?
All this is system-specific.

This is not really responsive. ....I find it difficult to believe that
some "specific" systems make no or counter-intuitive operational
distinctions between the two different methods of accomplishing the
operation.
You misunderstand. Keith's point is that how this is done is not defined
by hte C language, it is specific to individual implementations. Your
system may do it differently to mine.

So to get a good answer, you need to ask in a system-specific news group.
But I'm "listening"; tell me how many cycles different "systems"
would consume to perform each method...
Between 0 and a billion, depending on how your particular system
implements the function.....

--
Mark McIntyre

CLC FAQ <http://c-faq.com/>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>
Jul 12 '08 #40
Bill Reid wrote:
Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
>"Bill Reid" <ho********@happyhealthy.netwrites:
>>Jens Thoms Toerring <jt@toerring.dewrote in message
news:6d************@mid.uni-berlin.de...
[...]
>>>An alternative would be to use memmove() here, so you don't
have to do it byte by byte.
What would be the actual metrics to make a decision here
one way or the other? I myself just assume for this kind of stuff
that it will only be done on relatively small strings, just a few
characters, so it seems that a call to memmove() might be overkill.
Or is it? DOES it depend on the size of the block you'll
be moving? How many cycles does it take to assign one
character to a previous array position, versus the overhead of
a call to memmove(), and the operation of the function itself?
All this is system-specific.

This is not really responsive.
Valid complaint: Although most of your paragraph was about
system-specific nonsense, the actual question "What would be
the actual metrics?" can be addressed at least in part without
system specificity. Here are some "actual metrics" for your
consideration; others may occur to your thought:

- Brevity
- Clarity
- Maintainability
- Program size
- Execution time
- Price of tea in China
But I'm "listening"; tell me how many cycles different "systems"
would consume to perform each method...or is memmove() kind
of like fread() and fwrite(), you THINK that they're doing something
"special", but really they're using fgetc() and fputc() which in most
cases you probably could have just used yourself and eliminated
the "middleman"...
All this is system-specific.

--
Eric Sosman
es*****@ieee-dot-org.invalid
Jul 12 '08 #41

Ben Bacarisse <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
"Bill Reid" <ho********@happyhealthy.netwrites:
<snip>
In any event, does the following win the prize for most efficient
implementation of the presumed requirements of the function?

I think you have a bug.
char *remove_beg_end_non_text(char *text) {
char *beg;
size_t length;

for(beg=text;*beg!='\0';beg++)
if(!isspace(*beg)) break;

I'd guess

for (beg = text; isspace((unsigned char)*beg); beg++);

makes shorter code with some compilers.
Yeah, but does it fly past the terminating null character? I think
you may have out-obfuscated me...

Also, I've not been following why the cast is important here...
length=strlen(beg);

while(length>0)
if(!isspace(*(beg+(--length)))) break;

*(beg+(++length))='\0';

If length never is never decremented (because it was zero after the
initial space scan) then this writes outside the string. It always
helps to walk through what your code does in boundary cases like ""
and " ".
Actually, the "boundary case" for this is the size of array; as long
as it is at least one character bigger than the "string" then this will
always work. But you are correct if you replace the word "string" with
the word "array fully-populated by the string" above.

There are actually only five possible test cases, which all should
ASSUME a "array fully-populated by the string" (which I unfortunately
didn't): spaces before text, spaces after the text, spaces before and
after the text, just spaces, empty string (six if you count the stupidity
of passing NULL). Pass those five (or six) cases, the thing is
perfect...
return beg==text ? text : memmove(text,beg,length+1);
}

Gotta admit, you couldn't reduce the cycles too much on
that, could you? And it could even win a little bonus prize
for obfuscatory conditions like if(!isspace(*(beg+(--length))))...

You might have obscure it even to yourself!
OK, I need to work on a few things to make it the perfect piece
of obscure ruthlessly efficient code...

---
William Ernest Reid

Jul 12 '08 #42

Mark McIntyre <ma**********@TROUSERSspamcop.netwrote in message
news:GK*********************@en-nntp-06.dc1.easynews.com...
Bill Reid wrote:
Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
"Bill Reid" <ho********@happyhealthy.netwrites:
Jens Thoms Toerring <jt@toerring.dewrote in message
news:6d************@mid.uni-berlin.de...
[...]
An alternative would be to use memmove() here, so you don't
have to do it byte by byte.
>What would be the actual metrics to make a decision here
one way or the other? I myself just assume for this kind of stuff
that it will only be done on relatively small strings, just a few
characters, so it seems that a call to memmove() might be overkill.
Or is it? DOES it depend on the size of the block you'll
be moving? How many cycles does it take to assign one
character to a previous array position, versus the overhead of
a call to memmove(), and the operation of the function itself?
All this is system-specific.
This is not really responsive. ....I find it difficult to believe that
some "specific" systems make no or counter-intuitive operational
distinctions between the two different methods of accomplishing the
operation.

You misunderstand. Keith's point is that how this is done is not defined
by hte C language, it is specific to individual implementations. Your
system may do it differently to mine.
You misunderstand the English language, since I specifically used
the word "system" above.
So to get a good answer, you need to ask in a system-specific news group.
OK, I'll take that as a "I have no idea"...
But I'm "listening"; tell me how many cycles different "systems"
would consume to perform each method...

Between 0 and a billion, depending on how your particular system
implements the function.....
OK, I'll take that as a "I REALLY have no idea", since all you had
to do was pick ANY system (or two or three) that you were familiar with
to answer the question...

---
William Ernest Reid

Jul 12 '08 #43

Eric Sosman <es*****@ieee-dot-org.invalidwrote in message
news:17******************************@comcast.com. ..
Bill Reid wrote:
Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
"Bill Reid" <ho********@happyhealthy.netwrites:
Jens Thoms Toerring <jt@toerring.dewrote in message
news:6d************@mid.uni-berlin.de...
[...]
An alternative would be to use memmove() here, so you don't
have to do it byte by byte.
>What would be the actual metrics to make a decision here
one way or the other? I myself just assume for this kind of stuff
that it will only be done on relatively small strings, just a few
characters, so it seems that a call to memmove() might be overkill.
Or is it? DOES it depend on the size of the block you'll
be moving? How many cycles does it take to assign one
character to a previous array position, versus the overhead of
a call to memmove(), and the operation of the function itself?
All this is system-specific.
This is not really responsive.

Valid complaint: Although most of your paragraph was about
system-specific nonsense, the actual question "What would be
the actual metrics?" can be addressed at least in part without
system specificity. Here are some "actual metrics" for your
consideration; others may occur to your thought:

- Brevity
- Clarity
- Maintainability
- Program size
- Execution time
- Price of tea in China
I am left wondering how you manage to cope with everyday life
with such poor reading comprehension. I specifically asked about
"cycles", and you respond with "price of tea"...
But I'm "listening"; tell me how many cycles different "systems"
would consume to perform each method...or is memmove() kind
of like fread() and fwrite(), you THINK that they're doing something
"special", but really they're using fgetc() and fputc() which in most
cases you probably could have just used yourself and eliminated
the "middleman"...

All this is system-specific.
I was looking for the "Answers Department", but accidentally walked
into the "Department of Begging the Question" (part of the "Clueless
Pedants Department", sharing a common desk with the "Pointless
Arguments Department")...

In any event, I guess "Jens Thoms Toerring" was totally off-base in
suggesting memmove() over re-writing the string in place, since he
clearly didn't have access to the non-information about the specific
"system"...

---
William Ernest Reid

Jul 12 '08 #44
Bill Reid <ho********@happyhealthy.netwrote:
>
This is not really responsive. I make a distinction between doing
an "delete" of about 20 characters from a text block of 300K and moving
10 characters one position forward. I find it difficult to believe that
some "specific" systems make no or counter-intuitive operational
distinctions between the two different methods of accomplishing the
operation.
Then you need to practice believing more things. :-)

On some systems, a call to memmove() will generate an actual call to a
library function that moves one byte at a time just like your loop, and
thus will always be slower.

On other systems, a call to memmove() will generate an actual call to a
library function that uses a few (maybe even just one) machine
instructions to move the bytes much faster than your byte at a time
loop. Which is faster depends on the number of bytes moved, how much
overhead there is in a function call, and how much faster the machine
instructions are than your loop. It is likely that your loop will be
faster for "short" strings and slower for "long" strings, but exactly
where the break-even point is will vary from system to system. It might
be just a few bytes, it might be tens of bytes, it might even be
hundreds of bytes (although probably not).

On still other systems, a call to memmove() will generate those few (or
maybe even just one) machine instructions inline, which will almost
certainly be faster than your byte at a time loop all the time.
or is memmove() kind
of like fread() and fwrite(), you THINK that they're doing something
"special", but really they're using fgetc() and fputc() which in most
cases you probably could have just used yourself and eliminated
the "middleman"...
Although the C standard describes fread()/fwrite() as working as if they
called fgetc()/fputc() for each byte, I've never seen an implementation
where they actually do that; real implementations nearly always *do* do
something special instead. On the other hand, I've seen actual
implementations use all three of the above methods of implementing
memmove().
--
Larry Jones

All this was funny until she did the same thing to me. -- Calvin
Jul 12 '08 #45
On Jul 11, 5:55 am, Jack Klein <jackkl...@spamcop.netwrote:
On Thu, 10 Jul 2008 10:56:18 -0700 (PDT), "swengineer...@gmail.com"
<snip i = strlen(s)-1>
i is of type size_t which I believe can't be negative?

You are correct that size_t is an unsigned integer type, and can never
be negative. But that doesn't mean that underflowing it won't cause a
very serious problem. Consider:
<snip (unsigned long)(size_t)-1>
>
Compile and execute this and see what you get.
I don't see underflowing occuring anywhere in that code. Unsigned
integers can not underflow or overflow.
I do see your point, but your terminology was not correct.
Only signed integers can overflow, and when that happends the behavior
is not defined.
Jul 12 '08 #46
"Bill Reid" <ho********@happyhealthy.netwrites:
Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
>"Bill Reid" <ho********@happyhealthy.netwrites:
Jens Thoms Toerring <jt@toerring.dewrote in message
news:6d************@mid.uni-berlin.de...
[...]
>An alternative would be to use memmove() here, so you don't
have to do it byte by byte.

What would be the actual metrics to make a decision here
one way or the other? I myself just assume for this kind of stuff
that it will only be done on relatively small strings, just a few
characters, so it seems that a call to memmove() might be overkill.
Or is it? DOES it depend on the size of the block you'll
be moving? How many cycles does it take to assign one
character to a previous array position, versus the overhead of
a call to memmove(), and the operation of the function itself?

All this is system-specific.

This is not really responsive.
Ok, here's a more responsive answer.

I don't know.

Specifically, I don't know how many cycles any given operation will
take on whatever system you're using, partly because I don't know what
system you're using. Also, I don't know how many cycles it would take
on the system(s) *I'm* using, since I've never bothered to measure it.

I would expect that a call to memmove() would be faster than an
equivalent explicit loop on some systems, and vice versa on others.
memmove() has some overhead to determine whether the operands overlap,
but if the compiler can determine that they don't, or if it knows
which way they overlap and knows how memcpy() behaves, it might be
able to avoid that overhead. memmove() or memcpy() likely has some
overhead due to the function call, but an optimizing compiler might
replace either with inline code in some cases. An implementation of
memmove() or memcpy() might take advantage of copying chunks larger
than 1 byte; whether it can do this might depend on the alignment of
the operands. And so forth.

Perfect knowledge of the C language (something I don't claim to have)
would not enable someone to answer your question; that should be a
clue that this is not the place to ask it. Consider measuring it
yourself, on your own system with your own code.

[...]
But I'm "listening";
I won't speculate on what the quotation marks are supposed to mean.
tell me how many cycles different "systems"
would consume to perform each method...
See above.
or is memmove() kind
of like fread() and fwrite(), you THINK that they're doing something
"special", but really they're using fgetc() and fputc() which in most
cases you probably could have just used yourself and eliminated
the "middleman"...
The behavior of fread() and fwrite() is defined in terms of fgetc()
and fputc(), but there's no requirement that they actually be
implemented that way. Conversely, fgetc() and fputc() typically use
buffering, which means that 1000 calls to fputc() won't necessarily be
faster or slower than an equivalent single call to fwrite(). Use
whatever is clearer.

[...]
>Consider letting the compiler do its job.

The compiler has "told" me which it considers faster, since it is
documented that the default optimization is to replace sub-scripts
with pointers. Therefore I cannot really be preventing the compiler
from doing its "job" (your opinion) by pre-emptively writing my
code to do the same thing as the "optimizer", at least for this
particular "optimization".
Perhaps for this particular compiler. Other compilers for other
systems might behave differently.
Riiiiiiiiiiiiiiight? Now as far as some other optimizations are
concerned, I can't re-write my code to emulate those, but THOSE
tend to be the ones I am really concerned about in the first place,
with GOOD reason...
--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 12 '08 #47
"Bill Reid" <ho********@happyhealthy.netwrites:
Ben Bacarisse <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>"Bill Reid" <ho********@happyhealthy.netwrites:
><snip>
In any event, does the following win the prize for most efficient
implementation of the presumed requirements of the function?

I think you have a bug.
char *remove_beg_end_non_text(char *text) {
char *beg;
size_t length;

for(beg=text;*beg!='\0';beg++)
if(!isspace(*beg)) break;

I'd guess

for (beg = text; isspace((unsigned char)*beg); beg++);

makes shorter code with some compilers.

Yeah, but does it fly past the terminating null character?
Nope. Why would you think that?
I think
you may have out-obfuscated me...
I am happy to oblige. Those were your rules: you wanted minimal code;
your own example suggests clarity does not matter (although I think my
loop is clearer, but that is simply option).
Also, I've not been following why the cast is important here...
length=strlen(beg);

while(length>0)
if(!isspace(*(beg+(--length)))) break;

*(beg+(++length))='\0';

If length never is never decremented (because it was zero after the
initial space scan) then this writes outside the string. It always
helps to walk through what your code does in boundary cases like ""
and " ".

Actually, the "boundary case" for this is the size of array; as long
as it is at least one character bigger than the "string" then this will
always work. But you are correct if you replace the word "string" with
the word "array fully-populated by the string" above.

There are actually only five possible test cases, which all should
ASSUME a "array fully-populated by the string" (which I unfortunately
didn't): spaces before text, spaces after the text, spaces before and
after the text, just spaces, empty string (six if you count the stupidity
of passing NULL). Pass those five (or six) cases, the thing is
perfect...
I am getting lost in all the words. I think your code is wrong (in
two of the five cases you are considering). Are you saying it is
correct? Like many "off by one" bugs it may not produce undefined
behaviour. For example passing char test[2] = ""; is fine because of
the extra byte, but it is still a bug.
return beg==text ? text : memmove(text,beg,length+1);
}

Gotta admit, you couldn't reduce the cycles too much on
that, could you? And it could even win a little bonus prize
for obfuscatory conditions like if(!isspace(*(beg+(--length))))...

You might have obscure it even to yourself!

OK, I need to work on a few things to make it the perfect piece
of obscure ruthlessly efficient code...
I'd want to be sure it is correct first. I would be the first admit I
make mistakes but you have not persuaded me that I am wrong about
this. I tried to be as clear about the bug as possible.

--
Ben.
Jul 12 '08 #48
Bill Reid wrote:
I was looking for the "Answers Department",
Answers for what sort of questions? The price of tea in china? how to
iron shirts? Did you try yahoo answers?

but accidentally walked
into the "Department of Begging the Question" (part of the "Clueless
Pedants Department", sharing a common desk with the "Pointless
Arguments Department")...
Feel free to insult the experts who hang out here, I'm sure they'll be
even keener to help you after you've been rude to them.
---
William Ernest Reid
ps one too many dashes, and no space in your sigsep.

--
Mark McIntyre

CLC FAQ <http://c-faq.com/>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>
Jul 12 '08 #49
Bill Reid wrote:
>(I wrote)
>So to get a good answer, you need to ask in a system-specific news group.

OK, I'll take that as a "I have no idea"...
Take it how you like. It means what it says.
For the record, I have several answers, depending on which system you're
interested in. However you have zero chance of getting any of them from
me now.
>all you had
to do was pick ANY system (or two or three) that you were familiar with
to answer the question...
And had you asked the question over in a system-specific group, I'd have
answered with a relevant answer.

However its not topical here, so given your abusive attitude I suggest
you swivel on it.
--
Mark McIntyre

CLC FAQ <http://c-faq.com/>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>
Jul 13 '08 #50

This discussion thread is closed

Replies have been disabled for this discussion.

By using this site, you agree to our Privacy Policy and Terms of Use.