473,508 Members | 3,833 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

An Example for Discussion

I've been away from C programming since 1999 and I've recently taken it
up again. Below is a string function I wrote this morning that will be
going into my home-grown library of string functions. I brought the
idea of it from a function of the same name in PL/SQL. I thought it
would be handy in my C library. Mine differs in that the source string
argument is modified and returned from the function.

Working with C strings has always been problematic because you can crash
a program faster than you can say "Jack's you're uncle!" if you are not
careful. Some of my string functions I take the strcpy() approach and
copy in the modified string into a destination pointer that you can only
hope is pointing to something big enough and in your program's data
space. With others I modify the source string in place and return a
pointer to it. With this function I felt it safe to modify the string
in place because at worst one character would be changed to another and
at best the character would be removed from the source string, so there
is no chance of overflowing the buffer. Of course, someone will try to
use a string constant for the sorce string (i.e. "This is a string").
I thought I would post the code and see what the experts had to say. For
reference "strtools.h" is my header file for my homegrown string
functions, and chrsubst() is used to substitute characters in a string,
and str_rmchars() removes a set of characters from a string.

/**FUNCTION**************************************** ********************/
/* Name: */
/* translate(). */
/* */
/* Synopsis: */
/* #include <strings.h> */
/* #include "strtools.h" */
/* char *translate(char *src, char *fromstr, char *tostr); */
/* */
/* Description: */
/* The translate function will translate characters that appear */
/* in both the src and fromstr strings. If the fromstr character */
/* has a corresponding value in the tostr string all characters */
/* in the src string will be translated to that of the */
/* corresponding tostr character. If there is no corresponding */
/* character in the tostr string the character will be removed */
/* from the source string. */
/* */
/* Arguments: */
/* char *source - The string to be examined and translated. */
/* char *fromstr- The characters to translate in the source. */
/* char *tostr - The corresponding characters the source */
/* characters are translated to. */
/* */
/* Return Value: */
/* The starting address of the source string. */
/* */
/************************************************** **********FUNCTION**/

char *
translate ( char *p_source, char *p_from_string, char *p_to_string )
{
unsigned x_sub = 0, l_len = strlen( p_to_string );

while ( *p_from_string ) {

if ( strchr( p_source, *p_from_string ) != NULL ) {
if ( x_sub < l_len )
chrsubst( p_source, *p_from_string, p_to_string[x_sub] );
else {
char l_ch[] = "";
l_ch[0] = *p_from_string;
str_rmchars( p_source, l_ch );
}
}

x_sub++;
p_from_string++;

}

return p_source;
}
Nov 14 '05 #1
11 1408
Stan Milam <st*****@swbell.net> wrote:
I've been away from C programming since 1999 and I've recently taken it
up again. Below is a string function I wrote this morning that will be
going into my home-grown library of string functions. I brought the
idea of it from a function of the same name in PL/SQL. I thought it
would be handy in my C library. Mine differs in that the source string
argument is modified and returned from the function. Working with C strings has always been problematic because you can crash
a program faster than you can say "Jack's you're uncle!" if you are not
careful. Some of my string functions I take the strcpy() approach and
copy in the modified string into a destination pointer that you can only
hope is pointing to something big enough and in your program's data
space. With others I modify the source string in place and return a
pointer to it. With this function I felt it safe to modify the string
in place because at worst one character would be changed to another and
at best the character would be removed from the source string, so there
is no chance of overflowing the buffer. Of course, someone will try to
use a string constant for the sorce string (i.e. "This is a string").
I thought I would post the code and see what the experts had to say. For
reference "strtools.h" is my header file for my homegrown string
functions, and chrsubst() is used to substitute characters in a string,
and str_rmchars() removes a set of characters from a string. /**FUNCTION**************************************** ********************/
/* Name: */
/* translate(). */
/* */
/* Synopsis: */
/* #include <strings.h> */
/* #include "strtools.h" */
/* char *translate(char *src, char *fromstr, char *tostr); */
/* */
/* Description: */
/* The translate function will translate characters that appear */
/* in both the src and fromstr strings. If the fromstr character */
/* has a corresponding value in the tostr string all characters */
/* in the src string will be translated to that of the */
/* corresponding tostr character. If there is no corresponding */
/* character in the tostr string the character will be removed */
/* from the source string. */
/* */
/* Arguments: */
/* char *source - The string to be examined and translated. */
/* char *fromstr- The characters to translate in the source. */
/* char *tostr - The corresponding characters the source */
/* characters are translated to. */
/* */
/* Return Value: */
/* The starting address of the source string. */
/* */
/************************************************** **********FUNCTION**/ char *
translate ( char *p_source, char *p_from_string, char *p_to_string )
{
unsigned x_sub = 0, l_len = strlen( p_to_string );
Since strlen() returns a size_t type I would think that it's
prudent to keep that type for both 'x_sub' and 'l_len'.
while ( *p_from_string ) { if ( strchr( p_source, *p_from_string ) != NULL ) {
if ( x_sub < l_len )
chrsubst( p_source, *p_from_string, p_to_string[x_sub] );
else {
char l_ch[] = "";
Here you create an array with 1 element.
l_ch[0] = *p_from_string;
Now 'l_ch' isn't a string anymore (the final '\0' is missing) but
just an array (with on char)
str_rmchars( p_source, l_ch );
Since you pass an array as the second argument but you can't determine
in the called function how long it is (you neither pass the length nor
has what you pass to it a final '\0') this smells extremely fishy, at
least if I assume correctly that you wouldn't pass it an array of chars
when the function only operates on a single char.
} } x_sub++;
p_from_string++; } return p_source;
}


A few remarks concerning efficiency:

a) You can stop with your main loop immediately when 'x_sub' is equal
to l_len' - all left in this case is calling str_rmchar with what
'p_from_string' then points to (you will have to test if the chars
are in 'p_source' within the function anyway in order to find them,
so it doesn't make too much sense to do that also in the callin
function).

b) Since you also need to find in chrsubstr() where the function to be
replaced is located you automatically test if it's there at all, so
also for this you don't need the strstr() call in the caller. Actu-
ally, what chrsubst() might be that simple that you can avoid the
fucntion call overhead.

So I guess you can reduce the whole function to (untested!)

char *
translate( char *p_source, char *p_from_string, char *p_to_string )
{
size_t x_sub = 0,
from_len = strlen( p_from_string ),
to_len = strlen( p_to_string );

if ( from_len > to_len )
{
str_rmchars( p_source, p_from_string + to_len );
from_len = to_len;
}

for ( ; x_sub < from_len; x_sub++, p_from_string++ )
{
char *where = p_source;
while ( ( where = strchr( where, *p_from_string ) ) != NULL )
*where++ = p_to_string[ x_sub ];
}

return p_source;
}

again assuming that str_rmchar() removes all chars that are passed via
the second argument.
Regards, Jens
--
\ Jens Thoms Toerring ___ Je***********@physik.fu-berlin.de
\__________________________ http://www.toerring.de
Nov 14 '05 #2

On Fri, 4 Mar 2005, Stan Milam wrote:

I've been away from C programming since 1999 and I've recently taken it up
again. Below is a string function I wrote this morning that will be going
into my home-grown library of string functions. I brought the idea of it
from a function of the same name in PL/SQL.
This being a C group, and not an Oracle group, one shouldn't expect the
readers to know the PL/SQL library. I went and tracked it down. It turns
out that the 'TRANSLATE' function in PL/SQL does basically what the 'tr'
command in *nix does by default: given a list of "from" characters and a
matching list of "to" characters, replace each "from[i]" with "to[i]" in
the given string.
Working with C strings has always been problematic because you can crash a
program faster than you can say "Jack's you're uncle!" if you are not
careful.
True enough.
I thought I would post the code and see what the experts had to say. For
reference "strtools.h" is my header file for my homegrown string functions,
and chrsubst() is used to substitute characters in a string, and
str_rmchars() removes a set of characters from a string.
That's not very descriptive. Moreover, I think your aggressive
"modularization" ends up hurting both clarity and performance --- and
not just because of the weird function names!

(All criticism above and below is intended in the best of spirits, of
course. I hope it's useful to you. Now, on with the dissection!)

/**FUNCTION**************************************** ********************/
/* Name: */
/* translate(). */
Seriously, if the reader can't tell this much from the code, do you
/really/ want him reading it? This entire comment block is much too
verbose; it's almost longer than the function, and it /will/ be longer
than the function once I get through with it!
/* */
/* Synopsis: */
/* #include <strings.h> */
You never explained what the above header is. Is it a typo, perhaps?
And why should the reader care what dependencies this function has? That's
the implementor's job --- to make sure that the source file that defines
this function #includes <strings.h> (or <string.h>) at the top. There's
no reason to repeat any of this stuff here.
/* #include "strtools.h" */
/* char *translate(char *src, char *fromstr, char *tostr); */
This is doubly weird: first, because again the reader ought to be able
to read a prototype himself without seeing it twice; and second, because
the prototype above uses completely different variable names from the
one you actually use in the source code. They should match; there's no
reason they shouldn't.
/* */
/* Description: */
/* The translate function will translate characters that appear */
/* in both the src and fromstr strings. If the fromstr character */
/* has a corresponding value in the tostr string all characters */
/* in the src string will be translated to that of the */
/* corresponding tostr character. If there is no corresponding */
/* character in the tostr string the character will be removed */
/* from the source string. */
This is both dense and vague. I didn't understand what it was trying
to say until I read the appropriate page in the SQL/PL manual. Surely
there's a better way to describe the procedure! (In fact, part of your
problem is that the reader can't immediately tell from the source code
what's going on. Once we clarify the code, the algorithm will be
obvious, and thus will need less prose explanation.)
/* Arguments: */
/* char *source - The string to be examined and translated. */
/* char *fromstr- The characters to translate in the source. */
/* char *tostr - The corresponding characters the source */
/* characters are translated to. */
/* */
/* Return Value: */
/* The starting address of the source string. */
/* */
/************************************************** **********FUNCTION**/

char *
translate ( char *p_source, char *p_from_string, char *p_to_string )
Hungarian notation. And done wrong. Since when does "p" stand for
"string"? None of those three parameters is a "pointer" to any data
type in the abstract sense; they're all strings. If you were going to
use any Hungarian prefix, you'd be using "s": 's_source', 's_from',
and so on. ('s_from_string' would be a stupid name, as I'm sure you can
see.)
Drop the obfuscatory notation.
{
unsigned x_sub = 0, l_len = strlen( p_to_string );

while ( *p_from_string ) {
(I disagree with your whitespace style, too, but that's a personal
religious issue. I mention it only because I'll be using my own style
later on, so if you want to see my suggested style, you should scroll
down now.)
if ( strchr( p_source, *p_from_string ) != NULL ) {
if ( x_sub < l_len )
chrsubst( p_source, *p_from_string, p_to_string[x_sub] );
I have no idea what this is doing, but I bet it scans through the
entire string 'p_source', replacing each instance of '*p_from_string'
with 'p_to_string[x_sub]'. And you do this 'l_len' times. That's a
lot of looping for little reason. And I've already mentioned the poor
naming scheme: 'chrsubst'!?
else {
char l_ch[] = "";
This is equivalent to
char l_ch[1] = {0};
The following two lines lead me to believe that perhaps this is another
typo, that you meant 'char l_ch[2]' instead. Otherwise, why bother making
a copy of a single character?
l_ch[0] = *p_from_string;
str_rmchars( p_source, l_ch );
}
}

x_sub++;
p_from_string++;
Any time you see a loop, in C, which ends with an increment or a
pointer-chasing assignment, that should send up warning flags that
something hasn't been thought out properly. Here, it indicates that
you're missing a 'for' loop. See below.
}

return p_source;
}

So that was the code. Now here's my overall analysis.
You end a 'while' loop with an increment. That's the sign of a 'for'
loop struggling to get out. So we rewrite the loop:

[...] unsigned x_sub;

for (x_sub=0; *p_from_string; ++x_sub, ++p_from_string) {
if ( strchr( p_source, *p_from_string ) != NULL ) { [...] }
} [...]

The next thing I noticed was that now we have a 'for' loop with two
increments in the increment clause. That's a sign that we have dependent
loop variables that need refactoring --- and indeed, we are really using
'x_sub' as a loop counter, and merely incrementing 'p_from_string' to keep
it pointing at the 'x_sub'th element of that array. So we merge the two
dependent variables; that yields
char *translate(char *p_source, char *p_from_string, char *p_to_string)
{
unsigned l_len = strlen( p_to_string );
unsigned x_sub;

for (x_sub=0; p_from_string[x_sub]; ++x_sub) {
if (strchr(p_source, p_from_string[x_sub]) != NULL) {
if ( x_sub < l_len )
chrsubst(p_source, p_from_string[x_sub], p_to_string[x_sub]);
else {
char l_ch[1];
l_ch[0] = p_from_string[x_sub];
str_rmchars(p_source, l_ch);
}
}
}

return p_source;
}


Suddenly the parallel structure in the call to 'chrsubst' stands out!
We have de-obfuscated a critical part of the algorithm: We now clearly
see that characters from 'p_from_string' are matched only with the
corresponding characters from 'p_to_string'.

Next, I prefer to remove one level of indentation by changing that
first gigantic 'if' to a conditional 'continue'; at the same time, to
conserve even more space, I'll switch to a saner naming convention.
The two input-only parameters are properly const-qualified. This yields
what I consider the "clean" version of the procedure you wrote:

/*
Translates characters that appear in the |from| string to their
corresponding characters in the |to| string. If |to| is shorter
than |from|, the unmatched |from| characters are deleted from |src|.
Overwrites |src| with the output string, and returns the new |src|.
*/
char *translate(char *src, const char *from, const char *to)
{
size_t len_to = strlen(to);
size_t i;

for (i=0; from[i] != '\0'; ++i)
{
if (strchr(src, from[i]) == NULL)
continue;
if (i < len_to) {
chrsubst(src, from[i], to[i]);
}
else {
char ch[1]; /* Should probably be char ch[2] = {0}; */
ch[0] = from[i];
str_rmchars(src, ch);
}
}

return src;
}

But we're not done! Remember, I wanted to get rid of those weird
library-routine calls for two reasons: first, for clarity, and second,
for efficiency. There's no reason to be scanning so many times through
the input string! What's more, we can really elucidate the algorithm
much better with "simple" code like this:

char *translate(char *src, const char *from, const char *to)
{
size_t in, out;
size_t len_to = strlen(to);
for (in=out=0; src[in] != '\0'; ++in)
{
/* Is the current character in the |from| array? */
const char *t = strchr(from, src[in]);
/* No; just echo it to the output string */
if (t == NULL)
src[out++] = src[in];
/* Yes; if it has a match in |to|, then echo that entry. */
else if (t - from < len_to)
src[out++] = to[t - from];
}

src[out] = '\0';
return src;
}
A version of this routine using the 'strpbrk' library function
might be interesting to see, also, though I doubt it would be any
cleaner or quicker.

Anyway, I hope you get the idea --- "clean and simple" beats pretty
much any other strategy, any day. I highly recommend "The Elements of
Programming Style" (Kernighan & Plauger) to anyone who wants to learn
to write (or critique!) source code.

HTH,
-Arthur
Nov 14 '05 #3
Arthur J. O'Dwyer wrote:

On Fri, 4 Mar 2005, Stan Milam wrote:

This being a C group, and not an Oracle group, one shouldn't expect the
readers to know the PL/SQL library. I went and tracked it down. It turns
out that the 'TRANSLATE' function in PL/SQL does basically what the 'tr'
command in *nix does by default: given a list of "from" characters and a
matching list of "to" characters, replace each "from[i]" with "to[i]" in
the given string.
I have used the 'tr' command very little in my UNIX career, but I found
the PL/SQL translate function very handy. I thought it would be fun to
implement one in C now that I am writing with it again (along with
PL/SQL). Pretty much all the more-or-less standard C functions were
conceived to solve some problem. And a lot of them have funny names
too. But who could beat IEBR14 (I think that is what it was called) and
IEBGENR from the IBM mainframe world?
I thought I would post the code and see what the experts had to say.
For reference "strtools.h" is my header file for my homegrown string
functions, and chrsubst() is used to substitute characters in a
string, and str_rmchars() removes a set of characters from a string.

That's not very descriptive. Moreover, I think your aggressive
"modularization" ends up hurting both clarity and performance --- and
not just because of the weird function names!


Well, I thought it much better than what is many times provided by other
posters, which is nothing.

(All criticism above and below is intended in the best of spirits, of
course. I hope it's useful to you. Now, on with the dissection!)

OK, fair enough.
/**FUNCTION**************************************** ********************/
/* Name: */
/* translate(). */

Seriously, if the reader can't tell this much from the code, do you
/really/ want him reading it? This entire comment block is much too
verbose; it's almost longer than the function, and it /will/ be longer
than the function once I get through with it!


There is a method to my madness. I write the documentation when I write
the function. This is pretty much the documentation. I use a shell
script to extract these comment blocks an reformat them for printing.
/* */
/* Synopsis: */
/* #include <strings.h> */

You never explained what the above header is. Is it a typo, perhaps?
And why should the reader care what dependencies this function has? That's
the implementor's job --- to make sure that the source file that defines
this function #includes <strings.h> (or <string.h>) at the top. There's
no reason to repeat any of this stuff here.


Yep, I fat fingered it. It should be <string.h>. Again, this becomes
the written document for the function description.
/* #include "strtools.h" */
/* char *translate(char *src, char *fromstr, char *tostr); */

This is doubly weird: first, because again the reader ought to be able
to read a prototype himself without seeing it twice; and second, because
the prototype above uses completely different variable names from the
one you actually use in the source code. They should match; there's no
reason they shouldn't.


No this is what is in the user docs. They do not need to know what I
called them internally.


/* */
/* Description: */
/* The translate function will translate characters that appear */
/* in both the src and fromstr strings. If the fromstr character */
/* has a corresponding value in the tostr string all characters */
/* in the src string will be translated to that of the */
/* corresponding tostr character. If there is no corresponding */
/* character in the tostr string the character will be removed */
/* from the source string. */

This is both dense and vague. I didn't understand what it was trying
to say until I read the appropriate page in the SQL/PL manual. Surely
there's a better way to describe the procedure! (In fact, part of your
problem is that the reader can't immediately tell from the source code
what's going on. Once we clarify the code, the algorithm will be
obvious, and thus will need less prose explanation.)


Thanks. I'll rewrite it. I've been reading too much IBM documentation!
/* Arguments: */
/* char *source - The string to be examined and translated. */
/* char *fromstr- The characters to translate in the source. */
/* char *tostr - The corresponding characters the source */
/* characters are translated to. */
/* */
/* Return Value: */
/* The starting address of the source string. */
/* */
/************************************************** **********FUNCTION**/

char *
translate ( char *p_source, char *p_from_string, char *p_to_string )

Hungarian notation. And done wrong. Since when does "p" stand for
"string"? None of those three parameters is a "pointer" to any data
type in the abstract sense; they're all strings. If you were going to
use any Hungarian prefix, you'd be using "s": 's_source', 's_from',
and so on. ('s_from_string' would be a stupid name, as I'm sure you can
see.)
Drop the obfuscatory notation.


Ah, this is a holdover from my PL/SQL days. PL/SQL does not have
pointers so that did not even occur to me. No, it is not Hungarian
notation, it is Milam notation. I've tried Hungarian and I have come to
truly despise it. The p_ prefix simply means it is a PARAMETER
variable, and the l_ prefix means it is a LOCAL variable. I use the
x_sub, y_sub etc... name for subscripts. Just a glance and I know what
I am dealing with.
{
unsigned x_sub = 0, l_len = strlen( p_to_string );

while ( *p_from_string ) {

(I disagree with your whitespace style, too, but that's a personal
religious issue. I mention it only because I'll be using my own style
later on, so if you want to see my suggested style, you should scroll
down now.)


I like mine. It has been a long time evolving. But thanks for the offer.
if ( strchr( p_source, *p_from_string ) != NULL ) {
if ( x_sub < l_len )
chrsubst( p_source, *p_from_string, p_to_string[x_sub] );

I have no idea what this is doing, but I bet it scans through the
entire string 'p_source', replacing each instance of '*p_from_string'
with 'p_to_string[x_sub]'. And you do this 'l_len' times. That's a
lot of looping for little reason. And I've already mentioned the poor
naming scheme: 'chrsubst'!?


When I wrote that function I was working with a compiler that only
allowed a length of 8 on external function names. So, that's been
awhile and I never changed the name. There are a lot of hold-over names
in software where the operating environment dictated constraints. I'll
create an alias name for chrsubst and mark it as deprecated.
else {
char l_ch[] = "";

This is equivalent to
char l_ch[1] = {0};
The following two lines lead me to believe that perhaps this is another
typo, that you meant 'char l_ch[2]' instead. Otherwise, why bother making
a copy of a single character?


Hey, you are right! Thanks! (I slap myself on the head and say "What
were you thinking?")
l_ch[0] = *p_from_string;
str_rmchars( p_source, l_ch );
}
}

x_sub++;
p_from_string++;

Any time you see a loop, in C, which ends with an increment or a
pointer-chasing assignment, that should send up warning flags that
something hasn't been thought out properly. Here, it indicates that
you're missing a 'for' loop. See below.


Why should there be warning flags? Has the while keyword fallen into
disfavor? I see no advantage of one over the other. It is a matter of
preference.
}

return p_source;
}
So that was the code. Now here's my overall analysis.
You end a 'while' loop with an increment. That's the sign of a 'for'
loop struggling to get out. So we rewrite the loop:

[...]
unsigned x_sub;

for (x_sub=0; *p_from_string; ++x_sub, ++p_from_string) {
if ( strchr( p_source, *p_from_string ) != NULL ) {


[...]


See my comments above. This is like saying we should all write at the
same level, use the style. C is a language and it allows enough
latitude to express ourselves - however we feel on a given day.
}
}
[...]

The next thing I noticed was that now we have a 'for' loop with two
increments in the increment clause. That's a sign that we have dependent
loop variables that need refactoring --- and indeed, we are really using
'x_sub' as a loop counter, and merely incrementing 'p_from_string' to
keep it pointing at the 'x_sub'th element of that array. So we merge the
two
dependent variables; that yields
char *translate(char *p_source, char *p_from_string, char *p_to_string)
{
unsigned l_len = strlen( p_to_string );
unsigned x_sub;

for (x_sub=0; p_from_string[x_sub]; ++x_sub) {
if (strchr(p_source, p_from_string[x_sub]) != NULL) {
if ( x_sub < l_len )
chrsubst(p_source, p_from_string[x_sub],
p_to_string[x_sub]);
else {
char l_ch[1];
l_ch[0] = p_from_string[x_sub];
str_rmchars(p_source, l_ch);
}
}
}

return p_source;
}


Yes that is an improvement.


Suddenly the parallel structure in the call to 'chrsubst' stands out!
We have de-obfuscated a critical part of the algorithm: We now clearly
see that characters from 'p_from_string' are matched only with the
corresponding characters from 'p_to_string'.

Next, I prefer to remove one level of indentation by changing that
first gigantic 'if' to a conditional 'continue'; at the same time, to
conserve even more space, I'll switch to a saner naming convention.
The two input-only parameters are properly const-qualified. This yields
what I consider the "clean" version of the procedure you wrote:

I'll go for that.
/*
Translates characters that appear in the |from| string to their
corresponding characters in the |to| string. If |to| is shorter
than |from|, the unmatched |from| characters are deleted from |src|.
Overwrites |src| with the output string, and returns the new |src|.
*/
char *translate(char *src, const char *from, const char *to)
{
size_t len_to = strlen(to);
size_t i;

for (i=0; from[i] != '\0'; ++i)
{
if (strchr(src, from[i]) == NULL)
continue;
if (i < len_to) {
chrsubst(src, from[i], to[i]);
}
else {
char ch[1]; /* Should probably be char ch[2] = {0}; */
ch[0] = from[i];
str_rmchars(src, ch);
}
}

return src;
}

But we're not done! Remember, I wanted to get rid of those weird
library-routine calls for two reasons: first, for clarity, and second,
for efficiency. There's no reason to be scanning so many times through
the input string! What's more, we can really elucidate the algorithm
much better with "simple" code like this:

char *translate(char *src, const char *from, const char *to)
{
size_t in, out;
size_t len_to = strlen(to);
for (in=out=0; src[in] != '\0'; ++in)
{
/* Is the current character in the |from| array? */
const char *t = strchr(from, src[in]);
/* No; just echo it to the output string */
if (t == NULL)
src[out++] = src[in];
/* Yes; if it has a match in |to|, then echo that entry. */
else if (t - from < len_to)
src[out++] = to[t - from];
}

src[out] = '\0';
return src;
}
A version of this routine using the 'strpbrk' library function
might be interesting to see, also, though I doubt it would be any
cleaner or quicker.

Anyway, I hope you get the idea --- "clean and simple" beats pretty
much any other strategy, any day. I highly recommend "The Elements of
Programming Style" (Kernighan & Plauger) to anyone who wants to learn
to write (or critique!) source code.

HTH,
-Arthur


Thanks. I have been sitting on the bench for a couple of years. I have
another life besides being a programmer and I decided to live it for a
while. The problem has been my skills are a bit rusty and I've got to
get back "In The Zone."

Regards,
Stan Milam.
Nov 14 '05 #4
Arthur J. O'Dwyer wrote:

But we're not done! Remember, I wanted to get rid of those weird
library-routine calls for two reasons: first, for clarity, and second,
for efficiency. There's no reason to be scanning so many times through
the input string! What's more, we can really elucidate the algorithm
much better with "simple" code like this:

char *translate(char *src, const char *from, const char *to)
{
size_t in, out;
size_t len_to = strlen(to);
for (in=out=0; src[in] != '\0'; ++in)
{
/* Is the current character in the |from| array? */
const char *t = strchr(from, src[in]);
/* No; just echo it to the output string */
if (t == NULL)
src[out++] = src[in];
/* Yes; if it has a match in |to|, then echo that entry. */
else if (t - from < len_to)
src[out++] = to[t - from];
}

src[out] = '\0';
return src;
}

You're right, we are not done. Your version does not do the same thing
as mine. Here are the differences:

1. In mine if the char is in the "to" and "from" strings, but not the
"source" string then leave the source character alone. So far we agree.

2. In mine when the character is in the "source" and "from" strings
then *all instances* of the character are replaced with the
corresponding character from the "to" string. Hence the call to
chrsubst() in my code. Yours does too but it is not readily obvious what
you are doing. With mine I took the what I call the straight-ahead
approach and once this condition is met I change them all. But I like
yours very much as you only change the characters when you have to with
no function call.

3. In mine when the character is found in both the "source" and "from"
strings but there is no corresponding character in the "to" string *all
instances* of the character will be squeezed out of the "source" string.
This is what str_rmchars() does. A casual glance at yours tells me
yours does not squeeze out the characters at all.

A version of this routine using the 'strpbrk' library function
might be interesting to see, also, though I doubt it would be any
cleaner or quicker.

Anyway, I hope you get the idea --- "clean and simple" beats pretty
much any other strategy, any day. I highly recommend "The Elements of
Programming Style" (Kernighan & Plauger) to anyone who wants to learn
to write (or critique!) source code.

HTH,
-Arthur


I really did not this thread to be about functionality or writing style,
but rather the legitimacy of modifying the source string. Also, one of
my programming tenants is to not re invent the wheel, hence the calls to
my existing routines. I have heard the argument about the overhead of
setting up and calling a function before. My answer has always been "so
what?, Programmer time is much more expensive than computer time." As an
example many years ago I was asked by my employer to review C code for a
rather large program. I was horrified to see the contractors had
written in-line, data specific quick sorts and binary search routines.
When asked why they did not use qsort() and bsearch() their reply was
they did not want to incur the overhead of calling a library routine.
They also wrote their own versions of standard library calls like
strcpy() because they thought they could do it better than the vendor
when the vendor docs clearly stated that when optimization was turned on
the string functions would be turned into in-line assembly code. I was
livid. These guys had spent many hours at significant dollar rates to
write, test, and document code that did not need to be written.

Arthur, you have helped tremendously. Thanks.

Regards,
Stan Milam.
Nov 14 '05 #5
>Arthur J. O'Dwyer wrote:

[snipped to just the code]
char *translate(char *src, const char *from, const char *to)
{
size_t in, out;
size_t len_to = strlen(to);
for (in=out=0; src[in] != '\0'; ++in)
{
/* Is the current character in the |from| array? */
const char *t = strchr(from, src[in]);
/* No; just echo it to the output string */
if (t == NULL)
src[out++] = src[in];
/* Yes; if it has a match in |to|, then echo that entry. */
else if (t - from < len_to)
src[out++] = to[t - from];
}

src[out] = '\0';
return src;
}

In article <GL*****************@newssvr30.news.prodigy.com>
Stan Milam <st*****@swbell.net> wrote:You're right, we are not done. Your version does not do the same thing
as mine. Here are the differences:

1. In mine if the char is in the "to" and "from" strings, but not the
"source" string then leave the source character alone. So far we agree.
I am not sure why you list this as a "difference"... :-)
2. In mine when the character is in the "source" and "from" strings
then *all instances* of the character are replaced with the
corresponding character from the "to" string.
This happens in Arthur's as well. He loops over all characters in
the string -- the array whose first element "src" points to -- and
applies the "replace or delete character" algorithm to each character.
... Yours does too
So this is not a difference either. (Or is it? See below.)
but it is not readily obvious what you are doing.
It seems quite clear to me (thought I would probably have used
somewhat different variable names).
With mine I took the what I call the straight-ahead
approach and once this condition is met I change them all.
This has a potential problem. Suppose we replace "a" with "b", and
"b" with "c", i.e.:

char buf[] = "abcdedcba";

(void) translate(buf, "ab", "bc");

If you first replace *all* the 'a's with 'b's, we change buf[]'s
contents from "abcdedcba" to "bbcdedcbb". Next, we replace *all*
the 'b's with 'c's, producing "cccdedccc".

I suspect that anyone making that call would *want* "bccdedccb"
in the buf[] array. To obtain that as the result, the translation
must not be "re-applied" to translated characters. To me, the
easiest way to guarantee this is to make a clear distinction
between "as-yet-untranslated" characters and "after-translation"
characters. In Arthur's code, this is precisely the same as the
array index variable "in": characters at src[in] and beyond are
not-yet-translated; other characters are.

Of course, if "cccdedccc" *is* the desired result, Arthur's code
will not achieve it.
3. In mine when the character is found in both the "source" and "from"
strings but there is no corresponding character in the "to" string *all
instances* of the character will be squeezed out of the "source" string.
Arthur's does this as well, because the "translate single character"
algorithm (embedded within the loop) is:

if (t == NULL)
src[out++] = src[in];
/* Yes; if it has a match in |to|, then echo that entry. */
else if (t - from < len_to)
src[out++] = to[t - from];
/* else do nothing, which effectively deletes the character */

Note that, initially, out==in. If every "from" character has a
corresponding "to" character, so that (t - from) is always a
valid index in "to", this "out==in" condition will remain true
throughout the entire loop, so that the antepenultimate line
in the function:

src[out] = '\0';

does not change src[out] at all. On the other hand, if some
"from" characters are to be deleted -- e.g., from[] = "abc"
while to[] = "bc", which will change buf[] = "abcdedcba" to
"bcdedba\0a\0" -- then, in this case, "out" sometimes does not
increment when "in" does, so that eventually out < in. This
is why the src[out] = '\0' line is included.
I really did not this thread to be about functionality or writing style,
but rather the legitimacy of modifying the source string.
There is nothing ultimately *wrong* with this, although if one
is to do it, I, at least, would not call it a "source string"
at all, but rather a "buffer" or something similar. If one is
to modify the array, this must be made clear to callers, so
that they do not attempt things like this:

char *result = translate("hello", "h", "j");

This code has undefined behavior, because translate() attempts to
replace the "h" in the anonymous array created by the string
literal "hello", which may be in read-only storage and/or shared
with other "hello"s in the program.

On the other hand, if you choose *not* to modify the original
array, you are faced with the question of where to put the
storage space for the result. (This is a FAQ; see 7.5a and 7.5b.)
Also, one of my programming tenants is to not re invent the wheel,
hence the calls to my existing routines.


While this is a good tenet, there are times when the existing wheels
are unsuitable, for whatever reason (too large, too small, too
triangular, etc. :-) ). The problem of translating "abcdedcba" to
"bccdedccb" (rather than "cccdedccc") is, I think, one case of such
unsuitable wheels.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #6

On Sat, 5 Mar 2005, Chris Torek wrote:
Arthur J. O'Dwyer wrote:
char *translate(char *src, const char *from, const char *to)
{
size_t in, out;
size_t len_to = strlen(to);
for (in=out=0; src[in] != '\0'; ++in)
{
/* Is the current character in the |from| array? */
const char *t = strchr(from, src[in]);
/* No; just echo it to the output string */
if (t == NULL)
src[out++] = src[in];
/* Yes; if it has a match in |to|, then echo that entry. */
else if (t - from < len_to)
src[out++] = to[t - from];
}

src[out] = '\0';
return src;
}
Stan Milam <st*****@swbell.net> wrote:
You're right, we are not done. Your version does not do the same thing
as mine. Here are the differences:
[...] With mine I took the what I call the straight-ahead
approach and once this condition is met I change them all.


This has a potential problem. Suppose we replace "a" with "b", and
"b" with "c", i.e.:

char buf[] = "abcdedcba";

(void) translate(buf, "ab", "bc");

If you first replace *all* the 'a's with 'b's, we change buf[]'s
contents from "abcdedcba" to "bbcdedcbb". Next, we replace *all*
the 'b's with 'c's, producing "cccdedccc".

I suspect that anyone making that call would *want* "bccdedccb"
in the buf[] array.


Good catch! I recall that many of the examples in Kernighan & Plauger
turned out to have hidden bugs which were revealed by the refactoring
process. In this case, it looks to me like a hidden bug existed, and
was unintentionally corrected /without/ being revealed --- at least,
not to me! It's a good thing Chris Torek was on the job. ;-)
Of course, if "cccdedccc" *is* the desired result, Arthur's code
will not achieve it.


Right.

[A very good explanation of #3 snipped.]

Thanks,
-Arthur
Nov 14 '05 #7
Chris Torek wrote:
With mine I took the what I call the straight-ahead
approach and once this condition is met I change them all.

This has a potential problem. Suppose we replace "a" with "b", and
"b" with "c", i.e.:

char buf[] = "abcdedcba";

(void) translate(buf, "ab", "bc");

If you first replace *all* the 'a's with 'b's, we change buf[]'s
contents from "abcdedcba" to "bbcdedcbb". Next, we replace *all*
the 'b's with 'c's, producing "cccdedccc".

I suspect that anyone making that call would *want* "bccdedccb"
in the buf[] array. To obtain that as the result, the translation
must not be "re-applied" to translated characters. To me, the
easiest way to guarantee this is to make a clear distinction
between "as-yet-untranslated" characters and "after-translation"
characters. In Arthur's code, this is precisely the same as the
array index variable "in": characters at src[in] and beyond are
not-yet-translated; other characters are.

Of course, if "cccdedccc" *is* the desired result, Arthur's code
will not achieve it.


Yes, that is the desired behavior.

3. In mine when the character is found in both the "source" and "from"
strings but there is no corresponding character in the "to" string *all
instances* of the character will be squeezed out of the "source" string.

Arthur's does this as well, because the "translate single character"
algorithm (embedded within the loop) is:

if (t == NULL)
src[out++] = src[in];
/* Yes; if it has a match in |to|, then echo that entry. */
else if (t - from < len_to)
src[out++] = to[t - from];
/* else do nothing, which effectively deletes the character */

Note that, initially, out==in. If every "from" character has a
corresponding "to" character, so that (t - from) is always a
valid index in "to", this "out==in" condition will remain true
throughout the entire loop, so that the antepenultimate line
in the function:

src[out] = '\0';

does not change src[out] at all. On the other hand, if some
"from" characters are to be deleted -- e.g., from[] = "abc"
while to[] = "bc", which will change buf[] = "abcdedcba" to
"bcdedba\0a\0" -- then, in this case, "out" sometimes does not
increment when "in" does, so that eventually out < in. This
is why the src[out] = '\0' line is included.

I really did not this thread to be about functionality or writing style,
but rather the legitimacy of modifying the source string.

There is nothing ultimately *wrong* with this, although if one
is to do it, I, at least, would not call it a "source string"
at all, but rather a "buffer" or something similar. If one is
to modify the array, this must be made clear to callers, so
that they do not attempt things like this:

char *result = translate("hello", "h", "j");

This code has undefined behavior, because translate() attempts to
replace the "h" in the anonymous array created by the string
literal "hello", which may be in read-only storage and/or shared
with other "hello"s in the program.

On the other hand, if you choose *not* to modify the original
array, you are faced with the question of where to put the
storage space for the result. (This is a FAQ; see 7.5a and 7.5b.)


Yep, this is what I was talking about.

Also, one of my programming tenants is to not re invent the wheel,
hence the calls to my existing routines.

While this is a good tenet, there are times when the existing wheels
are unsuitable, for whatever reason (too large, too small, too
triangular, etc. :-) ). The problem of translating "abcdedcba" to
"bccdedccb" (rather than "cccdedccc") is, I think, one case of such
unsuitable wheels.


Nah, just using the tools you already have.
Nov 14 '05 #8
Stan Milam wrote:
Chris Torek wrote:
With mine I took the what I call the straight-ahead
approach and once this condition is met I change them all.


This has a potential problem. Suppose we replace "a" with "b",
and "b" with "c", i.e.:

char buf[] = "abcdedcba";

(void) translate(buf, "ab", "bc");

If you first replace *all* the 'a's with 'b's, we change buf[]'s
contents from "abcdedcba" to "bbcdedcbb". Next, we replace *all*
the 'b's with 'c's, producing "cccdedccc".

I suspect that anyone making that call would *want* "bccdedccb"
in the buf[] array. To obtain that as the result, the translation
must not be "re-applied" to translated characters. To me, the
easiest way to guarantee this is to make a clear distinction
between "as-yet-untranslated" characters and "after-translation"
characters. In Arthur's code, this is precisely the same as the
array index variable "in": characters at src[in] and beyond are
not-yet-translated; other characters are.

Of course, if "cccdedccc" *is* the desired result, Arthur's code
will not achieve it.


Yes, that is the desired behavior.


If you replace all the a's and b's etc. with identifiers (words),
and specify that i/o is done with streams (no backtracking) you
have the behaviour of id2id (my utility for mass reorganization of
identifiers in source code). This strictly eliminates any such
'reapplication'. It also has the advantage of needing no presized
overflowable buffers. See id2id-20 in:

<http://cbfalconer.home.att.net/download/>

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
Nov 14 '05 #9
Je***********@physik.fu-berlin.de wrote:
Stan Milam <st*****@swbell.net> wrote:

Snip of good post
he second argument.
Regards, Jens


Thanks Jens. This was the best reply I've had.

Stan.
Nov 14 '05 #10

On Sun, 6 Mar 2005, CBFalconer wrote:
Stan Milam wrote:
Chris Torek wrote:
char buf[] = "abcdedcba";
(void) translate(buf, "ab", "bc");

If you first replace *all* the 'a's with 'b's, we change buf[]'s
contents from "abcdedcba" to "bbcdedcbb". Next, we replace *all*
the 'b's with 'c's, producing "cccdedccc".

I suspect that anyone making that call would *want* "bccdedccb"[.] [...] Of course, if "cccdedccc" *is* the desired result, Arthur's code
will not achieve it.


Yes, that is the desired behavior.


If you replace all the a's and b's etc. with identifiers (words),
and specify that i/o is done with streams (no backtracking) you
have the behaviour of id2id (my utility for mass reorganization of
identifiers in source code). This strictly eliminates any such
'reapplication'.


*If*. :) Anyway, you misread Stan's reply; he says that "cccdedccc",
the result of reapplication, *IS* the desired behavior. (Now, I
personally think he's just being defensive, since I can't see any
sensible application for that result. But perhaps he knows of one,
and I'm just being re-defensive. ;)

-Arthur
Nov 14 '05 #11
Arthur J. O'Dwyer wrote:

On Sun, 6 Mar 2005, CBFalconer wrote:
Stan Milam wrote:
Chris Torek wrote:

char buf[] = "abcdedcba";
(void) translate(buf, "ab", "bc");

If you first replace *all* the 'a's with 'b's, we change buf[]'s
contents from "abcdedcba" to "bbcdedcbb". Next, we replace *all*
the 'b's with 'c's, producing "cccdedccc".

I suspect that anyone making that call would *want* "bccdedccb"[.]
[...]
Of course, if "cccdedccc" *is* the desired result, Arthur's code
will not achieve it.
Yes, that is the desired behavior.

If you replace all the a's and b's etc. with identifiers (words),
and specify that i/o is done with streams (no backtracking) you
have the behaviour of id2id (my utility for mass reorganization of
identifiers in source code). This strictly eliminates any such
'reapplication'.

*If*. :) Anyway, you misread Stan's reply; he says that "cccdedccc",
the result of reapplication, *IS* the desired behavior. (Now, I
personally think he's just being defensive, since I can't see any
sensible application for that result. But perhaps he knows of one,
and I'm just being re-defensive. ;)

-Arthur


Actually, I came up with an application for this behavior today. But
then I suppose I am being re-re-defensive! :-)

By the way, I rewrote the function today using a lot of your insights. I
removed the calls to chrsubst() and str_rmchars(), used the for() loop
and continue statements you advocated. Actually, I wrote two new
versions. The second removed the call to strlen() for the *p_to string
and incremented the pointer each time it was used to translate a
character. At the top I am testing to see if it is '\0'. This test
decides whether the character is translated or squeezed out of the
target string. I also rewrote the description. Note to self: don't try
doing this stuff when I am tired and sleepy - it just gets messy.

Regards,
Stan Milam.
Nov 14 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2351
by: Hung Jung Lu | last post by:
Hi, I have been looking into AOP (Aspect-Oriented Programming) for sometime, now. I frankly don't like the syntax of any of the approaches I have seen so far. I am kind playing around with some...
23
3172
by: Alberto | last post by:
An OUTSTANDING example of a rebuttal. Worth a glance. Rather long, OBVIOUSLY feel fee to dismiss it entirely if not interested. But this habit of the rebuttals must go to a stop (forgive odd...
3
3379
by: PL | last post by:
I want to pass a 2D array from Python to C++, manipulate it in C++ (for example, add 1 to each element) and pass it back to Python. With these building blocks I will be able to figure out all the...
42
2178
by: lylefair | last post by:
The file is now available as http://www.ffdba.com/downloads/testingNZ3.dat (rename .dat to .mdb) or http://www.ffdba.com/downloads/testingNZ3.mdb (At time of posting I have not opened the file.)
0
2290
by: Harry Smith | last post by:
This was posted in news.groups, but it was not posted to the list. REQUEST FOR DISCUSSION (RFD) unmoderated group comp.databases.postgresql.admin unmoderated group...
2
59064
NeoPa
by: NeoPa | last post by:
CHAPTER 1 - TABLE OF CONTENTS (Including attached database) CHAPTER 2 - INTRODUCTION CHAPTER 3 - TABLE LAYOUT CHAPTER 4 - FORM LAYOUT CHAPTER 5 - FORM MODULE CHAPTER 6 - CODE DISCUSSION (FILTER...
0
822
by: VanL | last post by:
Hello, A couple months ago there was an example posted in a blog of a rest interface for validating zip codes. If I recall correctly, the backend validator was written in python. The...
9
1613
by: andrew cooke | last post by:
Hi, Thanks for the help a couple of days ago. I completed what I was doing and wrote a summary which I've posted at http://acooke.org/cute/PythonMeta0.html (it's kind of long to post here). I...
6
3094
by: Guy Macon | last post by:
While I agree with the sentiment, the oringinal title on this thread ("OT: Specially for , why you should always use example.com for obfuscating domains") is wrong. There are other reserved domain...
7
1670
by: raylopez99 | last post by:
On Aug 16, 3:49 pm, Marc Gravell <marc.grav...@gmail.comwrote: If you cannot understand such a simple post, then you don't understand generic delegate types. Apparently you can only read your...
0
7229
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7129
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7333
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7398
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7061
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7502
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5637
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
4716
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3194
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.