Bug/Gross InEfficiency in HeathField's fgetline program

Antoninus Twink

The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.

char *dot_to_undersc ore(const char *s)
{
char *t = malloc(strlen(s ) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

Proposed solution:

char *dot_to_undersc ore(const char *s)
{
char *t, *u;
if(t=u=malloc(s trlen(s)+1))
while(*u++=(*s= ='.' ? s++, '_' : *s++));
return t;
}

Oct 7 '07 #1

Subscribe Reply

334

11441

1
2
3
11
>
Last »

Richard Heathfield

Antoninus Twink said:

The function below is from Richard HeathField's fgetline program.

It appears to be from my emgen utility, in fact.

For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings.

If it becomes a problem, I'll fix it. So far, it has not been a problem.

It is also written in a hard-to-read and clunky style.

A matter of opinion. Which bit did you find hard to read?

char *dot_to_undersc ore(const char *s)
{
char *t = malloc(strlen(s ) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

Proposed solution:

char *dot_to_undersc ore(const char *s)
{
char *t, *u;
if(t=u=malloc(s trlen(s)+1))
while(*u++=(*s= ='.' ? s++, '_' : *s++));
return t;
}

It is not obvious to me that this code correctly replaces the code I wrote.

--
Richard Heathfield <http://www.cpax.org.uk >
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Oct 7 '07 #2

Mark McIntyre

On Mon, 8 Oct 2007 00:16:07 +0200 (CEST), in comp.lang.c , Antoninus
Twink <no****@nospam. comwrote:

>The function below is from Richard HeathField's fgetline program.

Troll alert.

>it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings.

Possibly but consider the three laws of optimisation, and the data
typically being processed.

>It is also written in a hard-to-read and clunky style.

I disagree.

char *t, *u;
if(t=u=malloc(s trlen(s)+1))

hilarious !

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Oct 7 '07 #3

Tor Rustad

Antoninus Twink wrote:

The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient.

What new, R.H. has written slow code for years. ;-)

This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.

ROTFL

Proposed solution:

char *dot_to_undersc ore(const char *s)
{
char *t, *u;
if(t=u=malloc(s trlen(s)+1))
while(*u++=(*s= ='.' ? s++, '_' : *s++));
return t;
}

Hmm.. who's code was hard-to-read and clunky?
Anyway, you are missing the point. Strings are usually rather short, and
when located in L1 cache, is doesn't matter much, scanning it 2 or 3 times.

However, a real spead-up here, would be to drop malloc() and use fixed
sized strings. That's the way, to beat the crap out of OO code.

--
Tor <torust [at] online [dot] no>

C-FAQ: http://c-faq.com/

Oct 8 '07 #4

Richard Heathfield

Tor Rustad said:

<snip>

>
Anyway, you are missing the point. Strings are usually rather short, and
when located in L1 cache, is doesn't matter much, scanning it 2 or 3
times.

The important thing is to avoid reading it n times.

However, a real spead-up here, would be to drop malloc() and use fixed
sized strings. That's the way, to beat the crap out of OO code.

If performance were the primary consideration, that's exactly what I'd have
done. Since it wasn't, it isn't. I considered robustness in the face of
long strings to be more important.

--
Richard Heathfield <http://www.cpax.org.uk >
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Oct 8 '07 #5

Peter Nilsson

Antoninus Twink <nos...@nospam. comwrote:

The function below is from Richard HeathField's fgetline
program. For some reason, it makes three passes through
the string (a strlen(), a strcpy() then another pass to
change dots) when two would clearly be sufficient. ...

For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.

--
Peter

Oct 8 '07 #6

CBFalconer

Mark McIntyre wrote:

Antoninus Twink <no****@nospam. comwrote:

.... snip ...

>
> char *t, *u;
if(t=u=malloc(s trlen(s)+1))

hilarious !

What is hilarious? It should detect the failure of malloc quite
reliably. Of course the lack of blanks is rather foul.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Oct 8 '07 #7

Chris Thomasson

"Peter Nilsson" <ai***@acay.com .auwrote in message
news:11******** *************@g 4g2000hsf.googl egroups.com...

Antoninus Twink <nos...@nospam. comwrote:
>The function below is from Richard HeathField's fgetline
program. For some reason, it makes three passes through
the string (a strlen(), a strcpy() then another pass to
change dots) when two would clearly be sufficient. ...

For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.

Perhaps when he says his name out loud with a nervous tinge, it sound as if
there should be to capital letters... Speaking out load:

"That darn Heath, possible minor/stutter, Field!"

Na... Nobody is that serious about the pronouncement of somebody's name...
Ahh, it could be a simple typo...

Oct 8 '07 #8

Richard Heathfield

Peter Nilsson said:

Antoninus Twink <nos...@nospam. comwrote:
>The function below is from Richard HeathField's fgetline
program. For some reason, it makes three passes through
the string (a strlen(), a strcpy() then another pass to
change dots) when two would clearly be sufficient. ...

For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.

The mis-capitalisation is consistent with that used by a troll named "Paul"
(with an email address containing "paulcr"), who once threatened to break
my nose, apparently because he didn't know C very well.

--
Richard Heathfield <http://www.cpax.org.uk >
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Oct 8 '07 #9

Antoninus Twink

On 7 Oct 2007 at 22:55, Richard Heathfield wrote:

Antoninus Twink said:

>The function below is from Richard HeathField's fgetline program.

It appears to be from my emgen utility, in fact.

>For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings.

If it becomes a problem, I'll fix it. So far, it has not been a problem.

But what's frustrating is that it's an inefficiency that's completely
gratuitous! We all know that micro-optimization is bad, but this is a
micro-anti-optimization, which is surely worse!

The most natural way to look at this is "copy the characters from one
string to another, replacing . by _ when we see it". This has the
benefit of being a 1-pass algorithm. Instead, you split this up "first
copy one string to another; then go back to the beginning and swap . for
_". This makes a simple single operation into two, at the same time
introducing an extra pass through the string! It's not as if there's so
much fiendish complexity here that there's any benefit in breaking it up
into two separate operations.

>
>It is also written in a hard-to-read and clunky style.

A matter of opinion. Which bit did you find hard to read?

The function is a completely trivial one, yet I can't see it all at once
in my editor without scrolling! Whitespace can help readability, but
excessive whitespace can reduce it, and at the same time give too much
weight to things that aren't important.

>
>char *dot_to_undersc ore(const char *s)
{
char *t = malloc(strlen(s ) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

Proposed solution:

char *dot_to_undersc ore(const char *s)
{
char *t, *u;
if(t=u=malloc(s trlen(s)+1))
while(*u++=(*s= ='.' ? s++, '_' : *s++));
return t;
}

It is not obvious to me that this code correctly replaces the code I wrote.

If you believe that it doesn't correctly replace the code you wrote, it
would be easy to demonstrate that by pointing out a specific input s for
which it gives a different result, or an error (syntax error or
undefined behavior or whatever).

Oct 8 '07 #10