strcmp but with '\n' as the terrminator

Allan Bruce

Hi there,
I am reading a file into a char array, and I want to find if a string exists
in a given line.
I cant use strcmp since the line ends with '\n' and not '\0'. Is there a
similar function that will do this, or will I have to write my own?
Thanks
Allan

Nov 13 '05

Subscribe Reply

8307

« First
<
3
4
5
6
>

Dan Pop

In <pa************ *************** *@and.org> "James Antill" <ja***********@ and.org> writes:

On Tue, 22 Jul 2003 15:55:58 +0000, Dan Pop wrote:
In <ne************ ********@tomato .pcug.org.au> Kevin Easton <kevin@-nospam-pcug.org.au> writes:
Consider repeated concatenation of strings onto a destination - if we
concatenat e 20 strings, each character in the original buffer is
inspected at least 20 times, each character of the second string at
least 19 times, ...
This can be trivially avoided by using sprintf instead of strcat :-)

Errm, what?

Say you have...

List *scan = NULL;
char buf[4096]; /* we "know" this is long enough */

buf[0] = 0;
scan = beg;
while (scan)
{
strcat(buf, scan->data);
scan = scan->next;
}

...how does sprintf() help? Ok, so you can do something like...

ptr = buf;
while (scan)
{
ptr += sprintf(ptr, "%s", scan->data); /* assume sprintf() has an ISO
* return value*/

We normally assume that standard library functions return what the
standard says they do. Without this assumption, the standard library
becomes (next to) useless.
scan = scan->next;
}

...but then you might as well just do...

ptr = buf;
while (scan)
{
size_t len = strlen(scan->data);

memcpy(ptr, scan->data, len);
ptr += len;

scan = scan->next;
}
Except that it requires more code and is, therefore, less readable and
that it requires one more statement, after the loop, to properly terminate
the string.
...and after you do that more than once you realize that you want...

char *my_stpcpy(char *dst, const char *src)
You can simply name it stpcpy(), especially since this is the name you
use below :-)
{
size_t len = strlen(src);

memcpy(dst, src, len);
dst += len;

return (dst);
}

ptr = buf;
while (scan)
{
ptr = stpcpy(ptr, scan->data);
scan = scan->next;
}

...at which point you've just _reinvented the wheel_ for about the
millionth time, creating your own clumsy string API.
Which is pointless, considering that the sprintf-based solution achieves
the same thing, with the same source code complexity, while staying with
the standard API.
All because the c
library string APIs are deficient ... which is pretty much what was
argued.

The only defficiency I can see is that strcpy and strcat (and friends)
have a (mostly) useless return value. For the rare cases when this is
a problem, sprintf provides a solution without needing to reinvent
anything and without having to take the overhead of repetitive strcat()
calls (sprintf has its own overhead, but it is constant per call).

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 13 '05 #41

Paul Hsieh

dj******@csclub .uwaterloo.ca (Dave Vandervies) wrote in message news:<bf******* ***@tabloid.uwa terloo.ca>...

In article <MP************ ************@ne ws.sf.sbcglobal .net>,
Paul Hsieh <qe*@pobox.co m> wrote:
do******@addre ss.co.uk.invali d says...
Paul Hsieh wrote: The implicit requirement to
> scan for the end of the string implicit in most of the string library
> belies is propensity for being slow, a haven for buffer overflows, and
> generally just the wrong set of primitives for string manipulation.

And yet a goodly number of C programmers manage perfectly well with
null-terminated strings in their fast, well-written code.
By what standards can you say any of that? Buffer overflows are the #1
occurring bug, and the vast majority of them occurr in the C string
library.

Accidents are the #1 cause of death for people under the age of 35 in
the United States[1][2], and the vast majority of them are motor vehicle
accidents[3]. Does that mean we should stop using motor vehicles?

But there is only so much you can do about people who have accidents.
Furthermore things *ARE* done to minimize them. That's why cars have
bumpers, crumple zones, air bags and seat belts. That's why microwave
ovens can't operate without the door being closed. That's why razors
have bizarrely shaped enclosures around the blade. The infrastructure
evolves around the need to minimize accidents even if you could argue
that the accident was really the fault of the victim.

Compare this with the C language. In order to make it work and be
adopted, in 1989, compromises were made and lots of questionable
practices were rubber stamped. Ok fine -- for 1989 it was good
decision because it allowed the language to be rapidly and widely
implemented adopted. But in the 20+ year lifetime of this language,
we now know this language has serious problems. Nearly every hack,
most general program failures and every buffer overflow->stack hijack
attack can be traced back to the C standard.

Ok -- so what is to be done about this sad state of affairs? Simple,
do *something* whenever there is a standards revision. 1999 was the C
committee's perfect opportunity to do something, *ANYTHING* to try to
mitigate these problems. Even the single solitary act of deprecating
gets() would have at least been a signal that they were thinking about
these issues.

But no, they added in complex numbers that worsens C++ compatibility,
and numerous other irrelenvancies to codify "standard practice" for no
good reason. Not surprisingly, C99 has gotten no serious support from
any major vendor -- the closest thing is gcc, and they are still
working on it.

As to being fast -- that's impossible unless the functions are absolutely
trivial. The C-library basically imposes an additional minimum O(n) on all
non-trivial string manipulations.

Can you give an example of a nontrivial string manipulation that doesn't
already have O(n) time?

My claim is that there is an *addition* O(n) paid. For those in
theoretical Comp. Sci., this may mean nothing to you if the operation
is O(n) anyways (especially if we ignore the fact the many operations
have an "m" as well as "n"), but Buffer Overflows, paging, and cache
thrashing probably don't mean anything to you either. In which case
real world performance won't mean anything to you either.
[...] I strongly suspect that anything you can come up
with could be done with the C string library with no additional overhead.

I don't claim there is no additional overhead. But all the overhead
is O(1).

C still exposes the best core speed for someone willing to work around
the compiler and pretty much the only useful language with inline assembly
language. So I am stuck with it.

Really? If you go right to assembly, you stop having to work around
the compiler (since there's no longer a compiler to work around), [...]

Look, I don't care whether or not you understand why C (+ assembly
sometimes) is the only real option for writing maintainable and high
performance software.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sourceforge.net/

Nov 13 '05 #42

Paul Hsieh

Richard Heathfield <in*****@addres s.co.uk.invalid > wrote in message news:<3f******@ news2.power.net .uk>...

Kevin Easton wrote:
Dave Vandervies <dj******@csclu b.uwaterloo.ca> wrote:
In article <MP************ ************@ne ws.sf.sbcglobal .net>,
Paul Hsieh <qe*@pobox.co m> wrote: [...]As to being fast -- that's impossible unless the functions are absolutely
trivial. The C-library basically imposes an additional minimum O(n) on
all non-trivial string manipulations.

Can you give an example of a nontrivial string manipulation that doesn't
already have O(n) time? I strongly suspect that anything you can come up
with could be done with the C string library with no additional overhead.

Concatenation of a string of length m with a string of length n is
O(n+m) using strcat, but O(m) if you use a string type that has its
length explicitly stored, rather than indicated by a sentinel.

Using no additional overhead [1], remember

Remember?!?!? Remember where? In one of your processor's 6 precious
registers? You also have to *remember* how much memory you have
allocated and make sure you don't spill over as well, BTW. Oh yes,
and if you are communicating with a library are you going to pass
these remembered quantities around along with the string data? Or
will you let it work it all out with strlen by itself? Of course its
kind of hard to deduce the actual amount of memory from this
information so you either have to figure it all out from the caller
(thus duplicating some of the logic of the library) or you have to
pass it as a parameter (buring an additional register or stack.)

Or you could screw it and just buffer overflow like everyone else
does.

In most other cases you will end up doing a copy of the string that you
need to find the length of, so the overall time complexity doesn't
change by avoiding the scan to find string length - but the constant
factors can often be reduced quite significantly (consider an operation
like search-and-replace).

This can all be managed perfectly satisfactorily using C strings and
temporary variables.

Which my library (and others) is living proof of, of course. Of
course trying to do it all by hand youself without a centralize
library ... well you read about the weekly buffer overflow attacks
that get reported to www.securityfocus.com or Risks Digest or
www.news.com to see what happens when you try to do that.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sourceforge.net/

Nov 13 '05 #43

Paul Hsieh

"James Antill" <ja***********@ and.org> wrote:

But this says nothing about how good or bad the C-library string API is.

memcpy: Useful, but requires the programer to keep track of metadata for
dst.
memmove: Same as memcpy.
strcpy: Most commonly used for buffer overflows, as with all the str*
functions to create data the two inputs cannot be the same.
strncpy: Most broken interface ever
strcat: O(n)
strncat: O(n) Plus dst must be a valid NIL terminated c style string
memcmp: Useful, but requires the programer to keep track of metadata for
both arguments and properly merge them (you can "fix" having to
merge the metadata by using strncpy() but I wouldn't recommend
this).
strcmp: Useful, assuming you have valid c style strings.
strcoll: Same as strcmp
strncmp: Same as memcmp
strxfrm: Can be used as a non-broken strncpy() if you don't mind confusing
everyone (and you don't use LC_COLLATE).
memchr: Same as memcpy
strchr, strcspn, strpbrk, strrchr, strspn, strstr, strlen: Same as strcmp
strtok: Often used badly, destroys it's input ... sometimes even horribly
abused as a side band parameter to functions.
memset: Same as memcpy
Oooh! Nice list. I wonder where you got the idea for doing this from
.... ;)
[2] Malloc implementations I've seen require at least 16 bytes of overhead
per object, so you get 16 + 4 + 1 vs. 16 + 1

Yeah, and more importantly, people trying to mitigate buffer overflows
by allocating for the worst case will, of course, waste *far more* in
overhead on average.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sourceforge.net/

Nov 13 '05 #44

Richard Heathfield

Paul Hsieh wrote:

Richard Heathfield <in*****@addres s.co.uk.invalid > wrote in message
news:<3f******@ news2.power.net .uk>...
Kevin Easton wrote:
> Dave Vandervies <dj******@csclu b.uwaterloo.ca> wrote:
>> In article <MP************ ************@ne ws.sf.sbcglobal .net>,
>> Paul Hsieh <qe*@pobox.co m> wrote: [...]
>>>As to being fast -- that's impossible unless the functions are
>>>absolutely
>>>trivial. The C-library basically imposes an additional minimum O(n)
>>>on all non-trivial string manipulations.
>>
>> Can you give an example of a nontrivial string manipulation that
>> doesn't
>> already have O(n) time? I strongly suspect that anything you can come
>> up with could be done with the C string library with no additional
>> overhead.
>
> Concatenation of a string of length m with a string of length n is
> O(n+m) using strcat, but O(m) if you use a string type that has its
> length explicitly stored, rather than indicated by a sentinel.

Using no additional overhead [1], remember

Remember?!?!? Remember where?

In a size_t object.
In one of your processor's 6 precious
registers?
<shrug> The number of registers my processors have is not something that
concerns me when I'm writing portable code. For all I know, the program
might be running on Peter Seebach.
You also have to *remember* how much memory you have
allocated and make sure you don't spill over as well, BTW.
Thanks for reminding me. It had quite slipped my mind.
Oh yes,
and if you are communicating with a library are you going to pass
these remembered quantities around along with the string data?
That would be wise, don't you agree?
Or
will you let it work it all out with strlen by itself?
That depends on the library, of course.
Of course its
kind of hard to deduce the actual amount of memory from this
information so you either have to figure it all out from the caller
(thus duplicating some of the logic of the library) or you have to
pass it as a parameter (buring an additional register or stack.)
Yes. This is called "programmin g".
Or you could screw it and just buffer overflow like everyone else
does.

Can't be bothered.

This can all be managed perfectly satisfactorily using C strings and
temporary variables.

Which my library (and others) is living proof of, of course. Of
course trying to do it all by hand youself without a centralize
library ... well you read about the weekly buffer overflow attacks
that get reported to www.securityfocus.com or Risks Digest or
www.news.com to see what happens when you try to do that.

I've never seen any of my production programs reported there yet.

--
Richard Heathfield : bi****@eton.pow ernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton

Nov 13 '05 #45

Dan Pop

In <ne************ ********@tomato .pcug.org.au> Kevin Easton <kevin@-nospam-pcug.org.au> writes:

thing). The question is why builtin C strings use a sentinel method
rather than a length/end-pointer method to indicate their extent[%] - are
there any downsides to the latter?

On the PDP11, strcpy is simpler and faster with null-terminated strings;
here's the complete implementation, assuming the arguments are passed in
registers (DST is the register receiving the first argument, SRC is the
register receiving the second argument and R0 contains the return value):

STRCPY: MOV DST, R0
LOOP: MOVB (SRC)+, (DST)+
BNE LOOP
RET

But the real reason must be searched elsewhere. Languages using counted
strings provide a higher level API for string manipulation, i.e. they
take care of allocation issues in a transparent fashion and the character
count specifies not only the string length but also the size of the
space allocated to the string. If you copy a string, space for the
destination string will be automatically allocated, if you shrink a
string, the additional bytes will be automatically reclaimed by the
run time system. OTOH, such languages don't have pointers that can
point in the middle of a string and be effectively used as substrings.

The last sentence above also hints the advantage of C strings:
flexibility with minimum overhead:

char *path = "/foo/bar/baz.c";
char *file = strrchr(path, '/');
if (file == NULL) file = path;
else file++;

With counted strings, the above is impossible: a new string has to
be created to hold the file name.

C strings are well suited to a language like C, the only glitch is the
return value of strcmp and strcat: a pointer to the null character in the
destination string would be a lot more useful when concatenating together
many short strings.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 13 '05 #46

Kevin Easton

Dan Pop <Da*****@cern.c h> wrote:
[...]

But the real reason must be searched elsewhere. Languages using counted
strings provide a higher level API for string manipulation, i.e. they
take care of allocation issues in a transparent fashion and the character
count specifies not only the string length but also the size of the
space allocated to the string. If you copy a string, space for the
destination string will be automatically allocated, if you shrink a
string, the additional bytes will be automatically reclaimed by the
run time system. OTOH, such languages don't have pointers that can
point in the middle of a string and be effectively used as substrings.

The last sentence above also hints the advantage of C strings:
flexibility with minimum overhead:

char *path = "/foo/bar/baz.c";
char *file = strrchr(path, '/');
if (file == NULL) file = path;
else file++;

With counted strings, the above is impossible: a new string has to
be created to hold the file name.
I was thinking about something more like a struct-that-isn't (similar to
_Complex, in some ways?) - where

_String path = "/foo/bar/baz.c";

creates path with a pointer to the start of the string literal and a
length of 14 - when you do:

_String file = strrchr(path, '/');

strrchr would return a _String with an internal pointer to the last / of
the string literal, and a length of 6 (so both _String objects reference
the same memory - more like augmented pointers than fully encapsulated
strings).
C strings are well suited to a language like C, the only glitch is the
return value of strcmp and strcat: a pointer to the null character in the
destination string would be a lot more useful when concatenating together
many short strings.

It would - it would also have been nice to have the limit-pointer
versions like strlcat().

- Kevin.

Nov 13 '05 #47

Dan Pop

In <ne************ ********@tomato .pcug.org.au> Kevin Easton <kevin@-nospam-pcug.org.au> writes:

Dan Pop <Da*****@cern.c h> wrote:
[...]
But the real reason must be searched elsewhere. Languages using counted
strings provide a higher level API for string manipulation, i.e. they
take care of allocation issues in a transparent fashion and the character
count specifies not only the string length but also the size of the
space allocated to the string. If you copy a string, space for the
destination string will be automatically allocated, if you shrink a
string, the additional bytes will be automatically reclaimed by the
run time system. OTOH, such languages don't have pointers that can
point in the middle of a string and be effectively used as substrings.

The last sentence above also hints the advantage of C strings:
flexibility with minimum overhead:

char *path = "/foo/bar/baz.c";
char *file = strrchr(path, '/');
if (file == NULL) file = path;
else file++;

With counted strings, the above is impossible: a new string has to
be created to hold the file name.

I was thinking about something more like a struct-that-isn't (similar to
_Complex, in some ways?) - where

_String path = "/foo/bar/baz.c";

creates path with a pointer to the start of the string literal and a
length of 14 - when you do:

_String file = strrchr(path, '/');

strrchr would return a _String with an internal pointer to the last / of
the string literal, and a length of 6 (so both _String objects reference
the same memory - more like augmented pointers than fully encapsulated
strings).

If you think about it deeper, you'll realise that it would take too much
complexity hidden behind a single language feature. You have to support
all the pointer operations on the _String type, but also provide special
operations for manipulating the pointer component and the length component
separately (e.g. you need to point your _String to some allocated memory
block or to truncate your _String). The semantics of == are also
"interestin g". The more I think about it, the more I see the
complexities of C++ creeping into C ;-)

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 13 '05 #48

James Antill

On Wed, 23 Jul 2003 07:47:08 +0000, Richard Heathfield wrote:

James Antill wrote:
strcpy: Most commonly used for buffer overflows,

That's a little unfair on strcpy. If the programmer is careful (as all
programmers should be), strcpy is perfectly safe.
That's a little optimistic, there are very few cases where you couldn't
just as easily use memcpy() ... that aren't errors.

I disagree (although of course that might just mean that I have less
experience of fighting malware than you do). I find strcpy to have

I meant that often you need to find the length and sanity check it
anyway, so you almost always have all the inputs you need for a call to
memcpy()
expressive power, which is why I prefer it to memcpy when strings are
involved.

This is nice, like using NULL instead of 0, the problem comes when you
have a length metadata variable that is implicitly part of the call (Ie.
things change if/when you alter it) ... but doesn't appear in the
arguments.

as with all the str*
functions to create data the two inputs cannot be the same.

Why would you want to copy a string onto itself?

I've seen code like...

strcpy(s1, s1 + 1);

Um, yes, I've seen code like that too. My LART had memmove written on it (on
the bit just surrounding the sticky-out nail), in large letters. Once the
blood had stopped flowing out quite so freely.

*breaks into song* ... "If I had a LART, I'd LART all over this world."

Of course there's six string functions to add data (including s(n)printf())
and only one memmove().

--
James Antill -- ja***@and.org
Need an efficent and powerful string library for C?
http://www.and.org/vstr/

Nov 13 '05 #49

Dan Pop

In <pa************ *************** *@and.org> "James Antill" <ja***********@ and.org> writes:

On Wed, 23 Jul 2003 10:49:39 +0000, Dan Pop wrote:
All because the c
library string APIs are deficient ... which is pretty much what was
argued.
The only defficiency I can see is that strcpy and strcat (and friends)
have a (mostly) useless return value. For the rare cases when this is

That's the only defficiency?
Maybe you meant that's the only defficiency in the example. Arbitrary
sized source, source with NIL characters, substituting data, removing
parts of the data or dynamically working out what size the destination
needs to be to hold all the data. These are all handled poorly or not at
all.

You're badly missing the point of C strings. They are not supposed to
provide a general solution to *any* text manipulation problem. If you
need Perl, you know where to find it.

a problem, sprintf provides a solution without needing to reinvent
anything and without having to take the overhead of repetitive strcat()
calls (sprintf has its own overhead, but it is constant per call).

1. A lot of people don't normally see sprintf()/snprintf() used like this,
and so it's much easier for them to understand something that looks like
strcpy()/strncpy() with the correct semantics.

Arguments based on people's incompetence are bogus. Especially in a case
like this, where it is trivial to figure out what happens, even if you
aren't familiar with the technique.
2. People who sometimes use sprintf()/snprintf() in this way screw it up
enough that I would recommend something easier to use.
See above. People can easily misuse each and every feature of the
language and it's library.
3. The constant overhead for sprintf() is non-trivial, so you might as
well use the stpcpy() solution anyway ...
Only if, after profiling, you have determined that this is the performance
bottleneck of your application. Only fools microoptimise before
determining whether it is necessary or not. Unlike sprintf(), stpcpy()
is not a standard library function. Therefore, its usage reduces the
code readability, which is not acceptable without a *good* reason.
or think ahead and use something
better where other people have already written/tested the extra functions
for you.

Same comment as above: using extra functions reduces the code readability.
So, there must be a compelling reason for using them.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 13 '05 #50

Similar topics