473,785 Members | 2,297 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How is strlen implemented?

roy
Hi,

I was wondering how strlen is implemented.
What if the input string doesn't have a null terminator, namely the
'\0'?
Thanks a lot
Roy

Nov 14 '05
66 7790
On Sun, 24 Apr 2005 05:02:10 +0000, Keith Thompson wrote:
Stan Milam <st*****@swbell .net> writes:
Keith Thompson wrote:
Mark McIntyre <ma**********@s pamcop.net> writes:
On 22 Apr 2005 20:59:49 -0700, in comp.lang.c , "roy"
<ro*****@ho tmail.com> wrote:
[...]

>But from my experimental results, it seems
>that strlen can still return the number of characters of a char array. [...]>I am just not sure whether I am just lucky or sth else happened inside
>strlen.

lucky
No, if he'd been lucky it would have crashed the program (with a
meaningful diagnostic) rather than quietly returning a meaningless
result.


So, you are saying this is a poorly implemented compiler?


Not at all.

First, strlen() is part of the runtime library, not part of the
compiler.


It is part of the implementation which covers both compiler and library.
Many compilers can generate their own inline code for strlen() in which
case the "library" as a separate concept has little to do with it.

Lawrence
Nov 14 '05 #31

Flash Gordon wrote:
Gregory Pietsch wrote:
I checked my libraries,
Do you mean your personal libraries or your implementations . Remember

that the implementation is allowed to do things you are not allowed to do.

It was my implementation, based on unravelling the "while(*s)s ++" loop.
> and the following may be faster than the above:
What above? Please quote enough of the message you are replying to

for us to see what you are talking about. There is an option that gets
Google to do the right thing and if you search the group I'm sure you will find the instructions. It's in someone's sig, but I can't remember who.
#include <string.h>
#ifndef _OPTIMIZED_FOR_ SIZE
An implementation could declare that or not for any reason it wants.


If _OPTIMIZED_FOR_ SIZE is declared, the implementation tries to unravel
the "while(*s)s ++" loop somewhat.
#include <limits.h>
/* Nonzero if either X or Y is not aligned on a "long" boundary. */ #ifdef _ALIGN
Again, a compiler could declare that or not as it saw fit.


There's no way to portably detect whether a pointer-to-char is aligned
on a long boundary, is there?
#define UNALIGNED1(X) ((long)X&(sizeo f(long)-1))
There is no guarantee that this will tell you if it is aligned. Some
people around here have worked on word addressed systems where the

byte within the word was flagged in the *high* bits of the address.
I bet that makes for some funky internal pointer arithmetic!
#else
#define UNALIGNED1(X) 0
#endif

/* Macros for detecting endchar */
#if ULONG_MAX == 0xFFFFFFFFUL
#define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080)
Misleading name, I initially read that as a screwy attempt to detect

a NULL pointer. DETECTNULCHAR would be better.
#elif ULONG_MAX == 0xFFFFFFFFFFFFF FFFUL
/* Nonzero if X (a long int) contains a NULL byte. */
#define DETECTNULL(X) (((X) - 0x0101010101010 101) & ~(X) &
0x8080808080808 080)
#else
#define _OPTIMIZED_FOR_ SIZE
Isn't that macro you are defining in the implementation name space?
Anything could happen.


I tried two types of optimizations, one for time (try to unravel the
loop) and one for size. If I don't get a kind of system where casting
a pointer-to-char to a pointer-to-unsigned-long doesn't make much
sense, #defining _OPTIMIZED_FOR_ SIZE allows me to leave out code that
wouldn't work in that situation.
#endif

#ifdef DETECTNULL
#define DETECTCHAR(X,MA SK) DETECTNULL(X^MA SK)
#endif

#endif
/* strlen */
size_t (strlen)(const char *s)
{
const char *t = s;
#ifndef _OPTIMIZED_FOR_ SIZE
unsigned long *aligned_addr;

if (!UNALIGNED1(s) ) {
aligned_addr = (unsigned long *) s;
while (!DETECTNULL(*a ligned_addr))
aligned_addr++;


The above could read bytes off the end of a properly nul terminated
string. For example,
size_t len = strlen("a");


I'm testing for having a null character somewhere among the characters
that make up the area that aligned_addr points to. If I don't get a
sane environment (as indicated by the _OPTIMIZED_FOR_ SIZE macro), this
code isn't even compiled in.

Here's the general idea: suppose, for example, sizeof(unsigned long) is
4. I can freely cast a pointer-to-char to a pointer-to-unsigned-long. I
don't care if *aligned_addr is big-end-aligned or little-end-aligned.
Oh, well, is there a better way to unravel "while(*s)s ++"?
/* The block of bytes currently pointed to by aligned_addr
contains a null. We catch it using the bytewise search. */ s = (const char *) aligned_addr;
}
#endif
while (*s)
s++;
return (size_t) (s - t);
No need to cast the result of the subtraction. The compiler already
knows is is returning a size_t so will do the conversion anyway.


The cast is only for my eyes. ;-)
}

/* Gregory Pietsch */

--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.


Gregory Pietsch

Nov 14 '05 #32
Lawrence Kirby <lk****@netacti ve.co.uk> writes:
On Sun, 24 Apr 2005 05:02:10 +0000, Keith Thompson wrote:

[...]
First, strlen() is part of the runtime library, not part of the
compiler.


It is part of the implementation which covers both compiler and library.
Many compilers can generate their own inline code for strlen() in which
case the "library" as a separate concept has little to do with it.


You're right. I should have said that strlen() is *typically
implemented as* part of the runtime library, not part of the compiler.
(I don't know how many compilers generate inline code, and therefore
how accurate "typically" is.)

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #33
In article <11************ **********@l41g 2000cwc.googleg roups.com>,
"roy" <ro*****@hotmai l.com> wrote:
Thanks. Maybe my question should be "what if the input is a char array
without a null terminator". But from my experimental results, it seems
that strlen can still return the number of characters of a char array.
I am just not sure whether I am just lucky or sth else happened inside
strlen.


You are not lucky, you are unlucky.

If you were lucky, your program would crash as soon as try this, and
then you would know there is a bug that needs fixing. If you are
unlucky, you get a result that doesn't show the bug.
Nov 14 '05 #34

"Keith Thompson" <ks***@mib.or g> wrote in message
news:ln******** ****@nuthaus.mi b.org...
Lawrence Kirby <lk****@netacti ve.co.uk> writes:
On Sun, 24 Apr 2005 05:02:10 +0000, Keith Thompson wrote:

[...]
First, strlen() is part of the runtime library, not part of the
compiler.


It is part of the implementation which covers both compiler and library.
Many compilers can generate their own inline code for strlen() in which
case the "library" as a separate concept has little to do with it.


You're right. I should have said that strlen() is *typically
implemented as* part of the runtime library, not part of the compiler.
(I don't know how many compilers generate inline code, and therefore
how accurate "typically" is.)

Several common compilers, both commercial and free software, have both
in-line and library implementations , as provided for in standard C (both C89
and C99). In normal usage, not allowing for both possibilities would open
up the possibility of Undefined Behavior.
Nov 14 '05 #35
In article <11************ **********@z14g 2000cwz.googleg roups.com>,
Gregory Pietsch <GK**@flash.net > wrote:
There's no way to portably detect whether a pointer-to-char is aligned
on a long boundary, is there?
No (at least, not if by "portable" you mean what we usually do in
comp.lang.c :-) ... there are versions that are "portable" to those
systems that define an alignment function or macro, such as all
the BSD variants).

[code using things like]
#define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080)

I tried two types of optimizations, one for time (try to unravel the
loop) and one for size. ... Here's the general idea: suppose, for example, sizeof(unsigned long) is
4. I can freely cast a pointer-to-char to a pointer-to-unsigned-long. I
don't care if *aligned_addr is big-end-aligned or little-end-aligned.
Oh, well, is there a better way to unravel "while(*s)s ++"?


Maybe, maybe not. It is quite CPU-dependent.

For whatever it is worth (perhaps not much at this point), I tried
the above trick in SPARC assembly code when I was writing the 4.4BSD
C library routines for the SPARC. (I wrote many of the "portable"
routines as well; we set things up so that when you built for VAX,
Tahoe, or SPARC, you got either the machine-specific version or the
generic, depending on whether we had written a machine-specific
version.)

The result was that the fancy version using "four byte at a time"
scans (on aligned pointers) was significantly *slower* than the
dumb, simple, one-byte-at-a-time version, even for relatively long
strings. I was a bit surprised; and the results might be different
on a more modern CPU (this was back in 1991 or so).

(I wrote the whole thing in assembly -- well, in C at first, compiled
to assembly, then hand-edited -- so I know it was not the compiler
doing anything tricky, either.)

It turns out that in most C programs, most strings are very short.
The "Dhrystone" tests that many people used to use to compare C
library implementations use strings that are significantly longer
than average, and overemphasize the time behavior of strlen(),
strcpy(), and strcmp() on relatively long strings. Even for these
longer strings, the "optimized" strlen() was still slower.

Of course, this "most C strings are short" rule of thumb may come
about because most C libraries are optimized for short strings
because most strings are short because most C libraries are optimized
for short strings, etc. :-) In other words, if you have a lot of
long strings, and you do program optimization, you will avoid
calling strlen() on them so much.

Even if one breaks this initial chicken-and-egg loop (by calling
strlen() repeatedly on long strings), and then optimizes the heck
out of strlen(), one can probably still speed up one's programs by
fixing the repeated calls to strlen(). There is another rule of
thumb that applies beyond just C programming, or even computers:

The shortest, fastest, cheapest, and most reliable parts of
any system are the ones that are not there.

(This is another way of putting the "KISS" principle. Of course,
marketing usually gets in the way of this idea. :-) )
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #36
Keith Thompson wrote:
Stan Milam <st*****@swbell .net> writes:
Keith Thompson wrote:
Mark McIntyre <ma**********@s pamcop.net> writes:

On 22 Apr 2005 20:59:49 -0700, in comp.lang.c , "roy"
<ro*****@ho tmail.com> wrote:

[...]
>But from my experimental results, it seems
>that strlen can still return the number of characters of a char array.
[...]
I am just not sure whether I am just lucky or sth else happened inside
>strlen.

lucky

No, if he'd been lucky it would have crashed the program (with a
meaningful diagnostic) rather than quietly returning a meaningless
result.


So, you are saying this is a poorly implemented compiler?

Not at all.

First, strlen() is part of the runtime library, not part of the
compiler.

An implementation of strlen() that was able to detect the case where
the argument points to the first element of an array that doesn't
contain any '\0' characters would most likely add significant overhead
to *all* operations. The obvious way to implement it is to make all
pointers "fat", so each pointer includes both the base address and
bounds information; strlen() would then have to check the bounds.


A simpler way would be to insert a padding byte containing zero after
every char array.

-Peter

--
Pull out a splinter to reply.
Nov 14 '05 #37
Chris Torek wrote:
.... snip ...
Even if one breaks this initial chicken-and-egg loop (by calling
strlen() repeatedly on long strings), and then optimizes the heck
out of strlen(), one can probably still speed up one's programs by
fixing the repeated calls to strlen(). There is another rule of
thumb that applies beyond just C programming, or even computers:

The shortest, fastest, cheapest, and most reliable parts of
any system are the ones that are not there.

(This is another way of putting the "KISS" principle. Of course,
marketing usually gets in the way of this idea. :-) )


My suggestion is to try to return the length from most routines
that uncover it, in place of an insipid pointer to the original
string. strlcpy and strlcat follow this practice. So do printf
and sprintf.

--
Chuck F (cb********@yah oo.com) (cb********@wor ldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net> USE worldnet address!
Nov 14 '05 #38
Stan Milam wrote:

Stan Milam wrote:
Keith Thompson wrote:


No, if he'd been lucky it would have crashed the program (with a
meaningful diagnostic) rather than quietly returning a meaningless
result.


So, you are saying this is a poorly implemented compiler?


Okay guys, that was a joke.


No, it wasn't.
Your posts in the "C FAQ 3.1" thread show that you don't see
the beauty of the concept of undefined behavior.

If you're going to write bad code,
then the C standard committee doesn't care about
what happens as a consequence.

This philosophy was in C originally,
and is maintained in the current C99 standard.

It's not that R was in too much of a hurry specifying C,
so that he didn't have enough time
to also specify what garbage code should do,
but rather it's the case that compiler writers
are in too much of a hurry writing compilers
to want to care about how to translate garbage code.

--
pete
Nov 14 '05 #39
Gregory Pietsch wrote:

There has to be a null terminator somewhere.

Here's a short implementation:

#include <string.h>
size_t (strlen)(char *s)
{
char *p = s;

while (*p != '\0')
p++;
return (size_t)(p - s);
}


The ptrdiff_t type of (p - s) disqualifies this code
from being an example of portable C code.

If the following description of undefined behavior doesn't
apply to your code, then it doesn't apply to anything.
N869
6.5.6 Additive operators
[#9] When two pointers are subtracted, both shall point to
elements of the same array object, or one past the last
element of the array object; the result is the difference
of the subscripts of the two array elements. The size of
the
result is implementation-defined, and its type (a signed
integer type) is ptrdiff_t defined in the <stddef.h> header.
If the result is not representable in an object of that
type, the behavior is undefined.

--
pete
Nov 14 '05 #40

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

45
11729
by: Matt Parkins | last post by:
Hi, (I realise this probably isn't precisely the right group for this - could someone direct me to the appropriate group to post this question? - thanks !) I'm using Visual C++ 2005 Express Edition Beta (free download from MS - hooray!), and everything works fine, except I get warnings back on the use of some functions, strlen() for example, saying that the function has been deprecated - although they do still work (which is I guess...
81
7354
by: Matt | last post by:
I have 2 questions: 1. strlen returns an unsigned (size_t) quantity. Why is an unsigned value more approprate than a signed value? Why is unsighned value less appropriate? 2. Would there be any advantage in having strcat and strcpy return a pointer to the "end" of the destination string rather than returning a
33
2977
by: apropo | last post by:
what is wrong with this code? someone told me there is a BAD practice with that strlen in the for loop, but i don't get it exactly. Could anyone explain me in plain english,please? char *reverse(char *s) { int i; char *r; if(!s) return NULL;//ERROR r=calloc(strlen(s)+1,sizeof(char));
53
717
by: ¬a\\/b | last post by:
strlen is wrong because can not report if there is some error e.g. char *a; and "a" point to an array of size=size_t max that has no 0 in it
0
9647
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9489
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
10100
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9959
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8988
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6744
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
4061
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3665
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2893
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.