467,077 Members | 1,033 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,077 developers. It's quick & easy.

strlen() speed test

I need to write a couple of my own string manipulation routines (e.g.
a strcpy() alternative that returns the number of chars copied). I've
started with one of the simpler functions, strlen(). I've written a
very simple version call StringLength(), but it performs significantly
slower than its CRT counterparts.

Here's my implementation:
inline unsigned int StringLength( const char *pszString )
{
unsigned int cch = 0;
while ( *pszString++ )
cch++;
return cch;
}

I've compared my version to strlen() and _mbslen() using my own timer
function, which is a wrapper around the Windows
QueryPerformanceCounter() API.

My only guess is that the CRT versions use the processor registers for
its counters and my version uses RAM. (You can't make use of the
"register" keyword with the VC++ compiler.)

How can I make my string manipulation functions compete speedwise with
the CRT functions?

Jul 23 '05 #1
  • viewed: 3787
Share:
12 Replies
Nollie wrote:
I need to write a couple of my own string manipulation routines (e.g.
a strcpy() alternative that returns the number of chars copied). I've
started with one of the simpler functions, strlen(). I've written a
very simple version call StringLength(), but it performs significantly
slower than its CRT counterparts.

Here's my implementation:
inline unsigned int StringLength( const char *pszString )
{
unsigned int cch = 0;
while ( *pszString++ )
cch++;
return cch;
}
How about:

inline unsigned int StringLength( const char *pszString )
{
const char* p = pszString;
while (*p) p++;
return p - pszString;
}
I've compared my version to strlen() and _mbslen() using my own timer
function, which is a wrapper around the Windows
QueryPerformanceCounter() API.

My only guess is that the CRT versions use the processor registers for
its counters and my version uses RAM.
The compiler will decide what to use. Did you set maximum optimization
level?
(You can't make use of the "register" keyword with the VC++ compiler.)
Why?
How can I make my string manipulation functions compete speedwise with
the CRT functions?


There might be some optimizations that you can do (e.g. not go though each
single character in the loop but instead checking multiple characters at
once, handling the string end separately).
OTOH, the library routine possibly uses some special assembler instructions
to speed this up. Systems like x86 have special built-in string
instructions to search quickly for a specific value in an array.

Jul 23 '05 #2
>> (You can't make use of the "register" keyword with the VC++ compiler.)
Why? Dunno. Here's MS docs:
http://msdn.microsoft.com/library/de...er_keyword.asp
How about:
inline unsigned int StringLength( const char *pszString )
{
const char* p = pszString;
while (*p) p++;
return p - pszString;
} Thank you! Yes, this is a better algorithm. However, though it does
make StringLength() faster, it still does not rival the CRT functions.
The compiler will decide what to use. Did you set maximum optimization
level?

Yes, this is the answer. Sorry, I should have tried this before
posting. My timer only worked in unoptimized, DEBUG builds, but I've
rewritten it to work for all builds. StringLength() and strlen() now
use practically the exact same processing time.

However, now I have discovered another peculiarity...

Comparing 6 strlen() alternatives:
1. strlen() // CRT function
2. _tcslen() // redundant, maps to strlen
3. lstrlen() // Windows API
4. StringLength() // my function
5. _mbslen() // for multibyte strings
6. _mbstrlen() // for multibyte strings

Every one uses almost the exact same processing time, except _mbslen()
and _mbstrlen(), which are almost twice as fast as the others. I've
arranged the test so that the _mbs* functions are tested first, last,
and in between, so caching is not an issue.

It seems to me the _mbs* should be slower because they have to check
for double-byte character, but they are indeed quite faster. Why might
this be?

Jul 23 '05 #3

Nollie wrote:
(You can't make use of the
"register" keyword with the VC++ compiler.)

Says who?

Brian

Jul 23 '05 #4
>Says who?

Topic = strlen() speed test.

http://msdn.microsoft.com/library/de...er_keyword.asp
Jul 23 '05 #5
>I need to write a couple of my own string manipulation routines (e.g.
a strcpy() alternative that returns the number of chars copied). I've
started with one of the simpler functions, strlen(). I've written a
very simple version call StringLength(), but it performs significantly
slower than its CRT counterparts.


Which it may well do.
Vendor implementations may well be written in assembler.
In particular for 32-bit Intel, there is a trick where it is possible to
process 4 bytes at once doing, bit manpulation to see if the '\0' is in any
of the 4 bytes. And if MMX is available, 8 bytes can be processed at once.
And for either, both are likely to faster than your code.

But in any case, why bother.

Stephen Howe
Jul 23 '05 #6
Nollie wrote:
I need to write a couple of my own string manipulation routines (e.g.
a strcpy() alternative that returns the number of chars copied). I've
started with one of the simpler functions, strlen(). I've written a
very simple version call StringLength(), but it performs significantly
slower than its CRT counterparts.

Here's my implementation:
inline unsigned int StringLength( const char *pszString )
{
unsigned int cch = 0;
while ( *pszString++ )
cch++;
return cch;
}

I've compared my version to strlen() and _mbslen() using my own timer
function, which is a wrapper around the Windows
QueryPerformanceCounter() API.

My only guess is that the CRT versions use the processor registers for
its counters and my version uses RAM. (You can't make use of the
"register" keyword with the VC++ compiler.)

How can I make my string manipulation functions compete speedwise with
the CRT functions?


Here are some {processor specific} ideas:
1. Rewrite in assembly using processor string instructions.
2. Instead of fetching one char at a time, fetch as many
as will fit into the processor's native register.
For example, if the processor register is 32-bits and
a char is 8-bits, then fetch 4 chars at a time.

The reasoning is the time for a "fetch" from memory
costs the same whether it is one char or one word.
So maximize this time.

3. If the processor has the ability, then fill as many
registers as possible during one fetch instruction
cycle. Note that there is a penalty for short strings,
because there is the wasted extra data.

So, why are you wasting your time optimizing library
routines?
Does your program work correctly?
Are the user's complaining about the speed?
Is the program missing events due to slowness?
Is your program causing the computer to miss events
because it is too slow, or locking up resources?

Don't optimize until the program works correctly.

I had a program that programmed a Flash memory device,
which took 5 minutes (but it worked correctly). I
reread the data sheet and optimized it, then the program
took 30 seconds. And yes, developers were complaining
about the time.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library
Jul 23 '05 #7

Nollie wrote:

[context restored]
(You can't make use of the
"register" keyword with the VC++ compiler.)
Says who?


[MSDN link]
So by "make use of" you mean "can't get a register". Well, I'll break
the news that it's the same for almost every modern implementation.
Microsoft is just upfront about it.

Brian

Jul 23 '05 #8
>So, why are you wasting your time optimizing library
routines?


It all started with a need for a strcpy() alternative that returns the
number of characters copied. But I actually have an OCD complex where
I have to know exactly how everything works. ;-) The blackbox concept
of programming with libraries and modules irks me to no end.

I have, however, decided that the you guys are right, and that the CRT
functions will serve my purpose just fine. Thanks for all the input.
This newsgroup really is a great example of how helpful Usenet can be.
Jul 23 '05 #9
Nollie wrote:
So, why are you wasting your time optimizing library
routines?


It all started with a need for a strcpy() alternative that returns the
number of characters copied. But I actually have an OCD complex where
I have to know exactly how everything works. ;-) The blackbox concept
of programming with libraries and modules irks me to no end.


Then you should use more opensource/free software that comes with complete
source code. At least you don't need to write things yourself then, but
just look into the sources to "know exactly how everything is working". ;-)

Jul 23 '05 #10
The CRT source is available. I can't remember where you get it but it's
on my PC and I put a breakpoint in strlen. Here's the source. It's in
assembler and reads 4 bytes at a time (if it's word aligned) -
CODESEG

public strlen

strlen proc

.FPO ( 0, 1, 0, 0, 0, 0 )

string equ [esp + 4]

mov ecx,string ; ecx -> string
test ecx,3 ; test if string is aligned on
32 bits
je short main_loop

str_misaligned:
; simple byte loop until string is aligned
mov al,byte ptr [ecx]
inc ecx
test al,al
je short byte_3
test ecx,3
jne short str_misaligned

add eax,dword ptr 0 ; 5 byte nop to align label
below

align 16 ; should be redundant

main_loop:
mov eax,dword ptr [ecx] ; read 4 bytes
mov edx,7efefeffh
add edx,eax
xor eax,-1
xor eax,edx
add ecx,4
test eax,81010100h
je short main_loop
; found zero byte in the loop
mov eax,[ecx - 4]
test al,al ; is it byte 0
je short byte_0
test ah,ah ; is it byte 1
je short byte_1
test eax,00ff0000h ; is it byte 2
je short byte_2
test eax,0ff000000h ; is it byte 3
je short byte_3
jmp short main_loop ; taken if bits 24-30 are clear
and bit
; 31 is set

byte_3:
lea eax,[ecx - 1]
mov ecx,string
sub eax,ecx
ret
byte_2:
lea eax,[ecx - 2]
mov ecx,string
sub eax,ecx
ret
byte_1:
lea eax,[ecx - 3]
mov ecx,string
sub eax,ecx
ret
byte_0:
lea eax,[ecx - 4]
mov ecx,string
sub eax,ecx
ret

strlen endp

end

Jul 23 '05 #11
Peter Smithson wrote:
The CRT source is available. I can't remember where you get it but it's
on my PC and I put a breakpoint in strlen. Here's the source. It's in
assembler and reads 4 bytes at a time (if it's word aligned) -
CODESEG

public strlen

strlen proc

.FPO ( 0, 1, 0, 0, 0, 0 )

string equ [esp + 4]

mov ecx,string ; ecx -> string
test ecx,3 ; test if string is aligned on


This doesn't work on my embedded ARM7TDMI platform.
Care to post a solution in:
ARM Assembly
Z80 Assembly
8051 Assembly
Sparc Assembly
.. etc.

In otherwords, assembly is off-topic in this newsgroup.
However, the algorithm for assembly, when coded in C++
is on topic. One can have good luck at forcing the
compiler to generate desired code by writing simple
C++ that imitates the assembly language statements.
--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library
Jul 23 '05 #12

Thomas Matthews wrote:
In otherwords, assembly is off-topic in this newsgroup.
However, the algorithm for assembly, when coded in C++
is on topic. One can have good luck at forcing the
compiler to generate desired code by writing simple
C++ that imitates the assembly language statements.


OK - I'll not do it again.

In my defence the poster is asking how he can write code as fast as the
CRT code that comes with the compiler he's using with (I know, OT). By
looking at the assembly you can see how to improve the C++ algorithim.
Would it have been on topic if I'd said "I've looked at the assembly to
the routines you've been asking about .." and then described the
algorithim without using C++ or assembler but just words?

This reminds me of how in the UK the actual voice of IRA leaders were
not allowed on the TV in case they corrupt us. So we'd see news clips
of them with dubed over voices!

Jul 23 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

13 posts views Thread by Jordanakins@gmail.com | last post: by
45 posts views Thread by Matt Parkins | last post: by
33 posts views Thread by apropo | last post: by
66 posts views Thread by roy | last post: by
9 posts views Thread by No Such Luck | last post: by
11 posts views Thread by Sezai YILMAZ | last post: by
44 posts views Thread by sam_cit@yahoo.co.in | last post: by
31 posts views Thread by Jean-Marc Bourguet | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.