473,409 Members | 1,967 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,409 software developers and data experts.

strlen() speed test

I need to write a couple of my own string manipulation routines (e.g.
a strcpy() alternative that returns the number of chars copied). I've
started with one of the simpler functions, strlen(). I've written a
very simple version call StringLength(), but it performs significantly
slower than its CRT counterparts.

Here's my implementation:
inline unsigned int StringLength( const char *pszString )
{
unsigned int cch = 0;
while ( *pszString++ )
cch++;
return cch;
}

I've compared my version to strlen() and _mbslen() using my own timer
function, which is a wrapper around the Windows
QueryPerformanceCounter() API.

My only guess is that the CRT versions use the processor registers for
its counters and my version uses RAM. (You can't make use of the
"register" keyword with the VC++ compiler.)

How can I make my string manipulation functions compete speedwise with
the CRT functions?

Jul 23 '05 #1
12 4391
Nollie wrote:
I need to write a couple of my own string manipulation routines (e.g.
a strcpy() alternative that returns the number of chars copied). I've
started with one of the simpler functions, strlen(). I've written a
very simple version call StringLength(), but it performs significantly
slower than its CRT counterparts.

Here's my implementation:
inline unsigned int StringLength( const char *pszString )
{
unsigned int cch = 0;
while ( *pszString++ )
cch++;
return cch;
}
How about:

inline unsigned int StringLength( const char *pszString )
{
const char* p = pszString;
while (*p) p++;
return p - pszString;
}
I've compared my version to strlen() and _mbslen() using my own timer
function, which is a wrapper around the Windows
QueryPerformanceCounter() API.

My only guess is that the CRT versions use the processor registers for
its counters and my version uses RAM.
The compiler will decide what to use. Did you set maximum optimization
level?
(You can't make use of the "register" keyword with the VC++ compiler.)
Why?
How can I make my string manipulation functions compete speedwise with
the CRT functions?


There might be some optimizations that you can do (e.g. not go though each
single character in the loop but instead checking multiple characters at
once, handling the string end separately).
OTOH, the library routine possibly uses some special assembler instructions
to speed this up. Systems like x86 have special built-in string
instructions to search quickly for a specific value in an array.

Jul 23 '05 #2
>> (You can't make use of the "register" keyword with the VC++ compiler.)
Why? Dunno. Here's MS docs:
http://msdn.microsoft.com/library/de...er_keyword.asp
How about:
inline unsigned int StringLength( const char *pszString )
{
const char* p = pszString;
while (*p) p++;
return p - pszString;
} Thank you! Yes, this is a better algorithm. However, though it does
make StringLength() faster, it still does not rival the CRT functions.
The compiler will decide what to use. Did you set maximum optimization
level?

Yes, this is the answer. Sorry, I should have tried this before
posting. My timer only worked in unoptimized, DEBUG builds, but I've
rewritten it to work for all builds. StringLength() and strlen() now
use practically the exact same processing time.

However, now I have discovered another peculiarity...

Comparing 6 strlen() alternatives:
1. strlen() // CRT function
2. _tcslen() // redundant, maps to strlen
3. lstrlen() // Windows API
4. StringLength() // my function
5. _mbslen() // for multibyte strings
6. _mbstrlen() // for multibyte strings

Every one uses almost the exact same processing time, except _mbslen()
and _mbstrlen(), which are almost twice as fast as the others. I've
arranged the test so that the _mbs* functions are tested first, last,
and in between, so caching is not an issue.

It seems to me the _mbs* should be slower because they have to check
for double-byte character, but they are indeed quite faster. Why might
this be?

Jul 23 '05 #3

Nollie wrote:
(You can't make use of the
"register" keyword with the VC++ compiler.)

Says who?

Brian

Jul 23 '05 #4
>Says who?

Topic = strlen() speed test.

http://msdn.microsoft.com/library/de...er_keyword.asp
Jul 23 '05 #5
>I need to write a couple of my own string manipulation routines (e.g.
a strcpy() alternative that returns the number of chars copied). I've
started with one of the simpler functions, strlen(). I've written a
very simple version call StringLength(), but it performs significantly
slower than its CRT counterparts.


Which it may well do.
Vendor implementations may well be written in assembler.
In particular for 32-bit Intel, there is a trick where it is possible to
process 4 bytes at once doing, bit manpulation to see if the '\0' is in any
of the 4 bytes. And if MMX is available, 8 bytes can be processed at once.
And for either, both are likely to faster than your code.

But in any case, why bother.

Stephen Howe
Jul 23 '05 #6
Nollie wrote:
I need to write a couple of my own string manipulation routines (e.g.
a strcpy() alternative that returns the number of chars copied). I've
started with one of the simpler functions, strlen(). I've written a
very simple version call StringLength(), but it performs significantly
slower than its CRT counterparts.

Here's my implementation:
inline unsigned int StringLength( const char *pszString )
{
unsigned int cch = 0;
while ( *pszString++ )
cch++;
return cch;
}

I've compared my version to strlen() and _mbslen() using my own timer
function, which is a wrapper around the Windows
QueryPerformanceCounter() API.

My only guess is that the CRT versions use the processor registers for
its counters and my version uses RAM. (You can't make use of the
"register" keyword with the VC++ compiler.)

How can I make my string manipulation functions compete speedwise with
the CRT functions?


Here are some {processor specific} ideas:
1. Rewrite in assembly using processor string instructions.
2. Instead of fetching one char at a time, fetch as many
as will fit into the processor's native register.
For example, if the processor register is 32-bits and
a char is 8-bits, then fetch 4 chars at a time.

The reasoning is the time for a "fetch" from memory
costs the same whether it is one char or one word.
So maximize this time.

3. If the processor has the ability, then fill as many
registers as possible during one fetch instruction
cycle. Note that there is a penalty for short strings,
because there is the wasted extra data.

So, why are you wasting your time optimizing library
routines?
Does your program work correctly?
Are the user's complaining about the speed?
Is the program missing events due to slowness?
Is your program causing the computer to miss events
because it is too slow, or locking up resources?

Don't optimize until the program works correctly.

I had a program that programmed a Flash memory device,
which took 5 minutes (but it worked correctly). I
reread the data sheet and optimized it, then the program
took 30 seconds. And yes, developers were complaining
about the time.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library
Jul 23 '05 #7

Nollie wrote:

[context restored]
(You can't make use of the
"register" keyword with the VC++ compiler.)
Says who?


[MSDN link]
So by "make use of" you mean "can't get a register". Well, I'll break
the news that it's the same for almost every modern implementation.
Microsoft is just upfront about it.

Brian

Jul 23 '05 #8
>So, why are you wasting your time optimizing library
routines?


It all started with a need for a strcpy() alternative that returns the
number of characters copied. But I actually have an OCD complex where
I have to know exactly how everything works. ;-) The blackbox concept
of programming with libraries and modules irks me to no end.

I have, however, decided that the you guys are right, and that the CRT
functions will serve my purpose just fine. Thanks for all the input.
This newsgroup really is a great example of how helpful Usenet can be.
Jul 23 '05 #9
Nollie wrote:
So, why are you wasting your time optimizing library
routines?


It all started with a need for a strcpy() alternative that returns the
number of characters copied. But I actually have an OCD complex where
I have to know exactly how everything works. ;-) The blackbox concept
of programming with libraries and modules irks me to no end.


Then you should use more opensource/free software that comes with complete
source code. At least you don't need to write things yourself then, but
just look into the sources to "know exactly how everything is working". ;-)

Jul 23 '05 #10
The CRT source is available. I can't remember where you get it but it's
on my PC and I put a breakpoint in strlen. Here's the source. It's in
assembler and reads 4 bytes at a time (if it's word aligned) -
CODESEG

public strlen

strlen proc

.FPO ( 0, 1, 0, 0, 0, 0 )

string equ [esp + 4]

mov ecx,string ; ecx -> string
test ecx,3 ; test if string is aligned on
32 bits
je short main_loop

str_misaligned:
; simple byte loop until string is aligned
mov al,byte ptr [ecx]
inc ecx
test al,al
je short byte_3
test ecx,3
jne short str_misaligned

add eax,dword ptr 0 ; 5 byte nop to align label
below

align 16 ; should be redundant

main_loop:
mov eax,dword ptr [ecx] ; read 4 bytes
mov edx,7efefeffh
add edx,eax
xor eax,-1
xor eax,edx
add ecx,4
test eax,81010100h
je short main_loop
; found zero byte in the loop
mov eax,[ecx - 4]
test al,al ; is it byte 0
je short byte_0
test ah,ah ; is it byte 1
je short byte_1
test eax,00ff0000h ; is it byte 2
je short byte_2
test eax,0ff000000h ; is it byte 3
je short byte_3
jmp short main_loop ; taken if bits 24-30 are clear
and bit
; 31 is set

byte_3:
lea eax,[ecx - 1]
mov ecx,string
sub eax,ecx
ret
byte_2:
lea eax,[ecx - 2]
mov ecx,string
sub eax,ecx
ret
byte_1:
lea eax,[ecx - 3]
mov ecx,string
sub eax,ecx
ret
byte_0:
lea eax,[ecx - 4]
mov ecx,string
sub eax,ecx
ret

strlen endp

end

Jul 23 '05 #11
Peter Smithson wrote:
The CRT source is available. I can't remember where you get it but it's
on my PC and I put a breakpoint in strlen. Here's the source. It's in
assembler and reads 4 bytes at a time (if it's word aligned) -
CODESEG

public strlen

strlen proc

.FPO ( 0, 1, 0, 0, 0, 0 )

string equ [esp + 4]

mov ecx,string ; ecx -> string
test ecx,3 ; test if string is aligned on


This doesn't work on my embedded ARM7TDMI platform.
Care to post a solution in:
ARM Assembly
Z80 Assembly
8051 Assembly
Sparc Assembly
.. etc.

In otherwords, assembly is off-topic in this newsgroup.
However, the algorithm for assembly, when coded in C++
is on topic. One can have good luck at forcing the
compiler to generate desired code by writing simple
C++ that imitates the assembly language statements.
--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library
Jul 23 '05 #12

Thomas Matthews wrote:
In otherwords, assembly is off-topic in this newsgroup.
However, the algorithm for assembly, when coded in C++
is on topic. One can have good luck at forcing the
compiler to generate desired code by writing simple
C++ that imitates the assembly language statements.


OK - I'll not do it again.

In my defence the poster is asking how he can write code as fast as the
CRT code that comes with the compiler he's using with (I know, OT). By
looking at the assembly you can see how to improve the C++ algorithim.
Would it have been on topic if I'd said "I've looked at the assembly to
the routines you've been asking about .." and then described the
algorithim without using C++ or assembler but just words?

This reminds me of how in the UK the actual voice of IRA leaders were
not allowed on the TV in case they corrupt us. So we'd see news clips
of them with dubed over voices!

Jul 23 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: Jordanakins | last post by:
Usenet, I am currently working on my website and am needing to detect the connection speed of the client. I would like to do this in PHP and not use any other languages. This makes it a bit more...
45
by: Matt Parkins | last post by:
Hi, (I realise this probably isn't precisely the right group for this - could someone direct me to the appropriate group to post this question? - thanks !) I'm using Visual C++ 2005 Express...
81
by: Matt | last post by:
I have 2 questions: 1. strlen returns an unsigned (size_t) quantity. Why is an unsigned value more approprate than a signed value? Why is unsighned value less appropriate? 2. Would there...
33
by: apropo | last post by:
what is wrong with this code? someone told me there is a BAD practice with that strlen in the for loop, but i don't get it exactly. Could anyone explain me in plain english,please? char...
66
by: roy | last post by:
Hi, I was wondering how strlen is implemented. What if the input string doesn't have a null terminator, namely the '\0'? Thanks a lot Roy
9
by: No Such Luck | last post by:
I have a function which requires me to loop from the end of a string to the beginning on a char by char basis: int foo (char string) { unsigned int i; for(i = strlen(string); i >= 0; i--) {...
11
by: Sezai YILMAZ | last post by:
Hello I need high throughput while inserting into PostgreSQL. Because of that I did some PostgreSQL insert performance tests. ------------------------------------------------------------ --...
44
by: sam_cit | last post by:
Hi Everyone, I tried the following program unit in Microsoft Visual c++ 6.0 and the program caused unexpected behavior, #include <stdio.h> #include <string.h> int main() {
31
by: Jean-Marc Bourguet | last post by:
jacob navia <jacob@nospam.comwrites: I'd first align and then use that. You may get a trap with unaligned access even on machine where unaligned access doesn't normally trap if you stump on...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.