473,836 Members | 1,333 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

strtok behavior with multiple consecutive delimiters

Hello, and good whatever daytime is at your place..
please can somebody tell me, what the standard behavior of strtok shall be,
if it encounters two or more consecutive delimiters like in
(checks omitted)

char tst[] = "this\nis\n\nan \nempty\n\n\nli ne";
^^^^ ^^^^^^
char *tok = strtok(tst, "\n");
tok = strtok(NULL, "\n");
and so on..

will the groups of '\n' marked above be consumed one by one or the whole
group together?

Thank you very much
May 6 '06
34 24843
an**@servocomm. freeserve.co.uk a écrit :
CBFalconer wrote:

The OP can simply use the following replacement function, which
does not have those objectionable features. The testing code is
longer than the function.

OTOH By using C++ life becomes more productive, less error prone,
less complicated and more elegant:

#include <sstream>
#include <string>
#include <vector>
#include <iostream>

int main()
{

char tst[] = "this\nis\n\nan \nempty\n\n\nli ne";

std::stringstre am s;
s << tst;

std::vector<std ::string> tokens;
while (! s.eof() ){
std::string str;
getline(s,str,' \n');
tokens.push_bac k(str);
}

for (std::vector<st d::string>::con st_iterator iter
= tokens.begin();
iter !=tokens.end();
++iter){
std::cout << "token: \""<< *iter <<"\"\n";
}

}

regards
Andy Little


I compiled your program in C++ using the VS 2005 compiler. The
executable size of that stuff was 180 224 bytes.

Then I compiled Chuck's version using his strtok function using the
lcc-win32 compiler (a C compiler, not a C++ one). The size was 14 645 bytes.

Then I eliminated output from both programs. Compiled them without any
optimizations and inserted a loop of 1 million times.

C++ took 1.234 seconds
C took 0.375 seconds

Then I compiled both programs using VS 2005 (64 bits) with full
optimization:

C++ took 0.234 seconds
C took 0.156 seconds

I do not say that this measurements are important for everybody. But
maybe they are important for *some* people.

jacob
May 6 '06 #11
jacob navia wrote:
I compiled your program in C++ using the VS 2005 compiler. The
executable size of that stuff was 180 224 bytes.
It comes out at around 112 K for me. What were your command line
options?
Then I compiled Chuck's version using his strtok function using the
lcc-win32 compiler (a C compiler, not a C++ one). The size was 14 645 bytes.

Then I eliminated output from both programs. Compiled them without any
optimizations and inserted a loop of 1 million times.

C++ took 1.234 seconds
C took 0.375 seconds

Then I compiled both programs using VS 2005 (64 bits) with full
optimization:

C++ took 0.234 seconds
C took 0.156 seconds
(It would be nice to see the full source code that you were testing
FWIW). C++ version did rather better than I would expect, good
optimiser! ...;-)
I do not say that this measurements are important for everybody. But
maybe they are important for *some* people.


Sure, C++ will handle the C-style code as well if necessary, but the
amount of time you need to spend writing, testing and debugging is a
major factor to some people too.

And of course ... In what real situation are you going to be spending a
long time tokenising string literals?

regards
Andy Little

May 6 '06 #12
Command line for non optimized version:
cl /EHsc toksplit.cpp
lc toksplit.c

Command line for the optimized version:
cl /Ox /EHsc toksplit.cpp
cl /OX toksplit.c
Here is the code
--------------------------------------------------toksplit.h
#ifndef H_toksplit_h
# define H_toksplit_h

# ifdef __cplusplus
extern "C" {
# endif

#include <stddef.h>

/* copy over the next token from an input string, after
skipping leading blanks (or other whitespace?). The
token is terminated by the first appearance of tokchar,
or by the end of the source string.

The caller must supply sufficient space in token to
receive any token, Otherwise tokens will be truncated.

Returns: a pointer past the terminating tokchar.

This will happily return an infinity of empty tokens if
called with src pointing to the end of a string. Tokens
will never include a copy of tokchar.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
*/

const char *toksplit(const char *src, /* Source of tokens */
char tokchar, /* token delimiting char */
char *token, /* receiver of parsed token */
size_t lgh); /* length token can receive */
/* not including final '\0' */

# ifdef __cplusplus
}
# endif
#endif
--------------------------------------------end of toksplit.h
Now toksplit.c
/* ------- file toksplit.c ----------*/
#include "toksplit.h "

/* copy over the next token from an input string, after
skipping leading blanks (or other whitespace?). The
token is terminated by the first appearance of tokchar,
or by the end of the source string.

The caller must supply sufficient space in token to
receive any token, Otherwise tokens will be truncated.

Returns: a pointer past the terminating tokchar.

This will happily return an infinity of empty tokens if
called with src pointing to the end of a string. Tokens
will never include a copy of tokchar.

A better name would be "strtkn", except that is reserved
for the system namespace. Change to that at your risk.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
*/

const char *toksplit(const char *src, /* Source of tokens */
char tokchar, /* token delimiting char */
char *token, /* receiver of parsed token */
size_t lgh) /* length token can receive */
/* not including final '\0' */
{
if (src) {
while (' ' == *src) *src++;

while (*src && (tokchar != *src)) {
if (lgh) {
*token++ = *src;
--lgh;
}
src++;
}
if (*src && (tokchar == *src)) src++;
}
*token = '\0';
return src;
} /* toksplit */

#include <stdio.h>

#define ABRsize 64 /* length of acceptable token abbreviations */

int main(void)
{
char teststring[] = "this\nis\n\nan \nempty\n\n\nli ne";

const char *t, *s = teststring;
int i;
char token[ABRsize + 1];
int count;

count=0;
do {
t = s; i = 0;
while (*t) {
t = toksplit(t, '\n', token, 64);
//putchar(i + '1'); putchar(':');
//puts(token);
i++;
}
count++;
} while (count < 1000000);
return 0;
} /* main */

--------------------------------------------------------------toksplit.c

Now the C++ version:
--------------------------------------------------------------toksplit.cpp
#include <sstream>
#include <string>
#include <vector>
#include <iostream>

int main()
{

char tst[] = "this\nis\n\nan \nempty\n\n\nli ne";

std::stringstre am s;
s << tst;

std::vector<std ::string> tokens;
int count=0;
do {
s << tst;
while (! s.eof() ){
std::string str;
getline(s,str,' \n');
tokens.push_bac k(str);
}

for (std::vector<st d::string>::con st_iterator iter
= tokens.begin();
iter !=tokens.end();
++iter){
//std::cout << "token: \""<< *iter <<"\"\n";
}
count++;
} while (count < 1000000);

}
--------------------------------------------------------------end of
toksplit.cpp

May 6 '06 #13
On 2006-05-06, jacob navia <ja***@jacob.re mcomp.fr> wrote:
[...]
I compiled your program in C++ using the VS 2005 compiler. The
executable size of that stuff was 180 224 bytes.

Then I compiled Chuck's version using his strtok function using the
lcc-win32 compiler (a C compiler, not a C++ one). The size was 14 645 bytes.

Then I eliminated output from both programs. Compiled them without any
optimizations and inserted a loop of 1 million times.

C++ took 1.234 seconds
C took 0.375 seconds

Then I compiled both programs using VS 2005 (64 bits) with full
optimization:

C++ took 0.234 seconds
C took 0.156 seconds

I do not say that this measurements are important for everybody. But
maybe they are important for *some* people.


Interesting. Can you do a timing of VS 2005 with full optimizations on
the C version? I think this would complete the picture.
May 6 '06 #14
jacob navia wrote:

Now the C++ version:
--------------------------------------------------------------toksplit.cpp
#include <sstream>
#include <string>
#include <vector>
#include <iostream>

int main()
{

char tst[] = "this\nis\n\nan \nempty\n\n\nli ne";

std::stringstre am s;
s << tst;

std::vector<std ::string> tokens;
int count=0;
do {
s << tst;
while (! s.eof() ){
std::string str;
getline(s,str,' \n');
tokens.push_bac k(str);
}

for (std::vector<st d::string>::con st_iterator iter
= tokens.begin();
iter !=tokens.end();
++iter){
//std::cout << "token: \""<< *iter <<"\"\n";
}
count++;
} while (count < 1000000);

}
--------------------------------------------------------------end of
toksplit.cpp

If you are going to eliminate output for comparison, you should comment
out the entire last for loop as the C version outputs inline.

Also, to make things more equal, remove the vector, as this is only used
to store tokens for output.

--
Ian Collins.
May 6 '06 #15
In comp.lang.c an**@servocomm. freeserve.co.uk wrote:
OTOH By using C++ life becomes more productive, less error prone,
less complicated and more elegant:
Not always... (digression warning)
std::cout << "token: \""<< *iter <<"\"\n";


IMHO this is harder for the programmer to read than

printf( "token: \"%s\"\n", str );

To a certain extent this is a question of religion, but the difference
between the prevailing styles becomes more pronounced with heavily
formatted output:

printf( "%6s %2.2f %-18s:%u\n", val1, val2, val3, val4 );

Accomplishing the same thing with std::cout would be messy.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cybers pace.org | don't, I need to know. Flames welcome.
May 6 '06 #16
In comp.lang.c Ben Pfaff <bl*@cs.stanfor d.edu> wrote:
* It can only be used once at a time. If a sequence of
strtok() calls is ongoing and another one is started,
the state of the first one is lost.


<ot>For OP, if this is a problem for you, strtok_r() may be
available, depending on your system and portability constraints.</ot>

For all the pitfalls of strtok(), it is still possible to use it
correctly for fun and profit, a point which I think has not been
emphasized in this thread. It may well be the appropriate function
for the OP, but of course given his question it might also be
unsuable.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cybers pace.org | don't, I need to know. Flames welcome.
May 6 '06 #17
Geometer wrote:
I did :). I just wanted to know if this is the behavior required by the
standard and whether there is a difference betwenn C and C++.


Can we add to the FAQ "Please don't ask about strtok(), because everyone
here is ready to complain about it in endless ways"?

--
Phlip
http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!
May 7 '06 #18

jacob navia wrote:
Command line for non optimized version:
cl /EHsc toksplit.cpp
(Assuming this is my original as above)
Using these switches comes out at 120 kb on my system
lc toksplit.c

Command line for the optimized version:
cl /Ox /EHsc toksplit.cpp


Using these switches, comes out at 112 kb on my system

regards
Andy Little

May 7 '06 #19
Ian Collins wrote:
If you are going to eliminate output for comparison, you should comment
out the entire last for loop as the C version outputs inline.

Also, to make things more equal, remove the vector, as this is only used
to store tokens for output.


FWIW Below is my version of the comparison. Moving the construction of
the stringstream into the loop really kills performance of the
stringstream version. However this is IMO a more realistic *simple*
useage . I also modified the other code into C++ style but thats by the
way. With this approach the C code is an order of magnitude faster ( I
had to decrease the number of loops to avoid waiting on the
stringstream code), but its not really a fair comparison. The killer of
the C version for me is that you cant have arbitrary length tokens. You
are limited to whatever the value of ABRsize is. If the C coders want
to write a version that can handle arbitrary length C style strings
then it would be a fairer comparison IMO, (though my previous comments
re ease of coding, testing etc remain) BTW I used boost timer for
timing. If you havent got the boost distro you will just have to modify
those parts. I'm too lazy to do that...

regards
Andy Little

#include <sstream>
#include <string>
#include <vector>
#include <iostream>
#include <boost/timer.hpp>

int const ABRsize = 64;
int const NLOOPS = 100000;

const char *
toksplit(
const char *src,
char tokchar,
char *token,
size_t lgh
);

int main()
{
char tst[] = "this\nis\n\nan \nempty\n\n\nli ne";

std::cout << "Timing stringstream version: ";
boost::timer t0;
for( int count = 0; count < NLOOPS; ++count) {
std::stringstre am ss;
ss << tst;
while (! ss.eof() ){
std::string str;
getline(ss,str, '\n');
}
}
std::cout << t0.elapsed() << "s\n";
std::cout << "Timing toksplit version: ";
boost::timer t1;
for( int count =0;count < NLOOPS;++count) {
char token[ABRsize + 1];
const char *t = tst;
while (*t) {
t = toksplit(t, '\n', token, ABRsize);
}
}
std::cout << t1.elapsed() << "s\n";

}

const char *toksplit(
const char *src, /* Source of tokens */
char tokchar, /* token delimiting char */
char *token, /* receiver of parsed token */
size_t lgh) /* length token can receive */
/* not including final '\0' */
{
if (src) {
while (' ' == *src) *src++;
while (*src && (tokchar != *src)) {
if (lgh) {
*token++ = *src;
--lgh;
}
src++;
}
if (*src && (tokchar == *src)) src++;
}
*token = '\0';
return src;
} /* toksplit */

May 7 '06 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

33
1022
by: Geometer | last post by:
Hello, and good whatever daytime is at your place.. please can somebody tell me, what the standard behavior of strtok shall be, if it encounters two or more consecutive delimiters like in (checks omitted) char tst = "this\nis\n\nan\nempty\n\n\nline"; ^^^^ ^^^^^^ char *tok = strtok(tst, "\n");
0
9813
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10541
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10585
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10249
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6976
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5818
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4447
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4007
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3108
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.