473,803 Members | 2,972 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Does strtok require a non-null token?

I'm using strtok to break apart a colon-delimited string. It basically
works, but it looks like strtok skips over empty sections. In other
words, if the string has 2 colons in a row, it doesn't treat that as a
null token, it just treats the 2 colons as a single delimiter.

Is that the intended behavior?

Oct 12 '06
26 4425
On 12 Oct 2006 16:47:31 -0700, "William Hughes"
<wp*******@hotm ail.comwrote:
>Well, never is probably too strong. However, strtok() is dominated by
a good general purpose parsing method. Since you need a good
general purpose parsing method, why not use that instead of
strtok()?
Where one is needed, I do, and am obligated to supply it with the rest
of the code. Anyone maintaining that code is then obligated to read
and understand it.

Where it's not needed, strtok is already there, and the maintainer
already knows what it does.

It's the same reason I don't supply my own version of other parts of
the standard library.

--
Al Balmer
Sun City, AZ
Oct 13 '06 #11
Ben Pfaff wrote:
>
ry********@gmai l.com writes:
>I'm using strtok to break apart a colon-delimited string. It
basically works, but it looks like strtok skips over empty
sections. In other words, if the string has 2 colons in a row,
it doesn't treat that as a null token, it just treats the 2
colons as a single delimiter.

strtok() has at least these problems:

* It merges adjacent delimiters. If you use a comma as your
delimiter, then "a,,b,c" will be divided into three tokens,
not four. This is often the wrong thing to do. In fact, it
is only the right thing to do, in my experience, when the
delimiter set contains white space (for dividing a string
into "words") or it is known in advance that there will be
no adjacent delimiters.

* The identity of the delimiter is lost, because it is
changed to a null terminator.

* It modifies the string that it tokenizes. This is bad
because it forces you to make a copy of the string if
you want to use it later. It also means that you can't
tokenize a string literal with it; this is not
necessarily something you'd want to do all the time but
it is surprising.

* It can only be used once at a time. If a sequence of
strtok() calls is ongoing and another one is started,
the state of the first one is lost. This isn't a
problem for small programs but it is easy to lose track
of such things in hierarchies of nested functions in
large programs. In other words, strtok() breaks
encapsulation.
Whence sprang toksplit, which returns a pointer to the src string
just past the delimiting char, except at end of string. The only
possible nuisance IMO is that it handles only one possible token
delimiter char (apart from '\0').

const char *toksplit(const char *src, /* Source of tokens */
char tokchar, /* token delimiting char */
char *token, /* receiver of parsed token */
size_t lgh) /* length token can receive */
/* not including final '\0' */

--
Some informative links:
<news:news.anno unce.newusers
<http://www.geocities.c om/nnqweb/>
<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html >
<http://www.netmeister. org/news/learn2quote.htm l>
<http://cfaj.freeshell. org/google/>
Oct 13 '06 #12

Al Balmer wrote:
On 12 Oct 2006 16:47:31 -0700, "William Hughes"
<wp*******@hotm ail.comwrote:
Well, never is probably too strong. However, strtok() is dominated by
a good general purpose parsing method. Since you need a good
general purpose parsing method, why not use that instead of
strtok()?

Where one is needed, I do, and am obligated to supply it with the rest
of the code. Anyone maintaining that code is then obligated to read
and understand it.

Where it's not needed, strtok is already there, and the maintainer
already knows what it does.

It's the same reason I don't supply my own version of other parts of
the standard library.
Ok. I can see why, if you expect the code to be maintained by
others (a common setup), you would want to use standard functions.
And, usually, a bad standard is better than no standard. But there
are limits!

In any case, I don't think your average maintainence drone would
know how strtok() works, or that said drone would be better
at reading the documentation for strtok() than any documentation
you supply with a good general purpose routine.

One reason for my opinion is that personally I don't find
strtok() very useful (Indeed, outside of a couple of exericises that
mandated its use, I don't think I have ever used it). This is in
part because, if possible, I don't use C for string manipulation.
But even when I do use C I don't use strtok(). Clearly, my
situation may not be the most usual one (or even common).
- William Hughes

Oct 13 '06 #13
William Hughes wrote:
Default User wrote:
>William Hughes wrote:
>>ry********@gmai l.com wrote:
I'm using strtok to break apart a colon-delimited string. It
basically works, but it looks like strtok skips over empty
sections. In other words, if the string has 2 colons in a row, it
doesn't treat that as a null token, it just treats the 2 colons as
a single delimiter.

Is that the intended behavior?
Yes. Just one more reason to avoid strtok().
Unless that's the behavior you want. Example, breaking lines into words
with white space. You don't want a bunch of "null" words.
The point is not that the function's behaviour is not sometimes
what you want. The point is

-the default behaviour is surprising
Perhaps. About the only thing surprising to me was that the argument
you pass it is affected.
-the default behaviour is not even
usually what you want
So far I've only ever needed the default behaviour with respect to
collapsing adjacent tokens. In fact, I *expected* this! That is, for
the majority of the reasons I need to tokenized a string, this default
behaviour is exactly what I want.
-the default behaviour throws information away
Not sure what you mean here, but I assume you are referring to how it
munges its argument. I guess I just never care about this because we
always store strings in a struct that is passed around, or make copies
of things we tokenized and care about.
-if you don't like the default behaviour, see
figure 1.
I assume figure 1 is a picture of your own implementation that has
non-default requirements :)
Personally I'm with the Linux man pages on this one. Under Bugs
is the advice "Never use this function".
Well, I'll ignore this advice. For the trivial case of needing
tokenized a string to store in my own array of buffers, it works just fine.

For those requirements that strtok() does not fit we have our own
internal tokenizing routines. If all I need is to parse out (say) a
bunch of email addresses passed as a list and store them in a char**
[which was the last time I used strtok()] then it fits perfectly. In
this case I don't even care if the calling code screwed up the list. I
either get one or more valid strings or I don't. I return success or
failure and let them howl!

Of course, if I'd been bitten by the function in the past, I'd be
arguing differently.

Many of the str_ routines in the Standard have some legacy use that
explains design decisions [e.g., strncpy() and database column width].
I wonder if strtok() also has history that explains why the defaults
cause so much consternation?
Oct 13 '06 #14

Clever Monkey wrote:
William Hughes wrote:
Default User wrote:
William Hughes wrote:

ry********@gmai l.com wrote:
I'm using strtok to break apart a colon-delimited string. It
basically works, but it looks like strtok skips over empty
sections. In other words, if the string has 2 colons in a row, it
doesn't treat that as a null token, it just treats the 2 colons as
a single delimiter.

Is that the intended behavior?
Yes. Just one more reason to avoid strtok().
Unless that's the behavior you want. Example, breaking lines into words
with white space. You don't want a bunch of "null" words.
The point is not that the function's behaviour is not sometimes
what you want. The point is

-the default behaviour is surprising
Perhaps. About the only thing surprising to me was that the argument
you pass it is affected.
-the default behaviour is not even
usually what you want
So far I've only ever needed the default behaviour with respect to
collapsing adjacent tokens. In fact, I *expected* this! That is, for
the majority of the reasons I need to tokenized a string, this default
behaviour is exactly what I want.
-the default behaviour throws information away
Not sure what you mean here, but I assume you are referring to how it
munges its argument.
No, it also throws away the number [and identity] of the tokens.
I guess I just never care about this because we
always store strings in a struct that is passed around, or make copies
of things we tokenized and care about.
-if you don't like the default behaviour, see
figure 1.
I assume figure 1 is a picture of your own implementation that has
non-default requirements :)
Nope. See the jargon file.
Personally I'm with the Linux man pages on this one. Under Bugs
is the advice "Never use this function".
Well, I'll ignore this advice.
Chacon a son gout.
>For the trivial case of needing
tokenized a string to store in my own array of buffers, it works just fine.

For those requirements that strtok() does not fit we have our own
internal tokenizing routines.
And your reason for not using them in preference to strtok()?
If all I need is to parse out (say) a
bunch of email addresses passed as a list and store them in a char**
[which was the last time I used strtok()] then it fits perfectly. In
this case I don't even care if the calling code screwed up the list. I
either get one or more valid strings or I don't. I return success or
failure and let them howl!

Of course, if I'd been bitten by the function in the past, I'd be
arguing differently.

Many of the str_ routines in the Standard have some legacy use that
explains design decisions [e.g., strncpy() and database column width].
I wonder if strtok() also has history that explains why the defaults
cause so much consternation?
I am sure that the defaults were chosen for what was at the
time a good reason ( maybe because
the immediate need was removing whitespace). The fact remains
they are not a good choice for a general purpose routine
(and the fact that they are "mandatory defaults" makes things
even worse).

- William Hughes

Oct 13 '06 #15
William Hughes wrote:
Clever Monkey wrote:
>William Hughes wrote:
>>Default User wrote:
William Hughes wrote:

ry********@gmai l.com wrote:
>I'm using strtok to break apart a colon-delimited string. It
>basicall y works, but it looks like strtok skips over empty
>sections . In other words, if the string has 2 colons in a row, it
>doesn't treat that as a null token, it just treats the 2 colons as
>a single delimiter.
>>
>Is that the intended behavior?
Yes. Just one more reason to avoid strtok().
Unless that's the behavior you want. Example, breaking lines into words
with white space. You don't want a bunch of "null" words.
[...]
>> -the default behaviour throws information away
Not sure what you mean here, but I assume you are referring to how it
munges its argument.

No, it also throws away the number [and identity] of the tokens.
Ah. I always just keep track of them myself, usually as a index into
the array of strings I'm building. I think every book on the standard
library has similar example code.
>For the trivial case of needing
tokenized a string to store in my own array of buffers, it works just fine.

For those requirements that strtok() does not fit we have our own
internal tokenizing routines.

And your reason for not using them in preference to strtok()?
A few reasons come to mind. It might be too heavy-weight for my
purpose, or too specific for the simplest case of "get 0 or more things
from this delimited string", which strtok() fits perfectly. I have a
chunk of code I use that is almost cliched that I use to walk the
string, get the pieces and exit with a count.

That is to say, we've never found the need for a better_strtok() , as the
standard implementation satisfies all the necessary requirements.

At this time it has not been obvious that we need to factor this out to
a general-purpose string tokenization routine. Add to that that the
code I maintain is well-established, and I can't simply refactor for the
purpose of refactoring. Adding this much risk to a stable codebase this
late in the day is actually worse than living with standard functions
with warts.

I actually just counted the amount of times we invoke strtok() in a
major part of our product, and I found 6 discrete instances. Some of
that is dead code that has been deprecated. Two of them are places I've
added new functionality.

We _could_ have factored that out to our own function, but, quite
frankly, we never saw the point (except maybe to replace 3-5 lines of
cliche code with a single function call [which is nothing to sneeze at],
but this is usually the last thing to drive maintenance in my experience).

Anyway, I understand why strtok() is not recommended. But I also think
that once you understand the limitations and caveats that go along with
it, there is no reason not to use it for those cases where it is a good fit.

I actually look forward to the time here I'm bitten by strtok(). It
seems like the main sin it commits is being useful to some and
completely useless for others.
Oct 13 '06 #16
On 12 Oct 2006 20:42:00 -0700, "William Hughes"
<wp*******@hotm ail.comwrote:
>In any case, I don't think your average maintainence drone would
know how strtok() works,
Huh! I resent that ;-)

This maintenance drone has known about strtok for many years, as well
as all the other functions in the standard library. "Maintenanc e
drones" not only need to be capable of writing good, solid,
maintainable code, but have the added burden of needing to figure out
what some cowboy coder really meant to do.

--
Al Balmer
Sun City, AZ
Oct 16 '06 #17
Al Balmer <al******@att.n etwrites:
On 12 Oct 2006 20:42:00 -0700, "William Hughes"
<wp*******@hotm ail.comwrote:
>>In any case, I don't think your average maintainence drone would
know how strtok() works,

Huh! I resent that ;-)

This maintenance drone has known about strtok for many years, as well
as all the other functions in the standard library.
[...]

Perhaps you're not average? 8-)}

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Oct 16 '06 #18

Al Balmer wrote:
On 12 Oct 2006 20:42:00 -0700, "William Hughes"
<wp*******@hotm ail.comwrote:
In any case, I don't think your average maintainence drone would
know how strtok() works,

Huh! I resent that ;-)
My appologies for an unintended insult.
>
This maintenance drone has known about strtok for many years, as well
as all the other functions in the standard library. "Maintenanc e
drones" not only need to be capable of writing good, solid,
maintainable code, but have the added burden of needing to figure out
what some cowboy coder really meant to do.

I agree. Maintenance is difficult work and the best programmers
should be placed on maintenance, not new development.
In my experience, however, this is not the case. The term
"Maintenanc e drone" is all too often appropriate.

-William Hughes

Oct 16 '06 #19
On Mon, 16 Oct 2006 20:38:37 GMT, Keith Thompson <ks***@mib.or g>
wrote:
>Al Balmer <al******@att.n etwrites:
>On 12 Oct 2006 20:42:00 -0700, "William Hughes"
<wp*******@hot mail.comwrote:
>>>In any case, I don't think your average maintainence drone would
know how strtok() works,

Huh! I resent that ;-)

This maintenance drone has known about strtok for many years, as well
as all the other functions in the standard library.
[...]

Perhaps you're not average? 8-)}
Well, I'd like to think that ;-)

Truthfully, though I've worked for large companies, and done much
maintenance, I've never worked with many other maintenance
programmers, so I'm not at all qualified to know what the average is.
Indeed, I've seen many developers of new programs that were not
qualified. It might be more appropriate just to remove the word
"maintenanc e" from William's claim.

--
Al Balmer
Sun City, AZ
Oct 17 '06 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
1840
by: ern | last post by:
I'm using it like this: char * _command = "one two three four"; char * g_UserCommands; const char * delimeters = " "; g_UserCommands = strtok(_command, delimeters); g_UserCommands = strtok(g_UserCommands, delimeters); g_UserCommands = strtok(g_UserCommands, delimeters); g_UserCommands = strtok(g_UserCommands, delimeters); //Then I print each entry of g_UserCommands.
7
5710
by: Fernando Barsoba | last post by:
Hi, I'm using strtok() in the following way: void obtain_param(char *pmsg, CONF_PARAMS *cnf ) { char *s1, *s2; size_t msg_len; s1 = strtok (pmsg,":"); if (s1) {
7
4368
by: Peter | last post by:
hi all, the strtok() cannot phrase the token within another token, am i correct? For example, i want to get the second word of every row of a file, how to use strok to complete this? thanks from Peter (cmk128@hotmail.com)
5
7715
by: plmanikandan | last post by:
Hi, I need to split the value stored in a string and store them to another charrecter array.I am using strtok function.But i am getting invalid output when there is no value between delimiter my code #include<stdio.h> #include<string.h> #include<stdlib.h> void main()
8
1932
by: hu | last post by:
hi, everybody! I'm testing the fuction of strtok(). The environment is WinXP, VC++6.0. Program is simple, but mistake is confusing. First, the below code can get right outcome:"ello world, hello dreams." #include <stdafx.h> #include <string.h> #include <stdio.h> int main()
4
2736
by: Michael | last post by:
Hi, I have a proble I don't understand when using strtok(). It seems that if I make a call to strtok(), then make a call to another function that also makes use of strtok(), the original call is somehow confused or upset. I have the following code, which I am using to tokenise some input which is in th form x:y:1.2: int tokenize_input(Sale *sale, char *string){
14
3326
by: Mr John FO Evans | last post by:
I cam across an interesting limitation to the use of strtok. I have two strings on which I want strtok to operate. However since strtok has only one memory of the residual string I must complete one set of operations before starting on the second. This is inconvenient in the context of my program! So far the only solution I can see is to write a replacement for strtok to use on one of the strings. Can anyone offer an alternative?
29
2591
by: Pietro Cerutti | last post by:
Hello, here I have a strange problem with a real simple strtok example. The program is as follows: ### BEGIN STRTOK ### #include <string.h> #include <stdio.h>
3
2173
by: semi_evil | last post by:
I downloaded a few PHP scripts from various sites, in part to use them as is, and partly to study and learn from them. One script is littered with strtok() occurrences. So I checked the manual for its details. I do understand the basics of it, and the limitations (like not being able to work on two strings simultaneously for loss of the strtok internal 'pointer'). And it MUST be my ignorance and/or limited experience, but I don't
75
25146
by: siddhu | last post by:
Dear experts, As I know strtok_r is re-entrant version of strtok. strtok_r() is called with s1(lets say) as its first parameter. Remaining tokens from s1 are obtained by calling strtok_r() with a null pointer for the first parameter. My confusion is that this behavior is same as strtok. So I assume strtok_r must also be using any function static variable to keep the information about s1. If this is the case then how strtok_r is re-...
0
9565
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10550
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9125
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7604
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6844
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5501
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5633
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4275
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2972
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.