473,785 Members | 2,235 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

looking for implementation of strtok

can anyone point me out to its implementation in C before I roll my
own.

thx
Jun 27 '08 #1
11 17178
magicman wrote:
can anyone point me out to its implementation in C before I roll my
own.

thx
1ste Google result:
http://www.openbsd.org/cgi-bin/cvswe...-cvsweb-markup

- Jensen
Jun 27 '08 #2
On Apr 19, 5:27 pm, Jensen Somers <jensen.som...@ gmail.comwrote:
magicman wrote:
can anyone point me out to its implementation in C before I roll my
own.
thx

1ste Google result:http://www.openbsd.org/cgi-bin/cvswe...ng/strtok.c?re...

- Jensen
what does s[-1] = 0; in the implementation mean?
Jun 27 '08 #3
magicman wrote:
>
can anyone point me out to its implementation in C before I roll
my own.
Why? strtok() is part of the standard C library, and should always
be available.

7.21.5.8 The strtok function

Synopsis
[#1]
#include <string.h>
char *strtok(char * restrict s1,
const char * restrict s2);

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home .att.net>
Try the download section.
** Posted from http://www.teranews.com **
Jun 27 '08 #4
On Apr 20, 2:17 am, CBFalconer <cbfalco...@yah oo.comwrote:
magicman wrote:
can anyone point me out to its implementation in C before I roll
my own.
char *strtok(char * restrict s1, const char * restrict s2) {

static char *p;
char *v;

if(s1 != NULL) p = s1;
while(strchr(s2 , *p)) p++;
v = p;
for(; *p != 0; p++)
if(strchr(s2, *p) { *p = 0; for(i++; strchr(s2, *p); *p++ = 0);
return v; }
return NULL;
}
>
Why? strtok() is part of the standard C library, and should always
be available.
Most likely to learn.
Jun 27 '08 #5
On Sat, 19 Apr 2008 14:25:43 -0700 (PDT), magicman
<ir*********@gm ail.comwrote:
>can anyone point me out to its implementation in C before I roll my
own.
The function does not have a single implementation. Each run-time
library has its own implementation of the function. Some libraries
include the source for the functions. Google can probably point you
to some publicly available ones. Whether any particular
implementation has any relationship to your system is something you
will have to determine.
Remove del for email
Jun 27 '08 #6
CBFalconer wrote:
magicman wrote:
can anyone point me out to its implementation in C before I roll
my own.

Why? strtok() is part of the standard C library, and should always
be available.
It needn't be available on a freestanding implementation.
7.21.5.8 The strtok function

Synopsis
[#1]
#include <string.h>
char *strtok(char * restrict s1,
const char * restrict s2);
4p6 "...A conforming freestanding implementation shall accept
any strictly conforming program that does not use complex types
and in which the use of the features specified in the library clause
(clause 7) is confined to the contents of the standard headers
<float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>,
<stddef.h>, and <stdint.h>."

Note the absence of <string.h>

--
Peter
Jun 27 '08 #7
>magicman wrote:
>>can anyone point me out to its implementation in C before I roll my
own.
[As others noted elsethread, there is not really an "its
implementation" , which implies exactly one single implementation.]
>On Apr 19, 5:27 pm, Jensen Somers <jensen.som...@ gmail.comwrote:
>1ste Google
result:http://www.openbsd.org/cgi-bin/cvswe...ng/strtok.c?re...
In article <ce************ *************** *******@24g2000 hsh.googlegroup s.com>
magicman <ir*********@gm ail.comwrote:
>what does s[-1] = 0; in the implementation mean?
The version above (with the truncated URL) is a rehash of the
implementation I wrote in 1988. Mine did not have a strtok_r()
(which is not even in C99, much less C89), and in any case it
would be better to put strtok_r() in a separate file.

The C99 "restrict" qualifiers are missing, and should be added
if you have a C99-based system (though most people do not).

These days it probably would be better, in some sense, to go ahead
and call strspn() and strcspn(), rather than writing them in-line
as I did. (When I wrote this, subroutine calls were relatively
expensive, and compilers never did any inline expansion, so we had
to do it by hand wherever it was profitable.)

The assignment:

s[-1] = 0;

writes a '\0' at s[-1], of course. If you mean "what does s[-1]
even *mean*", it means: "add -1 to the pointer in s, and then
follow the resulting pointer". Adding an integer to a pointer
moves "forward" by that number of objects, so adding -1 moves
"forward" negative-one "char"s, or in other words, moves backward
by one "char". Earlier we had:

c = *s++;

so s now points one "char" beyond the location that holds the
character stored in "c". Pictorially, suppose we call strtok()
with:

char buf[] = ",hello,/,world";
char *result;

result = strtok(buf, ",/");

When we enter strtok(), we have "s" pointing to &buf[0], which
contains a ',' character. The static variable "last" is initially
NULL, or is left over from a previous call to strtok(), but either
way we are not going to use it, and are soon going to overwrite
it.

The strtok() call then calls strtok_r(), passing s (&buf[0]), delim
(pointing to the ',' in the string ",/"), and &last. The strtok_r()
function begins with (I will update this code for C99 while writing
this; note this changes the name of the third argument to "lastp",
as I would have called the variable if I had written this):

char *strtok_r(char *restrict s, const char *restrict delim, char **lastp) {
const char *restrict spanp;
char c, sc;
char *restrict tok;

if (s == NULL && (s = *lastp) == NULL)
return NULL;

Since s is not NULL, we do not even look at *lastp, much less
return NULL. (However, if we call strtok() again later, passing
NULL for s, we *will* look at *lastp, i.e., strtok()'s "last",
at that point. This is how we will save a "sideways" result
for later strtok() calls. This is not a *good* way to do it;
see the BSD strsep() function for a better method.)

Next, we enter the ugly loop that simply implements strspn():

cont:
c = *s++;
for (spanp = delim; (sc = *spanp++) != '\0';) {
if (c == sc)
goto cont;
}

In this case, "delim" points to ",/" and "c" initially picks up
the ',' from &buf[0], leaving s pointing to &buf[1]. The inner
loop sets spanp to delim, then loops while seting sc (the "span
character") from *spanp++ until sc is '\0'. Because the
loop increments spanp each time, it will look at all the non-'\0'
characters in delim, i.e., ',' and '/' respectively, until
something happens. In this case, the "something" happens
right away: c==',' and sc==',', so c==sc, so we execute the
"goto cont" line and thus start the outer loop over again.

This time, c gets the 'h' from buf[1], and s winds up pointing
to the 'e' in buf[2]. Since c=='h', and the inner loop looks
for ',' and '/', we get all the way through the inner loop
without doing a "goto cont". This leaves c=='h' and exits the
outer loop.

(The outer loop logically should use something other than a
"goto", but more realistically, we could just replace the whole
thing with:

s += strspn(s, delim);
c = *s++;

The strspn() function counts the number of characters that are
in the "span set", up to the first one that is not in it. That
is, since s points to ",hello,/,world" and the span-set is ",/",
strspn() would return 1, and s += 1 would then advance over the
initial comma. If s pointed to ",/,hello", strspn() would
return 3, and s += 3 would advance over all three initial
characters.)

In any case, now that we have skipped leading "non-token"
characters, we are ready to look for tokens:

if (c == '\0') {
*lastp = NULL;
return NULL;
}

If there are no more tokens, we set *lastp to NULL (this is, I
think, allowed but not required by the C standard) and return NULL.
However, c=='h', so we do not do this, but instead plow on to the
trickiest, densest part of the code:

tok = s - 1;
for (;;) {
c = *s++;
spanp = delim;
do {
if ((sc = *spanp++) == c) {
[to be shown and explained in a moment]
}
} while (sc != '\0');
}
/* NOTREACHED */

Remember that s points one character *past* the beginning of
the token -- in this case, to the 'e' in "hello" -- because we
did "c = *s++". So we set tok to s-1, so that tok points to
the 'h' in "hello". We then enter the outer loop, which -- as
the comment notes -- really just implements strcspn().

For each trip through the outer loop, we look at the next character
of the proposed token, via "c = *s++" as before. In this case,
it first sets c to the 'e' in buf[2], leaving s pointing to buf[3]
(the first 'l').

The inner loop then runs through each character in "delim", putting
it in sc each time, checking to see whether c==sc. Since the two
delimiter characters are ',' and '/', and 'e' is neither a comma nor
a slash, this will not find them. But here is one of the tricky
bits: since "delim" is a C string, it ends with a '\0' character.
I want *all* the loops to stop if c=='\0'. Rather than putting
in a special test for c=='\0', I simply allow the inner loop to
test c against the '\0' that ends the delimiter string.

In other words, since delim points to {',', '/', '\0'}, I can let
sc take on all *three* of these "char"s, and compare c==sc each
time. I stop the inner loop only *after* both c!=sc *and* sc=='\0'.
So this allows me to test c=='\0' without writing a separate "if"
statement.

In any case, none of 'e', 'l', 'l', and 'o' are token-delimiters,
so eventually "c = *s++" sets c to ',', leaving s pointing to the
'/' in ",hello,/,world". This time the inner loop *does* find
a token-delimiter, and we get to the code inside the "if":

if (c == '\0')
s = NULL;
else
s[-1] = '\0';
*lastp = s;
return tok;

Here, we check to see if c=='\0' first. A '\0' ends the string --
meaning s points one past the end -- and the end of the string is
necessarily the end of the token. But it also means there cannot
possibly be any additional tokens. So we want to sent *lastp to
NULL, so that a future call to strtok() will return NULL. Setting
s to NULL and *lastp to s will accomplish that. (In this case,
since c==sc, we will also have sc=='\0'. That is, we found both
the end of the delimiter string *and* the end of the token. This
is OK; we do not care which "token-ending character" we found.)

In our case, though, c!='\0', because c==','. So we set s[-1] to
'\0'. Here s points to the '/' in buf[7], so this replaces the
',' in buf[6] with '\0'. (Remember, buf is initially set to
",hello,/,world", so buf[0] is ',', buf[1] is 'h', buf[2] is 'e',
buf[3] is 'l', buf[4] is 'l', buf[5] is 'o', buf[6] is ',', buf[7]
is '/', buf[8] is ',', buf[9] is 'w', and so on.) We leave s==&buf[7],
and set *lastp = s -- so now the variable "last" in strtok()
points to the '/' in buf[7].

Finally, we return "tok", which is &buf[1]. The contents of buf[]
have been modified slightly, and the static variable "last" in
strtok() points just past the first token, so that a later call to
strtok() can find more tokens.

Again, the two nasty loops can be replaced by strspn() -- which
will "span" (or skip over) initial delimiters -- and strcspn(),
which will span the "c"omplemen t of the delimiter-set, i.e., skip
over initial "non-delimiters". Since everything that is not
a delimiter is a token character, this will give us just what
we want -- and we can rewrite the entire thing as:

char *strtok(char *restrict s, const char *restrict delim) {
static char *last;
char *restrict tok;

/* pick up from previous stopping point, if told to do so */
if (s == NULL && (s = last) == NULL)
return NULL;

/* skip initial delimiters to find start of token */
tok = s + strspn(s, delim);

/* skip over non-delimiters to find end of token */
s = tok + strcspn(tok, delim);

/*
* Save state for next call, and return token. If there
* is no token, both *tok and *s will be '\0'. If there
* is one token that is ended with '\0', *s will be '\0'.
* (If there are trailing delimiters, we will skip them
* later.)
*/
last = *s == '\0' ? NULL : s + 1;
return *tok == '\0' ? NULL : tok;
}

This code is now short enough that we can just use a separate copy
for strtok_r(), if we like; or we can use the BSD-style "strtok is
a wrapper around strtok_r", which just needs the obvious changes
with "lastp".

It is also short enough to demonstrate that strtok() is not a
particularly useful function: you can call strspn() and strcspn()
yourself, and avoid the problems that eventually crop up with the
saved state in the static variable called "last" here.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: gmail (figure it out) http://web.torek.net/torek/index.html
Jun 27 '08 #8
Peter Nilsson wrote:
CBFalconer wrote:
>magicman wrote:
>>can anyone point me out to its implementation in C before I roll
my own.

Why? strtok() is part of the standard C library, and should always
be available.

It needn't be available on a freestanding implementation.
.... snip ...

In which case the op would be well advised to look for or design
his own other routines. One available is tknsplit, as follows:

/* ------- file tknsplit.h ----------*/
#ifndef H_tknsplit_h
# define H_tknsplit_h

# ifdef __cplusplus
extern "C" {
# endif

#include <stddef.h>

/* copy over the next tkn from an input string, after
skipping leading blanks (or other whitespace?). The
tkn is terminated by the first appearance of tknchar,
or by the end of the source string.

The caller must supply sufficient space in tkn to
receive any tkn, Otherwise tkns will be truncated.

Returns: a pointer past the terminating tknchar.

This will happily return an infinity of empty tkns if
called with src pointing to the end of a string. Tokens
will never include a copy of tknchar.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
revised 2007-05-26 (name)
*/

const char *tknsplit(const char *src, /* Source of tkns */
char tknchar, /* tkn delimiting char */
char *tkn, /* receiver of parsed tkn */
size_t lgh); /* length tkn can receive */
/* not including final '\0' */

# ifdef __cplusplus
}
# endif
#endif
/* ------- end file tknsplit.h ----------*/

/* ------- file tknsplit.c ----------*/
#include "tknsplit.h "

/* copy over the next tkn from an input string, after
skipping leading blanks (or other whitespace?). The
tkn is terminated by the first appearance of tknchar,
or by the end of the source string.

The caller must supply sufficient space in tkn to
receive any tkn, Otherwise tkns will be truncated.

Returns: a pointer past the terminating tknchar.

This will happily return an infinity of empty tkns if
called with src pointing to the end of a string. Tokens
will never include a copy of tknchar.

A better name would be "strtkn", except that is reserved
for the system namespace. Change to that at your risk.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
Revised 2006-06-13 2007-05-26 (name)
*/

const char *tknsplit(const char *src, /* Source of tkns */
char tknchar, /* tkn delimiting char */
char *tkn, /* receiver of parsed tkn */
size_t lgh) /* length tkn can receive */
/* not including final '\0' */
{
if (src) {
while (' ' == *src) src++;

while (*src && (tknchar != *src)) {
if (lgh) {
*tkn++ = *src;
--lgh;
}
src++;
}
if (*src && (tknchar == *src)) src++;
}
*tkn = '\0';
return src;
} /* tknsplit */

#ifdef TESTING
#include <stdio.h>

#define ABRsize 6 /* length of acceptable tkn abbreviations */

/* ---------------- */

static void showtkn(int i, char *tok)
{
putchar(i + '1'); putchar(':');
puts(tok);
} /* showtkn */

/* ---------------- */

int main(void)
{
char teststring[] = "This is a test, ,, abbrev, more";

const char *t, *s = teststring;
int i;
char tkn[ABRsize + 1];

puts(teststring );
t = s;
for (i = 0; i < 4; i++) {
t = tknsplit(t, ',', tkn, ABRsize);
showtkn(i, tkn);
}

puts("\nHow to detect 'no more tkns' while truncating");
t = s; i = 0;
while (*t) {
t = tknsplit(t, ',', tkn, 3);
showtkn(i, tkn);
i++;
}

puts("\nUsing blanks as tkn delimiters");
t = s; i = 0;
while (*t) {
t = tknsplit(t, ' ', tkn, ABRsize);
showtkn(i, tkn);
i++;
}
return 0;
} /* main */

#endif
/* ------- end file tknsplit.c ----------*/

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home .att.net>
Try the download section.

** Posted from http://www.teranews.com **
Jun 27 '08 #9
In article <fu*********@ne ws3.newsguy.com I wrote, in part,
a "simplified implementation" of my old strtok, but in the
process put in a major bug:
char *strtok(char *restrict s, const char *restrict delim) {
static char *last;
char *restrict tok;

/* pick up from previous stopping point, if told to do so */
if (s == NULL && (s = last) == NULL)
return NULL;

/* skip initial delimiters to find start of token */
tok = s + strspn(s, delim);

/* skip over non-delimiters to find end of token */
s = tok + strcspn(tok, delim);

/*
* Save state for next call, and return token. If there
* is no token, both *tok and *s will be '\0'. If there
* is one token that is ended with '\0', *s will be '\0'.
* (If there are trailing delimiters, we will skip them
* later.)
*/
last = *s == '\0' ? NULL : s + 1;
return *tok == '\0' ? NULL : tok;
}
I forgot to '\0'-terminate the (nonempty case) token! The assignment
to "last" is too simple and should be expanded out to:

if (*s == '\0')
last = NULL;
else {
*s++ = '\0';
last = s;
}

Writing this without an "if" is possible, but I think ugly:

last = *s == '\0' ? NULL : (*s++ = '\0', s);
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: gmail (figure it out) http://web.torek.net/torek/index.html
Jun 27 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
7608
by: Anil | last post by:
Hi, Where can I get the source code that has the implementation of the below C operators/functions: malloc sizeof new strcmp strtok
7
4367
by: Peter | last post by:
hi all, the strtok() cannot phrase the token within another token, am i correct? For example, i want to get the second word of every row of a file, how to use strok to complete this? thanks from Peter (cmk128@hotmail.com)
8
1931
by: hu | last post by:
hi, everybody! I'm testing the fuction of strtok(). The environment is WinXP, VC++6.0. Program is simple, but mistake is confusing. First, the below code can get right outcome:"ello world, hello dreams." #include <stdafx.h> #include <string.h> #include <stdio.h> int main()
18
3497
by: Robbie Hatley | last post by:
A couple of days ago I dedecided to force myself to really learn exactly what "strtok" does, and how to use it. I figured I'd just look it up in some book and that would be that. I figured wrong! Firstly, Bjarne Stroustrup's "The C++ Programming Language" said: (nothing)
26
4422
by: ryampolsky | last post by:
I'm using strtok to break apart a colon-delimited string. It basically works, but it looks like strtok skips over empty sections. In other words, if the string has 2 colons in a row, it doesn't treat that as a null token, it just treats the 2 colons as a single delimiter. Is that the intended behavior?
9
5567
by: Zack | last post by:
The c99 standard states "The implementation shall behave as if no library function calls the strtok function." What exactly does this mean? Does this dictate any requirements to architectures supporting proccesses or threads? -- Zack
29
2587
by: Pietro Cerutti | last post by:
Hello, here I have a strange problem with a real simple strtok example. The program is as follows: ### BEGIN STRTOK ### #include <string.h> #include <stdio.h>
1
1749
by: Givisi | last post by:
Hello all! I am a completly newbie at C and I dont have good basis from Java so I cant get the linked lists part from C. I am trying to run the code from the function that is written at the end of this message but it has run-time errors so its hard to find the bugs. I really know that my logic is not so good since I cant understand that bit so well. I am using dm compiler. Could somebody possibly tell me what my mistakes are. Thank you...
75
25140
by: siddhu | last post by:
Dear experts, As I know strtok_r is re-entrant version of strtok. strtok_r() is called with s1(lets say) as its first parameter. Remaining tokens from s1 are obtained by calling strtok_r() with a null pointer for the first parameter. My confusion is that this behavior is same as strtok. So I assume strtok_r must also be using any function static variable to keep the information about s1. If this is the case then how strtok_r is re-...
0
9646
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9484
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10350
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10157
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10097
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7505
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5518
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3658
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2887
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.