Hi all,
I heard that strtok is not thread safe. So I want to write a
sample program which will tokenize string without using strtok.
Can I get a sample source code for the same.
For exp:
0.0.0.0--->I want to tokenize the string using delimiter as as dot.
Regards
Sujeet 20 17008
In article <11**********************@f14g2000cwb.googlegroups .com>,
<bu*********@yahoo.co.in> wrote: I heard that strtok is not thread safe. So I want to write a sample program which will tokenize string without using strtok.
Correct, strtok is not thread safe, but C itself is not thread safe
so the matter is not within the perview of the C standard.
Can I get a sample source code for the same.
If your system provides thread facilities, it may well also
provide strtok_r(), a common name for a thread-safe work-alike to
strtok() .
For exp: 0.0.0.0--->I want to tokenize the string using delimiter as as dot.
Use strcspn(), then one of the counted-size memory copy routines.
But after that you'll have to figure out what you want to do if there
are multiple delimeters -- for example, how would you want to parse
0.0..255.255
Is that 0.0.0.255 with an extra .255 at the end, or is it
0.0.255.255 with the extra delimeter skipped, or is it
0.0 then an indicator that you've reached the end of the first grouping,
which is then followed by a second grouping 255.255, or is
it to be parsed as indicating the range 0.0 through 255.255 ?
--
"No one has the right to destroy another person's belief by
demanding empirical evidence." -- Ann Landers
<bu*********@yahoo.co.in> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com... Hi all, I heard that strtok is not thread safe. So I want to write a sample program which will tokenize string without using strtok.
Can I get a sample source code for the same.
For exp: 0.0.0.0--->I want to tokenize the string using delimiter as as dot.
Regards Sujeet
One way is to write a function similar to strtok() but keep track of the
last place that a NULL was placed. In general this will probably mean two
extra parameters: the index in the array, and what the character there was.
Something like:
char *mystrtok( char *string, char *delimiters, int *index, char *c )
{
/* write the real code for your homework here to find n and to set index
and c */
return string+n; /* the token */
}
--
Fred L. Kleinschmidt
Boeing Associate Technical Fellow
Technical Architect, Software Reuse Project
On 2006-02-16, Walter Roberson <ro******@ibd.nrc-cnrc.gc.ca> wrote: In article <11**********************@f14g2000cwb.googlegroups .com>, <bu*********@yahoo.co.in> wrote: I heard that strtok is not thread safe. So I want to write a sample program which will tokenize string without using strtok.
Correct, strtok is not thread safe, but C itself is not thread safe so the matter is not within the perview of the C standard.
strtok is also otherwise unsafe for the same reason - you can't start a
tokenization on a string while you're in the middle of another one and
then come back to the first.
Jordan Abel <ra*******@gmail.com> writes: strtok is also otherwise unsafe for the same reason - you can't start a tokenization on a string while you're in the middle of another one and then come back to the first.
There are multiple other reasons to avoid strtok():
* It merges adjacent delimiters. If you use a comma as
your delimiter, then "a,,b,c" is three tokens, not
four. This is often the wrong thing to do. In fact,
it is only the right thing to do, in my experience,
when the delimiter set is limited to white space.
* The identity of the delimiter is lost, because it is
changed to a null terminator.
* It modifies the string that it tokenizes. This is bad
because it forces you to make a copy of the string if
you want to use it later. It also means that you can't
tokenize a string literal with it; this is not
necessarily something you'd want to do all the time but
it is surprising.
--
"For those who want to translate C to Pascal, it may be that a lobotomy
serves your needs better." --M. Ambuhl
"Here are the steps to create a C-to-Turbo-Pascal translator..." --H. Schildt bu*********@yahoo.co.in wrote: Hi all, I heard that strtok is not thread safe.
Well, technically, its not "thread-well defined". You *can* make it
threadsafe by using a thread local static for its implicit storage, but
this means that strtok doesn't literally follow its claimed
functionality if you try to tokenize across multiple threads (which is
a marginal case).
The issue is that strtok is not *re-entrant* which is a far worse
problem. It means you cannot even *nest* strtoks in two loops. For
example, it is very typical to read a textfile and want to break it
into lines, then from each line you would like to break it into
sub-tokens like words of some kind. So what you would *like* to do is
to write a two level nested loop where you strtok for lines on the
outer loop, and for words in the inner loop. You can't do that because
successive calls to strtok *have* to perform all the tokenization
before the next string is processed, which means nested processing is
not possible.
[...] So I want to write a sample program which will tokenize string without using strtok.
The good news is that strspn and strcspn perform basically equivalent
functionality, except that you have to do the little bit of pointer
tracking yourself ...
Can I get a sample source code for the same.
.... the bad news is that nobody ever uses these functions. And so
example code is hard to come by.
For exp: 0.0.0.0--->I want to tokenize the string using delimiter as as dot.
Well, by itself, this doesn't require nesting, so you can just use
strtok directly. But I would avoid strtok anyways, since it would
otherwise prevent you from extending the inner loop of your code to do
further or other parsing.
For example, lets say you write some code using strtok that works and
does the IP address parsing you want. Then you want to see if the IP
address matches a given mask, which also has to be parsed. A neat, and
clever way to do it would be to simply simultaneously parse through the
IP address and the IP mask, interleaved, then match each address byte
as you go. The point being that if it mismatches early, then you don't
have to parse all the way to know there is a mismatch (much like strcmp
doesn't have to go down the whole string so long as there is an early
mismatch.)
But strtok screws you, because you can't bob back and forth between two
strings as you tokenize with it.
Its interesting that there have been great outside efforts to improve
strncpy and strncat via the functions strlcpy and strlcat, but that
similar effort has not been made for strtok.
In any event, if you want to avoid all these issues, you can use "The
Better String Library" (Bstrlib) given in the URL at the very bottom of
this post. As a rule of thumb, Bstrlib never has these or any kind of
similar issue, and the length of code to perform any non-trivial string
manipulation is typically half that required for equivalent C-library
code. In this case, the main functionality is one line of code:
struct bstrList * pl = bsplit (ipstr, '.');
--
Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/
Ben Pfaff wrote: There are multiple other reasons to avoid strtok():
* It merges adjacent delimiters. If you use a comma as your delimiter, then "a,,b,c" is three tokens, not four. This is often the wrong thing to do. In fact, it is only the right thing to do, in my experience, when the delimiter set is limited to white space.
I will disagree slightly. If you are parsing text sentences into words
and don't care about punctuation, then merging is also correct.
Brian
On Thu, 16 Feb 2006 16:17:00 +0000, Walter Roberson wrote: ... but C itself is not thread safe so the matter is not within the perview of the C standard.
You keep saying thing that stop me short! I am interesting in the general
view of this. Are you saying that standard C does not provide enough
guarantees to implement a shared-memory mutex using varaibale testing and
setting alone?
I can see that how the threads get there in the first place is OT, but I
always thought the standard said enough to allow one to implement such a
thing if one were forced to need it.
Appoligies if this is OT, but I thought that what all the minimum
"abstract machine" requirements for "volatile" were about!
--
Ben.
"Default User" <de***********@yahoo.com> writes: Ben Pfaff wrote:
There are multiple other reasons to avoid strtok():
* It merges adjacent delimiters. If you use a comma as your delimiter, then "a,,b,c" is three tokens, not four. This is often the wrong thing to do. In fact, it is only the right thing to do, in my experience, when the delimiter set is limited to white space.
I will disagree slightly. If you are parsing text sentences into words and don't care about punctuation, then merging is also correct.
Good point. I'll revise my boilerplate text. How about this:
* It merges adjacent delimiters. If you use a comma as your
delimiter, then "a,,b,c" will be divided into three tokens,
not four. This is often the wrong thing to do. In fact, it
is only the right thing to do, in my experience, when the
delimiter set contains white space (for dividing a string
into "words") or it is known in advance that there will be
no adjacent delimiters.
--
"Am I missing something?"
--Dan Pop
In article <pa****************************@bsb.me.uk>,
Ben Bacarisse <be********@bsb.me.uk> wrote: On Thu, 16 Feb 2006 16:17:00 +0000, Walter Roberson wrote:
... but C itself is not thread safe so the matter is not within the perview of the C standard.
You keep saying thing that stop me short! I am interesting in the general view of this. Are you saying that standard C does not provide enough guarantees to implement a shared-memory mutex using varaibale testing and setting alone?
C has no idea what shared memory is, and provides no
operations that can test-and-set any variable atomically.
It does have sig_atomic_t
which is the integral type of an object that can be accessed
as an atomic entity, even in the presence of asychronous interrupts
The relevant section of C89 does not define "access", but the
C89 Rationale indicates,
The Committee concluded that about the only thing a strictly
conforming program can do in a signal handler is to assign a value
to a volatile static variable which can be written uninterruptedly
and promptly return. (THe header <signal.h> specifies a type
sig_atomic_t which can be so written.)
This clarifies that "access" includes at least -writing- to the
sig_atomic_t variable, but it does not tell us about -reading- the
sig_atomic_t variable within the signal handler.
Nothing in C89 gives test-and-set assurances, so although
you could possibly read the sig_atomic_t variable, test its value, and
write a new value based upon what you found there, you don't know
that you won't have had an interrupt between the time of the read and
the time of the write, so you might clobber a mutex lock that something
else has written there while you blinked.
The closest that you might be able to get is to mask off all
maskable signals except one, and then when you want to operate on
your mutex, raise() the signal and then rely upon the signal handler
mechanism to not allow anything else to raise() the same signal while
you are in the handler. That might work if your threading is single
processor with single dispatcher, but if your threading is multiprocessor
or multidispatcher then you can't be sure that no other copies of
the signal handler are in play.
I can see that how the threads get there in the first place is OT, but I always thought the standard said enough to allow one to implement such a thing if one were forced to need it.
No, because the standard is written to encompass processors which
have no test-and-set functionality. It is not uncommon for small
processors to not have that or equivilent operations, and to instead
rely upon interrupt levels to manage concurrancy. Obviously forcing
the existance of interrupt levels or test-and-set operations is beyond
what the standard should concern itself with.
In a case like this, it is what the standard does NOT say that is
important. The standard does NOT say that you cannot have an
implementation call that will implement mutexes (somehow). The
standard does not prohibit shared memory. But neither does the
C89 standard provide any certainty at all that the environment
is even -able- to support mutexes (or threads) through -any- means.
--
I was very young in those days, but I was also rather dim.
-- Christopher Priest
Ben Pfaff wrote: "Default User" <de***********@yahoo.com> writes: I will disagree slightly. If you are parsing text sentences into words and don't care about punctuation, then merging is also correct.
Good point. I'll revise my boilerplate text. How about this:
* It merges adjacent delimiters. If you use a comma as your delimiter, then "a,,b,c" will be divided into three tokens, not four. This is often the wrong thing to do. In fact, it is only the right thing to do, in my experience, when the delimiter set contains white space (for dividing a string into "words") or it is known in advance that there will be no adjacent delimiters.
I'm not sure the text captured what I was going for, I'll give an
example:
"In fact, it is only the right thing to do, in my experience, when the
delimiter set contains white space."
When parsing the text above into words, we want to count the ", "
following "fact", "do", or "experience" as a single delimiter. The
merging behavior of strtok() does that.
Brian
Ben Pfaff wrote: "Default User" <de***********@yahoo.com> writes:
Ben Pfaff wrote:
There are multiple other reasons to avoid strtok():
* It merges adjacent delimiters. If you use a comma as your delimiter, then "a,,b,c" is three tokens, not four. This is often the wrong thing to do. In fact, it is only the right thing to do, in my experience, when the delimiter set is limited to white space.
I will disagree slightly. If you are parsing text sentences into words and don't care about punctuation, then merging is also correct.
Good point. I'll revise my boilerplate text. How about this:
* It merges adjacent delimiters. If you use a comma as your delimiter, then "a,,b,c" will be divided into three tokens, not four. This is often the wrong thing to do. In fact, it is only the right thing to do, in my experience, when the delimiter set contains white space (for dividing a string into "words") or it is known in advance that there will be no adjacent delimiters.
It seems to me that strtok() is a solution in search of a problem. I
learned it several years ago when I began C programming but have never
used it in any 'real' program since. What good is it?
--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Joe Wright wrote: It seems to me that strtok() is a solution in search of a problem. I learned it several years ago when I began C programming but have never used it in any 'real' program since. What good is it?
It's a pretty compact way to tokenize a string, if you can handle the
limitations and are comfortable with the delimiter merging. It forms
the core of the parser for my text-adventure game, for instance, that's
why I was right on that "breaking strings into words" business.
Brian
Joe Wright <jo********@comcast.net> writes: It seems to me that strtok() is a solution in search of a problem. I learned it several years ago when I began C programming but have never used it in any 'real' program since. What good is it?
It's dangerous. I never use it, except in toy programs. I use
replacements that don't suffer from recursion issues, such as (in
programs where it is available) strtok_r.
I write a lot of programs that do various kinds of parsing of
strings. If you don't, then you might never find the need for
such a thing.
--
"A lesson for us all: Even in trivia there are traps."
--Eric Sosman
On Thu, 16 Feb 2006 19:26:05 +0000, Walter Roberson wrote: Nothing in C89 gives test-and-set assurances, so although you could possibly read the sig_atomic_t variable, test its value, and write a new value based upon what you found there, you don't know that you won't have had an interrupt between the time of the read and the time of the write, so you might clobber a mutex lock that something else has written there while you blinked.
I think I see what you are talking about. I thought you were making a
more general point -- that one could not, even in principle, implement a
mutual exclusion algorithm (such as Peterson's) with what little
assurances standard C provides. I find this part of the standard obtuse
so I would not be happy to assert that one can (although my money would be
on a "yes" rather than a "no" at the moment!).
--
Ben.
On 2006-02-16, Default User <de***********@yahoo.com> wrote: Joe Wright wrote:
It seems to me that strtok() is a solution in search of a problem. I learned it several years ago when I began C programming but have never used it in any 'real' program since. What good is it?
It's a pretty compact way to tokenize a string, if you can handle the limitations and are comfortable with the delimiter merging. It forms the core of the parser for my text-adventure game, for instance, that's why I was right on that "breaking strings into words" business.
You can also use it to parse any of the half-dozen or so unix-ish file
formats that have ':'-delimited fields.
Default User wrote: Ben Pfaff wrote: "Default User" <de***********@yahoo.com> writes:
I will disagree slightly. If you are parsing text sentences into words and don't care about punctuation, then merging is also correct.
Good point. I'll revise my boilerplate text. How about this:
* It merges adjacent delimiters. If you use a comma as your delimiter, then "a,,b,c" will be divided into three tokens, not four. This is often the wrong thing to do. In fact, it is only the right thing to do, in my experience, when the delimiter set contains white space (for dividing a string into "words") or it is known in advance that there will be no adjacent delimiters.
I'm not sure the text captured what I was going for, I'll give an example:
"In fact, it is only the right thing to do, in my experience, when the delimiter set contains white space."
When parsing the text above into words, we want to count the ", " following "fact", "do", or "experience" as a single delimiter. The merging behavior of strtok() does that.
I think a more useful general approach is:
Absorb leading blanks, followed by parsing the item up to
the first occurence of the delimiter. Absorb that delimiter.
This can detect an empty item.
Repeat
--
"The power of the Executive to cast a man into prison without
formulating any charge known to the law, and particularly to
deny him the judgement of his peers, is in the highest degree
odious and is the foundation of all totalitarian government
whether Nazi or Communist." -- W. Churchill, Nov 21, 1943
>On Thu, 16 Feb 2006 19:26:05 +0000, Walter Roberson wrote: Nothing in C89 gives test-and-set assurances ...
(This is true for C99 as well. On every real machine that really
supports multiprocessing and/or threads, one can write a suitable
mutual exclusion routine, callable from C code, but it is rare that
one can write a good one even in machine-specific C. It usually
requires some kind of assembly code: either inline __asm__, or just
a simple C-callable assembly routine.)
... so although you could possibly read the sig_atomic_t variable, test its value, and write a new value based upon what you found there, you don't know that you won't have had an interrupt between the time of the read and the time of the write, so you might clobber a mutex lock that something else has written there while you blinked.
In article <pa****************************@bsb.me.uk>
Ben Bacarisse <be********@bsb.me.uk> wrote:I think I see what you are talking about. I thought you were making a more general point -- that one could not, even in principle, implement a mutual exclusion algorithm (such as Peterson's) with what little assurances standard C provides.
You cannot, because on some real multiprocessor machines, memory
synchronization instructions are required to obtain store/load
sequencing guarantees. Consider the following "obvious" code:
# cpu 4 # cpu 7
store reg,[addr] nop
nop store reg,[addr]
load reg,[addr] nop
where instructions written on the same source line are executed in
the same clock tick, and [addr] is the same on both machines. You
would expect that the "load" on CPU 4 would see the value stored
by the "store" on CPU 7. But in fact, it may not (this depends on
unpredictable events like interrupts, making debugging difficult).
To make the result predictable, and make cpu 4 "see" cpu 7's store,
we have to replace the "nop" no-ops with "memory synchronize"
instructions (possibly in several flavors, and maybe only one is
required on each CPU; it depends quite a bit on the CPU architecture).
While the Java language has built-in thread and locking support,
it turns out that the locking was broken on some architectures,
such as the Alpha. Getting it right was tricky; a committee spent
years defining the memory model. See
<http://www.cs.umd.edu/~pugh/java/memoryModel>.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
On Fri, 17 Feb 2006 04:49:21 +0000, Chris Torek wrote: n article <pa****************************@bsb.me.uk> Ben Bacarisse <be********@bsb.me.uk> wrote:I think I see what you are talking about. I thought you were making a more general point -- that one could not, even in principle, implement a mutual exclusion algorithm (such as Peterson's) with what little assurances standard C provides.
You cannot, because on some real multiprocessor machines, memory synchronization instructions are required to obtain store/load sequencing guarantees. Consider the following "obvious" code:
# cpu 4 # cpu 7 store reg,[addr] nop nop store reg,[addr] load reg,[addr] nop
where instructions written on the same source line are executed in the same clock tick, and [addr] is the same on both machines. You would expect that the "load" on CPU 4 would see the value stored by the "store" on CPU 7. But in fact, it may not (this depends on unpredictable events like interrupts, making debugging difficult). To make the result predictable, and make cpu 4 "see" cpu 7's store, we have to replace the "nop" no-ops with "memory synchronize" instructions (possibly in several flavors, and maybe only one is required on each CPU; it depends quite a bit on the CPU architecture).
OK, I see that problem. I had assumed that declaring a variable volatile
would oblige the compiler to insert such synchronisation operations at
least at each sequence point prior to accessing it. Looking at the latest
ISO draft, I can't say that I find any support for this asssumption, but
then I can't see any wording that says that such synchronisation is not
required.
--
Ben.
>On Fri, 17 Feb 2006 04:49:21 +0000, Chris Torek wrote: ... on some real multiprocessor machines, memory synchronization instructions are required to obtain store/load sequencing guarantees. ...
In article <pa****************************@bsb.me.uk>
Ben Bacarisse <be********@bsb.me.uk> wrote:OK, I see that problem. I had assumed that declaring a variable volatile would oblige the compiler to insert such synchronisation operations at least at each sequence point prior to accessing it.
There are those who argue that it should. But consider the V9 SPARC,
for instance. There are six commonly-used memory barrier instructions:
membar #LoadLoad
membar #LoadStore
membar #StoreLoad
membar #StoreStore
membar #MemIssue
membar #Sync
Which of these should the compiler use? Which one(s) do you actually
need?
Looking at the latest ISO draft, I can't say that I find any support for this asssumption, but then I can't see any wording that says that such synchronisation is not required.
Presumably, the C99 folks (wisely) decided that this area was too
complex to tackle, and simply left it undefined, so that those
dealing with the systems can define them.
The problem with standardizing behavior is that you are then stuck
with it, even if it is not the right behavior. (Of course, the problem
with *not* standarizing behavior is that everyone runs in different
directions. But sometimes this is what you want anyway.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
On Fri, 17 Feb 2006 17:58:46 +0000, Chris Torek wrote: On Fri, 17 Feb 2006 04:49:21 +0000, Chris Torek wrote: ... on some real multiprocessor machines, memory synchronization instructions are required to obtain store/load sequencing guarantees. ... In article <pa****************************@bsb.me.uk> Ben Bacarisse <be********@bsb.me.uk> wrote:OK, I see that problem. I had assumed that declaring a variable volatile would oblige the compiler to insert such synchronisation operations at least at each sequence point prior to accessing it.
<snip> There are those who argue that it should. Presumably, the C99 folks (wisely) decided that this area was too complex to tackle, and simply left it undefined, so that those dealing with the systems can define them.
Thanks for the clarification. I imagine it is mostlt of academic interest
only since "real" concurrent programs will use some more grown-up way to
synchonise which can do whatever the hardware requires, but I appreciate
your explanation.
--
Ben. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Lans |
last post by:
I have a string that I need to tokenize but I need to use a string
token
see example i am trying the following but strtok only uses characters
as delimiters and I need to seperate bu a certain...
|
by: Kelvin |
last post by:
hi:
in C, we can use strtok() to tokenize a char*
but i can't find any similar member function of string that can tokenize a
string
so how so i tokenize a string in C++?
do it the C way?
...
|
by: Ram Laxman |
last post by:
Hi all,
How can I tokenize the integers using strtok. For example:
If I have some thing like:
"ram":"laxman":"deepak"
then I can safely use strtok.But if I have something like below:...
|
by: wreckingcru |
last post by:
I'm trying to tokenize a C++ string with the following code:
#include <iostream>
#include <stdio>
#include <string>
using namespace std;
int main(int argc, char* argv)
|
by: noobcprogrammer |
last post by:
#include "IndexADT.h"
int IndexInit(IndexADT* word)
{
word->head = NULL;
word->wordCount = 0;
return 1;
}
int IndexCreate(IndexADT* wordList,char* argv)
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: erikbower65 |
last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps:
1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal.
2. Connect to...
|
by: erikbower65 |
last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA:
1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
|
by: kcodez |
last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
|
by: Taofi |
last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same
This are my field names
ID, Budgeted, Actual, Status and Differences
...
|
by: DJRhino1175 |
last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this -
If...
|
by: Rina0 |
last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
|
by: lllomh |
last post by:
Define the method first
this.state = {
buttonBackgroundColor: 'green',
isBlinking: false, // A new status is added to identify whether the button is blinking or not
}
autoStart=()=>{
|
by: DJRhino |
last post by:
Was curious if anyone else was having this same issue or not....
I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
| |