By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,799 Members | 1,384 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,799 IT Pros & Developers. It's quick & easy.

strtok/strtok_r woes

P: n/a
I had written some code using strtok_r for parsing strings with two
"levels" of delimiters. The code was working like a charm, until it
suddenly broke one day. You see, I was applying strtok on the input
string itself, and as long as a variable string was passed to the
function, everything was hunky dory. Then one day somebody decided to
pass a constant string to my code ---- and everything came collapsing
down like a domino.

I believe that this is a pitfall into which many programmers may fall.
I have now changed the signature of my function to treat the input
string as a constant string, and I now making a local copy of the
string and operating on that. Of course, I now have to ensure that I
free up the dynamically allocated memory.
Ciao
KB

[I have posted my source listing to comp.sources.d]

Nov 14 '05 #1
Share this Question
Share on Google+
23 Replies


P: n/a
Modified source listing for my program using strtok_r in which I make a
local copy of the input string and operate strtok on that
// Program that parses strings of the form:
// Name, SS#, Emp#, Other#::Name, SS#, Emp#, Other#::Name, SS#, Emp#,
Other#
// and extracts the name and other attributes associated with an
employee
#include <stdio.h>
#include <iostream.h>
#include <fstream.h>

#include <list.h>

extern "C" char *strtok_r(char *s1, const char *s2, char **lasts);

#define MAX_DATA_LEN 128
#define EMPLOYEE_DELIMITER_STR ":"
#define ATTR_DELIMITER_STR ","
#define ATTR_DELIMITER_CHAR ','
struct EmployeeInfo
{
char _name[MAX_DATA_LEN];
size_t _attributeCount;
size_t *_attribute;

EmployeeInfo(const char *str):_attributeCount(0), _attribute(NULL)
{
if(!strstr(str,ATTR_DELIMITER_STR))
strcpy(_name, str);
else
{
char *temp1 = new char[strlen(str) + 1];
strcpy(temp1, str);
char *temp2 = temp1;

char* holdingBuf[1];

strcpy(_name, strtok_r(temp1, (const char *) ATTR_DELIMITER_STR,
holdingBuf));
temp1 += (strlen(temp1) + 1);
_attributeCount = numChars(temp1, ATTR_DELIMITER_CHAR) + 1;
_attribute = new size_t[_attributeCount];
memset(_attribute, '\0', _attributeCount*sizeof(*_attribute));

char* stringWithTokens = temp1;
char* token;
int i;
for(i =0;(token = strtok_r(stringWithTokens, (const char *)
ATTR_DELIMITER_STR, holdingBuf)) != NULL;
i++)
{
stringWithTokens = NULL;
_attribute[i] = atoi(token);
}

_attributeCount = i;

delete [] temp2;
}
}

virtual ~EmployeeInfo()
{
if(_attributeCount) delete [] _attribute;
}

void toString()
{
cout << "\nEmployee Name: " << _name << endl;
for(size_t i = 0; i < _attributeCount; i++)
cout << "Attribute " << i << ": " << _attribute[i] << endl;
}

protected:
static size_t numChars(const char* str, const char delimiter)
{
size_t retVal = 0;
char ch;
for(size_t i = 0; ch = str[i]; i++)
{
if(ch == delimiter)
retVal++;
}
return retVal;
}
};
void
parseNameString(const char* nameString)
{
char *temp1 = new char[strlen(nameString) + 1];
strcpy(temp1, nameString);
char *temp2 = temp1;
char* holdingBuf[1];

list<EmployeeInfo*> empInfoList;

char* stringWithTokens = temp1;
char* token;
while((token = strtok_r(stringWithTokens, (const char *)
EMPLOYEE_DELIMITER_STR, holdingBuf)) != NULL)
{
stringWithTokens = NULL;
EmployeeInfo *newInfo = new EmployeeInfo(token);
empInfoList.push_back(newInfo);
}

delete [] temp2;

list<EmployeeInfo*>::iterator itor, endOfList=empInfoList.end();

for(itor = empInfoList.begin(); itor != endOfList; itor++)
{
EmployeeInfo* thisInfo = *itor;
thisInfo->toString();
}

for(itor = empInfoList.begin(); itor != endOfList; itor++)
{

EmployeeInfo* thisInfo = *itor;
delete thisInfo;
}
}

int
main(int argc, char* argv[])
{
if(argc < 2)
{
cerr << "usage: " << argv[0] << " <inputfile>. Exiting ...." <<
endl;
exit(-1);
}

ifstream inFile(argv[1]);
if(!inFile)
{
cerr << "Error reading file " << argv[1] << " Exiting ...." <<
endl;
exit(-2);
}

char inputStr[MAX_DATA_LEN + 1];

/*
while(inFile >> inputStr)
{
cout << "** Parsing string " << inputStr << " **\n\n";
parseNameString(inputStr);
cout << "\n\n\n";
}
*/

while(inFile.getline(inputStr, MAX_DATA_LEN))
{
cout << "** Parsing string " << inputStr << " **\n\n";
parseNameString(inputStr);
cout << "\n\n\n";
}

exit(0);
}

Nov 14 '05 #2

P: n/a
The posting of the entire source listing to the comp.lang.c newsgroup
was an inadvertent mistake which is deeply regretted.

Nov 14 '05 #3

P: n/a
kb***@kaxy.com wrote:
I had written some code using strtok_r for parsing strings with two
"levels" of delimiters. The code was working like a charm, until it
suddenly broke one day. You see, I was applying strtok on the input
string itself, and as long as a variable string was passed to the
function, everything was hunky dory. Then one day somebody decided to
pass a constant string to my code ---- and everything came collapsing
down like a domino.

I believe that this is a pitfall into which many programmers may fall.
I have now changed the signature of my function to treat the input
string as a constant string, and I now making a local copy of the
string and operating on that. Of course, I now have to ensure that I
free up the dynamically allocated memory.
Ciao
KB

[I have posted my source listing to comp.sources.d]


char* strtok (char*, const char*);

strtok works by inserting '\0' in place of the delimiter each
successive call. Passing a string literal will definitely
cause trouble. Pass a string buffer. ;)

In fact, Herb-the-infamous-clown's book does what you did too!
Stay away from his books, unless you want an exercise
in debugging. ;)

Regards,
Jonathan.

--
"Women should come with documentation." - Dave
Nov 14 '05 #4

P: n/a
"Jonathan Burd" <jo***********@REMOVEMEgmail.com> wrote in message
news:35*************@individual.net...

strtok works by inserting '\0' in place of the delimiter each
successive call. Passing a string literal will definitely
cause trouble. Pass a string buffer. ;)

In fact, Herb-the-infamous-clown's book does what you did too!
Wow, I knew he was spreading much misinformation, but
I didn't think it was *that* bad.

Stay away from his books, unless you want an exercise
in debugging. ;)


Too true.

-Mike
Nov 14 '05 #5

P: n/a
Mike Wahler wrote:
"Jonathan Burd" <jo***********@REMOVEMEgmail.com> wrote in message
news:35*************@individual.net...
strtok works by inserting '\0' in place of the delimiter each
successive call. Passing a string literal will definitely
cause trouble. Pass a string buffer. ;)

In fact, Herb-the-infamous-clown's book does what you did too!

Wow, I knew he was spreading much misinformation, but
I didn't think it was *that* bad.

<snip>

I wonder if he even compiled the example he has given.
I've put large crosses in that book wherever I've discovered
such errors. (That's the only book that I've marked so far.
Heh.)

Regards,
Jonathan.

--
"Women should come with documentation." - Dave
Nov 14 '05 #6

P: n/a

"Jonathan Burd" <jo***********@REMOVEMEgmail.com> wrote in message
news:35*************@individual.net...
Mike Wahler wrote:
"Jonathan Burd" <jo***********@REMOVEMEgmail.com> wrote in message
news:35*************@individual.net...
strtok works by inserting '\0' in place of the delimiter each
successive call. Passing a string literal will definitely
cause trouble. Pass a string buffer. ;)

In fact, Herb-the-infamous-clown's book does what you did too!

Wow, I knew he was spreading much misinformation, but
I didn't think it was *that* bad.

<snip>

I wonder if he even compiled the example he has given.
I've put large crosses in that book wherever I've discovered
such errors. (That's the only book that I've marked so far.
Heh.)


Depending upon the implementation(s) used, compiling and
testing might not have caught it. Especially on older
systems I've used, I've seen them happily accept modifying
literals, and 'seem to work'. I think stuff like this
(depending upon behavior of particular implementation(s)
rather than the language rules for determining correctness)
could easily be the cause of many of the errors in his work.

-Mike
Nov 14 '05 #7

P: n/a
In the interest of "public disclosure" I am reproducing the coding
sample that I see in my copy of "C: The Complete Reference"

--NS

#include "stdio.h" /* NS: Shouldn't this be <stdio.h>? */
#include "string.h" /* NS: Shouldn't this be <string.h>? */

void main(void)
{
char *p;

p = strtok("The summer soldier, the sunshine patriot", " ");
/* NS: Passing a constant string to strtok */

printf(p); /* NS: Isn't %s missing here? */
do {
p = strtok('\0', ", ");
if(p) printf("|%s", p);
} while(p);
}

Nov 14 '05 #8

P: n/a
On 24 Jan 2005 18:37:09 -0800, ni*************@yahoo.com wrote in
comp.lang.c:
In the interest of "public disclosure" I am reproducing the coding
sample that I see in my copy of "C: The Complete Reference"

--NS

#include "stdio.h" /* NS: Shouldn't this be <stdio.h>? */
#include "string.h" /* NS: Shouldn't this be <string.h>? */
Yes, they both should. The #include "sometext" format is for
including source files, the #include <sometext> format is for
including standard headers, which need not be files.
void main(void)
This of course is just plain undefined behavior.
{
char *p;

p = strtok("The summer soldier, the sunshine patriot", " ");
/* NS: Passing a constant string to strtok */
No, your comment is in correct. He is passing the address of a string
literal, but the type of a string literal in C is "array of char" and
NOT "array of const char". Attempting to modify a string literal in C
does indeed produce undefined behavior, but not because the characters
are const, just because the C standard specifically says so.

Because attempting to modify a string literal is undefined, compilers
are free to actually make them const, but are not required to do so.
No strictly conforming program can tell one way or the other.
printf(p); /* NS: Isn't %s missing here? */
Not necessarily. If you pass a pointer to char to printf, assuming
that the pointer points to a valid string, printf will just output the
string. Unless of course the string happens to contain a conversion
specifier. If p points to "%s" for example, yet another incident of
undefined behavior occurs.
do {
p = strtok('\0', ", ");
if(p) printf("|%s", p);
} while(p);
}


Let it never be said that Schildt lacks the talent to cram multiple
examples of undefined behavior in such a small piece of code.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Nov 14 '05 #9

P: n/a
ni*************@yahoo.com wrote:
In the interest of "public disclosure" I am reproducing the coding
sample that I see in my copy of "C: The Complete Reference"

--NS

#include "stdio.h" /* NS: Shouldn't this be <stdio.h>? */
#include "string.h" /* NS: Shouldn't this be <string.h>? */

void main(void)
{
char *p;

p = strtok("The summer soldier, the sunshine patriot", " ");
/* NS: Passing a constant string to strtok */

printf(p); /* NS: Isn't %s missing here? */
No.

I prefer using puts() for testing certain string functions
instead of printf() though. Take, for example, a function
that generates a string consisting of printable characters when you
specify a range, say "^A-Z" (all printable characters except those in
the range A-Z; ^ negates), the string will contain '%' and if you use
printf to test such a routine, it will result in UB. It is
easy to forget that your string may contain '%', so when
testing string functions, generally avoid using printf() unless
you really need it and know what you are doing.
do {
p = strtok('\0', ", ");
if(p) printf("|%s", p);
} while(p);
}


Another error is his incorrect usage of feof().

See his usage in his book and then read this:
http://www.drpaulcarter.com/cs/common-c-errors.php#4.2

Regards,
Jonathan.

--
"Women should come with documentation." - Dave
Nov 14 '05 #10

P: n/a
Jonathan Burd wrote:
ni*************@yahoo.com wrote:


<snip>
printf(p); /* NS: Isn't %s missing here? */

No.

I prefer using puts() for testing certain string functions
instead of printf() though. Take, for example, a function
that generates a string consisting of printable characters when you
specify a range, say "^A-Z" (all printable characters except those in
the range A-Z; ^ negates), the string will contain '%' and if you use
printf to test such a routine, it will result in UB. It is
easy to forget that your string may contain '%', so when
testing string functions, generally avoid using printf() unless
you really need it and know what you are doing.


In short, use printf ("%s", p); when you need to use it.
printf (p); may be cause palpitation. ;)

<snip>

Regards,
Jonathan.

--
"Women should come with documentation." - Dave
Nov 14 '05 #11

P: n/a
Jonathan Burd <jo***********@REMOVEMEgmail.com> writes:
ni*************@yahoo.com wrote:
printf(p); /* NS: Isn't %s missing here? */


No.


Yes. Write printf("%s", p); instead, unless you really and truly
composed a format string, which is rare indeed.
--
"I don't have C&V for that handy, but I've got Dan Pop."
--E. Gibbons
Nov 14 '05 #12

P: n/a
Ben Pfaff wrote:
.... snip ...
--
"I don't have C&V for that handy, but I've got Dan Pop."
--E. Gibbons


Do you? Speaking of whom, where has he taken his diplomatic
skills? When last seen around here he had a sig intimating that he
was job hunting.

Another sorely missing regular is Richard Heathfield.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
Nov 14 '05 #13

P: n/a
CBFalconer wrote:
Another sorely missing regular is Richard Heathfield.


http://www.iso-9899.info/wiki/Usenet

has some interesting hints about Richard now using a new "nom de plume".

- Larry Weiss
Nov 14 '05 #14

P: n/a
ni*************@yahoo.com wrote:
do {
p = strtok('\0', ", ");


Apart from what the others have written, that is a misleading strtok()
call. The first parameter to strtok() is a char pointer, either to a
(modifiable) string, or a null pointer. Now it so happens that, through
a quirk of the C Standard, a null character is (unfortunately) also a
null pointer constant. However, to use it as such is misguided, and
could be confusing.

Richard
Nov 14 '05 #15

P: n/a
rl*@hoekstra-uitgeverij.nl (Richard Bos) writes:
ni*************@yahoo.com wrote:
do {
p = strtok('\0', ", ");


Apart from what the others have written, that is a misleading strtok()
call. The first parameter to strtok() is a char pointer, either to a
(modifiable) string, or a null pointer. Now it so happens that, through
a quirk of the C Standard, a null character is (unfortunately) also a
null pointer constant. However, to use it as such is misguided, and
could be confusing.


I'm sure you meant to say that a null character *constant* is a null
pointer constant. A character variable whose current value happens to
be '\0' won't work.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #16

P: n/a
Keith Thompson <ks***@mib.org> wrote:
rl*@hoekstra-uitgeverij.nl (Richard Bos) writes:
ni*************@yahoo.com wrote:
do {
p = strtok('\0', ", ");
Apart from what the others have written, that is a misleading strtok()
call. The first parameter to strtok() is a char pointer, either to a
(modifiable) string, or a null pointer. Now it so happens that, through
a quirk of the C Standard, a null character is (unfortunately) also a
null pointer constant. However, to use it as such is misguided, and
could be confusing.


I'm sure you meant to say that a null character *constant* is a null
pointer constant.


Yes, so I did.
A character variable whose current value happens to be '\0' won't work.


No more than an int object whose value is 0, no.

Richard
Nov 14 '05 #17

P: n/a
On 24 Jan 2005 18:37:09 -0800,
ni*************@yahoo.com <ni*************@yahoo.com> wrote
in Msg. <11**********************@z14g2000cwz.googlegroups .com>
In the interest of "public disclosure" I am reproducing the coding
sample that I see in my copy of "C: The Complete Reference"


[ mind-boggling piece of shi^h^h^hcode deleted ]

Is this really Schildt? I've never seen his book, but this is worse than I
would have ever dreamed.

--Daniel
Nov 14 '05 #18

P: n/a
On Thu, 27 Jan 2005 09:12:56 +0000, Daniel Haude wrote:
On 24 Jan 2005 18:37:09 -0800,
ni*************@yahoo.com <ni*************@yahoo.com> wrote
in Msg. <11**********************@z14g2000cwz.googlegroups .com>
In the interest of "public disclosure" I am reproducing the coding
sample that I see in my copy of "C: The Complete Reference"


[ mind-boggling piece of shi^h^h^hcode deleted ]

Is this really Schildt? I've never seen his book, but this is worse than I
would have ever dreamed.


Try a Google search on the word "bullschildt"

Lawrence
Nov 14 '05 #19

P: n/a
In article <sl*****************@kir.physnet.uni-hamburg.de>,
ha***@kir.physnet.uni-hamburg.de says...
On 24 Jan 2005 18:37:09 -0800,
ni*************@yahoo.com <ni*************@yahoo.com> wrote
in Msg. <11**********************@z14g2000cwz.googlegroups .com>
In the interest of "public disclosure" I am reproducing the coding
sample that I see in my copy of "C: The Complete Reference"


[ mind-boggling piece of shi^h^h^hcode deleted ]

Is this really Schildt? I've never seen his book, but this is worse than I
would have ever dreamed.


Yes, he's really that bad. I would have used the couple books of his I
bought (years ago before I realized the problem) to start fires with, but
now I save them in case somebody doesn't believe how bad he is. I keep
them out of sight, it's embarrassing to even have a copy, you
definitely don't want another programmer seeing you with one.

--
Randy Howard (2reply remove FOOBAR)
Nov 14 '05 #20

P: n/a
In article <ct********@library2.airnews.net>, lf*@airmail.net says...
CBFalconer wrote:
Another sorely missing regular is Richard Heathfield.


http://www.iso-9899.info/wiki/Usenet

has some interesting hints about Richard now using a new "nom de plume".


I have a guess, but the even more interesting question is *why*
would he change it now?

--
Randy Howard (2reply remove FOOBAR)
Nov 14 '05 #21

P: n/a
"Randy Howard" <ra*********@FOOverizonBAR.net> wrote in message
news:MP************************@news.verizon.net.. .
In article <ct********@library2.airnews.net>, lf*@airmail.net says...
CBFalconer wrote:
Another sorely missing regular is Richard Heathfield.
http://www.iso-9899.info/wiki/Usenet

has some interesting hints about Richard now using a new "nom de

plume".
I have a guess, but the even more interesting question is *why*
would he change it now?


Pshaw.

Much more interesting is the reference:

"Dan Pop Dennis Ritchie (yes, the R of K&R)"

I never knew...

--
Mabden
Nov 14 '05 #22

P: n/a
In article <a6*******************@newssvr13.news.prodigy.com> ,
mabden@sbc_global.net says...
"Randy Howard" <ra*********@FOOverizonBAR.net> wrote in message
news:MP************************@news.verizon.net.. .
In article <ct********@library2.airnews.net>, lf*@airmail.net says...
CBFalconer wrote:
> Another sorely missing regular is Richard Heathfield.

http://www.iso-9899.info/wiki/Usenet

has some interesting hints about Richard now using a new "nom de

plume".

I have a guess, but the even more interesting question is *why*
would he change it now?


Pshaw.

Much more interesting is the reference:

"Dan Pop Dennis Ritchie (yes, the R of K&R)"

I never knew...


Yes, I noticed the missing comma also. Being a wiki, maybe somebody
will fix it. Or maybe Dan likes it. :-)

Nov 14 '05 #23

P: n/a
Randy Howard <ra*********@FOOverizonBAR.net> writes:
In article <a6*******************@newssvr13.news.prodigy.com> ,
mabden@sbc_global.net says...

[...]
Much more interesting is the reference:

"Dan Pop Dennis Ritchie (yes, the R of K&R)"

I never knew...


Yes, I noticed the missing comma also. Being a wiki, maybe somebody
will fix it. Or maybe Dan likes it. :-)


I just fixed it.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #24

This discussion thread is closed

Replies have been disabled for this discussion.