By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,551 Members | 2,741 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,551 IT Pros & Developers. It's quick & easy.

String tokens/parsing

P: n/a
(if this is a FAQ, I apologize for not finding it)

I have a C-style string that I'd like to cleanly separate into tokens
(based on the '.' character) and then convert those tokens to unsigned
integers. What is the best standard(!) C++ way to accomplish this?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 22 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a
Christopher Benson-Manica wrote:
(if this is a FAQ, I apologize for not finding it)

I have a C-style string that I'd like to cleanly separate into tokens
(based on the '.' character) and then convert those tokens to unsigned
integers. What is the best standard(!) C++ way to accomplish this?


The strtok function will find tokens in the string and
modify your string.

Perhaps strchr to find the '.'.

Another function is sscanf. I've heard that you can set
the format descriptor string so that it parses correctly.
{which may be a difficult task). I'm sure if you post
to news:comp.lang.c, Dan Pop will show the way.

As for C++, you may want to convert to a std::string
and use the "find" methods and maybe a stringstream
for converting to an int.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Jul 22 '05 #2

P: n/a
Thomas Matthews <Th****************************@sbcglobal.net> spoke thus:
The strtok function will find tokens in the string and
modify your string. Perhaps strchr to find the '.' Another function is sscanf. I've heard that you can set
the format descriptor string so that it parses correctly.
{which may be a difficult task). I'm sure if you post
to news:comp.lang.c, Dan Pop will show the way.


Believe me, I'm perfectly capable of doing this with C, and am no
stranger to comp.lang.c. I posted here specifically because I'm
interested in improving on the C methods, if that is in fact possible.
The original C code (we're stuck in a "C-style-C++" paradigm,
unfortunately) strikes me as being distinctively hack-y.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 22 '05 #3

P: n/a
Christopher Benson-Manica wrote:
Believe me, I'm perfectly capable of doing this with C, and am no
stranger to comp.lang.c. I posted here specifically because I'm
interested in improving on the C methods, if that is in fact possible.
The original C code (we're stuck in a "C-style-C++" paradigm,
unfortunately) strikes me as being distinctively hack-y.

Then you need to be more specific about what you already have, and what
your requirements are. Do you object to converting the C-style strings
to std::strings or std::stringstreams? What limitations does your
"C-style-C++" paradigm impose? To what extent can you deviate from the C
standard library?

You have a very vague question.


Brian Rodenborn
Jul 22 '05 #4

P: n/a
Default User <fi********@boeing.com.invalid> spoke thus:
Then you need to be more specific about what you already have,
The original code looked like the following unholy mess (which I did
not write):

// unsigned int typedef'ed as uint
// assume appropriate #includes

char sTemp[64];
uint uDeptTime, uArrvTime;
if( argc>=2 ) {
if( sameas(argv[1], "") ) { // sameas ~ strcmp() with flavor
// error
}
strncpy( sTemp, argv[1], sizeof(sTemp) );
cp=strchr(sTemp, '.');
if( cp == NULL ) {
// error
}
uint const lene=strlen(cp);
uint const lenb=strlen(sTemp);
uint const lenr=lenb-lene;
sTemp[lenr]='\0';
uDeptTime=(uint)atoi(sTemp);
if( cp+1 ) {
uArrvTime=(uint)atoi(cp+1);
}
}

I wrote the following as a first approximation to a decent solution:

char *cp;
vector<uint> v;
char sTemp[64];
uint uDeptTime, uArrvTime, uMaxGT; // new variable
for( cp=argv[1] ; (cp=strchr(cp,'.')) != NULL ; ) {
v.push_back( atoi(cp++) ); // atoi() wraps the "standard" atoi, but
// I don't know the details
}
if( v.size() < 3 ) {
// error
}
uDeptTime=v[0];
uArrvTime=v[1];
uMaxGT=v[2];
and what your requirements are.
Straightline code only. (no additional class declarations)
Do you object to converting the C-style strings to std::strings or
std::stringstreams?
I'd love to use std::strings and/or std::stringstreams if they offer a
cleaner (not necessarily more "efficient") solution.
What limitations does your "C-style-C++" paradigm impose?
The STL is never used in our code, and vectors in particular seem to
be frowned upon. std::stringstreams might be pushing the envelope.
You have a very vague question.


Better?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 22 '05 #5

P: n/a
Chris,

Christopher Benson-Manica wrote:
Default User <fi********@boeing.com.invalid> spoke thus:

Then you need to be more specific about what you already have,


[snip]
<sigh>

Sometimes those responding to messages in this group can be a little...
well... pedantic. If you are looking for a decent C++ implementation of
a tokenizer, have a look at the boost.org site where they have a decent
tokenizer.

The Boost web site contains many useful template libraries that
complement the STL. As it turns out, many of the STL authors contribute
to this site. The way they put it, many of their submissions didn't make
it into the standard, but are none-the-less useful and worthy of use.

Jul 22 '05 #6

P: n/a
Evan Carew <te*******@pobox.com> spoke thus:
Sometimes those responding to messages in this group can be a little...
well... pedantic.
And I wouldn't want it any other way :)
If you are looking for a decent C++ implementation of
a tokenizer, have a look at the boost.org site where they have a decent
tokenizer.


Unfortunately, boost is out of the question here. I'm working at a
company where any code not written in-house (i.e., by my boss) is
considered suspect, so in effect I'm trying to sneak some "real" C++
in the code here and there below the radar. There are times where
std::strings can really make life simple, so I toss them in
occasionally, but for the most part C-style strings rule the day.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 22 '05 #7

P: n/a
Christopher Benson-Manica <at***@nospam.cyberspace.org> spoke thus:
I wrote the following as a first approximation to a decent solution:
And then (now) realized that it doesn't work...
for( cp=argv[1] ; (cp=strchr(cp,'.')) != NULL ; ) {
v.push_back( atoi(cp++) ); // atoi() wraps the "standard" atoi, but
// I don't know the details
}


*sigh*

for( cp=argv[1] ; cp && cp++ ; cp=strchr(cp,'.') ) {
v.push_back( atoi(cp) );
}

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 22 '05 #8

P: n/a
Christopher Benson-Manica wrote:

Default User <fi********@boeing.com.invalid> spoke thus:
Then you need to be more specific about what you already have,


The original code looked like the following unholy mess (which I did
not write):

// unsigned int typedef'ed as uint
// assume appropriate #includes

char sTemp[64];
uint uDeptTime, uArrvTime;
if( argc>=2 ) {
if( sameas(argv[1], "") ) { // sameas ~ strcmp() with flavor
// error
}
strncpy( sTemp, argv[1], sizeof(sTemp) );
cp=strchr(sTemp, '.');
if( cp == NULL ) {
// error
}
uint const lene=strlen(cp);
uint const lenb=strlen(sTemp);
uint const lenr=lenb-lene;
sTemp[lenr]='\0';
uDeptTime=(uint)atoi(sTemp);
if( cp+1 ) {
uArrvTime=(uint)atoi(cp+1);
}
}

I wrote the following as a first approximation to a decent solution:

char *cp;
vector<uint> v;
char sTemp[64];
uint uDeptTime, uArrvTime, uMaxGT; // new variable
for( cp=argv[1] ; (cp=strchr(cp,'.')) != NULL ; ) {
v.push_back( atoi(cp++) ); // atoi() wraps the "standard" atoi, but
// I don't know the details
}
if( v.size() < 3 ) {
// error
}
uDeptTime=v[0];
uArrvTime=v[1];
uMaxGT=v[2];
and what your requirements are.


Straightline code only. (no additional class declarations)
Do you object to converting the C-style strings to std::strings or
std::stringstreams?


I'd love to use std::strings and/or std::stringstreams if they offer a
cleaner (not necessarily more "efficient") solution.


Ok, Donovan Rebbechi previously posted this:

The simplest way would be to use the getline() function and set the
optionl
field separator argument to "."

std::istringstream in(mystring);

while (std::getline(in, mystring, '.'))
{
stringlist.push_back(mystring);
};
What limitations does your "C-style-C++" paradigm impose?


The STL is never used in our code, and vectors in particular seem to
be frowned upon. std::stringstreams might be pushing the envelope.


My usual tool for this sort of thing is the Explode function I wrote for
string parsing. Unfortunately, it returns a vector of strings. I'll
present it anyway, you may be able to get some value from it. Or not.

#include <vector>
#include <string>

// breaks apart a string into substrings separated by a character string
// does not use a strtok() style list of separator characters
// returns a vector of std::strings

std::vector<std::string> Explode (const std::string &inString,
const std::string &separator)
{
std::vector<std::string> returnVector;
std::string::size_type start = 0;
std::string::size_type end = 0;

while ((end=inString.find (separator, start)) != std::string::npos)
{
returnVector.push_back (inString.substr (start, end-start));
start = end+separator.size();
}

returnVector.push_back (inString.substr (start));

return returnVector;
}
You have a very vague question.


Better?


Much.

Brian Rodenborn
Jul 22 '05 #9

P: n/a
Evan Carew wrote:
Sometimes those responding to messages in this group can be a little...
well... pedantic.
I'm not sure if my questions were pedantic. Had he presented the problem
cleanly, then one could try to answer. As he had some not well-define
limits, I thought it prudent to ask before presenting solutions that may
not suit him. For instance, in light of his followup, something like my
Explode() function I trot out now and them wouldn't do, because it
returns a vector or strings.
If you are looking for a decent C++ implementation of
a tokenizer, have a look at the boost.org site where they have a decent
tokenizer.


Considering that he specified a "C-style C++ paradigm" I doubt Boost
will be in his solution set. Which is exactly why I asked.

Brian Rodenborn
Jul 22 '05 #10

P: n/a
Christopher Benson-Manica wrote:

(if this is a FAQ, I apologize for not finding it)

I have a C-style string that I'd like to cleanly separate into tokens
(based on the '.' character) and then convert those tokens to unsigned
integers. What is the best standard(!) C++ way to accomplish this?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.


Here is some code I wrote for this following a similar discussion:

http://groups.google.com/groups?hl=e...1%40nomail.com

using namespace std;

template <typename InsertIter>
void
tokenize(const string& buf, const string& delim, InsertIter& ii)
{
string::size_type sp(0); /* start position */
string::size_type ep(-1); /* end position */

do{
sp = buf.find_first_not_of(delim, ep+1);
ep = buf.find_first_of(delim, sp);
if(sp != ep){
if(ep == buf.npos)
ep = buf.length();
*ii++ = buf.substr(sp, ep-sp);
}
}while(sp != buf.npos);
}

You fill the delim string and then do your I/O in a loop similar to the
following:

deque<string> tokens;
while(std::getline(cin, buf) && !cin.fail()){
insert_iterator<deque<string> > ii(tokens, tokens.begin());
tokenize(buf, delim, ii);
if(tokens.size() > 0){
copy(tokens.begin(), tokens.end(),
ostream_iterator<string>(cout, "\n"));
tokens.clear();
}
}

The referenced discussion also indicates a relatively neater method
using locale and istringstream, but I don't have the implementation
handy.

HTH,

/david

--
Andre, a simple peasant, had only one thing on his mind as he crept
along the East wall: 'Andre, creep... Andre, creep... Andre, creep.'
-- unknown
Jul 22 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.