473,385 Members | 1,370 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

String tokens/parsing

(if this is a FAQ, I apologize for not finding it)

I have a C-style string that I'd like to cleanly separate into tokens
(based on the '.' character) and then convert those tokens to unsigned
integers. What is the best standard(!) C++ way to accomplish this?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 22 '05 #1
10 2600
Christopher Benson-Manica wrote:
(if this is a FAQ, I apologize for not finding it)

I have a C-style string that I'd like to cleanly separate into tokens
(based on the '.' character) and then convert those tokens to unsigned
integers. What is the best standard(!) C++ way to accomplish this?


The strtok function will find tokens in the string and
modify your string.

Perhaps strchr to find the '.'.

Another function is sscanf. I've heard that you can set
the format descriptor string so that it parses correctly.
{which may be a difficult task). I'm sure if you post
to news:comp.lang.c, Dan Pop will show the way.

As for C++, you may want to convert to a std::string
and use the "find" methods and maybe a stringstream
for converting to an int.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Jul 22 '05 #2
Thomas Matthews <Th****************************@sbcglobal.net> spoke thus:
The strtok function will find tokens in the string and
modify your string. Perhaps strchr to find the '.' Another function is sscanf. I've heard that you can set
the format descriptor string so that it parses correctly.
{which may be a difficult task). I'm sure if you post
to news:comp.lang.c, Dan Pop will show the way.


Believe me, I'm perfectly capable of doing this with C, and am no
stranger to comp.lang.c. I posted here specifically because I'm
interested in improving on the C methods, if that is in fact possible.
The original C code (we're stuck in a "C-style-C++" paradigm,
unfortunately) strikes me as being distinctively hack-y.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 22 '05 #3
Christopher Benson-Manica wrote:
Believe me, I'm perfectly capable of doing this with C, and am no
stranger to comp.lang.c. I posted here specifically because I'm
interested in improving on the C methods, if that is in fact possible.
The original C code (we're stuck in a "C-style-C++" paradigm,
unfortunately) strikes me as being distinctively hack-y.

Then you need to be more specific about what you already have, and what
your requirements are. Do you object to converting the C-style strings
to std::strings or std::stringstreams? What limitations does your
"C-style-C++" paradigm impose? To what extent can you deviate from the C
standard library?

You have a very vague question.


Brian Rodenborn
Jul 22 '05 #4
Default User <fi********@boeing.com.invalid> spoke thus:
Then you need to be more specific about what you already have,
The original code looked like the following unholy mess (which I did
not write):

// unsigned int typedef'ed as uint
// assume appropriate #includes

char sTemp[64];
uint uDeptTime, uArrvTime;
if( argc>=2 ) {
if( sameas(argv[1], "") ) { // sameas ~ strcmp() with flavor
// error
}
strncpy( sTemp, argv[1], sizeof(sTemp) );
cp=strchr(sTemp, '.');
if( cp == NULL ) {
// error
}
uint const lene=strlen(cp);
uint const lenb=strlen(sTemp);
uint const lenr=lenb-lene;
sTemp[lenr]='\0';
uDeptTime=(uint)atoi(sTemp);
if( cp+1 ) {
uArrvTime=(uint)atoi(cp+1);
}
}

I wrote the following as a first approximation to a decent solution:

char *cp;
vector<uint> v;
char sTemp[64];
uint uDeptTime, uArrvTime, uMaxGT; // new variable
for( cp=argv[1] ; (cp=strchr(cp,'.')) != NULL ; ) {
v.push_back( atoi(cp++) ); // atoi() wraps the "standard" atoi, but
// I don't know the details
}
if( v.size() < 3 ) {
// error
}
uDeptTime=v[0];
uArrvTime=v[1];
uMaxGT=v[2];
and what your requirements are.
Straightline code only. (no additional class declarations)
Do you object to converting the C-style strings to std::strings or
std::stringstreams?
I'd love to use std::strings and/or std::stringstreams if they offer a
cleaner (not necessarily more "efficient") solution.
What limitations does your "C-style-C++" paradigm impose?
The STL is never used in our code, and vectors in particular seem to
be frowned upon. std::stringstreams might be pushing the envelope.
You have a very vague question.


Better?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 22 '05 #5
Chris,

Christopher Benson-Manica wrote:
Default User <fi********@boeing.com.invalid> spoke thus:

Then you need to be more specific about what you already have,


[snip]
<sigh>

Sometimes those responding to messages in this group can be a little...
well... pedantic. If you are looking for a decent C++ implementation of
a tokenizer, have a look at the boost.org site where they have a decent
tokenizer.

The Boost web site contains many useful template libraries that
complement the STL. As it turns out, many of the STL authors contribute
to this site. The way they put it, many of their submissions didn't make
it into the standard, but are none-the-less useful and worthy of use.

Jul 22 '05 #6
Evan Carew <te*******@pobox.com> spoke thus:
Sometimes those responding to messages in this group can be a little...
well... pedantic.
And I wouldn't want it any other way :)
If you are looking for a decent C++ implementation of
a tokenizer, have a look at the boost.org site where they have a decent
tokenizer.


Unfortunately, boost is out of the question here. I'm working at a
company where any code not written in-house (i.e., by my boss) is
considered suspect, so in effect I'm trying to sneak some "real" C++
in the code here and there below the radar. There are times where
std::strings can really make life simple, so I toss them in
occasionally, but for the most part C-style strings rule the day.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 22 '05 #7
Christopher Benson-Manica <at***@nospam.cyberspace.org> spoke thus:
I wrote the following as a first approximation to a decent solution:
And then (now) realized that it doesn't work...
for( cp=argv[1] ; (cp=strchr(cp,'.')) != NULL ; ) {
v.push_back( atoi(cp++) ); // atoi() wraps the "standard" atoi, but
// I don't know the details
}


*sigh*

for( cp=argv[1] ; cp && cp++ ; cp=strchr(cp,'.') ) {
v.push_back( atoi(cp) );
}

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 22 '05 #8
Christopher Benson-Manica wrote:

Default User <fi********@boeing.com.invalid> spoke thus:
Then you need to be more specific about what you already have,


The original code looked like the following unholy mess (which I did
not write):

// unsigned int typedef'ed as uint
// assume appropriate #includes

char sTemp[64];
uint uDeptTime, uArrvTime;
if( argc>=2 ) {
if( sameas(argv[1], "") ) { // sameas ~ strcmp() with flavor
// error
}
strncpy( sTemp, argv[1], sizeof(sTemp) );
cp=strchr(sTemp, '.');
if( cp == NULL ) {
// error
}
uint const lene=strlen(cp);
uint const lenb=strlen(sTemp);
uint const lenr=lenb-lene;
sTemp[lenr]='\0';
uDeptTime=(uint)atoi(sTemp);
if( cp+1 ) {
uArrvTime=(uint)atoi(cp+1);
}
}

I wrote the following as a first approximation to a decent solution:

char *cp;
vector<uint> v;
char sTemp[64];
uint uDeptTime, uArrvTime, uMaxGT; // new variable
for( cp=argv[1] ; (cp=strchr(cp,'.')) != NULL ; ) {
v.push_back( atoi(cp++) ); // atoi() wraps the "standard" atoi, but
// I don't know the details
}
if( v.size() < 3 ) {
// error
}
uDeptTime=v[0];
uArrvTime=v[1];
uMaxGT=v[2];
and what your requirements are.


Straightline code only. (no additional class declarations)
Do you object to converting the C-style strings to std::strings or
std::stringstreams?


I'd love to use std::strings and/or std::stringstreams if they offer a
cleaner (not necessarily more "efficient") solution.


Ok, Donovan Rebbechi previously posted this:

The simplest way would be to use the getline() function and set the
optionl
field separator argument to "."

std::istringstream in(mystring);

while (std::getline(in, mystring, '.'))
{
stringlist.push_back(mystring);
};
What limitations does your "C-style-C++" paradigm impose?


The STL is never used in our code, and vectors in particular seem to
be frowned upon. std::stringstreams might be pushing the envelope.


My usual tool for this sort of thing is the Explode function I wrote for
string parsing. Unfortunately, it returns a vector of strings. I'll
present it anyway, you may be able to get some value from it. Or not.

#include <vector>
#include <string>

// breaks apart a string into substrings separated by a character string
// does not use a strtok() style list of separator characters
// returns a vector of std::strings

std::vector<std::string> Explode (const std::string &inString,
const std::string &separator)
{
std::vector<std::string> returnVector;
std::string::size_type start = 0;
std::string::size_type end = 0;

while ((end=inString.find (separator, start)) != std::string::npos)
{
returnVector.push_back (inString.substr (start, end-start));
start = end+separator.size();
}

returnVector.push_back (inString.substr (start));

return returnVector;
}
You have a very vague question.


Better?


Much.

Brian Rodenborn
Jul 22 '05 #9
Evan Carew wrote:
Sometimes those responding to messages in this group can be a little...
well... pedantic.
I'm not sure if my questions were pedantic. Had he presented the problem
cleanly, then one could try to answer. As he had some not well-define
limits, I thought it prudent to ask before presenting solutions that may
not suit him. For instance, in light of his followup, something like my
Explode() function I trot out now and them wouldn't do, because it
returns a vector or strings.
If you are looking for a decent C++ implementation of
a tokenizer, have a look at the boost.org site where they have a decent
tokenizer.


Considering that he specified a "C-style C++ paradigm" I doubt Boost
will be in his solution set. Which is exactly why I asked.

Brian Rodenborn
Jul 22 '05 #10
Christopher Benson-Manica wrote:

(if this is a FAQ, I apologize for not finding it)

I have a C-style string that I'd like to cleanly separate into tokens
(based on the '.' character) and then convert those tokens to unsigned
integers. What is the best standard(!) C++ way to accomplish this?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.


Here is some code I wrote for this following a similar discussion:

http://groups.google.com/groups?hl=e...1%40nomail.com

using namespace std;

template <typename InsertIter>
void
tokenize(const string& buf, const string& delim, InsertIter& ii)
{
string::size_type sp(0); /* start position */
string::size_type ep(-1); /* end position */

do{
sp = buf.find_first_not_of(delim, ep+1);
ep = buf.find_first_of(delim, sp);
if(sp != ep){
if(ep == buf.npos)
ep = buf.length();
*ii++ = buf.substr(sp, ep-sp);
}
}while(sp != buf.npos);
}

You fill the delim string and then do your I/O in a loop similar to the
following:

deque<string> tokens;
while(std::getline(cin, buf) && !cin.fail()){
insert_iterator<deque<string> > ii(tokens, tokens.begin());
tokenize(buf, delim, ii);
if(tokens.size() > 0){
copy(tokens.begin(), tokens.end(),
ostream_iterator<string>(cout, "\n"));
tokens.clear();
}
}

The referenced discussion also indicates a relatively neater method
using locale and istringstream, but I don't have the implementation
handy.

HTH,

/david

--
Andre, a simple peasant, had only one thing on his mind as he crept
along the East wall: 'Andre, creep... Andre, creep... Andre, creep.'
-- unknown
Jul 22 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

28
by: David Rubin | last post by:
I looked on google for an answer, but I didn't find anything short of using boost which sufficiently answers my question: what is a good way of doing string tokenization (note: I cannot use boost)....
7
by: Daniel Lidström | last post by:
Hi, I'm currently using this method to extract doubles from a string: System::String* sp = S" "; System::String* tokens = s->Trim()->Split(sp->ToCharArray()); m_Northing =...
26
by: Kai Jaensch | last post by:
Hello, i am an newbie and i have to to solve this problem as fast as i can. But at this time i don´t have a lot of success. Can anybody help me (and understand my english :-))? I have a...
15
by: John Smith | last post by:
I would like to parse a string into an array. I found on the net the following codes which parse a string and print it. The result is exactly what I want: char * pch; pch = strtok (buffer," ");...
3
by: Dave | last post by:
I'm calling string.Split() producing output string. I need direct access to its enumerator, but would greatly prefer an enumerator strings and not object types (as my parsing is unsafe casting...
7
by: Donn Ingle | last post by:
Hi, I really hope someone can help me -- I'm stuck. I have written three versions of code over a week and still can't get past this problem, it's blocking my path to getting other code written. ...
3
by: WP | last post by:
Hello! I need some help with my program...it's supposed to read infix expressions line by line from stdin and each expression should be divided into operands and operators and added to a vector of...
6
by: James Arnold | last post by:
Hello, I am new to C and I am trying to write a few small applications to get some hands-on practise! I am trying to write a random string generator, based on a masked input. For example, given...
6
by: (2b|!2b)==? | last post by:
I am expecting a string of this format: "id1:param1,param2;id2:param1,param2,param3;id" The tokens are seperated by semicolon ";" However each token is really a struct of the following...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.