By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,234 Members | 1,889 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,234 IT Pros & Developers. It's quick & easy.

String tokenizer comments desired

P: n/a
The function in question follows:

vector<string>& tokenize(
const string& s,
vector<string>& v,
char delimiter=',' )
{
int delim_idx, begin_idx=0, len=s.length();

for( delim_idx=s.find_first_of(delimiter,begin_idx) ;
delim_idx >=0 && begin_idx < len ;
delim_idx=s.find_first_of(delimiter,begin_idx) ) {

v.push_back( s.substr(begin_idx,delim_idx-begin_idx) );
begin_idx=delim_idx+1;
}
if( begin_idx < len ) {
v.push_back( s.substr(begin_idx,len-begin_idx) );
}
return( v );
}

It seems to work well, but I'd appreciate suggestions regarding style,
technique, or subtle bugs I've missed. Thanks.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 22 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Christopher Benson-Manica <at***@nospam.cyberspace.org> wrote in
news:c7**********@chessie.cirr.com:
The function in question follows:

vector<string>& tokenize(
const string& s,
vector<string>& v,
char delimiter=',' )
{
int delim_idx, begin_idx=0, len=s.length();
These are not of a strictly "proper" type... they should be of type
std::string::size_type.

for( delim_idx=s.find_first_of(delimiter,begin_idx) ;
delim_idx >=0 && begin_idx < len ;
delim_idx=s.find_first_of(delimiter,begin_idx) ) {
Why aren't you checking against std::string::npos, which is the return
value of find_first_of if the thing you're looking for isn't found?
v.push_back( s.substr(begin_idx,delim_idx-begin_idx) );
begin_idx=delim_idx+1;
}
if( begin_idx < len ) {
v.push_back( s.substr(begin_idx,len-begin_idx) );
}
return( v );
}

It seems to work well, but I'd appreciate suggestions regarding style,
technique, or subtle bugs I've missed. Thanks.


I'd probably use a while loop instead of a for loop. IMHO: it's a better
documentation style. To me a for loop implies that you're going to do
something a predermined number of times, where the while loop is just "do
something until some condition is met". So I'd probably do something
like (psudocode):

begin_idx = 0;
delim_idx = find;

while (delim_idx != npos)
{
push string fragment onto vector
advance the begin_idx
delim_idx = find;
}

push last fragment onto vector
return

Jul 22 '05 #2

P: n/a
Hi,

This is my own version:

#include <string>
#include <vector>

#include "tokenize.h"
void Tokenize(const string& buffer,
vector<string>& tokens,
const char delimiter)
{
int pos = 0, pos_ant = 0;

pos = buffer.find(delimiter, pos_ant);
while (pos != string::npos)
{
string token = buffer.substr(pos_ant, pos-pos_ant);
tokens.push_back(token);
pos_ant = pos+1;
pos = buffer.find(delimiter, pos_ant);
}

if (!buffer.empty())
{
tokens.push_back(buffer.substr(pos_ant, buffer.size()-1));
}
}

Regards,
Jose Luis

Christopher Benson-Manica <at***@nospam.cyberspace.org> wrote in message news:<c7**********@chessie.cirr.com>...
The function in question follows:

vector<string>& tokenize(
const string& s,
vector<string>& v,
char delimiter=',' )
{
int delim_idx, begin_idx=0, len=s.length();

for( delim_idx=s.find_first_of(delimiter,begin_idx) ;
delim_idx >=0 && begin_idx < len ;
delim_idx=s.find_first_of(delimiter,begin_idx) ) {

v.push_back( s.substr(begin_idx,delim_idx-begin_idx) );
begin_idx=delim_idx+1;
}
if( begin_idx < len ) {
v.push_back( s.substr(begin_idx,len-begin_idx) );
}
return( v );
}

It seems to work well, but I'd appreciate suggestions regarding style,
technique, or subtle bugs I've missed. Thanks.

Jul 22 '05 #3

P: n/a
Christopher Benson-Manica <at***@nospam.cyberspace.org> wrote in message news:<c7**********@chessie.cirr.com>...
The function in question follows:

vector<string>& tokenize(
const string& s,
vector<string>& v,
char delimiter=',' )
{
int delim_idx, begin_idx=0, len=s.length();

for( delim_idx=s.find_first_of(delimiter,begin_idx) ;
delim_idx >=0 && begin_idx < len ;
delim_idx=s.find_first_of(delimiter,begin_idx) ) {

v.push_back( s.substr(begin_idx,delim_idx-begin_idx) );
begin_idx=delim_idx+1;
}
if( begin_idx < len ) {
v.push_back( s.substr(begin_idx,len-begin_idx) );
}
return( v );
}

It seems to work well, but I'd appreciate suggestions regarding style,
technique, or subtle bugs I've missed. Thanks.


In addition to what Andre Kostur wrote, I'd suggest two extra
parameters. One is a template parameter; the function would
work just as well on any basic_string<CH>. The second is a
boolean parameter, whether empty strings are included in
the output. When the separator is ',', you probably want
to split "a,,b" in three strings. When ' ', and splitting
"a b", you probably want just two strings.

Regards,
Michiel Salters
Jul 22 '05 #4

P: n/a
jose luis fernandez diaz wrote:
Hi,

This is my own version:

#include <string>
#include <vector>

#include "tokenize.h"
void Tokenize(const string& buffer,
vector<string>& tokens,
const char delimiter)
{
int pos = 0, pos_ant = 0;

pos = buffer.find(delimiter, pos_ant);
while (pos != string::npos)
{
string token = buffer.substr(pos_ant, pos-pos_ant);
tokens.push_back(token);
pos_ant = pos+1;
pos = buffer.find(delimiter, pos_ant);
}

if (!buffer.empty())
{
tokens.push_back(buffer.substr(pos_ant, buffer.size()-1));
}
}


One better?

#include <string>

template <typename InsertIter>
void
tokenize(const std::string& buf,
const std::string& delim,
InsertIter ii)
{
std::string::size_type sp = 0; /* start position */
std::string::size_type ep = -1; /* end position */

do{
sp = buf.find_first_not_of(delim, ep+1);
ep = buf.find_first_of(delim, sp);
if(sp != ep){
if(ep == buf.npos)
ep = buf.length();
*ii++ = buf.substr(sp, ep-sp);
}
}while(sp != buf.npos);
}

/david

--
"As a scientist, Throckmorton knew that if he were ever to break wind in
the echo chamber, he would never hear the end of it."

Jul 22 '05 #5

P: n/a
Michiel Salters <Mi*************@logicacmg.com> spoke thus:
In addition to what Andre Kostur wrote, I'd suggest two extra
parameters. One is a template parameter; the function would
work just as well on any basic_string<CH>. The second is a
boolean parameter, whether empty strings are included in
the output. When the separator is ',', you probably want
to split "a,,b" in three strings. When ' ', and splitting
"a b", you probably want just two strings.


The empty strings argument is a great idea - thanks!

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 22 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.