Splitting a string into an array words

Simon

Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mostrecentline);

And split into an (flexible) array of strings. For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action

Thanks in advance,
Simon

Jul 19 '06 #1

Subscribe Post Reply

4427

Daniel T.

In article <11**********************@m79g2000cwm.googlegroups .com>,
"Simon" <Si***********@gmail.comwrote:

Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mostrecentline);

And split into an (flexible) array of strings. For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action

#include <vector>
#include <string>
#include <iostream>
#include <iterator>
// other includes as necessary

template < typename OutIt >
void split( const std::string& in, OutIt result )
{
// add code here...
}

int main() {
string seed = "step1";
vector<stringresult;
split( seed, back_inserter( result ) );
assert( result.size() == 1 );
assert( result[0] == "step1" );
std::cout << "You did it! Good job!\n"
}

Run the above program. Make chances to the part labeled "add code here"
until the program compiles and prints out "You did it! Good job!".

When it does, post back here with the code and I'll help you with the
next step.

Jul 19 '06 #2

Rolf Magnus

Simon wrote:

Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mostrecentline);

And split into an (flexible) array of strings.

What do you mean by "flexible", and which separators do you want to use?

For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action

In this case, I'd use a stringstream and operator>>.

Jul 19 '06 #3

Marcus Kwok

Simon <Si***********@gmail.comwrote:

Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mostrecentline);

And split into an (flexible) array of strings. For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action

If you are splitting the words by whitespace, you could create a
std::istringstream and push them into a std::vector<std::string>.

Something like: (untested and uncompiled)

std::istringstream line(mostrecentline);
std::vector<std::stringwords;
std::string temp;

while (line >temp) {
words.push_back(temp);
}

You will need to #include <sstream>, <string>, and <vectorfor this
method.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply

Jul 19 '06 #4

Mark P

Simon wrote:

Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mostrecentline);

And split into an (flexible) array of strings. For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action

Thanks in advance,
Simon

Here's a little tokenizer fcn I've used before. Not necessarily the
most elegant or compact way to do this (and criticisms are welcomed):

// Populates "out" with delimited substrings of "in".
int tokenize (const string& in, vector<string>& out, const char* delims)
{
string::size_type wordStart = 0; // current word start position
string::size_type wordEnd = 0; // last word end position

while (true)
{
wordStart = in.find_first_not_of(delims,wordEnd);
if (wordStart == in.npos)
break;
wordEnd = in.find_first_of(delims,wordStart);
if (wordEnd == in.npos)
wordEnd = in.size();
out.push_back(in.substr(wordStart,wordEnd - wordStart));
}
return out.size();
}

Mark

Jul 19 '06 #5

Daniel T.

In article <Jf*****************@newssvr25.news.prodigy.net> ,
Mark P <us****@fall2005REMOVE.fastmailCAPS.fmwrote:

Here's a little tokenizer fcn I've used before. Not necessarily the
most elegant or compact way to do this (and criticisms are welcomed):

Well since criticisms are welcomed... :-)

// Populates "out" with delimited substrings of "in".
int tokenize (const string& in, vector<string>& out, const char* delims)
{
string::size_type wordStart = 0; // current word start position
string::size_type wordEnd = 0; // last word end position

while (true)
{
wordStart = in.find_first_not_of(delims,wordEnd);
if (wordStart == in.npos)
break;
wordEnd = in.find_first_of(delims,wordStart);
if (wordEnd == in.npos)
wordEnd = in.size();
out.push_back(in.substr(wordStart,wordEnd - wordStart));
}
return out.size();
}

From least important to most important:

1) The while true and break is not a style I prefer.

2) Returning out.size() isn't very useful since the caller can find out
what out.size() equals without the functions help.

3) It only works for vectors, I'd write something that works for deques
and lists as well.

4) A cyclomatic complexity of 4 seems a tad excessive for what is
supposed to be such a simple job. You can drop that to 3 by removing
the unnecessary "if (wordEnd == in.npos)" logic. Heeding item (1)
above can reduce the complexity to 2.

Here's how I would write it:

template <typename OutIt>
void tokenize( const string& str, OutIt os, const string& delims = " ")
{
string::size_type start = str.find_first_not_of( delims );
while ( start != string::npos ) {
string::size_type end = str.find_first_of( delims, start );
*os++ = str.substr( start, end - start );
start = str.find_first_not_of( delims, end );
}
}

Jul 20 '06 #6

Mark P

Daniel T. wrote:

In article <Jf*****************@newssvr25.news.prodigy.net> ,
Mark P <us****@fall2005REMOVE.fastmailCAPS.fmwrote:

>Here's a little tokenizer fcn I've used before. Not necessarily the
most elegant or compact way to do this (and criticisms are welcomed):

Well since criticisms are welcomed... :-)

>// Populates "out" with delimited substrings of "in".
int tokenize (const string& in, vector<string>& out, const char* delims)
{
string::size_type wordStart = 0; // current word start position
string::size_type wordEnd = 0; // last word end position

while (true)
{
wordStart = in.find_first_not_of(delims,wordEnd);
if (wordStart == in.npos)
break;
wordEnd = in.find_first_of(delims,wordStart);
if (wordEnd == in.npos)
wordEnd = in.size();
out.push_back(in.substr(wordStart,wordEnd - wordStart));
}
return out.size();
}

From least important to most important:

1) The while true and break is not a style I prefer.

Fair enough-- I'm not a fan either, but see my comment to item 4.

>
2) Returning out.size() isn't very useful since the caller can find out
what out.size() equals without the functions help.

True. In my case, I pulled this function out of some actual code where
the return value is sometimes used as a check. E.g., when parsing a
particular file format, I expect a certain number of tokens per line.
It saves the calling function a line of code by having the size of out
returned automatically (and of course this fcn is called in multiple
places).

>
3) It only works for vectors, I'd write something that works for deques
and lists as well.

Agreed, I very much prefer your templated approach that takes any Output
Iterator. In my case, using a known type allowed me to return the
container size (cf. item 2), but this is just my own particular
situation and at times excessive code parsimony.

>
4) A cyclomatic complexity of 4 seems a tad excessive for what is
supposed to be such a simple job. You can drop that to 3 by removing
the unnecessary "if (wordEnd == in.npos)" logic. Heeding item (1)
above can reduce the complexity to 2.

Here's how I would write it:

template <typename OutIt>
void tokenize( const string& str, OutIt os, const string& delims = " ")
{
string::size_type start = str.find_first_not_of( delims );
while ( start != string::npos ) {
string::size_type end = str.find_first_of( delims, start );
*os++ = str.substr( start, end - start );
start = str.find_first_not_of( delims, end );
}
}

Looks good. In my case it was a bit more complicated because I also
have an additional parameter for a comment character. When a comment
character is encountered at the beginning of a token, that token is
discarded and the loop breaks. (So in my original implementation there
were multiple breakpoints out of the loop, although I hastily trimmed
these before I posted my code, thereby leaving some unattractive vestiges.)

In any event, I appreciate your comments and don't mean to simply make
excuses and argue all of your points. The only significant hitch to my
adopting your cleaner implementation is that I really do need support
for the comment character break. Luckily this is just a bit of a little
file parser I use for testing, so I don't stress too much about these
details, but feel free to propose a svelte implementation that supports
a comment char. :)

Mark

Jul 21 '06 #7

Daniel T.

In article <VO*********************@newssvr13.news.prodigy.co m>,
Mark P <us****@fall2005REMOVE.fastmailCAPS.fmwrote:

template <typename OutIt>
void tokenize( const string& str, OutIt os, const string& delims = " ")
{
string::size_type start = str.find_first_not_of( delims );
while ( start != string::npos ) {
string::size_type end = str.find_first_of( delims, start );
*os++ = str.substr( start, end - start );
start = str.find_first_not_of( delims, end );
}
}

Looks good. In my case it was a bit more complicated because I also
have an additional parameter for a comment character. When a comment
character is encountered at the beginning of a token, that token is
discarded and the loop breaks. (So in my original implementation there
were multiple breakpoints out of the loop, although I hastily trimmed
these before I posted my code, thereby leaving some unattractive vestiges.)

In any event, I appreciate your comments and don't mean to simply make
excuses and argue all of your points.

No problem. Your code was rather good in general, I only saw a few nits
to pick at.

The only significant hitch to my
adopting your cleaner implementation is that I really do need support
for the comment character break. Luckily this is just a bit of a little
file parser I use for testing, so I don't stress too much about these
details, but feel free to propose a svelte implementation that supports
a comment char. :)

If I understand what you mean then:

void tokenize( const string& str, OutIt os, const string& delims = " ",
char comment = '\0' )
{
string::size_type start = str.find_first_not_of( delims );
while ( start != string::npos && start[0] != comment ) {
string::size_type end = str.find_first_of( delims, start );
*os++ = str.substr( start, end - start );
start = str.find_first_not_of( delims, end );
}
}

Of course you should probably change the defaults to whatever is most
common in your code...

Jul 21 '06 #8

Daniel T.

In article <VO*********************@newssvr13.news.prodigy.co m>,
Mark P <us****@fall2005REMOVE.fastmailCAPS.fmwrote:

2) Returning out.size() isn't very useful since the caller can find out
what out.size() equals without the functions help.

True. In my case, I pulled this function out of some actual code where
the return value is sometimes used as a check. E.g., when parsing a
particular file format, I expect a certain number of tokens per line.
It saves the calling function a line of code by having the size of out
returned automatically (and of course this fcn is called in multiple
places).

Here you go, now it returns the size. :-)

int tokenize( const string& str, OutIt os, const string& delims = " ",
char comment = '\0' )
{
int result = 0;
string::size_type start = str.find_first_not_of( delims );
while ( start != string::npos && start[0] != comment ) {
string::size_type end = str.find_first_of( delims, start );
*os++ = str.substr( start, end - start );
++result;
start = str.find_first_not_of( delims, end );
}
return result;
}

Jul 21 '06 #9

Alex Vinokur

Simon wrote:

Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mostrecentline);

And split into an (flexible) array of strings.

[snip]

See "Splitting string into vector of vectors":
http://groups.google.com/group/sourc...993fb8841382c8
http://groups.google.com/group/perfo...49a1be3a5c6335
http://groups.google.com/group/perfo...c775cf7e3cdcf0
Alex Vinokur
email: alex DOT vinokur AT gmail DOT com
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

Jul 21 '06 #10

Old Wolf

Daniel T. wrote:

>
int tokenize( const string& str, OutIt os, const string& delims = " ",
char comment = '\0' )

You should return the size type of the output iterator,
rather than int.

I am suspicious of the code. Suppose str is "x".

{
int result = 0;
string::size_type start = str.find_first_not_of( delims );

start == 0.

while ( start != string::npos && start[0] != comment ) {

condition is true

string::size_type end = str.find_first_of( delims, start );

end is string::npos

*os++ = str.substr( start, end - start );

Here you subtract a value from npos. I am not sure if this is a
legal operation (although it will happen to work on my system).

++result;
start = str.find_first_not_of( delims, end );

is npos a legal argument for the second parameter to find_first_not_of
?

}
return result;
}

Jul 22 '06 #11

Daniel T.

In article <11**********************@h48g2000cwc.googlegroups .com>,
"Old Wolf" <ol*****@inspire.net.nzwrote:

Daniel T. wrote:

int tokenize( const string& str, OutIt os, const string& delims = " ",
char comment = '\0' )

You should return the size type of the output iterator,
rather than int.

That would be fine too...

I had to fix the code, start[0] of course is silly.

template < typename OutIt >
int tokenize( const string& str, OutIt os, const string& delims = " ",
char comment = '\0' )
{
int result = 0;
string::size_type start = str.find_first_not_of( delims );
while ( start != string::npos && str[start] != comment ) {
string::size_type end = str.find_first_of( delims, start );
*os++ = str.substr( start, end - start );
++result;
start = str.find_first_not_of( delims, end );
}
return result;
}

end is string::npos

*os++ = str.substr( start, end - start );

Here you subtract a value from npos. I am not sure if this is a
legal operation (although it will happen to work on my system).

I have several sources that say that npos is "The largest possible value
of type size_type." Most importantly, it is *not* a flag but a defined
value. Subtracting from a value is quite legal.

++result;
start = str.find_first_not_of( delims, end );

is npos a legal argument for the second parameter to find_first_not_of

Or the broader question, what is the defined result of find_first_not_of
if the second argument is greater than str.length().

It could be that my implementation (and yours) is doing the wrong thing,
Stroustrup in "The C++ Programming Language" says that in general
specifying a index >= the length() should throw an exception which would
require me to add another conditional here:

start = ( end == string::npos ) ?
end : str.find_first_not_of( delims, end );

Maybe someone can check the standard for me?

Jul 22 '06 #12

davidrubin

Mark P wrote:

Simon wrote:
Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mostrecentline);

And split into an (flexible) array of strings. For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action

Thanks in advance,
Simon

Here's a little tokenizer fcn I've used before. Not necessarily the
most elegant or compact way to do this (and criticisms are welcomed):

// Populates "out" with delimited substrings of "in".
int tokenize (const string& in, vector<string>& out, const char* delims)
{
string::size_type wordStart = 0; // current word start position
string::size_type wordEnd = 0; // last word end position

while (true)
{
wordStart = in.find_first_not_of(delims,wordEnd);
if (wordStart == in.npos)
break;
wordEnd = in.find_first_of(delims,wordStart);
if (wordEnd == in.npos)
wordEnd = in.size();
out.push_back(in.substr(wordStart,wordEnd - wordStart));
}
return out.size();
}

Mark

Along the same lines, here is something from a while back...

http://groups.google.com/group/comp....258d2ea71e3e03

Jul 23 '06 #13

by: Rick | last post by:

I have a program that reads from a file. In the file are a series of words. I read in all the words into a string array, find the average length, count the number of words, display the longest...

C / C++

Splitting a string array to seperate variables

by: JeffM | last post by:

Quick C# question: I have comma delimited values in a string array that I want to pass to seperate variables. Any tips on splitting the array? Thanks in advance! JM

C# / C Sharp

copy string array elements to strings in C#

by: Eranga | last post by:

I have the following code; string test1 = words;//"Exclud" string test2 ="Exclud"; string test3 = String.Copy(words);//"Exclud" bool booTest1 = test1.Equals(test2);//false bool booTest2 =...

C# / C Sharp

Fastest way removing duplicated value from string array

by: Niyazi | last post by:

Hi all, What is fastest way removing duplicated value from string array using vb.net? Here is what currently I am doing but the the array contains over 16000 items. And it just do it in 10 or...

.NET Framework

String array in constructor

by: mattias.k.nyberg | last post by:

So Im trying to learn to program with C#. And I have this question about why the string array won't work in the first class but it does in the second. To me it looks like they do the exact same...

C# / C Sharp

Splitting string into words and displaying

by: Ramper | last post by:

I need to Write a function that will, given an input string containing many words, split that string into individual words. For each word, the function should output the word, its starting index in...

C / C++

String array() in C

by: JackYee123 | last post by:

Hey, I need a structure to store a string array in c, for example Index Content -------- ----------- 0 word1 1 word2 2 3

C / C++

Splitting string over word boundry

by: John | last post by:

Hi I need to split a string in chunks of max 160 characters but ensuring that no word is broken. How can I do this? Thanks Regards

Visual Basic .NET

How do I get user input into a string array then display the data entered?

by: Dbarten1982 | last post by:

I am brand new to C++ programming, and am completely stuck on a program. In the program I am supposed to create a string array that stores 5 user input words, and the string constant "END_OF_ARRAY'...

C / C++

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Splitting a string into an array words

Similar topics