473,887 Members | 2,290 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Splitting a string into an array words

Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mos trecentline);

And split into an (flexible) array of strings. For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action

Thanks in advance,
Simon

Jul 19 '06 #1
12 4469
In article <11************ **********@m79g 2000cwm.googleg roups.com>,
"Simon" <Si***********@ gmail.comwrote:
Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mos trecentline);

And split into an (flexible) array of strings. For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action
#include <vector>
#include <string>
#include <iostream>
#include <iterator>
// other includes as necessary

template < typename OutIt >
void split( const std::string& in, OutIt result )
{
// add code here...
}

int main() {
string seed = "step1";
vector<stringre sult;
split( seed, back_inserter( result ) );
assert( result.size() == 1 );
assert( result[0] == "step1" );
std::cout << "You did it! Good job!\n"
}

Run the above program. Make chances to the part labeled "add code here"
until the program compiles and prints out "You did it! Good job!".

When it does, post back here with the code and I'll help you with the
next step.
Jul 19 '06 #2
Simon wrote:
Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mos trecentline);

And split into an (flexible) array of strings.
What do you mean by "flexible", and which separators do you want to use?
For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action
In this case, I'd use a stringstream and operator>>.

Jul 19 '06 #3
Simon <Si***********@ gmail.comwrote:
Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mos trecentline);

And split into an (flexible) array of strings. For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action
If you are splitting the words by whitespace, you could create a
std::istringstr eam and push them into a std::vector<std ::string>.

Something like: (untested and uncompiled)

std::istringstr eam line(mostrecent line);
std::vector<std ::stringwords;
std::string temp;

while (line >temp) {
words.push_back (temp);
}

You will need to #include <sstream>, <string>, and <vectorfor this
method.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Jul 19 '06 #4
Simon wrote:
Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mos trecentline);

And split into an (flexible) array of strings. For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action

Thanks in advance,
Simon
Here's a little tokenizer fcn I've used before. Not necessarily the
most elegant or compact way to do this (and criticisms are welcomed):

// Populates "out" with delimited substrings of "in".
int tokenize (const string& in, vector<string>& out, const char* delims)
{
string::size_ty pe wordStart = 0; // current word start position
string::size_ty pe wordEnd = 0; // last word end position

while (true)
{
wordStart = in.find_first_n ot_of(delims,wo rdEnd);
if (wordStart == in.npos)
break;
wordEnd = in.find_first_o f(delims,wordSt art);
if (wordEnd == in.npos)
wordEnd = in.size();
out.push_back(i n.substr(wordSt art,wordEnd - wordStart));
}
return out.size();
}

Mark
Jul 19 '06 #5
In article <Jf************ *****@newssvr25 .news.prodigy.n et>,
Mark P <us****@fall200 5REMOVE.fastmai lCAPS.fmwrote:
Here's a little tokenizer fcn I've used before. Not necessarily the
most elegant or compact way to do this (and criticisms are welcomed):
Well since criticisms are welcomed... :-)
// Populates "out" with delimited substrings of "in".
int tokenize (const string& in, vector<string>& out, const char* delims)
{
string::size_ty pe wordStart = 0; // current word start position
string::size_ty pe wordEnd = 0; // last word end position

while (true)
{
wordStart = in.find_first_n ot_of(delims,wo rdEnd);
if (wordStart == in.npos)
break;
wordEnd = in.find_first_o f(delims,wordSt art);
if (wordEnd == in.npos)
wordEnd = in.size();
out.push_back(i n.substr(wordSt art,wordEnd - wordStart));
}
return out.size();
}
From least important to most important:

1) The while true and break is not a style I prefer.

2) Returning out.size() isn't very useful since the caller can find out
what out.size() equals without the functions help.

3) It only works for vectors, I'd write something that works for deques
and lists as well.

4) A cyclomatic complexity of 4 seems a tad excessive for what is
supposed to be such a simple job. You can drop that to 3 by removing
the unnecessary "if (wordEnd == in.npos)" logic. Heeding item (1)
above can reduce the complexity to 2.

Here's how I would write it:

template <typename OutIt>
void tokenize( const string& str, OutIt os, const string& delims = " ")
{
string::size_ty pe start = str.find_first_ not_of( delims );
while ( start != string::npos ) {
string::size_ty pe end = str.find_first_ of( delims, start );
*os++ = str.substr( start, end - start );
start = str.find_first_ not_of( delims, end );
}
}
Jul 20 '06 #6
Daniel T. wrote:
In article <Jf************ *****@newssvr25 .news.prodigy.n et>,
Mark P <us****@fall200 5REMOVE.fastmai lCAPS.fmwrote:
>Here's a little tokenizer fcn I've used before. Not necessarily the
most elegant or compact way to do this (and criticisms are welcomed):

Well since criticisms are welcomed... :-)
>// Populates "out" with delimited substrings of "in".
int tokenize (const string& in, vector<string>& out, const char* delims)
{
string::size_ty pe wordStart = 0; // current word start position
string::size_ty pe wordEnd = 0; // last word end position

while (true)
{
wordStart = in.find_first_n ot_of(delims,wo rdEnd);
if (wordStart == in.npos)
break;
wordEnd = in.find_first_o f(delims,wordSt art);
if (wordEnd == in.npos)
wordEnd = in.size();
out.push_back(i n.substr(wordSt art,wordEnd - wordStart));
}
return out.size();
}

From least important to most important:

1) The while true and break is not a style I prefer.
Fair enough-- I'm not a fan either, but see my comment to item 4.
>
2) Returning out.size() isn't very useful since the caller can find out
what out.size() equals without the functions help.
True. In my case, I pulled this function out of some actual code where
the return value is sometimes used as a check. E.g., when parsing a
particular file format, I expect a certain number of tokens per line.
It saves the calling function a line of code by having the size of out
returned automatically (and of course this fcn is called in multiple
places).
>
3) It only works for vectors, I'd write something that works for deques
and lists as well.
Agreed, I very much prefer your templated approach that takes any Output
Iterator. In my case, using a known type allowed me to return the
container size (cf. item 2), but this is just my own particular
situation and at times excessive code parsimony.
>
4) A cyclomatic complexity of 4 seems a tad excessive for what is
supposed to be such a simple job. You can drop that to 3 by removing
the unnecessary "if (wordEnd == in.npos)" logic. Heeding item (1)
above can reduce the complexity to 2.

Here's how I would write it:

template <typename OutIt>
void tokenize( const string& str, OutIt os, const string& delims = " ")
{
string::size_ty pe start = str.find_first_ not_of( delims );
while ( start != string::npos ) {
string::size_ty pe end = str.find_first_ of( delims, start );
*os++ = str.substr( start, end - start );
start = str.find_first_ not_of( delims, end );
}
}
Looks good. In my case it was a bit more complicated because I also
have an additional parameter for a comment character. When a comment
character is encountered at the beginning of a token, that token is
discarded and the loop breaks. (So in my original implementation there
were multiple breakpoints out of the loop, although I hastily trimmed
these before I posted my code, thereby leaving some unattractive vestiges.)

In any event, I appreciate your comments and don't mean to simply make
excuses and argue all of your points. The only significant hitch to my
adopting your cleaner implementation is that I really do need support
for the comment character break. Luckily this is just a bit of a little
file parser I use for testing, so I don't stress too much about these
details, but feel free to propose a svelte implementation that supports
a comment char. :)

Mark
Jul 21 '06 #7
In article <VO************ *********@newss vr13.news.prodi gy.com>,
Mark P <us****@fall200 5REMOVE.fastmai lCAPS.fmwrote:
template <typename OutIt>
void tokenize( const string& str, OutIt os, const string& delims = " ")
{
string::size_ty pe start = str.find_first_ not_of( delims );
while ( start != string::npos ) {
string::size_ty pe end = str.find_first_ of( delims, start );
*os++ = str.substr( start, end - start );
start = str.find_first_ not_of( delims, end );
}
}

Looks good. In my case it was a bit more complicated because I also
have an additional parameter for a comment character. When a comment
character is encountered at the beginning of a token, that token is
discarded and the loop breaks. (So in my original implementation there
were multiple breakpoints out of the loop, although I hastily trimmed
these before I posted my code, thereby leaving some unattractive vestiges.)

In any event, I appreciate your comments and don't mean to simply make
excuses and argue all of your points.
No problem. Your code was rather good in general, I only saw a few nits
to pick at.
The only significant hitch to my
adopting your cleaner implementation is that I really do need support
for the comment character break. Luckily this is just a bit of a little
file parser I use for testing, so I don't stress too much about these
details, but feel free to propose a svelte implementation that supports
a comment char. :)
If I understand what you mean then:

void tokenize( const string& str, OutIt os, const string& delims = " ",
char comment = '\0' )
{
string::size_ty pe start = str.find_first_ not_of( delims );
while ( start != string::npos && start[0] != comment ) {
string::size_ty pe end = str.find_first_ of( delims, start );
*os++ = str.substr( start, end - start );
start = str.find_first_ not_of( delims, end );
}
}

Of course you should probably change the defaults to whatever is most
common in your code...
Jul 21 '06 #8
In article <VO************ *********@newss vr13.news.prodi gy.com>,
Mark P <us****@fall200 5REMOVE.fastmai lCAPS.fmwrote:

2) Returning out.size() isn't very useful since the caller can find out
what out.size() equals without the functions help.

True. In my case, I pulled this function out of some actual code where
the return value is sometimes used as a check. E.g., when parsing a
particular file format, I expect a certain number of tokens per line.
It saves the calling function a line of code by having the size of out
returned automatically (and of course this fcn is called in multiple
places).
Here you go, now it returns the size. :-)

int tokenize( const string& str, OutIt os, const string& delims = " ",
char comment = '\0' )
{
int result = 0;
string::size_ty pe start = str.find_first_ not_of( delims );
while ( start != string::npos && start[0] != comment ) {
string::size_ty pe end = str.find_first_ of( delims, start );
*os++ = str.substr( start, end - start );
++result;
start = str.find_first_ not_of( delims, end );
}
return result;
}
Jul 21 '06 #9

Simon wrote:
Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mos trecentline);

And split into an (flexible) array of strings.
[snip]

See "Splitting string into vector of vectors":
http://groups.google.com/group/sourc...993fb8841382c8
http://groups.google.com/group/perfo...49a1be3a5c6335
http://groups.google.com/group/perfo...c775cf7e3cdcf0
Alex Vinokur
email: alex DOT vinokur AT gmail DOT com
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

Jul 21 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
2201
by: Rick | last post by:
I have a program that reads from a file. In the file are a series of words. I read in all the words into a string array, find the average length, count the number of words, display the longest word and how long it is, and display the smallest word and how long it is. All that is working fine, but what I need to do it format certain words that contain a period at the end of them (ex: a word at the end of a sentence), words with quotes...
4
2026
by: JeffM | last post by:
Quick C# question: I have comma delimited values in a string array that I want to pass to seperate variables. Any tips on splitting the array? Thanks in advance! JM
10
5290
by: Eranga | last post by:
I have the following code; string test1 = words;//"Exclud" string test2 ="Exclud"; string test3 = String.Copy(words);//"Exclud" bool booTest1 = test1.Equals(test2);//false bool booTest2 = test2.Equals("Exclud");//true bool booTest3 = test1.Equals("Exclud");//false bool booTest4 = words.Equals("Exclud");//false bool booTest5 = test3.Equals("Exclud");//false
6
6125
by: Niyazi | last post by:
Hi all, What is fastest way removing duplicated value from string array using vb.net? Here is what currently I am doing but the the array contains over 16000 items. And it just do it in 10 or more minutes. 'REMOVE DUBLICATED VALUE FROM ARRAY +++++++++++++++++ Dim col As New Scripting.Dictionary Dim ii As Integer = 0
10
6277
by: mattias.k.nyberg | last post by:
So Im trying to learn to program with C#. And I have this question about why the string array won't work in the first class but it does in the second. To me it looks like they do the exact same thing. class Test { string words; public Test() { words = { "", "", "" }; // This doesn't work }
3
5253
by: Ramper | last post by:
I need to Write a function that will, given an input string containing many words, split that string into individual words. For each word, the function should output the word, its starting index in the string, and its length to the console without using stream extraction operator. #include <iostream> #include <string> using namespace std; int main() { string sval = " The quick brown fox jumps over the lazy dog";
12
43541
by: JackYee123 | last post by:
Hey, I need a structure to store a string array in c, for example Index Content -------- ----------- 0 word1 1 word2 2 3
4
1071
by: John | last post by:
Hi I need to split a string in chunks of max 160 characters but ensuring that no word is broken. How can I do this? Thanks Regards
5
16079
by: Dbarten1982 | last post by:
I am brand new to C++ programming, and am completely stuck on a program. In the program I am supposed to create a string array that stores 5 user input words, and the string constant "END_OF_ARRAY' in the last element. Then using the substr() display the first and third letter of each element. What I have now gets five word from the user, then it just displays the 1st and 3rd letter of the constant "END_OF_FILE" I don't know if the problem is...
0
9957
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9799
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10771
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10434
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9593
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
7143
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
6011
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4633
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3245
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.