473,385 Members | 2,274 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Tokenize string, homework

WP
Hello! I need some help with my program...it's supposed to read infix
expressions line by line from stdin and each expression should be
divided into operands and operators and added to a vector of strings. So
if we read one line that holds "1+2" the vector should afterwards hold
the strings "1", "+" and "2".
Valid operators are +, -, * and / meaning they are of length 1.
Valid operands are ints >= 0 meaning they can stretch over several chars.
There may be whitespace in the string, which should be skipped.
It can be assumed no invalid input occurs in the string (something that
is not an operand or an operator (according to the above rules) and is
not whitespace but we must check that before each operator we have an
operand.

I made program that tests four hard-coded strings (one valid and three
invalid) and it seems to work for that. I'm posting here because I want
to see if I could get some improvement tips. Note that I have an
assertion for invalid chars which is not needed for the assignment but I
added it to catch bugs.

Here's the program, it's a bit long but I wanted to post something
complete. Oh, also, I may not use anything that is not available in the
C++ standard.

#include <cassert>
#include <cctype>
#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

string char_to_string(char c)
{
string s;
s += c;

return s;
}

bool is_operator(char c)
{
static const string valid_operators("+-*/");

return valid_operators.find(c) != string::npos;
}

bool is_operand(char c)
{
istringstream iss(char_to_string(c));
int n;

return iss >n;
}

void divide_and_print(const string& infix);

int main()
{
const size_t dim = 4;
string infix_expressions[dim] =
{"1+222+ 32", "1++2", "1+ +2", "+1"};

for (size_t i = 0; i < dim; ++i)
{
divide_and_print(infix_expressions[i]);
}
}

void divide_and_print(const string& infix)
{
// This variable is used to build up the current operand char by char
string current_operand;
enum {NOTHING, OPERATOR, OPERAND} last_was = NOTHING;
vector<stringtokens;

for (string::const_iterator itr = infix.begin();
itr != infix.end(); ++itr)
{
if (isspace(*itr))
{
// The current char is some sort of whitespace, we skip over
// it. Note that we don't update the variable last_was,
// that's because we want to detect if we have two operators
// with no intervening operand.
;
}
else if (is_operand(*itr))
{
// The current char is an operand. We add it to current_operand
// (remember operands may stretch over several chars).
current_operand += *itr;

if ((itr + 1) == infix.end() || !is_operand(*(itr + 1)))
{
// This was the last char in the current operand so we add
// it to our vector of tokens and we also make sure we
// clear current_operand.
tokens.push_back(current_operand);
current_operand.clear();
}

last_was = OPERAND;
}
else if (is_operator(*itr))
{
// An operator may only follow non-operators.
if (last_was == OPERATOR || last_was == NOTHING)
{
cout << "There must be an operand before each operator!\n";
cout << "Invalid expression was: " << infix << endl;

return;
}

last_was = OPERATOR;

// operators are only one char long so we convert it to string
// and add it to our vector.
tokens.push_back(char_to_string(*itr));
}
else
// We encountered something that was not an operator, operand
// or ws.
assert(0);
}

for (vector<string>::size_type i = 0; i < tokens.size(); ++i)
{
cout << tokens.at(i) << '\n';
}

cout << endl;
}

If no one wants to look at all that I can understand it. :-) Anyway,
thanks for reading and I appreciate any suggestions on improvement or if
anyone spots any bugs.

/WP
Nov 22 '07 #1
3 2084
On Nov 23, 7:39 am, WP <inva...@invalid.invalidwrote:
Hello! I need some help with my program...it's supposed to read infix
expressions line by line from stdin and each expression should be
divided into operands and operators and added to a vector of strings. So
if we read one line that holds "1+2" the vector should afterwards hold
the strings "1", "+" and "2".
Valid operators are +, -, * and / meaning they are of length 1.
Valid operands are ints >= 0 meaning they can stretch over several chars.
There may be whitespace in the string, which should be skipped.
It can be assumed no invalid input occurs in the string (something that
is not an operand or an operator (according to the above rules) and is
not whitespace but we must check that before each operator we have an
operand.

I made program that tests four hard-coded strings (one valid and three
invalid) and it seems to work for that. I'm posting here because I want
to see if I could get some improvement tips. Note that I have an
assertion for invalid chars which is not needed for the assignment but I
added it to catch bugs.

Here's the program, it's a bit long but I wanted to post something
complete. Oh, also, I may not use anything that is not available in the
C++ standard.

#include <cassert>
#include <cctype>
#include <iostream>
#include <sstream>
#include <vector>

using namespace std;
the above is generally a bad idea, because in the future, the c++
standard may define some new function that will conflict with a
function name in your program.
>
string char_to_string(char c)
{
string s;
s += c;

return s;

}
string s(1,c) would do as well.
>
bool is_operator(char c)
{
static const string valid_operators("+-*/");

return valid_operators.find(c) != string::npos;

}
I would have used switch instead, but personal preference I guess.
>
bool is_operand(char c)
{
istringstream iss(char_to_string(c));
int n;

return iss >n;

}
I would recommend using the same method as is_operator, whatever
method you decide to eventually use. In fact I would suggest using
templates with a function is_a, although sadly C++ does not support
constant strings in templated instances.
>
void divide_and_print(const string& infix);

int main()
{
const size_t dim = 4;
string infix_expressions[dim] =
{"1+222+ 32", "1++2", "1+ +2", "+1"};

for (size_t i = 0; i < dim; ++i)
{
divide_and_print(infix_expressions[i]);
}

}
Why not just have it input from the user? That way you can test more
without having to keep recompiling.
>
void divide_and_print(const string& infix)
{
// This variable is used to build up the current operand char by char
string current_operand;
enum {NOTHING, OPERATOR, OPERAND} last_was = NOTHING;
vector<stringtokens;

for (string::const_iterator itr = infix.begin();
itr != infix.end(); ++itr)
You will eventually get problems in the future when your program needs
to process multicharacter operators, such as, say, <=, >=, !=, <<, >>
etc.
{
if (isspace(*itr))
{
// The current char is some sort of whitespace, we skip over
// it. Note that we don't update the variable last_was,
// that's because we want to detect if we have two operators
// with no intervening operand.
;
}
else if (is_operand(*itr))
{
// The current char is an operand. We add it to current_operand
// (remember operands may stretch over several chars).
current_operand += *itr;

if ((itr + 1) == infix.end() || !is_operand(*(itr + 1)))
{
// This was the last char in the current operand so we add
// it to our vector of tokens and we also make sure we
// clear current_operand.
tokens.push_back(current_operand);
current_operand.clear();
}

last_was = OPERAND;
}
else if (is_operator(*itr))
{
// An operator may only follow non-operators.
if (last_was == OPERATOR || last_was == NOTHING)
{
cout << "There must be an operand before each operator!\n";
cout << "Invalid expression was: " << infix << endl;

return;
}

last_was = OPERATOR;

// operators are only one char long so we convert it to string
// and add it to our vector.
tokens.push_back(char_to_string(*itr));
}
else
// We encountered something that was not an operator, operand
// or ws.
assert(0);
}

for (vector<string>::size_type i = 0; i < tokens.size(); ++i)
{
cout << tokens.at(i) << '\n';
}

cout << endl;

}

If no one wants to look at all that I can understand it. :-) Anyway,
thanks for reading and I appreciate any suggestions on improvement or if
anyone spots any bugs.

/WP
You may want to also study lexing and parsing.

Nov 23 '07 #2
On Fri, 23 Nov 2007 00:39:48 +0100, WP wrote:
Hello! I need some help with my program...it's supposed to read infix
expressions line by line from stdin and each expression should be
divided into operands and operators and added to a vector of strings. So
if we read one line that holds "1+2" the vector should afterwards hold
the strings "1", "+" and "2".
Valid operators are +, -, * and / meaning they are of length 1. Valid
operands are ints >= 0 meaning they can stretch over several chars.
There may be whitespace in the string, which should be skipped. It can
be assumed no invalid input occurs in the string (something that is not
an operand or an operator (according to the above rules) and is not
whitespace but we must check that before each operator we have an
operand.
Alan already wrote even more remarks than I would have :-)
I would only add that it's strange to me that two adjacent operands are
valid (2 + 3 4).

--
Tadeusz B. Kopec (tk****@NOSPAMPLEASElife.pl)
Always leave room to add an explanation if it doesn't work out.
Nov 23 '07 #3
On Fri, 23 Nov 2007 00:39:48 +0100 in comp.lang.c++, WP
<in*****@invalid.invalidwrote,
>If no one wants to look at all that I can understand it. :-) Anyway,
thanks for reading and I appreciate any suggestions on improvement or if
anyone spots any bugs.
I did not spot any bugs, but if it compiles and passes the tests
then it ought to do for homework. There are many ways to approach
something like this, and I expect that many of the things I might
suggest are of no use to you; if they have not been covered in the
course so far then you are probably not supposed to use them.

Other than nitpicking, the main things I would do differently is,
where you are parsing the input, think of it more in terms of
token-by-token looping rather than char-by-char. Thus something
like:

string::const_iterator itr = infix.begin();
while (itr != infix.end()) {
char ch = *itr++;
if (!isspace(ch)) {
string current(ch, 1);
if (isdigit(ch)) {
while(itr != infix.end() && isdigit(*itr))
current += (ch = *itr++);
}
tokens.push_back(current);
}
}
Or even more likely, a get_token() function that returns the next
token from the input, one by one. That is more likely to fit into a
larger parsing scheme. Either way, I would want to separate the
tokenizing from the other functions such as the checking for
alternating operators and operands.

Review the Desk Calculator example developed in chapter 6.1 of
Stroustrup _The C++ Programming Language, Third Ed._
Nov 23 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
by: qwweeeit | last post by:
In analysing a very big application (pysol) made of almost 100 sources, I had the need to remove comments. Removing the comments which take all the line is straightforward... Instead for the...
9
by: Lans | last post by:
I have a string that I need to tokenize but I need to use a string token see example i am trying the following but strtok only uses characters as delimiters and I need to seperate bu a certain...
4
by: Kelvin | last post by:
hi: in C, we can use strtok() to tokenize a char* but i can't find any similar member function of string that can tokenize a string so how so i tokenize a string in C++? do it the C way? ...
2
by: James | last post by:
Hi, I am looking for a stringtokenizer class/method in C#, but can't find one. The similar classes in Java and C++ are StringTokenizer and CStringT::tokenize respectively. I need to keep a...
5
by: Lam | last post by:
Hi I try to read in a line from text file, and how can I tokenize the line? Thanks
20
by: bubunia2000 | last post by:
Hi all, I heard that strtok is not thread safe. So I want to write a sample program which will tokenize string without using strtok. Can I get a sample source code for the same. For exp:...
1
by: Tim | last post by:
I ran into a problem with a script i was playing with to check code indents and need some direction. It seems to depend on if tabsize is set to 4 in editor and spaces and tabs indents are mixed on...
2
by: askalottaqs | last post by:
there's in maya's scripting language mel, called tokenize, you simply tokenize("string i want to tokenize"," ",bufferArray) which will fill the fufferArray wih the first string tokenized accorfing...
6
m6s
by: m6s | last post by:
1. After hours of researching, I used these snippets : void Object::TokenizeLines(const string& str, vector<string>& tokens, const string& delimiters) // Skip delimiters at beginning....
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.