Tokenize string, homework

Hello! I need some help with my program...it's supposed to read infix
expressions line by line from stdin and each expression should be
divided into operands and operators and added to a vector of strings. So
if we read one line that holds "1+2" the vector should afterwards hold
the strings "1", "+" and "2".
Valid operators are +, -, * and / meaning they are of length 1.
Valid operands are ints >= 0 meaning they can stretch over several chars.
There may be whitespace in the string, which should be skipped.
It can be assumed no invalid input occurs in the string (something that
is not an operand or an operator (according to the above rules) and is
not whitespace but we must check that before each operator we have an
operand.

I made program that tests four hard-coded strings (one valid and three
invalid) and it seems to work for that. I'm posting here because I want
to see if I could get some improvement tips. Note that I have an
assertion for invalid chars which is not needed for the assignment but I
added it to catch bugs.

Here's the program, it's a bit long but I wanted to post something
complete. Oh, also, I may not use anything that is not available in the
C++ standard.

#include <cassert>
#include <cctype>
#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

string char_to_string(char c)
{
string s;
s += c;

return s;
}

bool is_operator(char c)
{
static const string valid_operators("+-*/");

return valid_operators.find(c) != string::npos;
}

bool is_operand(char c)
{
istringstream iss(char_to_string(c));
int n;

return iss >n;
}

void divide_and_print(const string& infix);

int main()
{
const size_t dim = 4;
string infix_expressions[dim] =
{"1+222+ 32", "1++2", "1+ +2", "+1"};

for (size_t i = 0; i < dim; ++i)
{
divide_and_print(infix_expressions[i]);
}
}

void divide_and_print(const string& infix)
{
// This variable is used to build up the current operand char by char
string current_operand;
enum {NOTHING, OPERATOR, OPERAND} last_was = NOTHING;
vector<stringtokens;

for (string::const_iterator itr = infix.begin();
itr != infix.end(); ++itr)
{
if (isspace(*itr))
{
// The current char is some sort of whitespace, we skip over
// it. Note that we don't update the variable last_was,
// that's because we want to detect if we have two operators
// with no intervening operand.
;
}
else if (is_operand(*itr))
{
// The current char is an operand. We add it to current_operand
// (remember operands may stretch over several chars).
current_operand += *itr;

if ((itr + 1) == infix.end() || !is_operand(*(itr + 1)))
{
// This was the last char in the current operand so we add
// it to our vector of tokens and we also make sure we
// clear current_operand.
tokens.push_back(current_operand);
current_operand.clear();
}

last_was = OPERAND;
}
else if (is_operator(*itr))
{
// An operator may only follow non-operators.
if (last_was == OPERATOR || last_was == NOTHING)
{
cout << "There must be an operand before each operator!\n";
cout << "Invalid expression was: " << infix << endl;

return;
}

last_was = OPERATOR;

// operators are only one char long so we convert it to string
// and add it to our vector.
tokens.push_back(char_to_string(*itr));
}
else
// We encountered something that was not an operator, operand
// or ws.
assert(0);
}

for (vector<string>::size_type i = 0; i < tokens.size(); ++i)
{
cout << tokens.at(i) << '\n';
}

cout << endl;
}

If no one wants to look at all that I can understand it. :-) Anyway,
thanks for reading and I appreciate any suggestions on improvement or if
anyone spots any bugs.

/WP

Nov 22 '07 #1

Subscribe Post Reply

2084

alan

On Nov 23, 7:39 am, WP <inva...@invalid.invalidwrote:

Hello! I need some help with my program...it's supposed to read infix
expressions line by line from stdin and each expression should be
divided into operands and operators and added to a vector of strings. So
if we read one line that holds "1+2" the vector should afterwards hold
the strings "1", "+" and "2".
Valid operators are +, -, * and / meaning they are of length 1.
Valid operands are ints >= 0 meaning they can stretch over several chars.
There may be whitespace in the string, which should be skipped.
It can be assumed no invalid input occurs in the string (something that
is not an operand or an operator (according to the above rules) and is
not whitespace but we must check that before each operator we have an
operand.

I made program that tests four hard-coded strings (one valid and three
invalid) and it seems to work for that. I'm posting here because I want
to see if I could get some improvement tips. Note that I have an
assertion for invalid chars which is not needed for the assignment but I
added it to catch bugs.

Here's the program, it's a bit long but I wanted to post something
complete. Oh, also, I may not use anything that is not available in the
C++ standard.

#include <cassert>
#include <cctype>
#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

the above is generally a bad idea, because in the future, the c++
standard may define some new function that will conflict with a
function name in your program.

>
string char_to_string(char c)
{
string s;
s += c;

return s;

}

string s(1,c) would do as well.

>
bool is_operator(char c)
{
static const string valid_operators("+-*/");

return valid_operators.find(c) != string::npos;

}

I would have used switch instead, but personal preference I guess.

>
bool is_operand(char c)
{
istringstream iss(char_to_string(c));
int n;

return iss >n;

}

I would recommend using the same method as is_operator, whatever
method you decide to eventually use. In fact I would suggest using
templates with a function is_a, although sadly C++ does not support
constant strings in templated instances.

>
void divide_and_print(const string& infix);

int main()
{
const size_t dim = 4;
string infix_expressions[dim] =
{"1+222+ 32", "1++2", "1+ +2", "+1"};

for (size_t i = 0; i < dim; ++i)
{
divide_and_print(infix_expressions[i]);
}

}

Why not just have it input from the user? That way you can test more
without having to keep recompiling.

>
void divide_and_print(const string& infix)
{
// This variable is used to build up the current operand char by char
string current_operand;
enum {NOTHING, OPERATOR, OPERAND} last_was = NOTHING;
vector<stringtokens;

for (string::const_iterator itr = infix.begin();
itr != infix.end(); ++itr)

You will eventually get problems in the future when your program needs
to process multicharacter operators, such as, say, <=, >=, !=, <<, >>
etc.

{
if (isspace(*itr))
{
// The current char is some sort of whitespace, we skip over
// it. Note that we don't update the variable last_was,
// that's because we want to detect if we have two operators
// with no intervening operand.
;
}
else if (is_operand(*itr))
{
// The current char is an operand. We add it to current_operand
// (remember operands may stretch over several chars).
current_operand += *itr;

if ((itr + 1) == infix.end() || !is_operand(*(itr + 1)))
{
// This was the last char in the current operand so we add
// it to our vector of tokens and we also make sure we
// clear current_operand.
tokens.push_back(current_operand);
current_operand.clear();
}

last_was = OPERAND;
}
else if (is_operator(*itr))
{
// An operator may only follow non-operators.
if (last_was == OPERATOR || last_was == NOTHING)
{
cout << "There must be an operand before each operator!\n";
cout << "Invalid expression was: " << infix << endl;

return;
}

last_was = OPERATOR;

// operators are only one char long so we convert it to string
// and add it to our vector.
tokens.push_back(char_to_string(*itr));
}
else
// We encountered something that was not an operator, operand
// or ws.
assert(0);
}

for (vector<string>::size_type i = 0; i < tokens.size(); ++i)
{
cout << tokens.at(i) << '\n';
}

cout << endl;

}

If no one wants to look at all that I can understand it. :-) Anyway,
thanks for reading and I appreciate any suggestions on improvement or if
anyone spots any bugs.

/WP

You may want to also study lexing and parsing.

Nov 23 '07 #2

Tadeusz B. Kopec

On Fri, 23 Nov 2007 00:39:48 +0100, WP wrote:

Hello! I need some help with my program...it's supposed to read infix
expressions line by line from stdin and each expression should be
divided into operands and operators and added to a vector of strings. So
if we read one line that holds "1+2" the vector should afterwards hold
the strings "1", "+" and "2".
Valid operators are +, -, * and / meaning they are of length 1. Valid
operands are ints >= 0 meaning they can stretch over several chars.
There may be whitespace in the string, which should be skipped. It can
be assumed no invalid input occurs in the string (something that is not
an operand or an operator (according to the above rules) and is not
whitespace but we must check that before each operator we have an
operand.

Alan already wrote even more remarks than I would have :-)
I would only add that it's strange to me that two adjacent operands are
valid (2 + 3 4).

--
Tadeusz B. Kopec (tk****@NOSPAMPLEASElife.pl)
Always leave room to add an explanation if it doesn't work out.

Nov 23 '07 #3

David Harmon

On Fri, 23 Nov 2007 00:39:48 +0100 in comp.lang.c++, WP
<in*****@invalid.invalidwrote,

>If no one wants to look at all that I can understand it. :-) Anyway,
thanks for reading and I appreciate any suggestions on improvement or if
anyone spots any bugs.

I did not spot any bugs, but if it compiles and passes the tests
then it ought to do for homework. There are many ways to approach
something like this, and I expect that many of the things I might
suggest are of no use to you; if they have not been covered in the
course so far then you are probably not supposed to use them.

Other than nitpicking, the main things I would do differently is,
where you are parsing the input, think of it more in terms of
token-by-token looping rather than char-by-char. Thus something
like:

string::const_iterator itr = infix.begin();
while (itr != infix.end()) {
char ch = *itr++;
if (!isspace(ch)) {
string current(ch, 1);
if (isdigit(ch)) {
while(itr != infix.end() && isdigit(*itr))
current += (ch = *itr++);
}
tokens.push_back(current);
}
}
Or even more likely, a get_token() function that returns the next
token from the input, one by one. That is more likely to fit into a
larger parsing scheme. Either way, I would want to separate the
tokenizing from the other functions such as the checking for
alternating operators and operands.

Review the Desk Calculator example developed in chapter 6.1 of
Stroustrup _The C++ Programming Language, Third Ed._

Nov 23 '07 #4

Similar topics

Removing comments... tokenize error

by: qwweeeit | last post by:

In analysing a very big application (pysol) made of almost 100 sources, I had the need to remove comments. Removing the comments which take all the line is straightforward... Instead for the...

Python

Trying to tokenize a string

by: Lans | last post by:

I have a string that I need to tokenize but I need to use a string token see example i am trying the following but strtok only uses characters as delimiters and I need to seperate bu a certain...

C / C++

tokenize a string

by: Kelvin | last post by:

hi: in C, we can use strtok() to tokenize a char* but i can't find any similar member function of string that can tokenize a string so how so i tokenize a string in C++? do it the C way? ...

C / C++

Tokenize a string in C#

by: James | last post by:

Hi, I am looking for a stringtokenizer class/method in C#, but can't find one. The similar classes in Java and C++ are StringTokenizer and CStringT::tokenize respectively. I need to keep a...

C# / C Sharp

String Tokenize

by: Lam | last post by:

Hi I try to read in a line from text file, and how can I tokenize the line? Thanks

C# / C Sharp

How to tokenize string without using strtok

by: bubunia2000 | last post by:

Hi all, I heard that strtok is not thread safe. So I want to write a sample program which will tokenize string without using strtok. Can I get a sample source code for the same. For exp:...

C / C++

Problem with tokenize module and indents

by: Tim | last post by:

I ran into a problem with a script i was playing with to check code indents and need some direction. It seems to depend on if tabsize is set to 4 in editor and spaces and tabs indents are mixed on...

Python

string tokenize

by: askalottaqs | last post by:

there's in maya's scripting language mel, called tokenize, you simply tokenize("string i want to tokenize"," ",bufferArray) which will fill the fufferArray wih the first string tokenized accorfing...

C / C++

C++ Tokenize vectors strings, no values between delimiters

by: m6s | last post by:

1. After hours of researching, I used these snippets : void Object::TokenizeLines(const string& str, vector<string>& tokens, const string& delimiters) // Skip delimiters at beginning....

C / C++

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing