473,288 Members | 1,750 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,288 software developers and data experts.

Tokenize string, homework

WP
Hello! I need some help with my program...it's supposed to read infix
expressions line by line from stdin and each expression should be
divided into operands and operators and added to a vector of strings. So
if we read one line that holds "1+2" the vector should afterwards hold
the strings "1", "+" and "2".
Valid operators are +, -, * and / meaning they are of length 1.
Valid operands are ints >= 0 meaning they can stretch over several chars.
There may be whitespace in the string, which should be skipped.
It can be assumed no invalid input occurs in the string (something that
is not an operand or an operator (according to the above rules) and is
not whitespace but we must check that before each operator we have an
operand.

I made program that tests four hard-coded strings (one valid and three
invalid) and it seems to work for that. I'm posting here because I want
to see if I could get some improvement tips. Note that I have an
assertion for invalid chars which is not needed for the assignment but I
added it to catch bugs.

Here's the program, it's a bit long but I wanted to post something
complete. Oh, also, I may not use anything that is not available in the
C++ standard.

#include <cassert>
#include <cctype>
#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

string char_to_string(char c)
{
string s;
s += c;

return s;
}

bool is_operator(char c)
{
static const string valid_operators("+-*/");

return valid_operators.find(c) != string::npos;
}

bool is_operand(char c)
{
istringstream iss(char_to_string(c));
int n;

return iss >n;
}

void divide_and_print(const string& infix);

int main()
{
const size_t dim = 4;
string infix_expressions[dim] =
{"1+222+ 32", "1++2", "1+ +2", "+1"};

for (size_t i = 0; i < dim; ++i)
{
divide_and_print(infix_expressions[i]);
}
}

void divide_and_print(const string& infix)
{
// This variable is used to build up the current operand char by char
string current_operand;
enum {NOTHING, OPERATOR, OPERAND} last_was = NOTHING;
vector<stringtokens;

for (string::const_iterator itr = infix.begin();
itr != infix.end(); ++itr)
{
if (isspace(*itr))
{
// The current char is some sort of whitespace, we skip over
// it. Note that we don't update the variable last_was,
// that's because we want to detect if we have two operators
// with no intervening operand.
;
}
else if (is_operand(*itr))
{
// The current char is an operand. We add it to current_operand
// (remember operands may stretch over several chars).
current_operand += *itr;

if ((itr + 1) == infix.end() || !is_operand(*(itr + 1)))
{
// This was the last char in the current operand so we add
// it to our vector of tokens and we also make sure we
// clear current_operand.
tokens.push_back(current_operand);
current_operand.clear();
}

last_was = OPERAND;
}
else if (is_operator(*itr))
{
// An operator may only follow non-operators.
if (last_was == OPERATOR || last_was == NOTHING)
{
cout << "There must be an operand before each operator!\n";
cout << "Invalid expression was: " << infix << endl;

return;
}

last_was = OPERATOR;

// operators are only one char long so we convert it to string
// and add it to our vector.
tokens.push_back(char_to_string(*itr));
}
else
// We encountered something that was not an operator, operand
// or ws.
assert(0);
}

for (vector<string>::size_type i = 0; i < tokens.size(); ++i)
{
cout << tokens.at(i) << '\n';
}

cout << endl;
}

If no one wants to look at all that I can understand it. :-) Anyway,
thanks for reading and I appreciate any suggestions on improvement or if
anyone spots any bugs.

/WP
Nov 22 '07 #1
3 2081
On Nov 23, 7:39 am, WP <inva...@invalid.invalidwrote:
Hello! I need some help with my program...it's supposed to read infix
expressions line by line from stdin and each expression should be
divided into operands and operators and added to a vector of strings. So
if we read one line that holds "1+2" the vector should afterwards hold
the strings "1", "+" and "2".
Valid operators are +, -, * and / meaning they are of length 1.
Valid operands are ints >= 0 meaning they can stretch over several chars.
There may be whitespace in the string, which should be skipped.
It can be assumed no invalid input occurs in the string (something that
is not an operand or an operator (according to the above rules) and is
not whitespace but we must check that before each operator we have an
operand.

I made program that tests four hard-coded strings (one valid and three
invalid) and it seems to work for that. I'm posting here because I want
to see if I could get some improvement tips. Note that I have an
assertion for invalid chars which is not needed for the assignment but I
added it to catch bugs.

Here's the program, it's a bit long but I wanted to post something
complete. Oh, also, I may not use anything that is not available in the
C++ standard.

#include <cassert>
#include <cctype>
#include <iostream>
#include <sstream>
#include <vector>

using namespace std;
the above is generally a bad idea, because in the future, the c++
standard may define some new function that will conflict with a
function name in your program.
>
string char_to_string(char c)
{
string s;
s += c;

return s;

}
string s(1,c) would do as well.
>
bool is_operator(char c)
{
static const string valid_operators("+-*/");

return valid_operators.find(c) != string::npos;

}
I would have used switch instead, but personal preference I guess.
>
bool is_operand(char c)
{
istringstream iss(char_to_string(c));
int n;

return iss >n;

}
I would recommend using the same method as is_operator, whatever
method you decide to eventually use. In fact I would suggest using
templates with a function is_a, although sadly C++ does not support
constant strings in templated instances.
>
void divide_and_print(const string& infix);

int main()
{
const size_t dim = 4;
string infix_expressions[dim] =
{"1+222+ 32", "1++2", "1+ +2", "+1"};

for (size_t i = 0; i < dim; ++i)
{
divide_and_print(infix_expressions[i]);
}

}
Why not just have it input from the user? That way you can test more
without having to keep recompiling.
>
void divide_and_print(const string& infix)
{
// This variable is used to build up the current operand char by char
string current_operand;
enum {NOTHING, OPERATOR, OPERAND} last_was = NOTHING;
vector<stringtokens;

for (string::const_iterator itr = infix.begin();
itr != infix.end(); ++itr)
You will eventually get problems in the future when your program needs
to process multicharacter operators, such as, say, <=, >=, !=, <<, >>
etc.
{
if (isspace(*itr))
{
// The current char is some sort of whitespace, we skip over
// it. Note that we don't update the variable last_was,
// that's because we want to detect if we have two operators
// with no intervening operand.
;
}
else if (is_operand(*itr))
{
// The current char is an operand. We add it to current_operand
// (remember operands may stretch over several chars).
current_operand += *itr;

if ((itr + 1) == infix.end() || !is_operand(*(itr + 1)))
{
// This was the last char in the current operand so we add
// it to our vector of tokens and we also make sure we
// clear current_operand.
tokens.push_back(current_operand);
current_operand.clear();
}

last_was = OPERAND;
}
else if (is_operator(*itr))
{
// An operator may only follow non-operators.
if (last_was == OPERATOR || last_was == NOTHING)
{
cout << "There must be an operand before each operator!\n";
cout << "Invalid expression was: " << infix << endl;

return;
}

last_was = OPERATOR;

// operators are only one char long so we convert it to string
// and add it to our vector.
tokens.push_back(char_to_string(*itr));
}
else
// We encountered something that was not an operator, operand
// or ws.
assert(0);
}

for (vector<string>::size_type i = 0; i < tokens.size(); ++i)
{
cout << tokens.at(i) << '\n';
}

cout << endl;

}

If no one wants to look at all that I can understand it. :-) Anyway,
thanks for reading and I appreciate any suggestions on improvement or if
anyone spots any bugs.

/WP
You may want to also study lexing and parsing.

Nov 23 '07 #2
On Fri, 23 Nov 2007 00:39:48 +0100, WP wrote:
Hello! I need some help with my program...it's supposed to read infix
expressions line by line from stdin and each expression should be
divided into operands and operators and added to a vector of strings. So
if we read one line that holds "1+2" the vector should afterwards hold
the strings "1", "+" and "2".
Valid operators are +, -, * and / meaning they are of length 1. Valid
operands are ints >= 0 meaning they can stretch over several chars.
There may be whitespace in the string, which should be skipped. It can
be assumed no invalid input occurs in the string (something that is not
an operand or an operator (according to the above rules) and is not
whitespace but we must check that before each operator we have an
operand.
Alan already wrote even more remarks than I would have :-)
I would only add that it's strange to me that two adjacent operands are
valid (2 + 3 4).

--
Tadeusz B. Kopec (tk****@NOSPAMPLEASElife.pl)
Always leave room to add an explanation if it doesn't work out.
Nov 23 '07 #3
On Fri, 23 Nov 2007 00:39:48 +0100 in comp.lang.c++, WP
<in*****@invalid.invalidwrote,
>If no one wants to look at all that I can understand it. :-) Anyway,
thanks for reading and I appreciate any suggestions on improvement or if
anyone spots any bugs.
I did not spot any bugs, but if it compiles and passes the tests
then it ought to do for homework. There are many ways to approach
something like this, and I expect that many of the things I might
suggest are of no use to you; if they have not been covered in the
course so far then you are probably not supposed to use them.

Other than nitpicking, the main things I would do differently is,
where you are parsing the input, think of it more in terms of
token-by-token looping rather than char-by-char. Thus something
like:

string::const_iterator itr = infix.begin();
while (itr != infix.end()) {
char ch = *itr++;
if (!isspace(ch)) {
string current(ch, 1);
if (isdigit(ch)) {
while(itr != infix.end() && isdigit(*itr))
current += (ch = *itr++);
}
tokens.push_back(current);
}
}
Or even more likely, a get_token() function that returns the next
token from the input, one by one. That is more likely to fit into a
larger parsing scheme. Either way, I would want to separate the
tokenizing from the other functions such as the checking for
alternating operators and operands.

Review the Desk Calculator example developed in chapter 6.1 of
Stroustrup _The C++ Programming Language, Third Ed._
Nov 23 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
by: qwweeeit | last post by:
In analysing a very big application (pysol) made of almost 100 sources, I had the need to remove comments. Removing the comments which take all the line is straightforward... Instead for the...
9
by: Lans | last post by:
I have a string that I need to tokenize but I need to use a string token see example i am trying the following but strtok only uses characters as delimiters and I need to seperate bu a certain...
4
by: Kelvin | last post by:
hi: in C, we can use strtok() to tokenize a char* but i can't find any similar member function of string that can tokenize a string so how so i tokenize a string in C++? do it the C way? ...
2
by: James | last post by:
Hi, I am looking for a stringtokenizer class/method in C#, but can't find one. The similar classes in Java and C++ are StringTokenizer and CStringT::tokenize respectively. I need to keep a...
5
by: Lam | last post by:
Hi I try to read in a line from text file, and how can I tokenize the line? Thanks
20
by: bubunia2000 | last post by:
Hi all, I heard that strtok is not thread safe. So I want to write a sample program which will tokenize string without using strtok. Can I get a sample source code for the same. For exp:...
1
by: Tim | last post by:
I ran into a problem with a script i was playing with to check code indents and need some direction. It seems to depend on if tabsize is set to 4 in editor and spaces and tabs indents are mixed on...
2
by: askalottaqs | last post by:
there's in maya's scripting language mel, called tokenize, you simply tokenize("string i want to tokenize"," ",bufferArray) which will fill the fufferArray wih the first string tokenized accorfing...
6
m6s
by: m6s | last post by:
1. After hours of researching, I used these snippets : void Object::TokenizeLines(const string& str, vector<string>& tokens, const string& delimiters) // Skip delimiters at beginning....
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: marcoviolo | last post by:
Dear all, I would like to implement on my worksheet an vlookup dynamic , that consider a change of pivot excel via win32com, from an external excel (without open it) and save the new file into a...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.