By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,652 Members | 1,409 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,652 IT Pros & Developers. It's quick & easy.

White space and >>

P: n/a
Hello all,

[ Disclaimer: I am a complete C++ newbie ]

I want to read lines from a text file, where each line has the
following syntax:

token1:token2:token3

There could be white space between tokens and ':'
There could be white space before token1 or after token3.

Because I will need to access every line several times, later in my
program, I first store every line in a string vector:

// Do you guys put the & near the type or near the parameter name?
static void read_lines(vector<string> &v)
{
ifstream ifs(INFILE); // input file stream

if (ifs == NULL)
{
cerr << "Unable to open input file " << INFILE << ".\n";
exit(-1);
}

string line;

while (getline(ifs, line))
{
// Ignore empty lines and comments.
if (line.empty() || line[0]==HASH) continue;

v.push_back(line);
}
}

Does that part look OK?

Later on, when I am dealing with a specific line, I create a
stringstream object so I can use the >> operator.

Ideally, I would simply write:

{
istringstream myss(mystring);
string token1, token2, token3;

myss >> token1;
myss >> token2;
myss >> token3;
}

But this doesn't work because ':' is not treated as white space. Is
there a simple solution?

Is my approach completely wrong?

Nudge

Jul 19 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Grumble wrote:
Hello all,

[ Disclaimer: I am a complete C++ newbie ]

I want to read lines from a text file, where each line has the following
syntax:

token1:token2:token3

There could be white space between tokens and ':'
There could be white space before token1 or after token3.


I forgot to mention that it is valid for token1 to be empty, but it
is not valid for token2 and token3 to be empty.

I see a problem. Consider

: t2 : t3

myss >> token1;
myss >> token2;
myss >> token3;

If the >> operator considers ':' to be white space, then I will end
up with token1 = "t2" which is not what I want...

On the other hand, consider

t1:t2:t3

If ':' is not treated as white space, or perhaps some kind of
special delimiter, then I will end up with token1="t1:t2:t3" which
is wrong too...

Errr, how can I get the "ignore white space" behavior, along with
the "split at the delimiter" behavior together?

Nudge

Jul 19 '05 #2

P: n/a


Grumble wrote:

If ':' is not treated as white space, or perhaps some kind of
special delimiter, then I will end up with token1="t1:t2:t3" which
is wrong too...

Errr, how can I get the "ignore white space" behavior, along with
the "split at the delimiter" behavior together?


I think you are barking up the wrong tree.

Take your string.

Locate the 2 ':' characters.

Split the string into 3 seperate strings using the ':' positions
you have determined earlier.

You now have 3 strings, each one containing maybe some
leading whitespace, the token, maybe some trailing whitespace.

Get rid of leading and trailing whitespace in each string
and you are left with the tokens alone.

Not every problem is worth to be solved with clever uses of streams.
Sometimes simple string manipulation is simpler.

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 19 '05 #3

P: n/a
Karl Heinz Buchegger wrote:

Grumble wrote:
If ':' is not treated as white space, or perhaps some kind of
special delimiter, then I will end up with token1="t1:t2:t3" which
is wrong too...

Errr, how can I get the "ignore white space" behavior, along with
the "split at the delimiter" behavior together?

I think you are barking up the wrong tree.

Take your string.

Locate the 2 ':' characters.

Split the string into 3 seperate strings using the ':' positions
you have determined earlier.

You now have 3 strings, each one containing maybe some
leading whitespace, the token, maybe some trailing whitespace.

Get rid of leading and trailing whitespace in each string
and you are left with the tokens alone.

Not every problem is worth to be solved with clever uses of streams.
Sometimes simple string manipulation is simpler.


How disappointing :-)

What you describe is what I have done, but I was hoping for shorter
a solution (in terms of lines of code).
void extract_field(string &field, string &line, size_t lpos, size_t
rpos)
{
string temp = line.substr(lpos, rpos-lpos);

lpos = temp.find_first_not_of(WHITESPACE);
rpos = temp.find_first_of(WHITESPACE, lpos);

if (lpos == string::npos) // temp contains only white space.
{
field.erase();
}
else
{
field = temp.substr(lpos, rpos-lpos);
}
}

{
string opt_name, opt_type, opt_val;

size_t lpos = 0, rpos; // left and right position.

// Extract option name from line and strip white space.
rpos = line.find_first_of(COLON, lpos);
extract_field(opt_name, line, lpos, rpos);
lpos = rpos+1;

// Extract option type from line and strip white space.
rpos = line.find_first_of(COLON, lpos);
extract_field(opt_type, line, lpos, rpos);
lpos = rpos+1;

// Extract option value list from line.
opt_val = line.substr(lpos);
}

IMO, the above is far less elegant than:

myss >> opt_name;
myss >> opt_type;
myss >> opt_val;
// modulo error handling of course

I might use getline() to split my line into 3 strings... then use an
istringstream to strip white leading and trailing white space...

I have a related question: at some point I have a string, and I want
to concatenate an int at the end.

string s("toto");
int n=7;

s = s + n; // It would be nice if this resulted in s = "toto7" :-)

Am I supposed to use C's sprintf? A stringstream?

Nudge

Jul 19 '05 #4

P: n/a
Hi Grumble,

"Grumble" <in*****@kma.eu.org> schrieb im Newsbeitrag
news:bp**********@news-rocq.inria.fr...
How disappointing :-)

I was hoping for shorter
a solution (in terms of lines of code).


you could take an intensive look at the C++ stream library. There are ways
to do it, if you really want to. ;-)

If not's a life-or-death matter of doing it in an object-oriented way or if
you want to be short, reading a single text line using "cin" followed by a
sscanf() on the input buffer might be shorter than writing classes for
sorting out stream input.

The best solution would be using a class for regular expressions (perhaps
with streams support).

Your problem could be parsed by a regular expression like "/\w+[ \t]*:[
\t]*\w+[ \t]*:[ \t]*\w+/", this means "one or more word characters followed
by zero or more blank or tab characters, followed by a colon, followed by
.... etc."

Languages like Perl or PHP have regular expression support on language or
library level, and I'm sure there's a regexp library for C++ as well. :-)

I hope that helps.

Regards,
Ekkehard Morgenstern.
Jul 19 '05 #5

P: n/a
In article <bp**********@news-rocq.inria.fr>,
Grumble <in*****@kma.eu.org> wrote:

I want to read lines from a text file, where each line has the
following syntax:

token1:token2:token3
[snip code that reads the file into a vector of strings, one line per
string]
Does that part look OK?
Looks OK to me.
Later on, when I am dealing with a specific line, I create a
stringstream object so I can use the >> operator.

Ideally, I would simply write:

{
istringstream myss(mystring);
string token1, token2, token3;

myss >> token1;
myss >> token2;
myss >> token3;
}

But this doesn't work because ':' is not treated as white space. Is
there a simple solution?


Use getline() on myss, and tell it to use ':' as the separator, where
appropriate.

getline (myss, token1, ':');
getline (myss, token2, ':');
getline (myss, token3);

The tokens you pick up will also include whatever whitespace happens to
lie in between the colons.

--
Jon Bell <jt*******@presby.edu> Presbyterian College
Dept. of Physics and Computer Science Clinton, South Carolina USA
Jul 19 '05 #6

P: n/a


Grumble wrote:

Karl Heinz Buchegger wrote:

Grumble wrote:
If ':' is not treated as white space, or perhaps some kind of
special delimiter, then I will end up with token1="t1:t2:t3" which
is wrong too...

Errr, how can I get the "ignore white space" behavior, along with
the "split at the delimiter" behavior together?

I think you are barking up the wrong tree.

Take your string.

Locate the 2 ':' characters.

Split the string into 3 seperate strings using the ':' positions
you have determined earlier.

You now have 3 strings, each one containing maybe some
leading whitespace, the token, maybe some trailing whitespace.

Get rid of leading and trailing whitespace in each string
and you are left with the tokens alone.

Not every problem is worth to be solved with clever uses of streams.
Sometimes simple string manipulation is simpler.


How disappointing :-)


Depends :-)

What you describe is what I have done, but I was hoping for shorter
a solution (in terms of lines of code).

void extract_field(string &field, string &line, size_t lpos, size_t
rpos)
{
string temp = line.substr(lpos, rpos-lpos);

lpos = temp.find_first_not_of(WHITESPACE);
rpos = temp.find_first_of(WHITESPACE, lpos);

if (lpos == string::npos) // temp contains only white space.
{
field.erase();
}
else
{
field = temp.substr(lpos, rpos-lpos);
}
}

I would refactor the above into 2 functions:

A function TrimWhitespace
and a function ExtractField (which uses TrimWhitespace)

The reason?
A function for trimming a string is a good thing to have in your
toolbox and will come in handy a hundred of times.

And the function has gotten shorter and your toolbox has grown
by one additional function :-)

[snip]

I might use getline() to split my line into 3 strings...
OK
then use an
istringstream to strip white leading and trailing white space...
Or use your know function TrimWhitespace() from your personal
toolbox :-)
A good programmer has a collected a bag of little helper functions
like this one over the years.

I have a related question: at some point I have a string, and I want
to concatenate an int at the end.

string s("toto");
int n=7;

s = s + n; // It would be nice if this resulted in s = "toto7" :-)

Am I supposed to use C's sprintf? A stringstream?


stringstream.
you also might look at boost for it's lexical_cast.
www.boost.org

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 19 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.