By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,421 Members | 1,127 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,421 IT Pros & Developers. It's quick & easy.

specifying istream separator sequence

P: n/a
Is there any way to specify an istream separator sequence? For example,
suppose I have a record consisting of a list of comma-separated values (no
whitespace). I want to set the istream up so that formatted input operations
using operator>> will recognize the comma as a field separator. I know how
to do this for an output stream using an ostream_iterator, but there doesn't
seem to be an analogous way to handle it using istream_iterator.

I have looked through a bunch of STL docs online, but none of them see to
cover this particular case. Any help would be appreciated.

TIA,

Dave Moore
Jul 23 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a
Dave Moore wrote:
Is there any way to specify an istream separator sequence?


Summary: Sequence: no. Individual characters: yes.

Longer version: 'std::istream' determines its notion of whitespace
using the 'std::ctype<char>' facets (in general,
'std::basic_istream<cT>'
uses 'std::ctype<cT>') obtained from the 'std::locale' associated with
the stream. You can replace this locale using the 'imbue()' member
function and you can create a 'std::ctype<char>' facet which considers
comma as whitespace.

Unfortunately, the details of how to provide a new ctype facet differ
between character types due to a standardized optimization for 'char':
for 'std::ctype<char>' you effectively install a table with character
classifications while other character types would use a virtual
function. Here is an example I posted a while ago (and located quickly
with google):
<http://www.talkaboutprogramming.com/group/comp.lang.c++/messages/740314.html>
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.contendix.com> - Software Development & Consulting

Jul 23 '05 #2

P: n/a
Dave Moore wrote:

Is there any way to specify an istream separator sequence? For example,
suppose I have a record consisting of a list of comma-separated values (no
whitespace). I want to set the istream up so that formatted input operations
using operator>> will recognize the comma as a field separator.


You can do that. However it is much simpler to use getline() for that
task. The reason is, that you can tell getline what to use as 'end of
record' marker. Default is '\n', but as said: this is changable and
there is no reason why you shouldn't use ',' as record seperator.
--
Karl Heinz Buchegger
kb******@gascad.at
Jul 23 '05 #3

P: n/a

"Dietmar Kuehl" <di***********@yahoo.com> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com...
Dave Moore wrote:
Is there any way to specify an istream separator sequence?


Summary: Sequence: no. Individual characters: yes.

Longer version: 'std::istream' determines its notion of whitespace
using the 'std::ctype<char>' facets (in general,
'std::basic_istream<cT>'
uses 'std::ctype<cT>') obtained from the 'std::locale' associated with
the stream. You can replace this locale using the 'imbue()' member
function and you can create a 'std::ctype<char>' facet which considers
comma as whitespace.


Well, I don't really want whitespace semantics unfortunately, since I need
to keep track of empty fields in order to preserve column alignment. I was
already aware of the facet approach, although I did not know about the other
details you mentioned concerning different char types. Thanks for pointing
those out.

I guess I will have to "roll my own" iterator to handle this, or perhaps I
can pre-process the files using awk. However I think this latter approach
doesn't get rid of the column alignment issue, since I will still need to
read them in to my C++ program at some point.

Dave Moore
Jul 23 '05 #4

P: n/a

"Karl Heinz Buchegger" <kb******@gascad.at> wrote in message
news:42***************@gascad.at...
Dave Moore wrote:

Is there any way to specify an istream separator sequence? For example,
suppose I have a record consisting of a list of comma-separated values (no whitespace). I want to set the istream up so that formatted input operations using operator>> will recognize the comma as a field separator.


You can do that. However it is much simpler to use getline() for that
task. The reason is, that you can tell getline what to use as 'end of
record' marker. Default is '\n', but as said: this is changable and
there is no reason why you shouldn't use ',' as record seperator.


Well, I am already using getline to extract the data from an ifstream, but I
guess there is no reason I couldn't use it again with a different terminator
to extract the fields from an istringstream. Thanks for the suggestion.

For the sake of completeness though, and in case getline doesn't work for
some reason, can you specify an alternate approach? If it is to use facet
to change the definition of whitespace, then that won't work for me, as I
already explained in my reply to Dietmar. However, if you have yet another
approach, I'm all ears (or should it be "all eyes" for an NG? 8*).

Thanks,

Dave Moore
Jul 23 '05 #5

P: n/a
Dave Moore wrote:
Well, I don't really want whitespace semantics unfortunately, since I need to keep track of empty fields in order to preserve column alignment.
Why is this a problem? Turn off automatic whitespace skipping and
skip individual whitespace characters, i.e. commas, explicitly.
BTW, note that the whitespace technique does not necessarily work
if you try to use different data types than strings: a comma
character could e.g. be used as a thousands separator for floating
point values. That is, you would read in strings. Also note that
the also proposed 'getline()' approach does not necessarily do the
right thing if you use multiple different separators, e.g. commas
to separate fields and newlines to separate, well, lines. The
whitespace approach can cause reading of strings to stop at
multiple different characters: just mark each character which may
separate values as whitespace.
I guess I will have to "roll my own" iterator to handle this,


Assuming you have an appropriate locale set up, I think it is
fairly easy to read fields with skipping turned of and explicit
skipping of "whitespace" characters using a simple loop:

| std::istream& skip(std:istream& in) {
| char c = 0;
| if (in >> c && (c != ',' || c != '\n'))
| in.setstate(std::ios_base::failbit);
| return in;
| }
....
| in >> std::noskipws;
| for (std::string field; in >> field; in >> skip)
| process(field);
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.contendix.com> - Software Development & Consulting

Jul 23 '05 #6

P: n/a
Dietmar Kuehl wrote:
[skip] point values. That is, you would read in strings. Also note that
the also proposed 'getline()' approach does not necessarily do the
right thing if you use multiple different separators, e.g. commas
to separate fields and newlines to separate, well, lines.


[to OP]
I constantly found it to be the easiest approach to first
read in a whole line as string (using '\n' as delimiter). Once
the line is in memory, techniques to parse that line are applied
easily (such as using getline with ',' as terminator). In the long
run this has always turned out to be the simplest strategy if the
file format is line based and errors in the file have to be expected.
Parsing directly from the file dealing correctly with line breaks
and seperators quickly gets dirty in case of file format error handling.

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 23 '05 #7

P: n/a
Dave Moore wrote:

"Karl Heinz Buchegger" <kb******@gascad.at> wrote in message
news:42***************@gascad.at...
Dave Moore wrote:

Is there any way to specify an istream separator sequence? For example,
suppose I have a record consisting of a list of comma-separated values (no whitespace). I want to set the istream up so that formatted input operations using operator>> will recognize the comma as a field separator.
You can do that. However it is much simpler to use getline() for that
task. The reason is, that you can tell getline what to use as 'end of
record' marker. Default is '\n', but as said: this is changable and
there is no reason why you shouldn't use ',' as record seperator.


Well, I am already using getline to extract the data from an ifstream, but I
guess there is no reason I couldn't use it again with a different terminator
to extract the fields from an istringstream. Thanks for the suggestion.


Thats exactly what would be heading for.

For the sake of completeness though, and in case getline doesn't work for
some reason, can you specify an alternate approach? If it is to use facet
to change the definition of whitespace, then that won't work for me, as I
already explained in my reply to Dietmar.
And Dietmar alread showed what to do against it.
However, if you have yet another
approach, I'm all ears (or should it be "all eyes" for an NG? 8*).


Well. If the line is already in memory as a string, you can always search
for ',' and extract substrings from the line (doing it the hard way).

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 23 '05 #8

P: n/a
On Thu, 10 Feb 2005 14:18:12 +0100 in comp.lang.c++, "Dave Moore"
<dt*****@email.unc.edu> wrote,
Well, I am already using getline to extract the data from an ifstream, but I
guess there is no reason I couldn't use it again with a different terminator
to extract the fields from an istringstream. Thanks for the suggestion.

For the sake of completeness though, and in case getline doesn't work for
some reason, can you specify an alternate approach?


I do not like all the work of constructing a istringstream if all you
are using it for is tokenizing. A simple splitter example is at
http://groups.google.com/gr*********....earthlink.net
Jul 23 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.