473,320 Members | 2,035 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

specifying istream separator sequence

Is there any way to specify an istream separator sequence? For example,
suppose I have a record consisting of a list of comma-separated values (no
whitespace). I want to set the istream up so that formatted input operations
using operator>> will recognize the comma as a field separator. I know how
to do this for an output stream using an ostream_iterator, but there doesn't
seem to be an analogous way to handle it using istream_iterator.

I have looked through a bunch of STL docs online, but none of them see to
cover this particular case. Any help would be appreciated.

TIA,

Dave Moore
Jul 23 '05 #1
8 14365
Dave Moore wrote:
Is there any way to specify an istream separator sequence?


Summary: Sequence: no. Individual characters: yes.

Longer version: 'std::istream' determines its notion of whitespace
using the 'std::ctype<char>' facets (in general,
'std::basic_istream<cT>'
uses 'std::ctype<cT>') obtained from the 'std::locale' associated with
the stream. You can replace this locale using the 'imbue()' member
function and you can create a 'std::ctype<char>' facet which considers
comma as whitespace.

Unfortunately, the details of how to provide a new ctype facet differ
between character types due to a standardized optimization for 'char':
for 'std::ctype<char>' you effectively install a table with character
classifications while other character types would use a virtual
function. Here is an example I posted a while ago (and located quickly
with google):
<http://www.talkaboutprogramming.com/group/comp.lang.c++/messages/740314.html>
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.contendix.com> - Software Development & Consulting

Jul 23 '05 #2
Dave Moore wrote:

Is there any way to specify an istream separator sequence? For example,
suppose I have a record consisting of a list of comma-separated values (no
whitespace). I want to set the istream up so that formatted input operations
using operator>> will recognize the comma as a field separator.


You can do that. However it is much simpler to use getline() for that
task. The reason is, that you can tell getline what to use as 'end of
record' marker. Default is '\n', but as said: this is changable and
there is no reason why you shouldn't use ',' as record seperator.
--
Karl Heinz Buchegger
kb******@gascad.at
Jul 23 '05 #3

"Dietmar Kuehl" <di***********@yahoo.com> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com...
Dave Moore wrote:
Is there any way to specify an istream separator sequence?


Summary: Sequence: no. Individual characters: yes.

Longer version: 'std::istream' determines its notion of whitespace
using the 'std::ctype<char>' facets (in general,
'std::basic_istream<cT>'
uses 'std::ctype<cT>') obtained from the 'std::locale' associated with
the stream. You can replace this locale using the 'imbue()' member
function and you can create a 'std::ctype<char>' facet which considers
comma as whitespace.


Well, I don't really want whitespace semantics unfortunately, since I need
to keep track of empty fields in order to preserve column alignment. I was
already aware of the facet approach, although I did not know about the other
details you mentioned concerning different char types. Thanks for pointing
those out.

I guess I will have to "roll my own" iterator to handle this, or perhaps I
can pre-process the files using awk. However I think this latter approach
doesn't get rid of the column alignment issue, since I will still need to
read them in to my C++ program at some point.

Dave Moore
Jul 23 '05 #4

"Karl Heinz Buchegger" <kb******@gascad.at> wrote in message
news:42***************@gascad.at...
Dave Moore wrote:

Is there any way to specify an istream separator sequence? For example,
suppose I have a record consisting of a list of comma-separated values (no whitespace). I want to set the istream up so that formatted input operations using operator>> will recognize the comma as a field separator.


You can do that. However it is much simpler to use getline() for that
task. The reason is, that you can tell getline what to use as 'end of
record' marker. Default is '\n', but as said: this is changable and
there is no reason why you shouldn't use ',' as record seperator.


Well, I am already using getline to extract the data from an ifstream, but I
guess there is no reason I couldn't use it again with a different terminator
to extract the fields from an istringstream. Thanks for the suggestion.

For the sake of completeness though, and in case getline doesn't work for
some reason, can you specify an alternate approach? If it is to use facet
to change the definition of whitespace, then that won't work for me, as I
already explained in my reply to Dietmar. However, if you have yet another
approach, I'm all ears (or should it be "all eyes" for an NG? 8*).

Thanks,

Dave Moore
Jul 23 '05 #5
Dave Moore wrote:
Well, I don't really want whitespace semantics unfortunately, since I need to keep track of empty fields in order to preserve column alignment.
Why is this a problem? Turn off automatic whitespace skipping and
skip individual whitespace characters, i.e. commas, explicitly.
BTW, note that the whitespace technique does not necessarily work
if you try to use different data types than strings: a comma
character could e.g. be used as a thousands separator for floating
point values. That is, you would read in strings. Also note that
the also proposed 'getline()' approach does not necessarily do the
right thing if you use multiple different separators, e.g. commas
to separate fields and newlines to separate, well, lines. The
whitespace approach can cause reading of strings to stop at
multiple different characters: just mark each character which may
separate values as whitespace.
I guess I will have to "roll my own" iterator to handle this,


Assuming you have an appropriate locale set up, I think it is
fairly easy to read fields with skipping turned of and explicit
skipping of "whitespace" characters using a simple loop:

| std::istream& skip(std:istream& in) {
| char c = 0;
| if (in >> c && (c != ',' || c != '\n'))
| in.setstate(std::ios_base::failbit);
| return in;
| }
....
| in >> std::noskipws;
| for (std::string field; in >> field; in >> skip)
| process(field);
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.contendix.com> - Software Development & Consulting

Jul 23 '05 #6
Dietmar Kuehl wrote:
[skip] point values. That is, you would read in strings. Also note that
the also proposed 'getline()' approach does not necessarily do the
right thing if you use multiple different separators, e.g. commas
to separate fields and newlines to separate, well, lines.


[to OP]
I constantly found it to be the easiest approach to first
read in a whole line as string (using '\n' as delimiter). Once
the line is in memory, techniques to parse that line are applied
easily (such as using getline with ',' as terminator). In the long
run this has always turned out to be the simplest strategy if the
file format is line based and errors in the file have to be expected.
Parsing directly from the file dealing correctly with line breaks
and seperators quickly gets dirty in case of file format error handling.

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 23 '05 #7
Dave Moore wrote:

"Karl Heinz Buchegger" <kb******@gascad.at> wrote in message
news:42***************@gascad.at...
Dave Moore wrote:

Is there any way to specify an istream separator sequence? For example,
suppose I have a record consisting of a list of comma-separated values (no whitespace). I want to set the istream up so that formatted input operations using operator>> will recognize the comma as a field separator.
You can do that. However it is much simpler to use getline() for that
task. The reason is, that you can tell getline what to use as 'end of
record' marker. Default is '\n', but as said: this is changable and
there is no reason why you shouldn't use ',' as record seperator.


Well, I am already using getline to extract the data from an ifstream, but I
guess there is no reason I couldn't use it again with a different terminator
to extract the fields from an istringstream. Thanks for the suggestion.


Thats exactly what would be heading for.

For the sake of completeness though, and in case getline doesn't work for
some reason, can you specify an alternate approach? If it is to use facet
to change the definition of whitespace, then that won't work for me, as I
already explained in my reply to Dietmar.
And Dietmar alread showed what to do against it.
However, if you have yet another
approach, I'm all ears (or should it be "all eyes" for an NG? 8*).


Well. If the line is already in memory as a string, you can always search
for ',' and extract substrings from the line (doing it the hard way).

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 23 '05 #8
On Thu, 10 Feb 2005 14:18:12 +0100 in comp.lang.c++, "Dave Moore"
<dt*****@email.unc.edu> wrote,
Well, I am already using getline to extract the data from an ifstream, but I
guess there is no reason I couldn't use it again with a different terminator
to extract the fields from an istringstream. Thanks for the suggestion.

For the sake of completeness though, and in case getline doesn't work for
some reason, can you specify an alternate approach?


I do not like all the work of constructing a istringstream if all you
are using it for is tokenizing. A simple splitter example is at
http://groups.google.com/gr*********....earthlink.net
Jul 23 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Francesco | last post by:
Hi, I've a question... is it possible to change the default separator in the stream operator ">>" (for example from the blank space " " to the comma "," ) ? Here is an example... I want to...
4
by: Thomas Matthews | last post by:
Hi, In some threads, some people mentioned that variable initialization is best performed in an initialization list. Is there a way to initialize a variable from an istream in an...
3
by: Jacek Dziedzic | last post by:
Hello! Suppose I'm writing a library to write a binary representation of some data to a stream. Let's say that to make it extensible in the future I'm preceding the actual data with some info...
17
by: Crimperman | last post by:
Hi, need some advice on URIs In a dynamic page (perl driven) we list a number of items presented in an hierarchical tree structure. Within that page is a form which allows you to search for...
2
by: pigeonrandle | last post by:
Hi all, If i have (for example) a directory structure, where the names can have ANY character in them, what would be the best method for finding a 'unique series of characters' to use as a...
13
by: Gianni Mariani | last post by:
What I would like to do is read bytes from a stream, any number and any time. I would like it to wait until there are any bytes to read. I want the exact same functionality as cstdio's "fread"...
3
by: KWienhold | last post by:
I'm currently writing an application (using Visual Studio 2003 SP1 and C#) that stores files and additional information in a single compound file using IStorage/IStream. Since files in a compound...
4
by: Jim Michaels | last post by:
how do I write an overloaded >operator for istream that let's say fraction class through set() can take several types: void set(char*) void set(long int num, long int den) void set(double) ...
2
by: Colonel | last post by:
It seems that the problems have something to do with the overloading of istream operator ">>", but I just can't find the exact problem. // the declaration friend std::istream &...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.