Is there any way to specify an istream separator sequence? For example,
suppose I have a record consisting of a list of comma-separated values (no
whitespace). I want to set the istream up so that formatted input operations
using operator>> will recognize the comma as a field separator. I know how
to do this for an output stream using an ostream_iterator, but there doesn't
seem to be an analogous way to handle it using istream_iterator.
I have looked through a bunch of STL docs online, but none of them see to
cover this particular case. Any help would be appreciated.
TIA,
Dave Moore 8 14365
Dave Moore wrote: Is there any way to specify an istream separator sequence?
Summary: Sequence: no. Individual characters: yes.
Longer version: 'std::istream' determines its notion of whitespace
using the 'std::ctype<char>' facets (in general,
'std::basic_istream<cT>'
uses 'std::ctype<cT>') obtained from the 'std::locale' associated with
the stream. You can replace this locale using the 'imbue()' member
function and you can create a 'std::ctype<char>' facet which considers
comma as whitespace.
Unfortunately, the details of how to provide a new ctype facet differ
between character types due to a standardized optimization for 'char':
for 'std::ctype<char>' you effectively install a table with character
classifications while other character types would use a virtual
function. Here is an example I posted a while ago (and located quickly
with google):
<http://www.talkaboutprogramming.com/group/comp.lang.c++/messages/740314.html>
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.contendix.com> - Software Development & Consulting
Dave Moore wrote: Is there any way to specify an istream separator sequence? For example, suppose I have a record consisting of a list of comma-separated values (no whitespace). I want to set the istream up so that formatted input operations using operator>> will recognize the comma as a field separator.
You can do that. However it is much simpler to use getline() for that
task. The reason is, that you can tell getline what to use as 'end of
record' marker. Default is '\n', but as said: this is changable and
there is no reason why you shouldn't use ',' as record seperator.
--
Karl Heinz Buchegger kb******@gascad.at
"Dietmar Kuehl" <di***********@yahoo.com> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com... Dave Moore wrote: Is there any way to specify an istream separator sequence?
Summary: Sequence: no. Individual characters: yes.
Longer version: 'std::istream' determines its notion of whitespace using the 'std::ctype<char>' facets (in general, 'std::basic_istream<cT>' uses 'std::ctype<cT>') obtained from the 'std::locale' associated with the stream. You can replace this locale using the 'imbue()' member function and you can create a 'std::ctype<char>' facet which considers comma as whitespace.
Well, I don't really want whitespace semantics unfortunately, since I need
to keep track of empty fields in order to preserve column alignment. I was
already aware of the facet approach, although I did not know about the other
details you mentioned concerning different char types. Thanks for pointing
those out.
I guess I will have to "roll my own" iterator to handle this, or perhaps I
can pre-process the files using awk. However I think this latter approach
doesn't get rid of the column alignment issue, since I will still need to
read them in to my C++ program at some point.
Dave Moore
"Karl Heinz Buchegger" <kb******@gascad.at> wrote in message
news:42***************@gascad.at... Dave Moore wrote: Is there any way to specify an istream separator sequence? For example, suppose I have a record consisting of a list of comma-separated values
(no whitespace). I want to set the istream up so that formatted input
operations using operator>> will recognize the comma as a field separator.
You can do that. However it is much simpler to use getline() for that task. The reason is, that you can tell getline what to use as 'end of record' marker. Default is '\n', but as said: this is changable and there is no reason why you shouldn't use ',' as record seperator.
Well, I am already using getline to extract the data from an ifstream, but I
guess there is no reason I couldn't use it again with a different terminator
to extract the fields from an istringstream. Thanks for the suggestion.
For the sake of completeness though, and in case getline doesn't work for
some reason, can you specify an alternate approach? If it is to use facet
to change the definition of whitespace, then that won't work for me, as I
already explained in my reply to Dietmar. However, if you have yet another
approach, I'm all ears (or should it be "all eyes" for an NG? 8*).
Thanks,
Dave Moore
Dave Moore wrote: Well, I don't really want whitespace semantics unfortunately, since I
need to keep track of empty fields in order to preserve column alignment.
Why is this a problem? Turn off automatic whitespace skipping and
skip individual whitespace characters, i.e. commas, explicitly.
BTW, note that the whitespace technique does not necessarily work
if you try to use different data types than strings: a comma
character could e.g. be used as a thousands separator for floating
point values. That is, you would read in strings. Also note that
the also proposed 'getline()' approach does not necessarily do the
right thing if you use multiple different separators, e.g. commas
to separate fields and newlines to separate, well, lines. The
whitespace approach can cause reading of strings to stop at
multiple different characters: just mark each character which may
separate values as whitespace.
I guess I will have to "roll my own" iterator to handle this,
Assuming you have an appropriate locale set up, I think it is
fairly easy to read fields with skipping turned of and explicit
skipping of "whitespace" characters using a simple loop:
| std::istream& skip(std:istream& in) {
| char c = 0;
| if (in >> c && (c != ',' || c != '\n'))
| in.setstate(std::ios_base::failbit);
| return in;
| }
....
| in >> std::noskipws;
| for (std::string field; in >> field; in >> skip)
| process(field);
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.contendix.com> - Software Development & Consulting
Dietmar Kuehl wrote:
[skip] point values. That is, you would read in strings. Also note that the also proposed 'getline()' approach does not necessarily do the right thing if you use multiple different separators, e.g. commas to separate fields and newlines to separate, well, lines.
[to OP]
I constantly found it to be the easiest approach to first
read in a whole line as string (using '\n' as delimiter). Once
the line is in memory, techniques to parse that line are applied
easily (such as using getline with ',' as terminator). In the long
run this has always turned out to be the simplest strategy if the
file format is line based and errors in the file have to be expected.
Parsing directly from the file dealing correctly with line breaks
and seperators quickly gets dirty in case of file format error handling.
--
Karl Heinz Buchegger kb******@gascad.at
Dave Moore wrote: "Karl Heinz Buchegger" <kb******@gascad.at> wrote in message news:42***************@gascad.at... Dave Moore wrote: Is there any way to specify an istream separator sequence? For example, suppose I have a record consisting of a list of comma-separated values (no whitespace). I want to set the istream up so that formatted input operations using operator>> will recognize the comma as a field separator. You can do that. However it is much simpler to use getline() for that task. The reason is, that you can tell getline what to use as 'end of record' marker. Default is '\n', but as said: this is changable and there is no reason why you shouldn't use ',' as record seperator.
Well, I am already using getline to extract the data from an ifstream, but I guess there is no reason I couldn't use it again with a different terminator to extract the fields from an istringstream. Thanks for the suggestion.
Thats exactly what would be heading for. For the sake of completeness though, and in case getline doesn't work for some reason, can you specify an alternate approach? If it is to use facet to change the definition of whitespace, then that won't work for me, as I already explained in my reply to Dietmar.
And Dietmar alread showed what to do against it.
However, if you have yet another approach, I'm all ears (or should it be "all eyes" for an NG? 8*).
Well. If the line is already in memory as a string, you can always search
for ',' and extract substrings from the line (doing it the hard way).
--
Karl Heinz Buchegger kb******@gascad.at
On Thu, 10 Feb 2005 14:18:12 +0100 in comp.lang.c++, "Dave Moore"
<dt*****@email.unc.edu> wrote, Well, I am already using getline to extract the data from an ifstream, but I guess there is no reason I couldn't use it again with a different terminator to extract the fields from an istringstream. Thanks for the suggestion.
For the sake of completeness though, and in case getline doesn't work for some reason, can you specify an alternate approach?
I do not like all the work of constructing a istringstream if all you
are using it for is tokenizing. A simple splitter example is at http://groups.google.com/gr*********....earthlink.net This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Francesco |
last post by:
Hi,
I've a question...
is it possible to change the default separator in the stream operator ">>"
(for example from the blank space " " to the comma "," ) ?
Here is an example...
I want to...
|
by: Thomas Matthews |
last post by:
Hi,
In some threads, some people mentioned that variable initialization is
best performed in an initialization list.
Is there a way to initialize a variable from an istream in an...
|
by: Jacek Dziedzic |
last post by:
Hello!
Suppose I'm writing a library to write a binary representation
of some data to a stream. Let's say that to make it extensible
in the future I'm preceding the actual data with some info...
|
by: Crimperman |
last post by:
Hi,
need some advice on URIs
In a dynamic page (perl driven) we list a number of items presented in
an hierarchical tree structure. Within that page is a form which allows
you to search for...
|
by: pigeonrandle |
last post by:
Hi all,
If i have (for example) a directory structure, where the names can have
ANY character in them, what would be the best method for finding a
'unique series of characters' to use as a...
|
by: Gianni Mariani |
last post by:
What I would like to do is read bytes from a stream, any number and any
time. I would like it to wait until there are any bytes to read.
I want the exact same functionality as cstdio's "fread"...
|
by: KWienhold |
last post by:
I'm currently writing an application (using Visual Studio 2003 SP1 and
C#) that stores files and additional information in a single compound
file using IStorage/IStream.
Since files in a compound...
|
by: Jim Michaels |
last post by:
how do I write an overloaded >operator for istream that
let's say fraction class through set() can take several types:
void set(char*)
void set(long int num, long int den)
void set(double)
...
|
by: Colonel |
last post by:
It seems that the problems have something to do with the overloading of
istream operator ">>", but I just can't find the exact problem.
// the declaration
friend std::istream &...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
| |