473,385 Members | 1,942 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

std::istream slowness vs. std::fgetc

Let me preface this by saying this obviously isn't a C++ *language*
issue per se; rather probably an issue relating to quality of
implementation, unless I'm just misusing iostream...

I wrote a function to count lines in a large file that start with
a particular pattern. The file can contain large amounts of non-text
crap, which may make some lines very long (so using getline() with
a std::string isn't feasable). So I dug around looking at what
else is available in terms of the unformatted io functions and such,
and found istream::ignore, but the performance was crap compared
to an implementation using <cstdio>. (I tried it on both GNU C++
and MSVC++).

I'm kinda new to iostream, so I'm guessing (*hoping* is more like
it, because I'd rather use them for this) that I'm going about this
in the wrong way (and that that is part of why the performance is
so much worse). The other possibility is that compiler vendors
have spent more time making the C-style stdio functions fast, and
only implement iostream to be able to say they support the language
standard. Still another possibility (that I hope isn't the case)
is that iostreams are intrinsically unable to perform as well for
some reason relating to the design (lots of creation/destruction
of sentry objects maybe??).

Before anyone mentions it, this isn't on cin, so the sync_with_stdio
crap isn't the issue...

Anyway, here's the implementation I came up with using iostream.
(Btw, this is coming from memory and I'm not compiling it. So go
easy on the nitpicks if you grok the general idea).

----

// I was mildly suprised there's not something like this
// in <algorithm>, btw.
namespace {
template<class InputIterator, class OutputIterator, class T>
void
copy_n_until(InputIterator in, InputIterator end, OutputIterator out,
size_t limit, T until)
{
while (limit-- && in != end && *in != until)
*out++ = *in++;
}

int
count_special_lines(istream &in)
{
int cnt = 0;

do {
// Ok, I wanted to use istream::get(char_type *, size_type),
// but unfortunately it sets failbit if it stores no characters
// into the output array. Which kinda sucks for me, because
// adjacent newlines is not a real failure (and how can I tell
// this apart from a propagated failure on the low-level write()
// or WriteFile() or whatever system-specific function?).
// Maybe I'm not understanding something about the state bits...
//
// istream::getline also didn't seem so nice because it fails
// when it stores the max chars you told it it was allowed to
// read, but I'm deliberately limiting that number so I don't
// have to read a whole (potentially large) line into core.
//
// This iterator version seemed to be the best way to avoid
// having to screw with failure states that aren't real I/O
// failures, and its syntax is pretty nice. I'm not so sure
// about the performance implications, however. I'd note,
// though, that this certainly isn't the slowest part; just
// running a loop on the ignore is ridiculously slow...
//
string buf;
istreambuf_iterator<char> it(in), end;
copy_n_until(it, end, back_inserter(buf), 5, '\n');
if (buf == "Magic")
cnt++;
} while (in.ignore(numerical_limits<streamsize>.max(), '\n');

return cnt;
}

}

int
count_file(const string &file)
{
return count_special_lines(
ifstream(file.c_str(), ios::binary | ios::in));
}

----

The above approach worked, but it goes way slow for what it is doing
(essentially nothing). Using cstdio runs *many* times faster:

----

int
count_file(const string &file)
{
FILE *fp;

if (!(fp = fopen(file.c_str(), "r"))
return 0;

int cnt = 0;
for (;;) {
char buf[6];
if (!fgets(buf, sizeof buf, fp))
break;
if (!strcmp(buf, "Magic"))
cnt++;

// If we already finished a line, we don't need to
// skip to the next one.
if (strrchr(buf, '\n'))
continue;

// Skip to the next line.
int c;
while ((c = fgetc(fp)) != EOF && c != '\n')
;
if (c == EOF)
break;
}

fclose(fp);
return cnt;
}

----

Anyone know how I can rewrite the iostreams version to not suck (if
it is possible)? Or is the iostreams lib (at least, in the GNU and
MS implementations) not really useful for this sort of (very simple)
real world task, if speed is even slightly an issue?

Jason K
Jul 23 '05 #1
6 2977
Sorry I don't have the answer to this question, but am also interested
in the answer. I'm kind of surprised no one has replied to this post.
Is it that this question is better suited to a different newsgroup since
it could be compiler/vendor specific or os specific? Or is it just that
no one really knows the answer?

Perhaps a simpler question would be whether the i/o streams for C++
should perform as well as C i/o. If so, are there any tricks or preferred
usage?

Thanks,
Kyle
"Jason K" <jd*@nospam.com> wrote in message
news:in*****************@tornado.texas.rr.com...
Let me preface this by saying this obviously isn't a C++ *language*
issue per se; rather probably an issue relating to quality of
implementation, unless I'm just misusing iostream...

I wrote a function to count lines in a large file that start with
a particular pattern. The file can contain large amounts of non-text
crap, which may make some lines very long (so using getline() with
a std::string isn't feasable). So I dug around looking at what
else is available in terms of the unformatted io functions and such,
and found istream::ignore, but the performance was crap compared
to an implementation using <cstdio>. (I tried it on both GNU C++
and MSVC++).

I'm kinda new to iostream, so I'm guessing (*hoping* is more like
it, because I'd rather use them for this) that I'm going about this
in the wrong way (and that that is part of why the performance is
so much worse). The other possibility is that compiler vendors
have spent more time making the C-style stdio functions fast, and
only implement iostream to be able to say they support the language
standard. Still another possibility (that I hope isn't the case)
is that iostreams are intrinsically unable to perform as well for
some reason relating to the design (lots of creation/destruction
of sentry objects maybe??).

Before anyone mentions it, this isn't on cin, so the sync_with_stdio
crap isn't the issue...

Anyway, here's the implementation I came up with using iostream.
(Btw, this is coming from memory and I'm not compiling it. So go
easy on the nitpicks if you grok the general idea).

----

// I was mildly suprised there's not something like this
// in <algorithm>, btw.
namespace {
template<class InputIterator, class OutputIterator, class T>
void
copy_n_until(InputIterator in, InputIterator end, OutputIterator out,
size_t limit, T until)
{
while (limit-- && in != end && *in != until)
*out++ = *in++;
}

int
count_special_lines(istream &in)
{
int cnt = 0;

do {
// Ok, I wanted to use istream::get(char_type *, size_type),
// but unfortunately it sets failbit if it stores no characters
// into the output array. Which kinda sucks for me, because
// adjacent newlines is not a real failure (and how can I tell
// this apart from a propagated failure on the low-level write()
// or WriteFile() or whatever system-specific function?).
// Maybe I'm not understanding something about the state bits...
//
// istream::getline also didn't seem so nice because it fails
// when it stores the max chars you told it it was allowed to
// read, but I'm deliberately limiting that number so I don't
// have to read a whole (potentially large) line into core.
//
// This iterator version seemed to be the best way to avoid
// having to screw with failure states that aren't real I/O
// failures, and its syntax is pretty nice. I'm not so sure
// about the performance implications, however. I'd note,
// though, that this certainly isn't the slowest part; just
// running a loop on the ignore is ridiculously slow...
//
string buf;
istreambuf_iterator<char> it(in), end;
copy_n_until(it, end, back_inserter(buf), 5, '\n');
if (buf == "Magic")
cnt++;
} while (in.ignore(numerical_limits<streamsize>.max(), '\n');

return cnt;
}

}

int
count_file(const string &file)
{
return count_special_lines(
ifstream(file.c_str(), ios::binary | ios::in));
}

----

The above approach worked, but it goes way slow for what it is doing
(essentially nothing). Using cstdio runs *many* times faster:

----

int
count_file(const string &file)
{
FILE *fp;

if (!(fp = fopen(file.c_str(), "r"))
return 0;

int cnt = 0;
for (;;) {
char buf[6];
if (!fgets(buf, sizeof buf, fp))
break;
if (!strcmp(buf, "Magic"))
cnt++;

// If we already finished a line, we don't need to
// skip to the next one.
if (strrchr(buf, '\n'))
continue;

// Skip to the next line.
int c;
while ((c = fgetc(fp)) != EOF && c != '\n')
;
if (c == EOF)
break;
}

fclose(fp);
return cnt;
}

----

Anyone know how I can rewrite the iostreams version to not suck (if
it is possible)? Or is the iostreams lib (at least, in the GNU and
MS implementations) not really useful for this sort of (very simple)
real world task, if speed is even slightly an issue?

Jason K

Jul 23 '05 #2
Jason K wrote:
Let me preface this by saying this obviously isn't a C++ *language*
issue per se; rather probably an issue relating to quality of
implementation, unless I'm just misusing iostream...

I wrote a function to count lines in a large file that start with
a particular pattern. The file can contain large amounts of non-text
crap, which may make some lines very long (so using getline() with
a std::string isn't feasable). So I dug around looking at what
else is available in terms of the unformatted io functions and such,
and found istream::ignore, but the performance was crap compared
to an implementation using <cstdio>. (I tried it on both GNU C++
and MSVC++).

I'm kinda new to iostream, so I'm guessing (*hoping* is more like
it, because I'd rather use them for this) that I'm going about this
in the wrong way (and that that is part of why the performance is
so much worse). The other possibility is that compiler vendors
have spent more time making the C-style stdio functions fast, and
only implement iostream to be able to say they support the language
standard. Still another possibility (that I hope isn't the case)
is that iostreams are intrinsically unable to perform as well for
some reason relating to the design (lots of creation/destruction
of sentry objects maybe??).

Before anyone mentions it, this isn't on cin, so the sync_with_stdio
crap isn't the issue...

Anyway, here's the implementation I came up with using iostream.
(Btw, this is coming from memory and I'm not compiling it. So go
easy on the nitpicks if you grok the general idea).

----

// I was mildly suprised there's not something like this
// in <algorithm>, btw.
namespace {
template<class InputIterator, class OutputIterator, class T>
void
copy_n_until(InputIterator in, InputIterator end, OutputIterator out,
size_t limit, T until)
{
while (limit-- && in != end && *in != until)
*out++ = *in++;
}

int
count_special_lines(istream &in)
{
int cnt = 0;

do {
// Ok, I wanted to use istream::get(char_type *, size_type),
// but unfortunately it sets failbit if it stores no characters
// into the output array. Which kinda sucks for me, because
// adjacent newlines is not a real failure (and how can I tell
// this apart from a propagated failure on the low-level write()
// or WriteFile() or whatever system-specific function?).
// Maybe I'm not understanding something about the state bits...
//
// istream::getline also didn't seem so nice because it fails
// when it stores the max chars you told it it was allowed to
// read, but I'm deliberately limiting that number so I don't
// have to read a whole (potentially large) line into core.
//
// This iterator version seemed to be the best way to avoid
// having to screw with failure states that aren't real I/O
// failures, and its syntax is pretty nice. I'm not so sure
// about the performance implications, however. I'd note,
// though, that this certainly isn't the slowest part; just
// running a loop on the ignore is ridiculously slow...
//
string buf;
istreambuf_iterator<char> it(in), end;
copy_n_until(it, end, back_inserter(buf), 5, '\n');
if (buf == "Magic")
cnt++;
} while (in.ignore(numerical_limits<streamsize>.max(), '\n');

return cnt;
}

}

int
count_file(const string &file)
{
return count_special_lines(
ifstream(file.c_str(), ios::binary | ios::in));
}

----

The above approach worked, but it goes way slow for what it is doing
(essentially nothing). Using cstdio runs *many* times faster:

----

int
count_file(const string &file)
{
FILE *fp;

if (!(fp = fopen(file.c_str(), "r"))
return 0;

int cnt = 0;
for (;;) {
char buf[6];
if (!fgets(buf, sizeof buf, fp))
break;
if (!strcmp(buf, "Magic"))
cnt++;

// If we already finished a line, we don't need to
// skip to the next one.
if (strrchr(buf, '\n'))
continue;

// Skip to the next line.
int c;
while ((c = fgetc(fp)) != EOF && c != '\n')
;
if (c == EOF)
break;
}

fclose(fp);
return cnt;
}

----

Anyone know how I can rewrite the iostreams version to not suck (if
it is possible)? Or is the iostreams lib (at least, in the GNU and
MS implementations) not really useful for this sort of (very simple)
real world task, if speed is even slightly an issue?

Jason K


No comment on your code example.

Most C++ stream implementations are layered on top of the
STDIO implementation (open/close fopen/fclose, etc). This
adds additional overhead to most i/o calls.

For example, I wrote a quick program that reads a 16MB file
of binary data in 8 byte chunks (intentionally ineffecient).
The program reads the binary file twice, once using an fread()
loop, and then again using an ifstream.read() loop. The fread()
loop finished in 0.84 seconds, and the ifstream.read() loop
finished in 1.37 seconds. In both cases the file was opened
in 'binary' mode (this only matters on Windows). These
timings are only valid for my machine, which is:

pentium 3, 450 mhz, 384 MB RAM
SuSE Linux Pro v9.2
GNU g++ v3.3.4 (pre 3.3.5 20040809)
libstdc++ v3.3.4-11
glibc v2.3.3-118

Here's the ifstream function:

void
count_stream(const char * file)
{
int len;
char buf[9];

// *** open file for reading IN BINARY MODE ***
std::ifstream dmy(file,
std::ios_base::in |
std::ios_base::binary);

len = 8;
buf[len] = '\0'; // just because...

while(dmy)
{
// read 8 bytes of binary data
dmy.read(buf, len);
}

dmy.close();

return;
}

Here's the stdio version:

void
count_file(const char * file)
{
FILE *fp;
int len;
char buf[9];

len = 8;
buf[len] = '\0'; // just because...

// *** open file for reading IN BINARY MODE ***
if (!(fp = fopen(file, "rb")))
return;

for (;;)
{
// read 8 bytes of binary data
if (0 == fread(buf, len, 1, fp))
break;
}

fclose(fp);

return;
}

Regards,
Larry

--
Anti-spam address, change each 'X' to '.' to reply directly.
Jul 23 '05 #3

"Kyle Kolander" <kk*******@hotmail.com> wrote in message news:Kf***********@dfw-service2.ext.ray.com...

[snip]
Perhaps a simpler question would be whether the i/o streams for C++
should perform as well as C i/o. If so, are there any tricks or preferred
usage?

[snip]

For instance, see

"<Summary> Simple C/C++ Perfometer: Reading file to string (Versions 1.x)" at
http://groups-beta.google.com/group/...0fae8e5e065030

"<Release> Simple C/C++ Perfometer: Copying Files (Versions 4.x)" at
http://groups-beta.google.com/group/...74465da4c4e9bb
--
Alex Vinokur
email: alex DOT vinokur AT gmail DOT com
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

Jul 23 '05 #4
Alex Vinokur wrote:
For instance, see

"<Summary> Simple C/C++ Perfometer: Reading file to string (Versions 1.x)" at http://groups-beta.google.com/group/...0fae8e5e065030

"<Release> Simple C/C++ Perfometer: Copying Files (Versions 4.x)" at
http://groups-beta.google.com/group/...74465da4c4e9bb


Impressive - thank you!

Jul 23 '05 #5

"Abecedarian" <ab*********@spambob.com> wrote in message
news:11**********************@g14g2000cwa.googlegr oups.com...
Alex Vinokur wrote:
For instance, see

"<Summary> Simple C/C++ Perfometer: Reading file to string (Versions

1.x)" at
http://groups-beta.google.com/group/...0fae8e5e065030

"<Release> Simple C/C++ Perfometer: Copying Files (Versions 4.x)" at
http://groups-beta.google.com/group/...74465da4c4e9bb


Impressive - thank you!


Hardly, Just looking at the last line of the summary:
###############
### Summary ###
###############
==============================*===========
* Performance
* Comparative Performance Measurement
------------------------------*-----------
* Tool : Simple C/C++ Perfometer
* Algorithm: Reading file into string
* Language : C++
* Version : F2S-1.0
------------------------------*-----------
* Environment : Windows 2000 Professional
Intel(R) Celeron(R) CPU 1.70 GHz
Cygwin
* Compilers : GNU g++ 3.3.3
* Optimization : No optimization
~~~~~~~~~~~~~~~

Comparing C to un-optimized C++ is not a valid comparison.

Jeff Flinn
Jul 23 '05 #6

"Jason K" <jd*@nospam.com> wrote in message
news:in*****************@tornado.texas.rr.com...
Let me preface this by saying this obviously isn't a C++ *language*
issue per se; rather probably an issue relating to quality of
implementation, unless I'm just misusing iostream...
IMO, yes. IOStreams performance is awful, and always will be, thanks to
the requirements imposed on it.
Below is a reply to a similar posting in microsoft.public.vc.stl by Stephen
Howe.

Jeff Flinn

-----------------------------------------------------------------------------

Several things:

1. Have you seen this Carl?
http://www.open-std.org/jtc1/sc22/wg...2004/n1666.pdf
It talks about efficient IOStreams

2. Part of the problem is the fact that the C portions of the library are by
Microsoft and the C++ portions are by Dinkumware. And the C++ portions "sit"
on top of the C portions.
Inevitably, C is faster because of the design favour. I would not have
designed things this way.

Instead fopen() and fstream would call a common internal function to do the
opening of files.
I would try and make it so it possible to share the same buffer (unless the
standards make that impossible).

Stephen Howe
b) Are there any getarounds?


Don't use IOStreams.

-cd


Jul 23 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: CAFxX | last post by:
i'm writing a program that executes some calculations on a bitmap loaded in memory. these calculation ends up with pixel wth values far over 255, but i need them to be between 0 and 255 since i...
19
by: Lionel B | last post by:
Greetings, I need to read (unformatted text) from stdin up to EOF into a char buffer; of course I cannot allocate my buffer until I know how much text is available, and I do not know how much...
103
by: Steven T. Hatton | last post by:
§27.4.2.1.4 Type ios_base::openmode Says this about the std::ios::binary openmode flag: *binary*: perform input and output in binary mode (as opposed to text mode) And that is basically _all_ it...
8
by: G Patel | last post by:
I wrote the following program to remove C89 type comments from stdin and send it to stdout (as per exercise in K&R2) and it works but I was hoping more experienced programmer would critique the...
40
by: Matt | last post by:
Alright, this is puzzling me. Here's what it says: Exercise 1-7. Write a program to print the value of EOF. Which is from the book "The C Programming Language". How would I go about writing...
34
by: Tom | last post by:
I'd greatly appreciate advice and code snippets on how to create a ram disk within a C/C++ program. I also need to be able to determine the free space. Thanks in advance for any help.
2
by: david wolf | last post by:
My understanding is that cstdio basically is the same as stdio.h except the functions are in a namspace called std. However when I take a look at the content of the file cstdio, it has the...
24
by: Ground21 | last post by:
Hello. How could I read the whole text file line after line from the end of file? (I want to ~copy~ file) I want to copy file in this way: file 1: a b
7
by: Adrian | last post by:
What is the best was to do this in c++. This is going to be used for unix util that should be able to have input piped to it or file name spec #include <stdio.h> int main(int argc, char...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.