By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,616 Members | 1,618 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,616 IT Pros & Developers. It's quick & easy.

Fastest way to read from a file into a vector<unsigned char>

P: n/a
Hi all,

I want to read from a file into a vector<unsigned char>. Right now my
code looks like this:

FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);

The problem is that the vector's constructor initializes the buffer to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.

So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector here)?
I tried to do some things with reserve(), but they didn't help.

Thanks,
Phillip Hellewell

Apr 4 '06 #1
Share this Question
Share on Google+
9 Replies


P: n/a
ss****@gmail.com wrote:
I want to read from a file into a vector<unsigned char>. Right now my
code looks like this:

FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);

The problem is that the vector's constructor initializes the buffer to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.

So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector
here)? I tried to do some things with reserve(), but they didn't help.


You could read it into an array and then initialise your vector with it.

As for "fastest", you'd have to clock it. There is no way to tell unless
a special tool (a profiler) is involved.

V
--
Please remove capital As from my address when replying by mail
Apr 5 '06 #2

P: n/a
Surely making a copy of an array is slower than a memset.

I want a faster solution, not a slower one.

Short of implementing my own byte_vector class, does anyone have a
solution to remove the unnecessary initialization of the vector?

Apr 5 '06 #3

P: n/a
ss****@gmail.com wrote:
Surely making a copy of an array is slower than a memset.

I want a faster solution, not a slower one.

Short of implementing my own byte_vector class, does anyone have a
solution to remove the unnecessary initialization of the vector?


Initialise it from the stream buffer directly, or from the extractor
iterator (like "istream_iterator" or something).

V
--
Please remove capital As from my address when replying by mail
Apr 5 '06 #4

P: n/a
ss****@gmail.com wrote:
Surely making a copy of an array is slower than a memset.

I want a faster solution, not a slower one.

Short of implementing my own byte_vector class, does anyone have a
solution to remove the unnecessary initialization of the vector?

If you system supports memory mapping (mmap) a file, write a simple
input iterator object to iterate over an array of unsigned char. Map
your file, point a pair of integrators at the beginning and end and
construct the vector from the two iterators.

Might be quicker, might not.

Benchmark.

--
Ian Collins.
Apr 5 '06 #5

P: n/a
<ss****@gmail.com> wrote in message
news:11**********************@j33g2000cwa.googlegr oups.com...
: Hi all,
:
: I want to read from a file into a vector<unsigned char>. Right now my
: code looks like this:
:
: FILE* f = fopen( "datafile", "rb" );
: enum { SIZE = 100 };
: vector<unsigned char> buf(SIZE);
: fread(&buf[0], 1, SIZE, f);
:
: The problem is that the vector's constructor initializes the buffer to
: all zeroes. I don't want it to initialize to all zeroes. It is
: pointless and a waste of time since I will just be reading in from the
: file overtop of it.
Believing that initializing the vector will slow things down is
probably a misconception. Writing memory takes very little processor
time, and will be handled in the cache. You won't even have a cache
flush before you overwrite the same memory anyway.
Slow <=> disc i/o >> memory i/o >> cache i/o >> processor

: So, does anyone know how I could eliminate the initialization of the
: vector (without switching to a raw array; I really want a vector here)?
: I tried to do some things with reserve(), but they didn't help.

If you need top performance for a large file, using platform-specific
ways to map the file into memory is likely to provide the best results
- as Ian suggested.
Why is a vector needed in the first place?
It is a very safe bet to say that you have much more to win from other
optimizations than what you will gain by skipping the vector init.

Regards,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Brainbench MVP for C++ <> http://www.brainbench.com
Apr 5 '06 #6

P: n/a
ss****@gmail.com wrote:
Hi all,

I want to read from a file into a vector<unsigned char>. Right now my
code looks like this:

FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);

The problem is that the vector's constructor initializes the buffer to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.
That's premature optimization. Have you actually timed your
code to see which part takes how much time? I bet you haven't,
because then you'd see that it's the disk I/O that takes
over 99% of the time -- trying to optimize away the vector
initialization is pointless.

Unless, of course, you read from a RAM disk of some sorts
or some other device that is way faster than the fastest HDDs
around.
So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector here)?
I tried to do some things with reserve(), but they didn't help.


Time your routine, identify the bottleneck. I'm fairly certain
you will notice that the call(s) to fread() take(s) your time.
Others have suggested means to speed that up already.

HTH,
- J.
Apr 9 '06 #7

P: n/a
ss****@gmail.com wrote:
Hi all,

I want to read from a file into a vector<unsigned char>. Right now my
code looks like this:

FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);

The problem is that the vector's constructor initializes the buffer to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.

So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector here)?
I tried to do some things with reserve(), but they didn't help.


What did you try with reserve()? Something like the following?

vector<unsigned char> buf;
buf.reserve(SIZE);
flockfile(f);
for(size_t i = 0; i < SIZE; ++i)
{
const int r = getc_unlocked(f);
if(r == EOF)
break;
buf.push_back(r);
}
funlockfile(f);

This avoids the vector initialization. Clearly there are other costs
however. You would need to test it (with optimization) to see wether it
is overall faster.

Apr 9 '06 #8

P: n/a
In message <28***************************@news.chello.pl>, Jacek
Dziedzic <jacek@no_spam.tygrys.no_spam.net> writes
ss****@gmail.com wrote:
Hi all,
I want to read from a file into a vector<unsigned char>. Right now
my
code looks like this:
FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);
The problem is that the vector's constructor initializes the buffer
to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.
That's premature optimization. Have you actually timed your
code to see which part takes how much time? I bet you haven't,
because then you'd see that it's the disk I/O that takes
over 99% of the time -- trying to optimize away the vector
initialization is pointless.

Unless, of course, you read from a RAM disk of some sorts
or some other device that is way faster than the fastest HDDs
around.


Or the OS kindly takes care of read-ahead caching of the file access for
you.

I have experienced exactly this problem, and determined by profiling
that, to my surprise, vector initialisation was indeed taking a large
fraction of the time. (Typical size of the data being read was something
like a megabyte.)
So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector here)?
I tried to do some things with reserve(), but they didn't help.

I gave up and wrote my own simple "lightweight vector" class - basically
just a pointer and size and capacity counter.
Time your routine, identify the bottleneck. I'm fairly certain
you will notice that the call(s) to fread() take(s) your time.
Others have suggested means to speed that up already.


--
Richard Herring
Apr 10 '06 #9

P: n/a
Victor Bazarov wrote:
ss****@gmail.com wrote:
Surely making a copy of an array is slower than a memset.

I want a faster solution, not a slower one.

Short of implementing my own byte_vector class, does anyone have a
solution to remove the unnecessary initialization of the vector?


Initialise it from the stream buffer directly, or from the extractor
iterator (like "istream_iterator" or something).


Yes, try something like this:

std::vector<unsigned char> buf;
std::ifstream strm("datafile", std::ios_base::binary);
if (!strm)
{
std::cerr << "cannot open file\n" << std::endl;
exit();
}
strm.unsetf(std::ios_base::skipws);
std::istream_iterator<unsigned char> isi(strm), isiEOF;
buf.assign(isi, isiEOF);
if (!strm.eof()) std::cerr << "read error\n" << std::endl;

--
Paul M. Dubuc
Apr 10 '06 #10

This discussion thread is closed

Replies have been disabled for this discussion.