473,387 Members | 1,574 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Fastest way to read from a file into a vector<unsigned char>

Hi all,

I want to read from a file into a vector<unsigned char>. Right now my
code looks like this:

FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);

The problem is that the vector's constructor initializes the buffer to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.

So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector here)?
I tried to do some things with reserve(), but they didn't help.

Thanks,
Phillip Hellewell

Apr 4 '06 #1
9 6768
ss****@gmail.com wrote:
I want to read from a file into a vector<unsigned char>. Right now my
code looks like this:

FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);

The problem is that the vector's constructor initializes the buffer to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.

So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector
here)? I tried to do some things with reserve(), but they didn't help.


You could read it into an array and then initialise your vector with it.

As for "fastest", you'd have to clock it. There is no way to tell unless
a special tool (a profiler) is involved.

V
--
Please remove capital As from my address when replying by mail
Apr 5 '06 #2
Surely making a copy of an array is slower than a memset.

I want a faster solution, not a slower one.

Short of implementing my own byte_vector class, does anyone have a
solution to remove the unnecessary initialization of the vector?

Apr 5 '06 #3
ss****@gmail.com wrote:
Surely making a copy of an array is slower than a memset.

I want a faster solution, not a slower one.

Short of implementing my own byte_vector class, does anyone have a
solution to remove the unnecessary initialization of the vector?


Initialise it from the stream buffer directly, or from the extractor
iterator (like "istream_iterator" or something).

V
--
Please remove capital As from my address when replying by mail
Apr 5 '06 #4
ss****@gmail.com wrote:
Surely making a copy of an array is slower than a memset.

I want a faster solution, not a slower one.

Short of implementing my own byte_vector class, does anyone have a
solution to remove the unnecessary initialization of the vector?

If you system supports memory mapping (mmap) a file, write a simple
input iterator object to iterate over an array of unsigned char. Map
your file, point a pair of integrators at the beginning and end and
construct the vector from the two iterators.

Might be quicker, might not.

Benchmark.

--
Ian Collins.
Apr 5 '06 #5
<ss****@gmail.com> wrote in message
news:11**********************@j33g2000cwa.googlegr oups.com...
: Hi all,
:
: I want to read from a file into a vector<unsigned char>. Right now my
: code looks like this:
:
: FILE* f = fopen( "datafile", "rb" );
: enum { SIZE = 100 };
: vector<unsigned char> buf(SIZE);
: fread(&buf[0], 1, SIZE, f);
:
: The problem is that the vector's constructor initializes the buffer to
: all zeroes. I don't want it to initialize to all zeroes. It is
: pointless and a waste of time since I will just be reading in from the
: file overtop of it.
Believing that initializing the vector will slow things down is
probably a misconception. Writing memory takes very little processor
time, and will be handled in the cache. You won't even have a cache
flush before you overwrite the same memory anyway.
Slow <=> disc i/o >> memory i/o >> cache i/o >> processor

: So, does anyone know how I could eliminate the initialization of the
: vector (without switching to a raw array; I really want a vector here)?
: I tried to do some things with reserve(), but they didn't help.

If you need top performance for a large file, using platform-specific
ways to map the file into memory is likely to provide the best results
- as Ian suggested.
Why is a vector needed in the first place?
It is a very safe bet to say that you have much more to win from other
optimizations than what you will gain by skipping the vector init.

Regards,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Brainbench MVP for C++ <> http://www.brainbench.com
Apr 5 '06 #6
ss****@gmail.com wrote:
Hi all,

I want to read from a file into a vector<unsigned char>. Right now my
code looks like this:

FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);

The problem is that the vector's constructor initializes the buffer to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.
That's premature optimization. Have you actually timed your
code to see which part takes how much time? I bet you haven't,
because then you'd see that it's the disk I/O that takes
over 99% of the time -- trying to optimize away the vector
initialization is pointless.

Unless, of course, you read from a RAM disk of some sorts
or some other device that is way faster than the fastest HDDs
around.
So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector here)?
I tried to do some things with reserve(), but they didn't help.


Time your routine, identify the bottleneck. I'm fairly certain
you will notice that the call(s) to fread() take(s) your time.
Others have suggested means to speed that up already.

HTH,
- J.
Apr 9 '06 #7
ss****@gmail.com wrote:
Hi all,

I want to read from a file into a vector<unsigned char>. Right now my
code looks like this:

FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);

The problem is that the vector's constructor initializes the buffer to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.

So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector here)?
I tried to do some things with reserve(), but they didn't help.


What did you try with reserve()? Something like the following?

vector<unsigned char> buf;
buf.reserve(SIZE);
flockfile(f);
for(size_t i = 0; i < SIZE; ++i)
{
const int r = getc_unlocked(f);
if(r == EOF)
break;
buf.push_back(r);
}
funlockfile(f);

This avoids the vector initialization. Clearly there are other costs
however. You would need to test it (with optimization) to see wether it
is overall faster.

Apr 9 '06 #8
In message <28***************************@news.chello.pl>, Jacek
Dziedzic <jacek@no_spam.tygrys.no_spam.net> writes
ss****@gmail.com wrote:
Hi all,
I want to read from a file into a vector<unsigned char>. Right now
my
code looks like this:
FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);
The problem is that the vector's constructor initializes the buffer
to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.
That's premature optimization. Have you actually timed your
code to see which part takes how much time? I bet you haven't,
because then you'd see that it's the disk I/O that takes
over 99% of the time -- trying to optimize away the vector
initialization is pointless.

Unless, of course, you read from a RAM disk of some sorts
or some other device that is way faster than the fastest HDDs
around.


Or the OS kindly takes care of read-ahead caching of the file access for
you.

I have experienced exactly this problem, and determined by profiling
that, to my surprise, vector initialisation was indeed taking a large
fraction of the time. (Typical size of the data being read was something
like a megabyte.)
So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector here)?
I tried to do some things with reserve(), but they didn't help.

I gave up and wrote my own simple "lightweight vector" class - basically
just a pointer and size and capacity counter.
Time your routine, identify the bottleneck. I'm fairly certain
you will notice that the call(s) to fread() take(s) your time.
Others have suggested means to speed that up already.


--
Richard Herring
Apr 10 '06 #9
Victor Bazarov wrote:
ss****@gmail.com wrote:
Surely making a copy of an array is slower than a memset.

I want a faster solution, not a slower one.

Short of implementing my own byte_vector class, does anyone have a
solution to remove the unnecessary initialization of the vector?


Initialise it from the stream buffer directly, or from the extractor
iterator (like "istream_iterator" or something).


Yes, try something like this:

std::vector<unsigned char> buf;
std::ifstream strm("datafile", std::ios_base::binary);
if (!strm)
{
std::cerr << "cannot open file\n" << std::endl;
exit();
}
strm.unsetf(std::ios_base::skipws);
std::istream_iterator<unsigned char> isi(strm), isiEOF;
buf.assign(isi, isiEOF);
if (!strm.eof()) std::cerr << "read error\n" << std::endl;

--
Paul M. Dubuc
Apr 10 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Chris Thompson | last post by:
Hi I'm writing a p2p client for an existing protocol. I used a std::vector<char> as a buffer for messages read from the server. The message length is the first 4 bytes. The message code the...
8
by: Joseph Turian | last post by:
Some function requires a vector<const foo*> argument. How can I cast a vector<foo*> to vector<const foo*>? Thanks! Joseph
4
by: Bobrick | last post by:
Hi. I'm in the process of making a GUI for a function someone else wrote, and i've come across a type i'm unfamiliar with, namely "std::vector<unsigned char>". I need to get the contents of this...
6
by: Bobrick | last post by:
Hi. Thanks to everyone who replied to my last post, it turns out it wasn't the line where I was trying to treat the variable in question as an array which was the problem, but the line above. ...
3
by: timor.super | last post by:
Hi group, how to convert a string to a vector of unsigned char ? I used to iterate trough the string to set the vector, but I think this is not the best way to do this. I'm a beginner with the...
6
by: arnuld | last post by:
This works fine, I welcome any views/advices/coding-practices :) /* C++ Primer - 4/e * * Exercise 8.9 * STATEMENT: * write a program to store each line from a file into a *...
12
by: eiji.anonremail | last post by:
Hi all, I'm facing some uncertainty with const template arguments. Maybe someone could explain the general strategy. #include <vector> int main(int arc, char** argv) {
5
by: Man4ish | last post by:
Hi, i am coding one program by using string class but i want to replace it with const char* in order to enhance the performance. Sample code: vector <string> vec; char str =...
3
by: Michel Caillat | last post by:
Here is a small C++ puzzling problem which blocks me in my current developments and I will appreciate any idea or explanation. Consider the following C++ program : #include <stdint.h>...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.