469,903 Members | 1,369 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,903 developers. It's quick & easy.

Can I create a map<string, ofstream> object to hold ofstreams by key value?

I have a project which needs to open hundreds to thousands of files for
writing. The following is a simplified test program I wrote to see if I
can use a map<string, ofstream> object to keep the list of ofstreams. I
need to have them open simultaneously for writing -- I have millions of
rows of data to write to them so opening and closing all the time will
be unacceptable in efficiency.

The following program compiles but failed to run. Anyone knows what is
causing this?

Please, help greatly appreciated!

-km-

#include <map>
#include <iostream>
#include <fstream>
#include <vector>
#include <string>

using namespace std;

int main(int argc, char** argv)
{
map<string, ofstream> so;

ofstream ofs("c:\\a.txt", ios::out|ios::app);
so["A"]=ofs;

ofstream ofs1("c:\\b.txt", ios::out|ios::app);
so["B"]=ofs1;

so["A"]<<"Good input to A"<<endl;
so["B"]<<"Good input to B"<<endl;
so["A"].close();
so["B"].close();
return 0;

}

Jul 23 '05 #1
15 6520
On Sun, 19 Jun 2005 01:09:02 +0400, <ke*******@gmail.com> wrote:
I have a project which needs to open hundreds to thousands of files for
writing. The following is a simplified test program I wrote to see if I
can use a map<string, ofstream> object to keep the list of ofstreams. I
need to have them open simultaneously for writing -- I have millions of
rows of data to write to them so opening and closing all the time will
be unacceptable in efficiency.

The following program compiles but failed to run. Anyone knows what is
causing this?


Which compiler are you using?

Standard stream objects are noncopyable. I tried compiling your code with
MSVC 7.1 and g++ 3.3.3 - they won't compile it exactly for that reason.

--
Maxim Yegorushkin
Jul 23 '05 #2
* ke*******@gmail.com:
I have a project which needs to open hundreds to thousands of files for
writing.
There's a very good chance it doesn't. But since that is a general
programming/design problem, consider posting a description of your
problem and your intended solution in [comp.programming].

The following is a simplified test program I wrote to see if I
can use a map<string, ofstream> object to keep the list of ofstreams.
An element type for a standard collection must be copyable.

A stream object isn't.

But you can store pointers or preferably smart pointers to such objects.

need to have them open simultaneously for writing -- I have millions of
rows of data to write to them so opening and closing all the time will
be unacceptable in efficiency.
There's a very good chance you don't need to have them open. And anyway
both your C++ implementation and the OS are likely to limit the number of
simultanously open files to something much less than you describe. But since
that is a general programming/design problem, consider posting a description
of your problem and your intended solution in [comp.programming].
The following program compiles but failed to run. Anyone knows what is
causing this?


In general, try to inspect the error information available (but see above).

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Jul 23 '05 #3
I am using MSVC++ 6.0 and it compiled with 118 warnings but no error.
When run, the program immediately caused an access violation.

Thanks for your input - I will change the value type to pointer so to
make it copyable.

-km-

Maxim Yegorushkin wrote:
On Sun, 19 Jun 2005 01:09:02 +0400, <ke*******@gmail.com> wrote:
I have a project which needs to open hundreds to thousands of files for
writing. The following is a simplified test program I wrote to see if I
can use a map<string, ofstream> object to keep the list of ofstreams. I
need to have them open simultaneously for writing -- I have millions of
rows of data to write to them so opening and closing all the time will
be unacceptable in efficiency.

The following program compiles but failed to run. Anyone knows what is
causing this?


Which compiler are you using?

Standard stream objects are noncopyable. I tried compiling your code with
MSVC 7.1 and g++ 3.3.3 - they won't compile it exactly for that reason.

--
Maxim Yegorushkin


Jul 23 '05 #4


Alf P. Steinbach wrote:
* ke*******@gmail.com:
I have a project which needs to open hundreds to thousands of files for
writing.
There's a very good chance it doesn't. But since that is a general
programming/design problem, consider posting a description of your
problem and your intended solution in [comp.programming].


I have a file which is about 8GB in size, it is text formatted and with
a key value in the middle of each line that I can parse out. I need to
write this entire line to a file with filename containing this key
value. There are 3000 potential key values. There is no clear ordering
of the lines and the appearence of the key values is quite uniform
through the file.

I actually have to use OS-dependent file io functions because ifstream
only supports reading files up to 4G in size?

Any quick idea how I can do that faster?

Thanks for your help.

-km-
The following is a simplified test program I wrote to see if I
can use a map<string, ofstream> object to keep the list of ofstreams.


An element type for a standard collection must be copyable.

A stream object isn't.

But you can store pointers or preferably smart pointers to such objects.

need to have them open simultaneously for writing -- I have millions of
rows of data to write to them so opening and closing all the time will
be unacceptable in efficiency.


There's a very good chance you don't need to have them open. And anyway
both your C++ implementation and the OS are likely to limit the number of
simultanously open files to something much less than you describe. But since
that is a general programming/design problem, consider posting a description
of your problem and your intended solution in [comp.programming].
The following program compiles but failed to run. Anyone knows what is
causing this?


In general, try to inspect the error information available (but see above).

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?


Jul 23 '05 #5
* quant:
[excessive bottom-quoting]
Please limit quoting to relevant stuff, see the FAQ etc.

Corrected.
* quant: * Alf P. Steinbach:
* ke*******@gmail.com:
I have a project which needs to open hundreds to thousands of files for
writing.
There's a very good chance it doesn't. But since that is a general
programming/design problem, consider posting a description of your
problem and your intended solution in [comp.programming].


I have a file which is about 8GB in size, it is text formatted and with
a key value in the middle of each line that I can parse out. I need to
write this entire line to a file with filename containing this key
value. There are 3000 potential key values. There is no clear ordering
of the lines and the appearence of the key values is quite uniform
through the file.


Sorry, that's off-topic in this group, try [comp.programming].

I actually have to use OS-dependent file io functions because ifstream
only supports reading files up to 4G in size?
The standard mentions no such limit so it must be implementation defined.

Implementation concerns are generally off-topic in this group.

Try e.g. [comp.os.ms-windows.programmer.win32], or whatever is appropriate
for your C++ implementation and OS.

Any quick idea how I can do that faster?


Sorry, that's off-topic in this group, try [comp.programming].

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Jul 23 '05 #6
Ian
ke*******@gmail.com wrote:
I have a project which needs to open hundreds to thousands of files for
writing. The following is a simplified test program I wrote to see if I
can use a map<string, ofstream> object to keep the list of ofstreams. I
need to have them open simultaneously for writing -- I have millions of
rows of data to write to them so opening and closing all the time will
be unacceptable in efficiency.

How do you know that?

I don't know of any OS that will support thousands of open files per
user without tweaking and a significant memory overhead.

I'm surprised your code compiled, ofstream's base copy constructor is
private and map insertion has to copy the object.

Design rethink time!

Ian
Jul 23 '05 #7


Ian wrote:
ke*******@gmail.com wrote:
I have a project which needs to open hundreds to thousands of files for
writing. The following is a simplified test program I wrote to see if I
can use a map<string, ofstream> object to keep the list of ofstreams. I
need to have them open simultaneously for writing -- I have millions of
rows of data to write to them so opening and closing all the time will
be unacceptable in efficiency.
How do you know that?


I did try only keeping one ofstream and keep closing it and reopen to
write the new lines. Its speed is about 120MB/hour. My total file size
is 8GB so that translates into about 5 days? I am hoping for something
better than that:)
I don't know of any OS that will support thousands of open files per
user without tweaking and a significant memory overhead.
You're right.
I'm surprised your code compiled, ofstream's base copy constructor is
private and map insertion has to copy the object.

Design rethink time!

Ian


Jul 23 '05 #8
quant wrote:
I have a file which is about 8GB in size, it is text formatted and with
a key value in the middle of each line that I can parse out. I need to
write this entire line to a file with filename containing this key
value. There are 3000 potential key values. There is no clear ordering
of the lines and the appearence of the key values is quite uniform
through the file.


First thought: keep your map<key,ostream*> but use it as a cache, i.e.
(pseudocode!):

if map.size>something {
victim = map.oldest;
delete victim;
map.erase(victim);
}
if !map.contains(thisone) {
map[thisone] = new ofstream(filename);
}
map[thisone] << stuff;

There are various C++ mechanisms to let you have multiple containers for
the same data, so in this case a map lets you look them up and a list
(priority queue?) could help track the ones to eject. Whether this
helps depends on the distribution of the keys.

Second thought: read everything into memory, sort by key, write to one
file at a time. Obvious problem: you run out of memory. Another
question: what is the big-O complexity of (stably) sorting n items with
m distinct values, where m << n ? I think it is O(n log m). Does the
standard C++ library's sort algorithm get this right, or does it end up
doing O(n log n)?

Third thought: you read in a large number of records, but not more than
you can keep in memory without paging, sort, write to one file at a
time, and then read in the next large chunk.

I presume that you need to seek to the end of the file each time you
open it. I have a vague recollection that some platforms have a problem
with that, i.e. they are O(filesize) to seek to the end, rather than
O(1). If it looks like a problem, google for it.

Final thought: use a proper database, this is what they're for.

--Phil.
Jul 23 '05 #9
Ian
quant wrote:

I have a file which is about 8GB in size, it is text formatted and with
a key value in the middle of each line that I can parse out. I need to
write this entire line to a file with filename containing this key
value. There are 3000 potential key values. There is no clear ordering
of the lines and the appearence of the key values is quite uniform
through the file.
Try the simple first, take a small chunk of your file and play with it.
Try just opening and appending and see how long this would take for
the full file.
I actually have to use OS-dependent file io functions because ifstream
only supports reading files up to 4G in size?
Why not just split the file?
Any quick idea how I can do that faster?

Sort the data by key first.

Ian

Jul 23 '05 #10
quant wrote:
> need to have them open simultaneously for writing -- I have millions of
> rows of data to write to them so opening and closing all the time will
> be unacceptable in efficiency.

How do you know that?

I did try only keeping one ofstream and keep closing it and reopen to
write the new lines. Its speed is about 120MB/hour. My total file size
is 8GB so that translates into about 5 days? I am hoping for something
better than that:)


You can do caching. Use an array of opened files and some structure with the
files needed that have a pointer or index to the corresponding opened file.
When you need to write to one, check if it has an opened corresponding, if
not, open one and point to it, closing one if there is none free using the
Least Recently Used algorithm or something similar. I suggested this
approach several years ago in a similar situation and it worked well.

--
Salu2
Jul 23 '05 #11
I did try only keeping one ofstream and keep closing it and reopen to
write the new lines. Its speed is about 120MB/hour. My total file size
is 8GB so that translates into about 5 days? I am hoping for something
better than that:)


Don't use c++ iostreams if you want speed. Roll your own class that
wrappes the native file-system code. Don't use the C file API either,
since it will also slow things down a little. Probably your own file
classes will be noncopyable too, but then you can always store a
smart pointer like boost::shared_ptr or boost::intrusive_ptr inside
your map. Both work fine in STL containers.

Really man, I have to say it again: iostream is not the right lib if
you want max. raw IO speed.

And when doing your own file code: use read- and write-buffers - if
you call the OS for every single byte you read or write things will
get painfully slow.
Jul 23 '05 #12


Swampmonster wrote:
I did try only keeping one ofstream and keep closing it and reopen to
write the new lines. Its speed is about 120MB/hour. My total file size
is 8GB so that translates into about 5 days? I am hoping for something
better than that:)
Don't use c++ iostreams if you want speed. Roll your own class that
wrappes the native file-system code. Don't use the C file API either,
since it will also slow things down a little. Probably your own file
classes will be noncopyable too, but then you can always store a
smart pointer like boost::shared_ptr or boost::intrusive_ptr inside
your map. Both work fine in STL containers.


Yes I found that to be true -- I tried to use ifstream but later found
under Windows a ReadFile can be tens of times faster. Will try to do
what
you suggested.
Really man, I have to say it again: iostream is not the right lib if
you want max. raw IO speed.

And when doing your own file code: use read- and write-buffers - if
you call the OS for every single byte you read or write things will
get painfully slow.


Thanks for your advice.

-km-

Jul 23 '05 #13
"quant" <ke*******@gmail.com> wrote in message
news:11**********************@g44g2000cwa.googlegr oups.com...
I did try only keeping one ofstream and keep closing it and reopen to
write the new lines. Its speed is about 120MB/hour. My total file size
is 8GB so that translates into about 5 days? I am hoping for something
better than that:)


I suggest you sort the file by key, then make a sequential pass through the
sorted file opening one output file at a time.
Jul 23 '05 #14


Andrew Koenig wrote:
"quant" <ke*******@gmail.com> wrote in message
news:11**********************@g44g2000cwa.googlegr oups.com...
I did try only keeping one ofstream and keep closing it and reopen to
write the new lines. Its speed is about 120MB/hour. My total file size
is 8GB so that translates into about 5 days? I am hoping for something
better than that:)


I suggest you sort the file by key, then make a sequential pass through the
sorted file opening one output file at a time.


I don't have a tool which can sort a file of 8GB in size. However I did
follow the suggestion of an earlier post which has the same underlying
idea as yours. I created a large map<string, vector<string> > object to
contain all the contents I need to write out, read in an 250MB chunk a
time (process and inserted each line into the map), and then iterate
through the keys and used OS-dependent native file IO functions to save
time. The end result is, I can handle all the files in 15 minutes max,
compared to my first version which took an estimated 5 days I am pretty
satisfied:)

I found this group to be extremely helpful with many great minds. Will
post future questions as well as contribute if I can, one day.

-km-

Jul 23 '05 #15


quant schreef:
Alf P. Steinbach wrote:
* ke*******@gmail.com:
I have a project which needs to open hundreds to thousands of files for
writing.


There's a very good chance it doesn't. But since that is a general
programming/design problem, consider posting a description of your
problem and your intended solution in [comp.programming].


I have a file which is about 8GB in size, it is text formatted and with
a key value in the middle of each line that I can parse out. I need to
write this entire line to a file with filename containing this key
value. There are 3000 potential key values. There is no clear ordering
of the lines and the appearence of the key values is quite uniform
through the file.

I actually have to use OS-dependent file io functions because ifstream
only supports reading files up to 4G in size?


No, some implementations do restrict them to that size. However,
64-bits
implementations exist (even for 32-bits OS). Still, the fastest way
would
be to set up a machine with 12GB RAM, bin the values in memory, then
write out everything. 12GB is certainly possible in 64-bits systems
and they may actually be a lot cheaper than your time.

HTH,
Michiel Salters

Jul 23 '05 #16

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Marcelo Pinto | last post: by
1 post views Thread by sandwich_eater | last post: by
13 posts views Thread by liujiaping | last post: by
reply views Thread by subramanian100in | last post: by
reply views Thread by Salome Sato | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.