468,766 Members | 1,297 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,766 developers. It's quick & easy.

iostream and memory-mapped file

Hi there,
I am seeking a fastest way to load a BIG string and parse it as a
given format. I have a extern function which return a (char *)string in
BIG size. Now, I am going to parse it with a iterator as following

char *str = return_a_big_size_str();
istringstream ss(string(str), istringstream::in);
istreambuf_iterator<char> bit(ss), eit;
parsing(bit, eit);

I found the code shown above is so inefficient because of the big size
of str.

BTW, I also save the whole string to a file, says str.txt, and then
load the file with ifstream

std::ifstream input("str.txt") ;
std::istreambuf_iterator bit(input), eit;
parsing(bit, eit);

I can't believe that the later program is faster than the previous one.
Anyway, I think memory-mapped IO maybe a better choice. However, I
have no idea how memory-mapped file associated with ifstream

Feb 21 '06 #1
3 7233
it's slow because you are making a lot of copies.

is your parser templatized to use any kind of char iterator? then it
would be as easy as parsing(str, str+len). no copying required.

Feb 21 '06 #2
TB
wa***@wakun.com skrev:
Hi there,
I am seeking a fastest way to load a BIG string and parse it as a
given format. I have a extern function which return a (char *)string in
BIG size. Now, I am going to parse it with a iterator as following

IO is slow, accept it.
char *str = return_a_big_size_str();
istringstream ss(string(str), istringstream::in);
istreambuf_iterator<char> bit(ss), eit;
parsing(bit, eit);

I found the code shown above is so inefficient because of the big size
of str.

You could always write your own iterator:

#include <iterator>
#include <stdexcept>

class cstringiterator
: public std::iterator<std::input_iterator_tag,char> {

private:
char const * d_cstring;

public:
cstringiterator(char const * cstring = 0)
: d_cstring(cstring) { }
cstringiterator(cstringiterator const & csi)
: d_cstring(csi.d_cstring) { }

value_type operator*() throw (std::runtime_error) {
if(!d_cstring) throw std::runtime_error("Access Denied");
return *d_cstring;
}
cstringiterator & operator++() throw () {
if(d_cstring) {
if(!*++d_cstring) {
d_cstring = 0;
}
}
return *this;
}
cstringiterator operator++(int) throw () {
cstringiterator c(d_cstring);
++*this;
return c;
}
bool operator==(cstringiterator const & csi) const throw () {
return d_cstring == csi.d_cstring;
}
bool operator!=(cstringiterator const & csi) const throw () {
return d_cstring != csi.d_cstring;
}
};

#include <ostream>
#include <algorithm>

int main(int argc, char* argv[])
{
char const * c = "apa";
std::copy(cstringiterator(c),cstringiterator(),
std::ostream_iterator<char>(std::cout));
return 0;
}
BTW, I also save the whole string to a file, says str.txt, and then
load the file with ifstream

std::ifstream input("str.txt") ;
std::istreambuf_iterator bit(input), eit;
parsing(bit, eit);
Use an iterator that utilizes internal buffers, and only reads ahead
when called for; overwriting old buffers and allocates new when needed,
unless you actually must have complete access to the entire string at
any time.

I can't believe that the later program is faster than the previous one.
Anyway, I think memory-mapped IO maybe a better choice. However, I
have no idea how memory-mapped file associated with ifstream


Memory mapping a file is rather platform specific with its own set of
native api calls. Derive a class from std::basic_filebuf that neatly
handles it all.

--
TB @ SWEDEN
Feb 21 '06 #3
wa***@wakun.com wrote:
char *str = return_a_big_size_str();
istringstream ss(string(str), istringstream::in);
The above line create at least two copies of the string which are
all around at the same time. This is likely to cause swapping on your
system (at least if the strings are really rather large). This is an
tremendous performance hit.
istreambuf_iterator<char> bit(ss), eit;
parsing(bit, eit);


Hold it! You are parsing your string using stream *buffer* iterators,
i.e. you are not taking advantage of the formatting facilities of
streams at all? Why don't you simply pass pointers as the iterators
to the 'parsing()' function (which, of course, should be function
template). Assuming, however, that 'parsing()' is not a function
template, you still have the option to create a suitable stream buffer
which is used just for the situation described:

struct membuf:
std::streambuf
{
membuf(char* str) { this->setg(str, str, str + strlen(str)); }
};
membuf buffer(str);
std::istreambuf_iterator<char> bit(&buffer), eit;
// ...
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence
Feb 23 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

11 posts views Thread by Charles L | last post: by
17 posts views Thread by ~Gee | last post: by
1 post views Thread by Vijay | last post: by
10 posts views Thread by Dan Elliott | last post: by
4 posts views Thread by Someonekicked | last post: by
6 posts views Thread by thangamani.vaiyapuri | last post: by
4 posts views Thread by marko.suonpera | last post: by
1 post views Thread by lars.uffmann | last post: by
19 posts views Thread by Robert Kochem | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by Marin | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.