By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,227 Members | 1,290 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,227 IT Pros & Developers. It's quick & easy.

getline buffering

P: n/a
Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?

abir

Feb 19 '07 #1
Share this Question
Share on Google+
8 Replies


P: n/a
On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?
I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.

Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example: http://www.cplusplus.com/reference/i...pubsetbuf.html

--
Erik Wikström

Feb 19 '07 #2

P: n/a
On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
wrote:
On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?

I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.

Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example:http://www.cplusplus.com/reference/i...pubsetbuf.html

--
Erik Wikström
Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....

abir

Feb 19 '07 #3

P: n/a
toton wrote:
On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
wrote:
>On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
>>Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?
I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.

Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example:http://www.cplusplus.com/reference/i...pubsetbuf.html

--
Erik Wikström

Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....
Make sure you decouple stream I/O from stdio, i.e. do
std::ios::sync_with_stdio(false);

HTH,
- J/
Feb 19 '07 #4

P: n/a
"Jacek Dziedzic" <ja************************@gmail.comwrote in message
news:e3***************************@news.chello.pl. ..
toton wrote:
>On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
wrote:
>>On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:

Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?
I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.

Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example:http://www.cplusplus.com/reference/i...pubsetbuf.html

--
Erik Wikström

Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....

Make sure you decouple stream I/O from stdio, i.e. do
std::ios::sync_with_stdio(false);
Normally good advice, but unnecessary with VC++.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Feb 19 '07 #5

P: n/a
On Feb 19, 8:49 pm, "P.J. Plauger" <p...@dinkumware.comwrote:
"Jacek Dziedzic" <jacek.dziedzic.n.o.s.p....@gmail.comwrote in message

news:e3***************************@news.chello.pl. ..
toton wrote:
On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
wrote:
On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
>>Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?
I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.
>Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example:http://www.cplusplus.com/reference/i...pubsetbuf.html
>--
Erik Wikström
Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....
Make sure you decouple stream I/O from stdio, i.e. do
std::ios::sync_with_stdio(false);

Normally good advice, but unnecessary with VC++.

P.J. Plauger
Dinkumware, Ltd.http://www.dinkumware.com
I got the problem. It has nothing to do with Qt or other
libraries ....
I was using a tellg() to get the current position. Now my question is
why tellg is such costly ? Won't it just return the current strem
position ?
To explain,
{
boost::progress_timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
///int pos = in.tellg();
std::getline(in,line);
}
}
This code takes 0.58 sec in my computer while if I uncomment the line
in.tellg(), it takes 120.8 sec !

Feb 20 '07 #6

P: n/a
toton wrote:
On Feb 19, 8:49 pm, "P.J. Plauger" <p...@dinkumware.comwrote:
>"Jacek Dziedzic" <jacek.dziedzic.n.o.s.p....@gmail.comwrote in message

news:e3***************************@news.chello.pl ...
>>toton wrote:
On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
wrote:
On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
>Hi,
> I am reading some large text files and parsing it. typical file size
>I am using is 3 MB. It takes around 20 sec just to use std::getline (I
>need to treat newlines properly ) for whole file in debug , and 8 sec
>while optimization on.
> It is for Visual Studio 7.1 and its std library. While vim opens it
>in a fraction of sec.
> So, is it that getline is reading the file line by line, instead
>reading a chunk at a time in its internal buffer? is there any
>function to set how much to read from the stream internally ?
> I am not very comfortable with read and readsome , to load a large
>buffer, as it changes the file position. While I need the visible file
>position to be the position I am actually, while "internally" it
>should read some more , may be like 1MB chunk ... ?
I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.
Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example:http://www.cplusplus.com/reference/i...pubsetbuf.html
--
Erik Wikström
Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....
Make sure you decouple stream I/O from stdio, i.e. do
std::ios::sync_with_stdio(false);
Normally good advice, but unnecessary with VC++.

P.J. Plauger
Dinkumware, Ltd.http://www.dinkumware.com

I got the problem. It has nothing to do with Qt or other
libraries ....
I was using a tellg() to get the current position. Now my question is
why tellg is such costly ? Won't it just return the current strem
position ?
To explain,
{
boost::progress_timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
///int pos = in.tellg();
std::getline(in,line);
}
}
This code takes 0.58 sec in my computer while if I uncomment the line
in.tellg(), it takes 120.8 sec !
Could it be that you have opened the file in text mode and the tellg()
seeks to beginning always and rereads characters (counting cr+lf pairs
as one ). Try switching to binary mode and handle cr+lf yourself.

ismo
Feb 20 '07 #7

P: n/a
On Feb 20, 11:10 am, Ismo Salonen <nob...@another.invalidwrote:
toton wrote:
On Feb 19, 8:49 pm, "P.J. Plauger" <p...@dinkumware.comwrote:
"Jacek Dziedzic" <jacek.dziedzic.n.o.s.p....@gmail.comwrote in message
>news:e3***************************@news.chello.pl ...
>toton wrote:
On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
wrote:
On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?
I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.
Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example:http://www.cplusplus.com/reference/i...pubsetbuf.html
--
Erik Wikström
Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....
Make sure you decouple stream I/O from stdio, i.e. do
std::ios::sync_with_stdio(false);
Normally good advice, but unnecessary with VC++.
P.J. Plauger
Dinkumware, Ltd.http://www.dinkumware.com
I got the problem. It has nothing to do with Qt or other
libraries ....
I was using a tellg() to get the current position. Now my question is
why tellg is such costly ? Won't it just return the current strem
position ?
To explain,
{
boost::progress_timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
///int pos = in.tellg();
std::getline(in,line);
}
}
This code takes 0.58 sec in my computer while if I uncomment the line
in.tellg(), it takes 120.8 sec !

Could it be that you have opened the file in text mode and the tellg()
seeks to beginning always and rereads characters (counting cr+lf pairs
as one ). Try switching to binary mode and handle cr+lf yourself.

ismo
The whole purpose of using getline is that only. I am not sure why
tellg have to behave like that in text mode , it is stored one !
Tested the same with gcc .The program in mingw is not giving any big
performance
difference.
here is the program
#include <fstream>
#include <iostream>
#include <ctime>
int main(){
{
//boost::progress_timer t;
time_t start,end;
time(&start);
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in,line);
}
time(&end);
std::cout<<difftime(end,start);
}
}
With & without comment on the line , it takes 2 sec & 3 sec
respectively (without -o2 flag ) It looks fine to me ...
Even the visual studio std code looks quite simple one ....
anyone else has tested it with a big file (4-8 MB )and found a huge
difference ?

thanks
abir

Feb 20 '07 #8

P: n/a
On Feb 20, 7:45 am, "toton" <abirba...@gmail.comwrote:
On Feb 20, 11:10 am, Ismo Salonen <nob...@another.invalidwrote:
toton wrote:
On Feb 19, 8:49 pm, "P.J. Plauger" <p...@dinkumware.comwrote:
>"Jacek Dziedzic" <jacek.dziedzic.n.o.s.p....@gmail.comwrote in message
>>news:e3***************************@news.chello.p l...
>>toton wrote:
>>>On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
>>>wrote:
>>>>On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
>>>>>Hi,
>>>>> I am reading some large text files and parsing it. typical file size
>>>>>I am using is 3 MB. It takes around 20 sec just to use std::getline (I
>>>>>need to treat newlines properly ) for whole file in debug , and 8 sec
>>>>>while optimization on.
>>>>> It is for Visual Studio 7.1 and its std library. While vim opens it
>>>>>in a fraction of sec.
>>>>> So, is it that getline is reading the file line by line, instead
>>>>>reading a chunk at a time in its internal buffer? is there any
>>>>>function to set how much to read from the stream internally ?
>>>>> I am not very comfortable with read and readsome , to load a large
>>>>>buffer, as it changes the file position. While I need the visible file
>>>>>position to be the position I am actually, while "internally" it
>>>>>should read some more , may be like 1MB chunk ... ?
>>>>I'm not sure, but I think it's the other way around, Vim does notread
>>>>the whole file at once so it's faster.
>>>>Each ifstream has a buffer associated with it, you can get a pointer
>>>>to it with the rdbuf()-method and you can specify an array to useas
>>>>buffer with the pubsetbuf()-method. See the following link for a short
>>>>example:http://www.cplusplus.com/reference/i...pubsetbuf.html
>>>>--
>>>>Erik Wikström
>>>Hi,
>>> I had checked it in a separate console project (multi threaded )it
>>>is running perfectly, and reads within .8 sec. However the same code
>>>takes 12 sec when running inside my Qt app.
>>>I fear Qt lib is interacting with c++ runtime is some way to causethe
>>>problem ....
>>>May be I need to build the Qt lib a fresh to check what is wrong.
>>>Thanks for answering the question ....
>> Make sure you decouple stream I/O from stdio, i.e. do
>>std::ios::sync_with_stdio(false);
>Normally good advice, but unnecessary with VC++.
>P.J. Plauger
>Dinkumware, Ltd.http://www.dinkumware.com
I got the problem. It has nothing to do with Qt or other
libraries ....
I was using a tellg() to get the current position. Now my question is
why tellg is such costly ? Won't it just return the current strem
position ?
To explain,
{
boost::progress_timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
///int pos = in.tellg();
std::getline(in,line);
}
}
This code takes 0.58 sec in my computer while if I uncomment the line
in.tellg(), it takes 120.8 sec !
Could it be that you have opened the file in text mode and the tellg()
seeks to beginning always and rereads characters (counting cr+lf pairs
as one ). Try switching to binary mode and handle cr+lf yourself.
ismo

The whole purpose of using getline is that only. I am not sure why
tellg have to behave like that in text mode , it is stored one !
Tested the same with gcc .The program in mingw is not giving any big
performance
difference.
here is the program
#include <fstream>
#include <iostream>
#include <ctime>
int main(){
{
//boost::progress_timer t;
time_t start,end;
time(&start);
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in,line);
}
time(&end);
std::cout<<difftime(end,start);
}}

With & without comment on the line , it takes 2 sec & 3 sec
respectively (without -o2 flag ) It looks fine to me ...
Even the visual studio std code looks quite simple one ....
anyone else has tested it with a big file (4-8 MB )and found a huge
difference ?
On a 22.5MB file I get one second running time without tellg, 4
seconds if the file is opened in text mode and 2 seconds if opened in
binary mode. Seems quite reasonable to me.

--
Erik Wikström

Feb 20 '07 #9

This discussion thread is closed

Replies have been disabled for this discussion.