473,395 Members | 2,006 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

getline buffering

Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?

abir

Feb 19 '07 #1
8 3518
On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?
I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.

Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example: http://www.cplusplus.com/reference/i...pubsetbuf.html

--
Erik Wikström

Feb 19 '07 #2
On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
wrote:
On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?

I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.

Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example:http://www.cplusplus.com/reference/i...pubsetbuf.html

--
Erik Wikström
Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....

abir

Feb 19 '07 #3
toton wrote:
On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
wrote:
>On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
>>Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?
I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.

Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example:http://www.cplusplus.com/reference/i...pubsetbuf.html

--
Erik Wikström

Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....
Make sure you decouple stream I/O from stdio, i.e. do
std::ios::sync_with_stdio(false);

HTH,
- J/
Feb 19 '07 #4
"Jacek Dziedzic" <ja************************@gmail.comwrote in message
news:e3***************************@news.chello.pl. ..
toton wrote:
>On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
wrote:
>>On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:

Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?
I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.

Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example:http://www.cplusplus.com/reference/i...pubsetbuf.html

--
Erik Wikström

Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....

Make sure you decouple stream I/O from stdio, i.e. do
std::ios::sync_with_stdio(false);
Normally good advice, but unnecessary with VC++.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Feb 19 '07 #5
On Feb 19, 8:49 pm, "P.J. Plauger" <p...@dinkumware.comwrote:
"Jacek Dziedzic" <jacek.dziedzic.n.o.s.p....@gmail.comwrote in message

news:e3***************************@news.chello.pl. ..
toton wrote:
On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
wrote:
On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
>>Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?
I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.
>Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example:http://www.cplusplus.com/reference/i...pubsetbuf.html
>--
Erik Wikström
Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....
Make sure you decouple stream I/O from stdio, i.e. do
std::ios::sync_with_stdio(false);

Normally good advice, but unnecessary with VC++.

P.J. Plauger
Dinkumware, Ltd.http://www.dinkumware.com
I got the problem. It has nothing to do with Qt or other
libraries ....
I was using a tellg() to get the current position. Now my question is
why tellg is such costly ? Won't it just return the current strem
position ?
To explain,
{
boost::progress_timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
///int pos = in.tellg();
std::getline(in,line);
}
}
This code takes 0.58 sec in my computer while if I uncomment the line
in.tellg(), it takes 120.8 sec !

Feb 20 '07 #6
toton wrote:
On Feb 19, 8:49 pm, "P.J. Plauger" <p...@dinkumware.comwrote:
>"Jacek Dziedzic" <jacek.dziedzic.n.o.s.p....@gmail.comwrote in message

news:e3***************************@news.chello.pl ...
>>toton wrote:
On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
wrote:
On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
>Hi,
> I am reading some large text files and parsing it. typical file size
>I am using is 3 MB. It takes around 20 sec just to use std::getline (I
>need to treat newlines properly ) for whole file in debug , and 8 sec
>while optimization on.
> It is for Visual Studio 7.1 and its std library. While vim opens it
>in a fraction of sec.
> So, is it that getline is reading the file line by line, instead
>reading a chunk at a time in its internal buffer? is there any
>function to set how much to read from the stream internally ?
> I am not very comfortable with read and readsome , to load a large
>buffer, as it changes the file position. While I need the visible file
>position to be the position I am actually, while "internally" it
>should read some more , may be like 1MB chunk ... ?
I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.
Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example:http://www.cplusplus.com/reference/i...pubsetbuf.html
--
Erik Wikström
Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....
Make sure you decouple stream I/O from stdio, i.e. do
std::ios::sync_with_stdio(false);
Normally good advice, but unnecessary with VC++.

P.J. Plauger
Dinkumware, Ltd.http://www.dinkumware.com

I got the problem. It has nothing to do with Qt or other
libraries ....
I was using a tellg() to get the current position. Now my question is
why tellg is such costly ? Won't it just return the current strem
position ?
To explain,
{
boost::progress_timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
///int pos = in.tellg();
std::getline(in,line);
}
}
This code takes 0.58 sec in my computer while if I uncomment the line
in.tellg(), it takes 120.8 sec !
Could it be that you have opened the file in text mode and the tellg()
seeks to beginning always and rereads characters (counting cr+lf pairs
as one ). Try switching to binary mode and handle cr+lf yourself.

ismo
Feb 20 '07 #7
On Feb 20, 11:10 am, Ismo Salonen <nob...@another.invalidwrote:
toton wrote:
On Feb 19, 8:49 pm, "P.J. Plauger" <p...@dinkumware.comwrote:
"Jacek Dziedzic" <jacek.dziedzic.n.o.s.p....@gmail.comwrote in message
>news:e3***************************@news.chello.pl ...
>toton wrote:
On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
wrote:
On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?
I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.
Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example:http://www.cplusplus.com/reference/i...pubsetbuf.html
--
Erik Wikström
Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....
Make sure you decouple stream I/O from stdio, i.e. do
std::ios::sync_with_stdio(false);
Normally good advice, but unnecessary with VC++.
P.J. Plauger
Dinkumware, Ltd.http://www.dinkumware.com
I got the problem. It has nothing to do with Qt or other
libraries ....
I was using a tellg() to get the current position. Now my question is
why tellg is such costly ? Won't it just return the current strem
position ?
To explain,
{
boost::progress_timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
///int pos = in.tellg();
std::getline(in,line);
}
}
This code takes 0.58 sec in my computer while if I uncomment the line
in.tellg(), it takes 120.8 sec !

Could it be that you have opened the file in text mode and the tellg()
seeks to beginning always and rereads characters (counting cr+lf pairs
as one ). Try switching to binary mode and handle cr+lf yourself.

ismo
The whole purpose of using getline is that only. I am not sure why
tellg have to behave like that in text mode , it is stored one !
Tested the same with gcc .The program in mingw is not giving any big
performance
difference.
here is the program
#include <fstream>
#include <iostream>
#include <ctime>
int main(){
{
//boost::progress_timer t;
time_t start,end;
time(&start);
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in,line);
}
time(&end);
std::cout<<difftime(end,start);
}
}
With & without comment on the line , it takes 2 sec & 3 sec
respectively (without -o2 flag ) It looks fine to me ...
Even the visual studio std code looks quite simple one ....
anyone else has tested it with a big file (4-8 MB )and found a huge
difference ?

thanks
abir

Feb 20 '07 #8
On Feb 20, 7:45 am, "toton" <abirba...@gmail.comwrote:
On Feb 20, 11:10 am, Ismo Salonen <nob...@another.invalidwrote:
toton wrote:
On Feb 19, 8:49 pm, "P.J. Plauger" <p...@dinkumware.comwrote:
>"Jacek Dziedzic" <jacek.dziedzic.n.o.s.p....@gmail.comwrote in message
>>news:e3***************************@news.chello.p l...
>>toton wrote:
>>>On Feb 19, 5:44 pm, "Erik Wikström" <eri...@student.chalmers.se>
>>>wrote:
>>>>On Feb 19, 12:44 pm, "toton" <abirba...@gmail.comwrote:
>>>>>Hi,
>>>>> I am reading some large text files and parsing it. typical file size
>>>>>I am using is 3 MB. It takes around 20 sec just to use std::getline (I
>>>>>need to treat newlines properly ) for whole file in debug , and 8 sec
>>>>>while optimization on.
>>>>> It is for Visual Studio 7.1 and its std library. While vim opens it
>>>>>in a fraction of sec.
>>>>> So, is it that getline is reading the file line by line, instead
>>>>>reading a chunk at a time in its internal buffer? is there any
>>>>>function to set how much to read from the stream internally ?
>>>>> I am not very comfortable with read and readsome , to load a large
>>>>>buffer, as it changes the file position. While I need the visible file
>>>>>position to be the position I am actually, while "internally" it
>>>>>should read some more , may be like 1MB chunk ... ?
>>>>I'm not sure, but I think it's the other way around, Vim does notread
>>>>the whole file at once so it's faster.
>>>>Each ifstream has a buffer associated with it, you can get a pointer
>>>>to it with the rdbuf()-method and you can specify an array to useas
>>>>buffer with the pubsetbuf()-method. See the following link for a short
>>>>example:http://www.cplusplus.com/reference/i...pubsetbuf.html
>>>>--
>>>>Erik Wikström
>>>Hi,
>>> I had checked it in a separate console project (multi threaded )it
>>>is running perfectly, and reads within .8 sec. However the same code
>>>takes 12 sec when running inside my Qt app.
>>>I fear Qt lib is interacting with c++ runtime is some way to causethe
>>>problem ....
>>>May be I need to build the Qt lib a fresh to check what is wrong.
>>>Thanks for answering the question ....
>> Make sure you decouple stream I/O from stdio, i.e. do
>>std::ios::sync_with_stdio(false);
>Normally good advice, but unnecessary with VC++.
>P.J. Plauger
>Dinkumware, Ltd.http://www.dinkumware.com
I got the problem. It has nothing to do with Qt or other
libraries ....
I was using a tellg() to get the current position. Now my question is
why tellg is such costly ? Won't it just return the current strem
position ?
To explain,
{
boost::progress_timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
///int pos = in.tellg();
std::getline(in,line);
}
}
This code takes 0.58 sec in my computer while if I uncomment the line
in.tellg(), it takes 120.8 sec !
Could it be that you have opened the file in text mode and the tellg()
seeks to beginning always and rereads characters (counting cr+lf pairs
as one ). Try switching to binary mode and handle cr+lf yourself.
ismo

The whole purpose of using getline is that only. I am not sure why
tellg have to behave like that in text mode , it is stored one !
Tested the same with gcc .The program in mingw is not giving any big
performance
difference.
here is the program
#include <fstream>
#include <iostream>
#include <ctime>
int main(){
{
//boost::progress_timer t;
time_t start,end;
time(&start);
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in,line);
}
time(&end);
std::cout<<difftime(end,start);
}}

With & without comment on the line , it takes 2 sec & 3 sec
respectively (without -o2 flag ) It looks fine to me ...
Even the visual studio std code looks quite simple one ....
anyone else has tested it with a big file (4-8 MB )and found a huge
difference ?
On a 22.5MB file I get one second running time without tellg, 4
seconds if the file is opened in text mode and 2 seconds if opened in
binary mode. Seems quite reasonable to me.

--
Erik Wikström

Feb 20 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Vikram | last post by:
Hi, I don't remember if it happened previously, but nowadays I'm having problem with using cin.getline function and cin>> function simultaneously. I have Visual Studio 6. If I use cin.getline...
5
by: vknid | last post by:
Hello, I have a question. Its probably a very newbish question so please be nice hehe. =D I have been reading through C++ Programming Fundamentals, and have come a crossed an example program...
1
by: ma740988 | last post by:
Consider: ifstrem MyFile("extractMe.txt"); string Str; getline(MyFile, Str); getline above extracts the contents of MyFile and place into the string object. Deduced using FROM/TO logic I...
10
by: Skywise | last post by:
I keep getting the following error upon compiling: c:\c++ files\programs\stellardebug\unitcode.h(677) : error C2664: 'class istream &__thiscall istream::getline(char *,int,char)' : cannot convert...
14
by: KL | last post by:
I am so lost. I am in a college course for C++, and first off let me state I am not asking for anyone to do my assignment, just clarification on what I seem to not be able to comprehend. I have a...
18
by: Amadeus W. M. | last post by:
I'm trying to read a whole file as a single string, using the getline() function, as in the example below. I can't tell what I'm doing wrong. Tried g++ 3.2, 3.4 and 4.0. Thanks! #include...
7
by: Mathias Herrmann | last post by:
Hi. I have the following problem: Using popen() to execute a program and read its stdout works usually fine. Now I try to do this with a program called xsupplicant (maybe one knows), but I dont...
2
by: Jason | last post by:
I have created a 2d isometric game map using tiles and now I'm trying to move around my map..when i go near the edge of the map I want to redraw the map to show new parts of the map however the...
3
by: ssoffline | last post by:
hi i have an app in which i can drop objects onto a form and move them, it consists of graphics (lines), i am using double buffering to avoid filckering in the parent control which is a panel,but...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.