473,793 Members | 2,742 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

complexity for tellg()

Hi,
I am reading a big file , and need to have a flag for current file
position so that I can store the positions for later direct access.
However it looks tellg is a very costly function ! But it's code says
it should just return the current buffer position , thus should be a
very low cost function.
To explain,
{
boost::progress _timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in ,line);
}
}
This code takes 0.58 sec in my computer, while if I uncomment the line
in.tellg() , it takes 120.8 sec (varies a little )

can anyone say the reason & the possible workout ?
I amusing MS Visual Studio 7.1 and the std library provided by visual
studio 7.1

Feb 20 '07 #1
12 4797
* toton:
Hi,
I am reading a big file , and need to have a flag for current file
position so that I can store the positions for later direct access.
However it looks tellg is a very costly function ! But it's code says
it should just return the current buffer position , thus should be a
very low cost function.
To explain,
{
boost::progress _timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in ,line);
}
}
This code takes 0.58 sec in my computer, while if I uncomment the line
in.tellg() , it takes 120.8 sec (varies a little )

can anyone say the reason & the possible workout ?
I amusing MS Visual Studio 7.1 and the std library provided by visual
studio 7.1
Most likely the cause is conversion of CRLF to LF, which you've
specified by (1) opening the file in text mode and (2) compiling with a
Windows compiler.

One cure could then be to open the file in binary mode, and handle
newlines as appropriate (or not).

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Feb 20 '07 #2
toton wrote:
Hi,
I am reading a big file , and need to have a flag for current file
position so that I can store the positions for later direct access.
However it looks tellg is a very costly function ! But it's code says
it should just return the current buffer position , thus should be a
very low cost function.
To explain,
{
boost::progress _timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in ,line);
}
}
This code takes 0.58 sec in my computer, while if I uncomment the line
in.tellg() , it takes 120.8 sec (varies a little )

can anyone say the reason & the possible workout ?
I amusing MS Visual Studio 7.1 and the std library provided by visual
studio 7.1
The reason is that tellg performs a seek to the current position. This
flushes the input buffer so dramatically slowing down your program.

Looks as through the defintion is streambuf (which is used by all
streams) is such that the only way to find the current position is to
perform a seek to the current position.

john
Feb 20 '07 #3
>
Looks as through the defintion is streambuf (which is used by all
streams) is such that the only way to find the current position is to
perform a seek to the current position.
Let me try that again

Looks as though the definition of streambuf (which is used by all
streams) is such that the only way to find the current position is to
perform a seek to the current position.

john
Feb 20 '07 #4
Of course you can approach the problem computing the position yourself, if
you know the size of the input read.
Not elegant, but it works for simple cases...

std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
size_t pos = 0;
std::string line;
while(in){
// int pos = in.tellg();
std::getline(in ,line);
pos += line.length() + 2; // account for line terminator...
}

Bye Carlo

"toton" <ab*******@gmai l.comha scritto nel messaggio
news:11******** **************@ q2g2000cwa.goog legroups.com...
Hi,
I am reading a big file , and need to have a flag for current file
position so that I can store the positions for later direct access.
However it looks tellg is a very costly function ! But it's code says
it should just return the current buffer position , thus should be a
very low cost function.
To explain,
{
boost::progress _timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in ,line);
}
}
This code takes 0.58 sec in my computer, while if I uncomment the line
in.tellg() , it takes 120.8 sec (varies a little )

can anyone say the reason & the possible workout ?
I amusing MS Visual Studio 7.1 and the std library provided by visual
studio 7.1

Feb 20 '07 #5
John Harrison wrote:
>>
Looks as through the defintion is streambuf (which is used by all
streams) is such that the only way to find the current position is to
perform a seek to the current position.

Let me try that again

Looks as though the definition of streambuf (which is used by all
streams) is such that the only way to find the current position is to
perform a seek to the current position.

john
Let me really try this again, I shouldn't speculate on things I have no
real knowledge of.

I would imagine that the *likely* reason is that calling tellg in the
particular circumstances you are is causing the input buffer to flush.
Certainly the slow down you are observing would be consistent with that.

However the only way to know for sure would be a careful examination of
the library code, or use of a debugger to step into the library code.

john
Feb 20 '07 #6
On Feb 20, 11:46 am, "Alf P. Steinbach" <a...@start.now rote:
* toton:
Hi,
I am reading a big file , and need to have a flag for current file
position so that I can store the positions for later direct access.
However it looks tellg is a very costly function ! But it's code says
it should just return the current buffer position , thus should be a
very low cost function.
To explain,
{
boost::progress _timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in ,line);
}
}
This code takes 0.58 sec in my computer, while if I uncomment the line
in.tellg() , it takes 120.8 sec (varies a little )
can anyone say the reason & the possible workout ?
I amusing MS Visual Studio 7.1 and the std library provided by visual
studio 7.1

Most likely the cause is conversion of CRLF to LF, which you've
specified by (1) opening the file in text mode and (2) compiling with a
Windows compiler.

One cure could then be to open the file in binary mode, and handle
newlines as appropriate (or not).

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
There are enough bad things related to new line ...
seekg and tellg doesn't match when newline char is \n , and file is
opened in text mode.
For the unix file,
std::string line;
while(in){
int pos = in.tellg();
std::getline(in ,line);
std::cout<<pos< <" "<<line<<std::e ndl;

if(line==".PEN_ DOWN"){
in.seekg(pos);
break;
}
}
std::getline(in ,line);///This doesn't print .PEN_DOWN !
std::cout<<line <<std::endl;
Now if I open it in binary mode, Then this problem is solved.
But it creates another set of problems,
for unix file now it is fine, but for windows file \r is attached at
the end of line, as newline char is \n. So I need to remove \r from
the line if it is present.

I wonder, what will getline will return in case of a mac file where
newline terminator is \r only. Will it return the total file as single
line ?
Is there any std api support to take care of all these things, and yet
to make seekg & tellg consistent ?

Thanks
abir

Feb 20 '07 #7
"toton" <ab*******@gmai l.comwrote in message
news:11******** **************@ h3g2000cwc.goog legroups.com...
On Feb 20, 11:46 am, "Alf P. Steinbach" <a...@start.now rote:
>* toton:
Hi,
I am reading a big file , and need to have a flag for current file
position so that I can store the positions for later direct access.
However it looks tellg is a very costly function ! But it's code says
it should just return the current buffer position , thus should be a
very low cost function.
To explain,
{
boost::progress _timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in ,line);
}
}
This code takes 0.58 sec in my computer, while if I uncomment the line
in.tellg() , it takes 120.8 sec (varies a little )
can anyone say the reason & the possible workout ?
I amusing MS Visual Studio 7.1 and the std library provided by visual
studio 7.1

Most likely the cause is conversion of CRLF to LF, which you've
specified by (1) opening the file in text mode and (2) compiling with a
Windows compiler.

One cure could then be to open the file in binary mode, and handle
newlines as appropriate (or not).

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

There are enough bad things related to new line ...
seekg and tellg doesn't match when newline char is \n , and file is
opened in text mode.
That shouldn't be, if you're just using seekg to return to a place
earlier memorized by tellg.
For the unix file,
std::string line;
while(in){
int pos = in.tellg();
std::getline(in ,line);
std::cout<<pos< <" "<<line<<std::e ndl;

if(line==".PEN_ DOWN"){
in.seekg(pos);
break;
}
}
std::getline(in ,line);///This doesn't print .PEN_DOWN !
std::cout<<line <<std::endl;
Now if I open it in binary mode, Then this problem is solved.
But it creates another set of problems,
for unix file now it is fine, but for windows file \r is attached at
the end of line, as newline char is \n. So I need to remove \r from
the line if it is present.
If you wrote the file in binary mode, the \r characters wouldn't
be appended in the first place. It is important that you read and
write consistently, at least if you don't want to deal with local
conventions for reading and writing text files.
I wonder, what will getline will return in case of a mac file where
newline terminator is \r only. Will it return the total file as single
line ?
If you write in text mode and read in binary mode, that could happen,
yes.
Is there any std api support to take care of all these things, and yet
to make seekg & tellg consistent ?
Yes, it's called the Standard C++ library, if you use it right.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Feb 20 '07 #8
toton wrote:
>
There are enough bad things related to new line ...
seekg and tellg doesn't match when newline char is \n , and file is
opened in text mode.
Sure it does. See below.
For the unix file,
std::string line;
while(in){
int pos = in.tellg();
std::getline(in ,line);
std::cout<<pos< <" "<<line<<std::e ndl;

if(line==".PEN_ DOWN"){
in.seekg(pos);
break;
}
}
std::getline(in ,line);///This doesn't print .PEN_DOWN !
std::cout<<line <<std::endl;
Now if I open it in binary mode, Then this problem is solved.
But it creates another set of problems,
for unix file now it is fine, but for windows file \r is attached at
the end of line, as newline char is \n. So I need to remove \r from
the line if it is present.

I wonder, what will getline will return in case of a mac file where
newline terminator is \r only. Will it return the total file as single
line ?
Is there any std api support to take care of all these things, and yet
to make seekg & tellg consistent ?
Be careful: you're mixing two different things. In C++ source code, '\n'
is the character that's used to mark the end of a line, and '\r' is the
character that is used to mark a carriage return. That has only a
historical connection with the ASCII newline character whose value is
0x0D and the ASCII carriage return character whose value is 0x0A (or
maybe the other way around).

For text files, if you know the conventions that your operating system
uses, you can talk about the details of how line ends are represented in
the text file. But from a high level language perspective, that's
irrelevant detail: it's up to the I/O library to translate things, so
that when you write the character '\n' it does whatever is appropriate
to mark the end of a line using the OS's conventions. Similarly, when
you read a text file, the I/O library translates whatever the OS uses to
mark the end of a line into a single '\n' character.

The problem you're running into is that you're apparently not using
native text files, since you're talking about unix files, mac files, and
Windows. The I/O library isn't prepared to deal with all of them. When
you move text files from one system to another, use a utility like ftp
that understands line ending conventions and does the appropriate
translations. Don't expect Unix I/O libraries to understand Windows file
conventions, or vice versa.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
Feb 20 '07 #9
On Feb 20, 7:23 pm, "P.J. Plauger" <p...@dinkumwar e.comwrote:
"toton" <abirba...@gmai l.comwrote in message

news:11******** **************@ h3g2000cwc.goog legroups.com...
On Feb 20, 11:46 am, "Alf P. Steinbach" <a...@start.now rote:
* toton:
Hi,
I am reading a big file , and need to have a flag for current file
position so that I can store the positions for later direct access.
However it looks tellg is a very costly function ! But it's code says
it should just return the current buffer position , thus should be a
very low cost function.
To explain,
{
boost::progress _timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in ,line);
}
}
This code takes 0.58 sec in my computer, while if I uncomment the line
in.tellg() , it takes 120.8 sec (varies a little )
can anyone say the reason & the possible workout ?
I amusing MS Visual Studio 7.1 and the std library provided by visual
studio 7.1
Most likely the cause is conversion of CRLF to LF, which you've
specified by (1) opening the file in text mode and (2) compiling with a
Windows compiler.
One cure could then be to open the file in binary mode, and handle
newlines as appropriate (or not).
--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
There are enough bad things related to new line ...
seekg and tellg doesn't match when newline char is \n , and file is
opened in text mode.

That shouldn't be, if you're just using seekg to return to a place
earlier memorized by tellg.
For the unix file,
std::string line;
while(in){
int pos = in.tellg();
std::getline(in ,line);
std::cout<<pos< <" "<<line<<std::e ndl;
if(line==".PEN_ DOWN"){
in.seekg(pos);
break;
}
}
std::getline(in ,line);///This doesn't print .PEN_DOWN !
std::cout<<line <<std::endl;
Now if I open it in binary mode, Then this problem is solved.
But it creates another set of problems,
for unix file now it is fine, but for windows file \r is attached at
the end of line, as newline char is \n. So I need to remove \r from
the line if it is present.

If you wrote the file in binary mode, the \r characters wouldn't
be appended in the first place. It is important that you read and
write consistently, at least if you don't want to deal with local
conventions for reading and writing text files.
I wonder, what will getline will return in case of a mac file where
newline terminator is \r only. Will it return the total file as single
line ?

If you write in text mode and read in binary mode, that could happen,
yes.
Is there any std api support to take care of all these things, and yet
to make seekg & tellg consistent ?

Yes, it's called the Standard C++ library, if you use it right.

P.J. Plauger
Dinkumware, Ltd.http://www.dinkumware.com
May be I am unable to express the problem clearly.
1) I am not writing the file, I am reading the file only. It is a text
file, but nothing is fixed like line terminator will be \n or \r\n or
\r . It all depends on who saved the file using which editor .
So this is the question for parsing ...
The file looks something like this
..X_DIM 20701
..Y_DIM 27000
..X_POINTS_PER_ MM 100
..Y_POINTS_PER_ MM 100
..POINTS_PER_SE COND 200
..COMMENT YES_PRES_ORG 0
..COMMENT YES_PRES_EXT 1023
..DT 3975234
..PEN_DOWN
..COMMENT .PEN_WIDTH 1
..COMMENT .PEN_WIDTH_ORG 1
..COMMENT .PEN_COLOR 0x0

Now I need to remember past position using tellg() , and go to that
position using seekg().
The cases are,
1) file is opened in text mode . The file contains \n as terminator.
seekg doesn't place file pointer to proper pos saved by tellg (as
given in my previous program ) . It works as expected when newline is
\r\n.
2) The file is opened in binary mode . The file contains \n as line
terminator.
seekg & tellg works as expected. The file contains \r\n as
terminator . the returned string contains \r , which need to be
removed.
3) This one I hadn't tested. Several mac files have \r as newline
char. What std::getline(st ream,str ) will return ? The whole page or
the line only ?

Thus my questions are, how to check which newline char to use , so
that I can parse all of the files properly ?
It should be noted, files are not written by me, I just read it.
And all the test's are done with MSVC 7.1 , gcc might give just
opposite result (I will check it quickly ) .

Feb 20 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2694
by: Fred Ma | last post by:
I'm using the stringstreams to get the numerical values of string tokens (the strings result from tokenizing a line of input elsewhere): #include <iostream> #include <sstream> using namespace std; int main(void) { istringstream iss("1.23DOG");
6
8347
by: Eivind Grimsby Haarr | last post by:
I know that this has been posted before on several other newsgroups, but I need to make sure I got this right, so I hope you can forgive me for posting this. In MVSC6.0, and also in several Borland c++ compilers from what I can see from newsgroup postings, ifstream::tellg() alters the position of the file reading pointer when reading UNIX files (only LF character, not CRLF) in text mode. I can see why it does this, keeping consistency...
8
8069
by: Randy Yates | last post by:
I'm using mingw/g++ 3.3.3. When I use pos = tellg(), getline(), setg(pos), then the next getline() does NOT get from the original position. I've tried doing a clear() before the seekg() to no avail. In essence I've made my own "GetLine" and "PeekLine" functions for my class but due to this problem its not working properly void BOOK::GetLine(string &line) { char linechar;
16
667
by: kate | last post by:
salve. per favore rispondete alla mia domanda: Come faccio a ottenere le dimensioni di un file?(con c/c++) risp presto grazie
2
3604
by: philippe.deneve | last post by:
Hi all, I'm trying to retrieve the filesize of a file which size is larger then 4G. Old C functions are insufficient for they return a long. The standard c++ methods tellg() and seekg() also doesn't seem to work for these sizes. Do you know how I can resolve these kind of problems? regards, Philippe.
9
5449
by: wizofaus | last post by:
Is the any reason according to the standard that calling tellg() on an std::ifstream after a call to peek() could place the filebuf in an inconsistent state? I think it's a bug in the VC7 dinkumware implementation (and I've reported to them as such), but the following code std::ofstream ofs("test.txt"); ofs << "0123456789"; ofs.close(); std::wifstream ifs("test.txt");
0
1797
by: Chris | last post by:
I am reading in image files in a program and I read in the header in ascii mode. The problem is, sometimes tellg () gives me a completely incorrect result and sometimes it is just fine. Here is an example: char input; std::ifstream fin("blocks.pgm", std::ios::in); std::ifstream::pos_type position = fin.tellg(); fin>>input; position = fin.tellg();
1
4354
by: Chris | last post by:
I am reading in image files in a program and I read in the header in ascii mode and the data in binary mode. The problem is, sometimes tellg() gives me a completely incorrect result and sometimes it is just fine. It is quite annoying because there is no other good way to read these file and I think it is a problem with Visual C++, since it works just fine when I compile it using g++ on a Linux system. Here is an example: char input;
1
8224
by: Jacek Dziedzic | last post by:
Hello! This is my first time dealing with Very Large Files. I have vector of strings representing numbers and I need to extract bytes in binary mode from a Large File that correspond to ranges specified by the strings. For example for an input of "0", "100", "500", "700" I need to create three files, the first would contain
0
9671
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10433
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10161
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10000
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9035
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5436
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4112
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3720
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2919
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.