By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,911 Members | 1,213 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,911 IT Pros & Developers. It's quick & easy.

File Read Progress Indicator

P: n/a
I am working on a program that reads and processes large text files (on
the order of 32 MB, so not too huge), so I wanted to add a progress
indicator so I can estimate when it will finish. I just need an
estimate, so the exact byte count isn't essential.

// reduced code
// assume necessary #include's and using declarations for std
// components

ifstream file(filename.c_str());
// read 2 header lines
for (int i = 0; i != 2; ++i) {
string header;
getline(file, header);
}
ifstream::pos_type start_of_data = file.tellg();

file.seekg(0, ios::end);
ifstream::pos_type end_of_data = file.tellg();
file.seekg(start_of_data);
for (string line; getline(file, line); ) {
do_something_with(line);

int percent_done =
static_cast<unsigned long>(file.tellg()) * 100 / end_of_data;

cout << percent_done << "%\n";
}

This outline seems to work well. My question is: is the cast from the
return type of ifstream::tellg() to unsigned long well-defined? The
reason I am casting to an unsigned type in the first place is that
without the cast, eventually negative percents were being displayed.

Also, are there any other issues with my usage of tellg()? I remember
reading somewhere that the result of tellg() isn't guaranteed to be able
to represent any valid filesize, but I don't know if there is any way
around this issue using only standard components.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
May 18 '07 #1
Share this Question
Share on Google+
7 Replies


P: n/a
int percent_done =
static_cast<unsigned long>(file.tellg()) * 100 / end_of_data;
This outline seems to work well. My question is: is the cast from the
return type of ifstream::tellg() to unsigned long well-defined? The
reason I am casting to an unsigned type in the first place is that
without the cast, eventually negative percents were being displayed.
It seems that the negativity problem you're seeing would have to be
when you're hitting that limit, but specifically when file.tellg() *
100 hits that limit. You probably should do this calculation in
doubles, then convert back to ints at the end.

How many bits does unsigned long have on your system? If it's 64,
then ignore the previous paragraph, as you're very unlikely to be
hitting that limit.

Michael

May 18 '07 #2

P: n/a
On May 18, 10:17 pm, ricec...@gehennom.invalid (Marcus Kwok) wrote:
I am working on a program that reads and processes large text files (on
the order of 32 MB, so not too huge), so I wanted to add a progress
indicator so I can estimate when it will finish. I just need an
estimate, so the exact byte count isn't essential.
// reduced code
// assume necessary #include's and using declarations for std
// components
ifstream file(filename.c_str());
// read 2 header lines
for (int i = 0; i != 2; ++i) {
string header;
getline(file, header);
}
ifstream::pos_type start_of_data = file.tellg();
file.seekg(0, ios::end);
ifstream::pos_type end_of_data = file.tellg();
file.seekg(start_of_data);
for (string line; getline(file, line); ) {
do_something_with(line);
int percent_done =
static_cast<unsigned long>(file.tellg()) * 100 / end_of_data;

cout << percent_done << "%\n";
}
This outline seems to work well. My question is: is the cast from the
return type of ifstream::tellg() to unsigned long well-defined?
No. First, the return type is a streampos, which may not even
be convertible to an integral type. Second, even when it is
convertible, there is not necessarily a direct relationship
between the numeric value and the number of bytes in the file.
Third, even on systems where there is an exact relationship
(Unix), or a more or less rough relationship (Windows), and
unsigned long is generally not large enough. (Unix defines a
special type, ssize_t, for this; Microsoft uses a struct
LARGE_INTEGER.) If you're sure that the files can never be more
than, say, 100 MB, then this is not necessarily a consideration.
The
reason I am casting to an unsigned type in the first place is that
without the cast, eventually negative percents were being displayed.
Overflow. The length of a file often doesn't fit into a long to
begin with, and then you go ahead and multiply it by 100. Since
you're interested in per cent, and exact precision isn't an
issue, I'd cast it to double, and use floating point arithmetic.
Also, are there any other issues with my usage of tellg()? I remember
reading somewhere that the result of tellg() isn't guaranteed to be able
to represent any valid filesize, but I don't know if there is any way
around this issue using only standard components.
There's no real solution if you want to remain 100% standard,
because there are real systems where what you want simply isn't
possible. If you're willing to limit portability to Windows and
Unix, however, converting the results of tellg() to double, and
using it, should work. (The results may be off by a couple of
percent under Windows, but typically, the error will be more or
less the same for each call, so your calculations of per cent
will probably end up more precise than expected. Supposing that
the file has more or less homogeonous contents, at least.)

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

May 19 '07 #3

P: n/a
On May 19, 1:01 am, James Kanze <james.ka...@gmail.comwrote:
On May 18, 10:17 pm, ricec...@gehennom.invalid (Marcus Kwok) wrote:
I am working on a program that reads and processes large text files (on
the order of 32 MB, so not too huge), so I wanted to add a progress
indicator so I can estimate when it will finish. I just need an
estimate, so the exact byte count isn't essential.
// reduced code
// assume necessary #include's and using declarations for std
// components
ifstream file(filename.c_str());
// read 2 header lines
for (int i = 0; i != 2; ++i) {
string header;
getline(file, header);
}
ifstream::pos_type start_of_data = file.tellg();
file.seekg(0, ios::end);
ifstream::pos_type end_of_data = file.tellg();
file.seekg(start_of_data);
for (string line; getline(file, line); ) {
do_something_with(line);
int percent_done =
static_cast<unsigned long>(file.tellg()) * 100 / end_of_data;
cout << percent_done << "%\n";
}
This outline seems to work well. My question is: is the cast from the
return type of ifstream::tellg() to unsigned long well-defined?

No. First, the return type is a streampos, which may not even
be convertible to an integral type. Second, even when it is
convertible, there is not necessarily a direct relationship
between the numeric value and the number of bytes in the file.
Third, even on systems where there is an exact relationship
(Unix), or a more or less rough relationship (Windows), and
unsigned long is generally not large enough. (Unix defines a
special type, ssize_t, for this; Microsoft uses a struct
LARGE_INTEGER.) If you're sure that the files can never be more
than, say, 100 MB, then this is not necessarily a consideration.
The
reason I am casting to an unsigned type in the first place is that
without the cast, eventually negative percents were being displayed.

Overflow. The length of a file often doesn't fit into a long to
begin with, and then you go ahead and multiply it by 100. Since
you're interested in per cent, and exact precision isn't an
issue, I'd cast it to double, and use floating point arithmetic.
Also, are there any other issues with my usage of tellg()? I remember
reading somewhere that the result of tellg() isn't guaranteed to be able
to represent any valid filesize, but I don't know if there is any way
around this issue using only standard components.

There's no real solution if you want to remain 100% standard,
because there are real systems where what you want simply isn't
possible. If you're willing to limit portability to Windows and
Unix, however, converting the results of tellg() to double, and
using it, should work. (The results may be off by a couple of
percent under Windows, but typically, the error will be more or
less the same for each call, so your calculations of per cent
will probably end up more precise than expected. Supposing that
the file has more or less homogeonous contents, at least.)

--
James Kanze (Gabi Software) email: james.ka...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

May 19 '07 #4

P: n/a
Michael <mc******@aol.comwrote:
> int percent_done =
static_cast<unsigned long>(file.tellg()) * 100 / end_of_data;
>This outline seems to work well. My question is: is the cast from the
return type of ifstream::tellg() to unsigned long well-defined? The
reason I am casting to an unsigned type in the first place is that
without the cast, eventually negative percents were being displayed.

It seems that the negativity problem you're seeing would have to be
when you're hitting that limit, but specifically when file.tellg() *
100 hits that limit. You probably should do this calculation in
doubles, then convert back to ints at the end.
Thanks, that's the same advice James Kanze gave as well.
How many bits does unsigned long have on your system? If it's 64,
then ignore the previous paragraph, as you're very unlikely to be
hitting that limit.
sizeof(unsigned long) * CHAR_BIT = 32 on my platform (Windows XP, VS
2005).

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
May 21 '07 #5

P: n/a
James Kanze <ja*********@gmail.comwrote:
On May 18, 10:17 pm, ricec...@gehennom.invalid (Marcus Kwok) wrote:
>This outline seems to work well. My question is: is the cast from the
return type of ifstream::tellg() to unsigned long well-defined?

No. First, the return type is a streampos, which may not even
be convertible to an integral type. Second, even when it is
convertible, there is not necessarily a direct relationship
between the numeric value and the number of bytes in the file.
Third, even on systems where there is an exact relationship
(Unix), or a more or less rough relationship (Windows), and
unsigned long is generally not large enough. (Unix defines a
special type, ssize_t, for this; Microsoft uses a struct
LARGE_INTEGER.) If you're sure that the files can never be more
than, say, 100 MB, then this is not necessarily a consideration.
>The
reason I am casting to an unsigned type in the first place is that
without the cast, eventually negative percents were being displayed.

Overflow. The length of a file often doesn't fit into a long to
begin with, and then you go ahead and multiply it by 100. Since
you're interested in per cent, and exact precision isn't an
issue, I'd cast it to double, and use floating point arithmetic.
Thanks, I think I'll go this route.

As an aside, the conversion from streampos to double is well-defined?
Or it just will work in practice? Right now it only needs to work on
Windows but we may use it on HP-UX in the future.
>Also, are there any other issues with my usage of tellg()? I remember
reading somewhere that the result of tellg() isn't guaranteed to be able
to represent any valid filesize, but I don't know if there is any way
around this issue using only standard components.

There's no real solution if you want to remain 100% standard,
because there are real systems where what you want simply isn't
possible. If you're willing to limit portability to Windows and
Unix, however, converting the results of tellg() to double, and
using it, should work.
I see, so I guess this answers my above question :)

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
May 21 '07 #6

P: n/a
On May 21, 8:03 pm, ricec...@gehennom.invalid (Marcus Kwok) wrote:
James Kanze <james.ka...@gmail.comwrote:
On May 18, 10:17 pm, ricec...@gehennom.invalid (Marcus Kwok) wrote:
This outline seems to work well. My question is: is the cast from the
return type of ifstream::tellg() to unsigned long well-defined?
No. First, the return type is a streampos, which may not even
be convertible to an integral type. Second, even when it is
convertible, there is not necessarily a direct relationship
between the numeric value and the number of bytes in the file.
Third, even on systems where there is an exact relationship
(Unix), or a more or less rough relationship (Windows), and
unsigned long is generally not large enough. (Unix defines a
special type, ssize_t, for this; Microsoft uses a struct
LARGE_INTEGER.) If you're sure that the files can never be more
than, say, 100 MB, then this is not necessarily a consideration.
The
reason I am casting to an unsigned type in the first place is that
without the cast, eventually negative percents were being displayed.
Overflow. The length of a file often doesn't fit into a long to
begin with, and then you go ahead and multiply it by 100. Since
you're interested in per cent, and exact precision isn't an
issue, I'd cast it to double, and use floating point arithmetic.
Thanks, I think I'll go this route.
As an aside, the conversion from streampos to double is well-defined?
Or it just will work in practice? Right now it only needs to work on
Windows but we may use it on HP-UX in the future.
First, it's not defined at all; there is (in the standard) no
direct conversion from streampos to an arithmetic type. There
is an implicite conversion from streampos to streamoff, however,
and streamoff is required to be convertible to an integral type;
in most implementations, streamoff is in fact a typedef of an
integral type. If streamoff is a typedef to an integral type,
streampos will convert implicitly to any arithmetic type; if it
is a user defined type, you'll need some explicit conversion in
there somewhere.

More significantly, of course, the semantics of the conversion
are more or less undefined; there is a set of operations which
are required to work, but there's nothing to stop the resulting
integral type from being a magic number, or (more likely), some
formatted representation, with different bits having different
significations.

In practice, of course: under Unix or Windows, streamoff will be
an integral type, and it will represent the number of bytes at
the system level from the start of the file. Under Unix, this
means exactly the number of bytes that you read; under Windows,
the number may be slightly higher, but perfectly adequate for
things like a progress bar. This solution typically won't work
on mainframes, but then, mainframes don't usually have the sort
of terminals attached to them where a running indication of
progress would make sense. (And they're different enough from
Unix/Windows that there are probably other things in your code
which would require fixing.)
Also, are there any other issues with my usage of tellg()? I remember
reading somewhere that the result of tellg() isn't guaranteed to be able
to represent any valid filesize, but I don't know if there is any way
around this issue using only standard components.
There's no real solution if you want to remain 100% standard,
because there are real systems where what you want simply isn't
possible. If you're willing to limit portability to Windows and
Unix, however, converting the results of tellg() to double, and
using it, should work.
I see, so I guess this answers my above question :)
Yes. And Windows and Unix (which includes Mac) is a pretty
large world; I'd say that if you're concerned about a user
sitting in front of a terminal, they pretty much cover that
environment. (Today---this wasn't always true, and even today,
you might run into a legacy system here and there. But if you
don't already have one, your company isn't going to go out and
acquire one in the future.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

May 22 '07 #7

P: n/a
James Kanze <ja*********@gmail.comwrote:
First, it's not defined at all; there is (in the standard) no
direct conversion from streampos to an arithmetic type. There
is an implicite conversion from streampos to streamoff, however,
and streamoff is required to be convertible to an integral type;
in most implementations, streamoff is in fact a typedef of an
integral type. If streamoff is a typedef to an integral type,
streampos will convert implicitly to any arithmetic type; if it
is a user defined type, you'll need some explicit conversion in
there somewhere.
Thanks. The conversion from streampos to double works for me, today, on
my current platform :)

[snip rest]

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
May 22 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.