By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,714 Members | 1,345 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,714 IT Pros & Developers. It's quick & easy.

How to read past an end of file character

P: n/a
Hey all,

I'm working on an encoding scheme where I am running into a problem with
reading a file off a stream. Looking at the binary encoding of the file
(using a simple hex editor), there is no problem, and the whole file is
there. However, when I try to read it from cin, at certain times, cin stops
reading. I cannot force cin to go around the bad character, nor, indeed, do
I know what the bad character is.

I am including code at the bottom, but I do not think that will be helpful.
Does anyone know how to read past an end of file character (supposing that
one comes in on the stream from a text file or some similar source but is
not -actually- the end of the file)? The location of the problem is marked
with *** below.

-JFA1

#include <iostream>
#include <vector>
#include <utility>
#include <algorithm>
#include <map>
#include <cassert>

using namespace std;

typedef unsigned char uchar;
typedef unsigned long ulong;
typedef pair<ulong, int> range;
typedef pair<range, uchar> rangewchar;

const ulong lmsbmask = 0x80000000;

int total = UCHAR_MAX;
vector<int> numEnc = vector<int>(256, 1);
map<range, uchar> rangetochar;
map<uchar, range> chartorange;
map<ulong, range> starttorange;

int recalcCount = 0;

int writeCounter;
uchar writeBuf;

int readCounter;
uchar readBuf;

void compress();
void recalculate();
ulong neededSpace(uchar c);

void decompress();
uchar readNextChar();

void flushWriteBuffer();
void writeBits(ulong lng, const int nbits);
void writeBit(bool bit);
bool readBit();

bool tripless(const rangewchar &r1, const rangewchar &r2);
bool tripgreat(const rangewchar &r1, const rangewchar &r2);
bool rless(const range &r1, const range &r2);
bool rgreat(const range &r1, const range &r2);
int main(int argc, char *argv[])
{
if (argc != 2) {
cerr << "Error: incorrect number of command line flags specified.
Bailing.\n"
<< "Usage: ARC [-c|-u]\n";
exit(EXIT_FAILURE);
}

if (!strcmp(argv[1], "-c"))
compress();
else if (!strcmp(argv[1], "-u"))
decompress();

if (writeCounter != 0)
flushWriteBuffer();

return 0;
}

void compress()
{
recalculate();
while (cin.peek() != (char) EOF) {
if (recalcCount == (1 << 8)) {
recalculate();
recalcCount = 0;
}
char next;
cin.get(next);

range nextRange((chartorange.find((uchar) next))->second);
writeBits(nextRange.first, nextRange.second);

++recalcCount; ++numEnc[(uchar) next]; ++total;
}
}

void recalculate()
{
vector<rangewchar> totalinfo(UCHAR_MAX);
for (int i = 0; i < UCHAR_MAX; ++i) {
range pr;
totalinfo[i].first.second = neededSpace(i);
totalinfo[i].second = i;
}

sort(totalinfo.begin(), totalinfo.end(), &tripless);

ulong previous = 0;
for (int i = 0; i < UCHAR_MAX; ++i) {
totalinfo[i].first.first = previous;
previous = previous + (lmsbmask >> (totalinfo[i].first.second - 1));
chartorange[totalinfo[i].second] = totalinfo[i].first;
starttorange[totalinfo[i].first.first] = totalinfo[i].first;
rangetochar[totalinfo[i].first] = totalinfo[i].second;
}
}

ulong neededSpace(uchar c)
{
double requiredRange = .5, avgratio = (double) numEnc[(unsigned char) c]
/ (double) total;
int bitsNeeded = 1;

while (requiredRange > avgratio) {
requiredRange /= 2;
++bitsNeeded;
}

return bitsNeeded;
}

void decompress()
{
recalculate();
while (cin.peek() != EOF) { //****The problem seems to happen HERE****
if (recalcCount == (1 << 8)) {
recalculate();
recalcCount = 0;
}

uchar nextchar = readNextChar();
cout << nextchar;

++recalcCount; ++numEnc[nextchar]; ++total;
}
}

uchar readNextChar()
{
int nread(0);
ulong tmp(0);
while (true) {
bool nextbit(readBit());

if (nextbit)
tmp |= (lmsbmask >> nread);

map<ulong, range>::const_iterator it(starttorange.find(tmp));
if (it != starttorange.end()) { //If we find a matching start point
assert(nread <= it->second.second);
if (it->second.second == nread+1) //If we've read the right number of
chars
return rangetochar.find(it->second)->second; //Bingo
}
++nread;
}
}

const uchar clsbmask = 0x01;
const uchar cmsbmask = 0x80;

void writeBits(ulong lng, const int nbits)
{
for (int i = 0; i < nbits; ++i) {
writeBit((lng & lmsbmask) == lmsbmask);
lng <<= 1;
}
}

void writeBit(bool bit)
{
if (writeCounter == 8) {
cout.put(writeBuf);
writeCounter = 0;
writeBuf = 0;
}

writeBuf <<= 1;
if (bit)
writeBuf |= clsbmask;
++writeCounter;
}

void flushWriteBuffer()
{
while (writeCounter!=1) {
writeBit(false);
}
}

bool readBit()
{
if (readCounter == 0) {
readCounter = 8;
cin.get(reinterpret_cast<char &>(readBuf));
}

bool retBit = (readBuf & cmsbmask) == cmsbmask;
readBuf <<= 1;
--readCounter;
return retBit;
}

bool rless(const range &r1, const range &r2)
{
return (r1.second) < (r2.second);
}

bool rgreat(const range &r1, const range &r2)
{
return (r1.second) > (r2.second);
}

bool tripless(const rangewchar &r1, const rangewchar &r2)
{
return (r1.first.second) < (r2.first.second);
}

bool tripgreat(const rangewchar &r1, const rangewchar &r2)
{
return (r1.first.second) > (r2.first.second);
}
Jul 23 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a

James Aguilar wrote:
Hey all,

I'm working on an encoding scheme where I am running into a problem with reading a file off a stream. Looking at the binary encoding of the file (using a simple hex editor), there is no problem, and the whole file is there. However, when I try to read it from cin, at certain times, cin stops reading. I cannot force cin to go around the bad character, nor, indeed, do I know what the bad character is.

I am including code at the bottom, but I do not think that will be helpful. Does anyone know how to read past an end of file character (supposing that one comes in on the stream from a text file or some similar source but is not -actually- the end of the file)? The location of the problem is marked with *** below.


Based upon your description above, I assume that your program is being
used under windows. All M$ operating share a common descendency from
that wonderful old OS called CPM. The CPM file system did not have the
meta-data avaialble to know the length of the file so the end of file
character (0x26) was used to mark the termination character of the
file. This unfortunate decision was corrected in earlier versions of
DOS. The unfortunate reality is that, to this day, the operating
system will consider that character the end of the file when the file
is being read in "text mode".

The only solution that I can offer is to open your stream using the
flag ios::binary. A side effect from doing this is that the operating
system will no longer coagulate "\r\n" sequences into a single "\n" as
it does when the file is opened in text mode. The plus side is that it
gives you as a programmer more precise control over what is read from
(or written to) the file.

Regards

Jon Trauntvein

Jul 23 '05 #2

P: n/a

"JH Trauntvein" <j.**********@comcast.net> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...

James Aguilar wrote:
Hey all,

I'm working on an encoding scheme where I am running into a problem

with
reading a file off a stream. Looking at the binary encoding of the

file
(using a simple hex editor), there is no problem, and the whole file

is
there. However, when I try to read it from cin, at certain times,

cin stops
reading. I cannot force cin to go around the bad character, nor,

indeed, do
I know what the bad character is.

I am including code at the bottom, but I do not think that will be

helpful.
Does anyone know how to read past an end of file character (supposing

that
one comes in on the stream from a text file or some similar source

but is
not -actually- the end of the file)? The location of the problem is

marked
with *** below.


Based upon your description above, I assume that your program is being
used under windows. All M$ operating share a common descendency from
that wonderful old OS called CPM. The CPM file system did not have the
meta-data avaialble to know the length of the file so the end of file
character (0x26) was used to mark the termination character of the
file. This unfortunate decision was corrected in earlier versions of
DOS. The unfortunate reality is that, to this day, the operating
system will consider that character the end of the file when the file
is being read in "text mode".

The only solution that I can offer is to open your stream using the
flag ios::binary. A side effect from doing this is that the operating
system will no longer coagulate "\r\n" sequences into a single "\n" as
it does when the file is opened in text mode. The plus side is that it
gives you as a programmer more precise control over what is read from
(or written to) the file.


This is exactly what I'm looking for. I don't care about \r\n, since my
task is to read and compress arbitrary data. Thanks!

- JFA1
Jul 23 '05 #3

P: n/a
Sorry, one last question: can the method you stated be used with the
standard input and output?

- JFA1
Jul 23 '05 #4

P: n/a
"James Aguilar" wrote :
Sorry, one last question: can the method you stated be used with the
standard input and output?


Yes - but how to set stdin/stdout to binary mode may be platform/compiler-specific (somebody correct me if there is a
standard C++ way to do this...). Using the MinGW gcc compiler on Win32, putting the following code before main() does
the job (this is in the MinGW FAQ, I think):

#include <fcntl.h> // _O_BINARY
unsigned int _CRT_fmode = _O_BINARY; // MinGW: force stdin/stdout to binary mode

Then the following code will read stdin byte-for-byte into a correctly-sized buffer (you'll want to put some error
checking in this!):

using namespace std;

....

cin.seekg(0,ios::end); // set pos to end of stream
size_t len = (size_t)cin.tellg(); // get position
cin.seekg(0); // set pos back to beginning of stream
char* const buffer = new char[len];
cin.read(buffer,len);

....

Again, this works for me on Win32 with MinGW - not sure about portability.

Regards,

--
Lionel B

Jul 23 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.