468,463 Members | 2,041 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,463 developers. It's quick & easy.

fstream, getline() and failbit

I am reading a binary file and I want to search it for a string. The
only problem is that failbit gets set after only a few calls to
getline() so it never reaches the end of the file where the string is
contained. From reading through posts to this list it seems that
failbit gets set if there is a format error whilst reading. Is it bad
form to reading binary data into a char[] array? Is this why my
function below doesn't work?

void ReadBinData()
{
int reads=0;
string data;
char str[1024];
fstream myFile ("test.exe", ios::in | ios::binary);
if ( (myFile.rdstate() & ifstream::failbit ) != 0 )
cout << "error";

while(myFile.getline(str, 1024 ))
{
data = str;
if(data.find("roryrory", 0)!=string::npos)
cout << "found it";
reads++;
}

cout << "\nno of times getline was called = " << reads << endl;

if ( (myFile.rdstate() & ifstream::failbit ) != 0 )
cout << "\nerror, failbit set....";

myFile.close();
}

Rory.
Jan 23 '08 #1
11 12863
rory wrote:
while(myFile.getline(str, 1024 ))
Try read (str, 1024) instead of getline (can't use "while (read (...))"
then though) - why would you want to read "lines" from a binary (.exe)
file anyways?

However that should not be the cause of your problem. What COULD happen
is that your search expression will get split over 2 different
"getlines" in your case, when 1023 bytes have been read without finding
a newline delimiter. str will then for example contain "roryr\n" in the
last 6 bytes, and on the next call will be filled with "ory" in the
first 3 bytes. You'd never find your search expression.
And the failbit might well be set after "only a few" calls to getline,
if the executable only contains a few newlines and isn't much bigger
than a couple of kilobytes.. So long story short: don't use getline!
Then see if your problem still occurs.

Best Regards,

Lars
Jan 23 '08 #2
On Jan 23, 1:43 pm, Lars Uffmann <a...@nurfuerspam.dewrote:
rory wrote:
while(myFile.getline(str, 1024 ))

Try read (str, 1024) instead of getline (can't use "while (read (...))"
then though) - why would you want to read "lines" from a binary (.exe)
file anyways?

However that should not be the cause of your problem. What COULD happen
is that your search expression will get split over 2 different
"getlines" in your case, when 1023 bytes have been read without finding
a newline delimiter. str will then for example contain "roryr\n" in the
last 6 bytes, and on the next call will be filled with "ory" in the
first 3 bytes. You'd never find your search expression.
And the failbit might well be set after "only a few" calls to getline,
if the executable only contains a few newlines and isn't much bigger
than a couple of kilobytes.. So long story short: don't use getline!
Then see if your problem still occurs.

Best Regards,

Lars
Thanks Lars, using read it seems to read the entire file, the filesize
is 3.830 mbs and read is called 3829 times. My next problem is one you
alluded to, reading blocks of data means the string could get chopped
up which is the last thing I want. The idea is that I append a unique
string identifier to a binary file, then I append some text after it.
I then want to search that file for the unique string identifier and
then retrieve the text that follows it. Before writing the unique
string I first write a newline char, that's why I thought I could just
use getline() as it runs until a new line. Valid point however that it
might not always get to a new line. Have you any suggestions for me on
how I might do this? Thanks for the reply,

Rory.
Jan 23 '08 #3
In article <ed81c061-791b-4d38-ac59-
54**********@v67g2000hse.googlegroups.com>, ro*******@gmail.com says...

[ ... ]
Thanks Lars, using read it seems to read the entire file, the filesize
is 3.830 mbs and read is called 3829 times. My next problem is one you
alluded to, reading blocks of data means the string could get chopped
up which is the last thing I want. The idea is that I append a unique
string identifier to a binary file, then I append some text after it.
I then want to search that file for the unique string identifier and
then retrieve the text that follows it. Before writing the unique
string I first write a newline char, that's why I thought I could just
use getline() as it runs until a new line. Valid point however that it
might not always get to a new line. Have you any suggestions for me on
how I might do this? Thanks for the reply,
My guess is that the failure is due to some value in the file being
interpreted as signaling the end of the file when it's treated as text.
Unix generally treats control-D this way; for Windows it's control-Z.
Regardless, you need to tell your stream not to interpret control
characters that way, by opening it as a binary stream:

std::ifstream file(your_file_name, std::ios::binary);

std::stringstream temp;

// copy the file into a string
temp << file.rdbuf();

// marker for the beginning of your data:
std::string sentinel("\nroryrory");

// find your data (std::string::npos if it doesn't exist)
int data_pos = temp.str().find(sentinel)+sentinel.length();

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jan 23 '08 #4
On 2008-01-23 10:25:24 -0500, Jerry Coffin <jc*****@taeus.comsaid:
>
My guess is that the failure is due to some value in the file being
interpreted as signaling the end of the file when it's treated as text.
Unix generally treats control-D this way; for Windows it's control-Z.
Regardless, you need to tell your stream not to interpret control
characters that way, by opening it as a binary stream:
Yes, that's the right way to read a binary file. But having done that,
the runtime library also won't translate the character sequence that
represents a newline into the character '\n'. It's binary data all the
way...
>
// marker for the beginning of your data:
std::string sentinel("\nroryrory");
That '\n' at the beginning may or may not match some sequence of bytes
that was written to the file. Search for "roryrory" instead.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Jan 23 '08 #5
I think this take care of the '\n' problem also

#include <iostream>
#include <fstream>
using namespace std;

int main()
{
ifstream::pos_type size;
char * memblock;

ifstream iFile;
iFile.open("asdf.txt",ios::in|ios::binary|ios::ate );
if(iFile.is_open())
{
size = iFile.tellg(); //get the size of file
memblock = new char[size]; //memblock needs size byte to hold all
data
iFile.seekg(0,ios::beg); //go back to beginning of file
iFile.read(memblock,size); //read whole file in memblock
iFile.close(); //close the file
}
char * pos;
pos = strchr(&memblock[0],'r'); //find the position of the first
'r'
while( strncmp(pos,"roryrory",8) ) //compare 8 char size
{
pos = strchr(pos+1,'a'); // keep looking to the next char
}
return 0;
}
Jan 23 '08 #6
That code causes my program to crash. It seems to die at the while
loop, if I place a cout in there it never gets printed and I get a
'program has encountered a problem and needs to close' notice. My
previous version, the one using stringstream was finding the correct
string and returning the right position but when I tried reading from
that position on I get a strange string. I don't know why? Thanks for
replying, I am getting closer to finding the problem but will have to
leave it till tomorrow, it's bedtime!

Rory.

Jan 24 '08 #7
On Jan 24, 4:55*am, rory <rorywa...@gmail.comwrote:
That code causes my program to crash. It seems to die at the while
loop, if I place a cout in there it never gets printed and I get a
'program has encountered a problem and needs to close' notice. My
previous version, the one using stringstream was finding the correct
string and returning the right position but when I tried reading from
that position on I get a strange string. I don't know why? Thanks for
replying, I am getting closer to finding the problem but will have to
leave it till tomorrow, it's bedtime!

Rory.
This code worked for me.

#include <fstream>
#include <iostream>
#include <sstream>
#include <string>

main()
{
std::ifstream ifstr("new",std::ios::binary);
std::stringstream temp;
temp << ifstr.rdbuf();
const std::string sentinel("roryrory");
const std::string::size_type data_pos(temp.str().find(sentinel,
0)+sentinel.length());
const std::string myText(temp.str().substr(data_pos,50));
std::cout << myText << '\n';
}
Thanks,
Balaji.
Jan 24 '08 #8
Your code work here too. The only difference I can spot is that you
used const std::string's. I'm delighted that it now works for me but
can someone explain *why* it didn't work using plain old std::strings?
Thanks to everyone who's replied, I can now move forward with my
project.

Rory.

Jan 24 '08 #9
On Jan 23, 4:25 pm, Jerry Coffin <jcof...@taeus.comwrote:
In article <ed81c061-791b-4d38-ac59-
54277a539...@v67g2000hse.googlegroups.com>, rorywa...@gmail.com says...
[ ... ]
Thanks Lars, using read it seems to read the entire file, the filesize
is 3.830 mbs and read is called 3829 times. My next problem is one you
alluded to, reading blocks of data means the string could get chopped
up which is the last thing I want. The idea is that I append a unique
string identifier to a binary file, then I append some text after it.
I then want to search that file for the unique string identifier and
then retrieve the text that follows it. Before writing the unique
string I first write a newline char, that's why I thought I could just
use getline() as it runs until a new line. Valid point however that it
might not always get to a new line. Have you any suggestions for me on
how I might do this? Thanks for the reply,
My guess is that the failure is due to some value in the file being
interpreted as signaling the end of the file when it's treated as text.
That's one possibility. Another is simply that his buffer isn't
big enough to hold the longest "line". getline() will set the
failbit if it encounters the end of the buffer before it sees a
'\n' character.
Unix generally treats control-D this way; for Windows it's control-Z.
Unix never treats control-D this way in a file. Under Unix,
there is absolutely no difference between binary files and text
files.
Regardless, you need to tell your stream not to interpret
control characters that way, by opening it as a binary stream:
std::ifstream file(your_file_name, std::ios::binary);
And of course, he'll also have to write the file in binary mode;
otherwise, some of the output data might be modified.
std::stringstream temp;
// copy the file into a string
temp << file.rdbuf();
// marker for the beginning of your data:
std::string sentinel("\nroryrory");
// find your data (std::string::npos if it doesn't exist)
int data_pos = temp.str().find(sentinel)+sentinel.length();
Depending on the implementation, that might not be such a good
idea; some implementations of stringstream grow the string very
inefficiently (and it will be 3.8 MB). If he can determine the
size of the file before hand, resizing an std::vector<charand
reading the entire file into it, then using std::search might be
a good option. (If portability isn't a concern, mmap'ing the
file is likely to be the fastest solution.) Otherwise, a KMP
search is pretty straightforward, and since it never requires
backing up, it avoids the problem of the sequence being split
across two successive buffers. Or if he needs an even faster
algorithm (BM, for example), he can save a block the size of the
sentinel at the start of the buffer, copy the end of the
preceding buffer into it before each read, and start his search
from there.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jan 24 '08 #10
On Jan 23, 5:12 pm, Pete Becker <p...@versatilecoding.comwrote:
On 2008-01-23 10:25:24 -0500, Jerry Coffin <jcof...@taeus.comsaid:
Yes, that's the right way to read a binary file. But having
done that, the runtime library also won't translate the
character sequence that represents a newline into the
character '\n'. It's binary data all the way...
// marker for the beginning of your data:
std::string sentinel("\nroryrory");
That '\n' at the beginning may or may not match some sequence
of bytes that was written to the file. Search for "roryrory"
instead.
If it's binary data, he'd better have used binary mode when he
wrote it as well. In which case, reading it in binary mode
will return exactly the same bytes he wrote.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jan 24 '08 #11
rory <ro*******@gmail.comwrote:
I am reading a binary file and I want to search it for a string. The
only problem is that failbit gets set after only a few calls to
getline() so it never reaches the end of the file where the string is
contained. From reading through posts to this list it seems that
failbit gets set if there is a format error whilst reading. Is it bad
form to reading binary data into a char[] array? Is this why my
function below doesn't work?

void ReadBinData()
{
int reads=0;
string data;
char str[1024];
fstream myFile ("test.exe", ios::in | ios::binary);
if ( (myFile.rdstate() & ifstream::failbit ) != 0 )
cout << "error";

while(myFile.getline(str, 1024 ))
{
data = str;
if(data.find("roryrory", 0)!=string::npos)
cout << "found it";
reads++;
}

cout << "\nno of times getline was called = " << reads << endl;

if ( (myFile.rdstate() & ifstream::failbit ) != 0 )
cout << "\nerror, failbit set....";

myFile.close();
}
Is there a particular reason why you can't use a standard algorithm?

void ReadBinData()
{
fstream myFile("test.exe", ios::in | ios::binary);
const char* rory = "roryrory";
search( istream_iterator<char>( myFile ), istream_iterator<char>(),
rory, rory + strlen( rory ) );
if ( myFile )
{
cout << "found it\n";
}
else if ( myFile.eof() )
{
cout << "not found\n";
}
else
cout << "error\n";
myFile.close();
}
Jan 24 '08 #12

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Newsgroup - Ann | last post: by
3 posts views Thread by Mike Austin | last post: by
10 posts views Thread by Alex Vinokur | last post: by
18 posts views Thread by Amadeus W. M. | last post: by
3 posts views Thread by shyam | last post: by
2 posts views Thread by manwanirg | last post: by
4 posts views Thread by IanWright | last post: by
8 posts views Thread by khalid302 | last post: by
reply views Thread by NPC403 | last post: by
reply views Thread by kmladenovski | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.