473,382 Members | 1,107 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

compare 2 files

How to compare if two files are identical? I wrote the following:

bool comparefiles(const std::string& lhs, const std::string& rhs)
{
std::ifstream lhsfile(lhs.c_str());
std::ifstream rhsfile(rhs.c_str());

typedef std::istreambuf_iterator<char> istreambuf_iterator;

return std::equal(
istreambuf_iterator(lhsfile),
istreambuf_iterator(),
istreambuf_iterator(rhsfile)
);
}

But I don't think it will work becuase: (1) we only compare the first N
chars where N is the number of chars in lhsfile, so if rhsfile has more
chars the function will return true if the first N are equal which is
incorrect, (2) the standard says that calling operator* on an end of stream
is undefined (24.5.3.3), so if lhsfile has more chars then we will at some
point call operator* on rhsfile when it is at EOF, and the result is
undefined (though I think it should always return EOF).

So what else can we do?

I could use the stat function to check if lhsfile and rhsfile have the same
size, but I want to keep my code ANSI compatible.

So I came up with the following function, which looks very much like strcmp.
bool comparefiles(const std::string& lhs, const std::string& rhs)
{
using namespace std;
const streambuf::int_type eof = streambuf::traits_type::eof();

ifstream lhsfile(lhs.c_str());
ifstream rhsfile(rhs.c_str());

streambuf * lhsbuf = lhsfile.rdbuf();
streambuf * rhsbuf = rhsfile.rdbuf();

char lhschar, rhschar;
while (true)
{
lhschar = lhsbuf->sbumpc();
rhschar = rhsbuf->sbumpc();

if (lhschar == eof && rhschar == eof) return true;
if (lhschar == eof || rhschar == eof) break;
if (lhschar != rhschar) break;
}

cout << "compare \"" << lhs << "\" and \"" << rhs << "\" failed\n";
return false;
}
Any comments?
Jul 22 '05 #1
4 6129
"Siemel Naran" <Si*********@REMOVE.att.net> wrote in message
news:fv***********************@bgtnsc04-news.ops.worldnet.att.net...
How to compare if two files are identical? I wrote the following: .... So I came up with the following function, which looks very much like
strcmp.
bool comparefiles(const std::string& lhs, const std::string& rhs)
{
using namespace std;
const streambuf::int_type eof = streambuf::traits_type::eof();

ifstream lhsfile(lhs.c_str());
ifstream rhsfile(rhs.c_str());

streambuf * lhsbuf = lhsfile.rdbuf();
streambuf * rhsbuf = rhsfile.rdbuf(); Since only the stream buffer interface is used, you can directly
create instances of std::filebuf instead of an ifstream.
char lhschar, rhschar; These two variables should be of type int_type. char may be unable
to represent eof (or be equal to eof when it should not, e.g.
when reading 0xFF on an implementation where char is signed).
while (true)
{
lhschar = lhsbuf->sbumpc();
rhschar = rhsbuf->sbumpc();

if (lhschar == eof && rhschar == eof) return true;
if (lhschar == eof || rhschar == eof) break;
if (lhschar != rhschar) break;
}

or:
do {
lhschar = lhsbuf.sbumpc();
rhschar = rhsbuf.sbumpc();
if( lhschar != rhschar ) return false;
} while( lhschar != eof );
return true;
Cheers,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Jul 22 '05 #2

I think the first thing I'd do it check if the file sizes are the same. No
need to read tthrough the file looking for differences if they're different
sizes. I'm not familiar with how to check file size, but if that's easy
enough to do, you might want to throw in a check for that equality before
bothering to check the contents. Just a thought...

-Howard
Jul 22 '05 #3
"Howard" <al*****@hotmail.com> wrote in message news:J7Dxd.1129358
I think the first thing I'd do it check if the file sizes are the same. No need to read tthrough the file looking for differences if they're different sizes. I'm not familiar with how to check file size, but if that's easy
enough to do, you might want to throw in a check for that equality before
bothering to check the contents. Just a thought...


This is the ideal solution, then I can continue to use std::equal as in my
original code. However, the standard does not provide a way to find the
file size without opening it and scanning to the last character. Opening
the file, calling file.seekg(ios::end) followed by file.tellp() is allowed
to return 0 rather than the actual byte position though my implementation
does in fact return the file size. There is a function stat, and it's on
Windows and Linux, but it's not ANSI standard (though maybe it should be).
I know that boost also has some way to get the file size, and I imagine the
implementation calls stat on Windows and Linux, etc.
Jul 22 '05 #4
Siemel Naran wrote:
"Howard" <al*****@hotmail.com> wrote in message news:J7Dxd.1129358
I think the first thing I'd do it check if the file sizes are the
same. No need to read tthrough the file looking for differences if
they're different sizes. I'm not familiar with how to check file
size, but if that's easy enough to do, you might want to throw in a
check for that equality before bothering to check the contents.
Just a thought...
This is the ideal solution, then I can continue to use std::equal as
in my original code. However, the standard does not provide a way to
find the file size without opening it and scanning to the last
character. Opening the file, calling file.seekg(ios::end) followed
by file.tellp() is allowed to return 0 rather than the actual byte
position though my implementation does in fact return the file size.
There is a function stat, and it's on Windows and Linux, but it's not
ANSI standard (though maybe it should be). I know that boost also has
some way to get the file size, and


Yes, at:

http://www.boost.org/libs/filesystem....htm#file_size

I imagine the implementation calls stat on Windows and Linux, etc.


Windows: GetFileAttributes()
POSIX: stat()

Jeff Flinn
Jul 22 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

44
by: Xah Lee | last post by:
here's a large exercise that uses what we built before. suppose you have tens of thousands of files in various directories. Some of these files are identical, but you don't know which ones are...
4
by: Lad | last post by:
Hi, What is the best method for comparing two files by words? I was thinking about reading files by words and compare them but a word in one file can be linked with a new line character ( \n) and...
2
by: SP | last post by:
Hi All, I need to compare two files line by line and copy the differing lines to a new files. I.e. Compare file1 and file2 line by line. Copy only differing lines to file3. I tried a couple...
8
by: pjsimon | last post by:
I want to compare two files like MS Visual SourceSafe's Show Differences feature. Is there a way to access methods in VB.Net that will let me use existing MS code to show the differences between...
3
by: shona | last post by:
Hi, can any one told me how to compare files with same name but different extension.. for eg. if a.txt & a.doc then ans is same files.. Thanks
0
by: ds81 | last post by:
I am trying to read a large number of image (BMP, JPG) files, and need to know if any are identical. I have been trying to store the hashcodes of the files, so that they then can be compared later. ...
4
by: Clay Hobbs | last post by:
I am making a program that (with urllib) that downloads two jpeg files and, if they are different, displays the new one. I need to find a way to compare two files in Python. How is this done? ...
0
by: norseman | last post by:
Timothy Grant wrote: =================================== If you are on a Unix platform: man cmp man identify man display (ImageMagick) gimp If you use mc (MidnightCommander) the F3 key can...
0
by: zw | last post by:
Hi I have 2 log files, each with a timestamp on the first 2 fields. However, when I do a awk '/ / {print $1,$2}' logs/x.log on a log file, it is complicated by the fact that I also get other...
3
by: Susan StLouis | last post by:
I'm writing a program that can be used to compare files. The program features a select that contains a list of files. After selecting several of the files. a "Biggest" button can be pushed to find...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.