473,732 Members | 2,217 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Deciding whether two files are the same

SzH
Suppose that there is a program that takes two files as its command
line arguments. Is there a (cross platform) way to decide whether the
two files are the same? Simple string comparison is not enough as the
two files might be specified as "file.txt" and "./file.txt", or one of
them may be a symlink to the other.

[I've already posted this 30 min ago but it didn't show up in Google
Groups---sorry if some people get it twice.]
Jan 24 '08
28 2496
Pavel <dot_com_yahoo@ paultolk_revers e.yourselfwrite s:
On the file systems complying to UNIX conventions (which is where
harlinks are mostly met), you could compare the file system and
inode. Now, a perfect comparison of file systems is a challenge in
itself but often you can reasonably know the files belong to the same
file system (if they are in the same directory, for example, and not
symlinks).
It's fairly simple. All you have to do is stat(2) the files and compare
st_dev and st_ino fields of stat structure returned by those calls.

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86*tlen.pl >--<jid:mina86*jab ber.org>--ooO--(_)--Ooo--
Jan 27 '08 #21
On Jan 27, 2:24 pm, Michal Nazarewicz <min...@tlen.pl wrote:
Pavel <dot_com_yahoo@ paultolk_revers e.yourselfwrite s:
On the file systems complying to UNIX conventions (which is where
harlinks are mostly met), you could compare the file system and
inode. Now, a perfect comparison of file systems is a challenge in
itself but often you can reasonably know the files belong to the same
file system (if they are in the same directory, for example, and not
symlinks).
It's fairly simple. All you have to do is stat(2) the files and compare
st_dev and st_ino fields of stat structure returned by those calls.
If they're the same, the files are part of the same file system
(I'm pretty sure). If they're different, you don't know.

--
James Kanze (GABI Software) email:ja******* **@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jan 28 '08 #22
James Kanze <ja*********@gm ail.comwrites:
On Jan 27, 2:24 pm, Michal Nazarewicz <min...@tlen.pl wrote:
>Pavel <dot_com_yahoo@ paultolk_revers e.yourselfwrite s:
On the file systems complying to UNIX conventions (which is where
harlinks are mostly met), you could compare the file system and
inode. Now, a perfect comparison of file systems is a challenge in
itself but often you can reasonably know the files belong to the same
file system (if they are in the same directory, for example, and not
symlinks).
>It's fairly simple. All you have to do is stat(2) the files and compare
st_dev and st_ino fields of stat structure returned by those calls.

If they're the same, the files are part of the same file system
(I'm pretty sure). If they're different, you don't know.
If they are different either their inode number or device number
differ. If both inode and device number are the same the files are the
same. The problem is that you don't know if the files are different if
either inode number or device number differ (as it was discussed earlier
on an example of NFS directory mounted using two different IPs).

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86*tlen.pl >--<jid:mina86*jab ber.org>--ooO--(_)--Ooo--
Jan 28 '08 #23
Michal Nazarewicz wrote:
James Kanze <ja*********@gm ail.comwrites:
On Jan 27, 2:24 pm, Michal Nazarewicz <min...@tlen.pl wrote:
Pavel <dot_com_yahoo@ paultolk_revers e.yourselfwrite s:
On the file systems complying to UNIX conventions (which is where
harlinks are mostly met), you could compare the file system and
inode. Now, a perfect comparison of file systems is a challenge in
itself but often you can reasonably know the files belong to the same
file system (if they are in the same directory, for example, and not
symlinks).
It's fairly simple. All you have to do is stat(2) the files and compare
st_dev and st_ino fields of stat structure returned by those calls.
If they're the same, the files are part of the same file system
(I'm pretty sure). If they're different, you don't know.
If they are different either their inode number or device number
differ.
If they differ in their inode number, they are different. If
the device number differs, they might be different, or they
might not be. It's a fairly frequent occurence for the same
file system to be mounted with different inode numbers.
If both inode and device number are the same the files are the
same. The problem is that you don't know if the files are
different if either inode number or device number differ (as
it was discussed earlier on an example of NFS directory
mounted using two different IPs).
That's what I've been saying, and contradicts what you first
said. I think that if the inode numbers are different, the
files are different, but I've seen identical files with
different device numbers.

--
James Kanze (GABI Software) email:ja******* **@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jan 28 '08 #24
da************@ gmail.com wrote:
On Jan 24, 12:01 pm, SzH <szhor...@gmail .comwrote:
>Suppose that there is a program that takes two files as its command
line arguments. Is there a (cross platform) way to decide whether the
two files are the same? Simple string comparison is not enough as the
two files might be specified as "file.txt" and "./file.txt", or one of
them may be a symlink to the other.

[I've already posted this 30 min ago but it didn't show up in Google
Groups---sorry if some people get it twice.]

This might have some logical solution.
I would imagine that if you open the file in "exclusive-write" mode
and try to open the other one you can check if the files are the same.
What is someone else was entertaining herself opening one of the files
in "exclusive-write" mode while we were doing same? :-)

-Pavel
Jan 29 '08 #25
In article <5fc93358-28ec-4d62-85d1-134574a6a8d2
@s13g2000prd.go oglegroups.com> , ja*********@gma il.com says...
On Jan 27, 2:24 pm, Michal Nazarewicz <min...@tlen.pl wrote:
[ ... ]
It's fairly simple. All you have to do is stat(2) the files and compare
st_dev and st_ino fields of stat structure returned by those calls.

If they're the same, the files are part of the same file system
(I'm pretty sure). If they're different, you don't know.
That depends a bit on viewpoint. Quite a few distributed file systems
provide a situation in which what's logically considered a single file
resides on a number of different machines. I.e. you have one logical
file system living on top of a number of physical file systems (so to
speak).

Such a system normally provides some unambiguous way to identify a file
(necessary for its own bookkeeping) but using it isn't portable. Each
system normally has a proxy entry in its own file system, so comparing
files on that system works just fine -- but two device/inode pairs on
two separate systems might actually refer to the same file so writes to
one will show up when reading the other.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jan 29 '08 #26
On Jan 29, 7:11 am, Jerry Coffin <jcof...@taeus. comwrote:
In article <5fc93358-28ec-4d62-85d1-134574a6a8d2
@s13g2000prd.go oglegroups.com> , james.ka...@gma il.com says...
On Jan 27, 2:24 pm, Michal Nazarewicz <min...@tlen.pl wrote:
[ ... ]
It's fairly simple. All you have to do is stat(2) the files and compare
st_dev and st_ino fields of stat structure returned by those calls.
If they're the same, the files are part of the same file system
(I'm pretty sure). If they're different, you don't know.
That depends a bit on viewpoint. Quite a few distributed file systems
provide a situation in which what's logically considered a single file
resides on a number of different machines. I.e. you have one logical
file system living on top of a number of physical file systems (so to
speak).
Such a system normally provides some unambiguous way to identify a file
(necessary for its own bookkeeping) but using it isn't portable. Each
system normally has a proxy entry in its own file system, so comparing
files on that system works just fine -- but two device/inode pairs on
two separate systems might actually refer to the same file so writes to
one will show up when reading the other.
I'm not sure that that's relevant here. Regardless of where the
files reside, you get all of the files below a single mount
point from a single file server. And you can always get the
same files, mounted elsewhere, through a different server, or a
different connection to the same server. Files accessed through
different mount points have different device numbers.

Note that Windows has similar problems. I don't know the
Windows equivalents of inode numbers and device numbers, but you
can certainly mount the same file through different mount
points, either using SMB or using NFS. And as far as I can
tell, the protocols really provide no way of determining where
the file really comes from.

--
James Kanze (GABI Software) email:ja******* **@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jan 29 '08 #27
In article <91b48e6d-cc9e-481c-bb75-
66**********@e6 g2000prf.google groups.com>, ja*********@gma il.com says...

[ ... ]
I'm not sure that that's relevant here. Regardless of where the
files reside, you get all of the files below a single mount
point from a single file server. And you can always get the
same files, mounted elsewhere, through a different server, or a
different connection to the same server. Files accessed through
different mount points have different device numbers.
In a distributed file system, you generally have several servers that
all carry the same files, and one file might be accessible from a number
of different servers.

In most cases, you have at least some degree of location transparency --
i.e. it'll typically support some sort of path that gets resolved to a
server/file combination by file system itself. In most cases, however,
you can also access those files directly from the individual servers as
well...
Note that Windows has similar problems. I don't know the
Windows equivalents of inode numbers and device numbers, but you
can certainly mount the same file through different mount
points, either using SMB or using NFS. And as far as I can
tell, the protocols really provide no way of determining where
the file really comes from.
Oh, absolutely -- I certainly didn't intend to imply that this was
unique to Unix by any means. I just used Unix terminology because that
was already being used in the thread. The same basic problem can arise
in many different systems, though it's also true that there really
aren't that many different OSes any more -- most of what's left is
Windows and various clones of Unix (and somebody who previously dealt
with substantially different systems could be forgiven for thinking of
Windows as a Unix clone...)

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jan 30 '08 #28
James Kanze <ja*********@gm ail.comwrites:
Michal Nazarewicz wrote:
>James Kanze <ja*********@gm ail.comwrites:
On Jan 27, 2:24 pm, Michal Nazarewicz <min...@tlen.pl wrote:
Pavel <dot_com_yahoo@ paultolk_revers e.yourselfwrite s:
On the file systems complying to UNIX conventions (which is where
harlinks are mostly met), you could compare the file system and
inode. Now, a perfect comparison of file systems is a challenge in
itself but often you can reasonably know the files belong to the same
file system (if they are in the same directory, for example, and not
symlinks).
>It's fairly simple. All you have to do is stat(2) the files and compare
st_dev and st_ino fields of stat structure returned by those calls.
If they're the same, the files are part of the same file system
(I'm pretty sure). If they're different, you don't know.
>If they are different either their inode number or device number
differ.

If they differ in their inode number, they are different. If
the device number differs, they might be different, or they
might not be. It's a fairly frequent occurence for the same
file system to be mounted with different inode numbers.
I'm not saying that's not the case.
>
>If both inode and device number are the same the files are the
same. The problem is that you don't know if the files are
different if either inode number or device number differ (as
it was discussed earlier on an example of NFS directory
mounted using two different IPs).

That's what I've been saying, and contradicts what you first
said. I think that if the inode numbers are different, the
files are different, but I've seen identical files with
different device numbers.
And I've never said anything different.

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86*tlen.pl >--<jid:mina86*jab ber.org>--ooO--(_)--Ooo--
Jan 30 '08 #29

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
2030
by: KN | last post by:
I know both are pretty much the same and it comes down to personal choice. But I have to make the choice for the team. Things so far that I am considering 1. XML documentation in C# -- thats good.. not there in VB.net?? 2. Some language features of VB.Net like Redim(makes it easier for developers), but not good enough reason. 3. C# is in more like standard languages and key words used are more
0
253
by: Saurabh | last post by:
I require urgently guidelines to aid in deciding whether to go for a FAT Client Application or a Client-Server Application model. Like Pros and Cons related to the Performance issue in both the above models seeking your help ....... thanx in advance Saurabh
6
1845
by: Charlie Garrett-Jones | last post by:
i have a server side generated web application that provides parameter forms and always opens the associated reports in a new browser window. the code that controls the opening of the new browser window is simply html target='_blank'. but now there is some business rules that may be checked server side only. so if the data provided fails, i want the same target window as the parameter form and if it passes i want a new window. can anyone...
2
1293
by: Fred Mertz | last post by:
I'd like to know the rationalle some of you use for deciding which members of System.IO you use to do your file I/O. It appears that there are many ways to accomplish any task (reading from a text file, for example). Did the framework designers have any particular *intention* for how we're supposed to do these basic tasks? Just looking for the "pit of success" in System.IO if there is one. For example, when a developer wants to open a...
0
9447
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9307
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9181
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8186
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6735
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6031
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4550
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4809
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3261
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.