473,386 Members | 1,841 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Duplicate file checker in C#?

rob
Does anyone know of a duplicate file checker project in C#? Couldn't
locate anything on CodeProject or SourceForge.

Has anyone here considered writing one?
Sep 15 '06 #1
5 7147
On Fri, 15 Sep 2006 06:44:31 -0400, rob <ro*@nospam.comwrote:
>Does anyone know of a duplicate file checker project in C#? Couldn't
locate anything on CodeProject or SourceForge.

Has anyone here considered writing one?
What do you mean by duplicate file checker?

Do you want to compare the content of two files, or do you want to see if a file
exists in more than one place on a drive or drives?
Good luck with your project,

Otis Mukinfus
http://www.arltex.com
http://www.tomchilders.com
Sep 15 '06 #2
Rob
On Fri, 15 Sep 2006 07:04:27 -0500, Otis Mukinfus
<ph***@emailaddress.comwrote:
>On Fri, 15 Sep 2006 06:44:31 -0400, rob <ro*@nospam.comwrote:
>>Does anyone know of a duplicate file checker project in C#? Couldn't
locate anything on CodeProject or SourceForge.

Has anyone here considered writing one?
>What do you mean by duplicate file checker?

Do you want to compare the content of two files, or do you want to see if a file
exists in more than one place on a drive or drives?
I should have said "Finder" rather than "Checker".

Dupe finders usually track down multiple copies of one file existing
within a set of folders. Used for hunting down disk-hogging
duplicates of large files. Differences in commercial/PD dupe-finders
are primarily the UI, but there are also variations on the method for
fingerprinting files (no assumptions are made that the names or dates
are identical). The usual approach is to identify files by doing an
MD5 or sorting by size and doing a byte-by-byte compare (BTW, I can't
see why the MD5 would be any faster than byte-by-byte, except if more
than two copies of one file are present).

So it's a matter of recursing through folder structures, logging
files, then finding out if they are duplicates. The process after
that is usually where things are missing. Everyone has their own
ideas about how to deal with the dupes after they are located.

Given the need to customize the UI, I thought this would be one of the
most-hacked types of programs out there, but I found nothing in C# on
Sourceforge.

By the way, my own interest is just for my own use, not for any
commercial endeavor. It would be a cool thing to post as a community
effort, so I was surprised it had not been done.
Sep 15 '06 #3
On Fri, 15 Sep 2006 19:07:06 -0400, Rob <Ro*@nospam.comwrote:
>On Fri, 15 Sep 2006 07:04:27 -0500, Otis Mukinfus
<ph***@emailaddress.comwrote:
>>On Fri, 15 Sep 2006 06:44:31 -0400, rob <ro*@nospam.comwrote:
>>>Does anyone know of a duplicate file checker project in C#? Couldn't
locate anything on CodeProject or SourceForge.

Has anyone here considered writing one?
>>What do you mean by duplicate file checker?

Do you want to compare the content of two files, or do you want to see if a file
exists in more than one place on a drive or drives?

I should have said "Finder" rather than "Checker".

Dupe finders usually track down multiple copies of one file existing
within a set of folders. Used for hunting down disk-hogging
duplicates of large files. Differences in commercial/PD dupe-finders
are primarily the UI, but there are also variations on the method for
fingerprinting files (no assumptions are made that the names or dates
are identical). The usual approach is to identify files by doing an
MD5 or sorting by size and doing a byte-by-byte compare (BTW, I can't
see why the MD5 would be any faster than byte-by-byte, except if more
than two copies of one file are present).

So it's a matter of recursing through folder structures, logging
files, then finding out if they are duplicates. The process after
that is usually where things are missing. Everyone has their own
ideas about how to deal with the dupes after they are located.

Given the need to customize the UI, I thought this would be one of the
most-hacked types of programs out there, but I found nothing in C# on
Sourceforge.

By the way, my own interest is just for my own use, not for any
commercial endeavor. It would be a cool thing to post as a community
effort, so I was surprised it had not been done.
I was interested when I saw your post because a co-worker of mine has been given
a similar task. His assignment was to write something that compares the files
on two servers to determine if both have the same set of files. Actually I'm
glad he got the task rather than me. I think he will probably use the FileInfo
and DirectoryInfo classes to find duplicate names, then as you say decide how to
determine if files with the same name truly are the same file. After that he'll
have to figure out which is the correct one.

Regarding the solution to your project. It sounds like you have the methodology
worked out. Time to start coding ;o)
Good luck with your project,

Otis Mukinfus
http://www.arltex.com
http://www.tomchilders.com
Sep 16 '06 #4
Rob
On Fri, 15 Sep 2006 22:26:54 -0500, Otis Mukinfus
<ph***@emailaddress.comwrote:
>I was interested when I saw your post because a co-worker of mine has been given
a similar task. His assignment was to write something that compares the files
on two servers to determine if both have the same set of files. Actually I'm
glad he got the task rather than me. I think he will probably use the FileInfo
and DirectoryInfo classes to find duplicate names, then as you say decide how to
determine if files with the same name truly are the same file.
If he doesn't need to do that in C#, he could use "Beyond Compare"
(www.ScooterSoftware.com), an excellent folder comparison program.
There may be a way to use it from C# using its plugin interface, but I
haven't tried that.

I need to do a generalized global search, and I can't count on the
file names being the same, so I can't go that route. Looks like I'll
have to write mine from the ground up. Amazing that there's no C#
code available for this.
Sep 17 '06 #5
On Sun, 17 Sep 2006 02:43:59 -0400, Rob <Ro*@nospam.comwrote:
>On Fri, 15 Sep 2006 22:26:54 -0500, Otis Mukinfus
<ph***@emailaddress.comwrote:
>>I was interested when I saw your post because a co-worker of mine has been given
a similar task. His assignment was to write something that compares the files
on two servers to determine if both have the same set of files. Actually I'm
glad he got the task rather than me. I think he will probably use the FileInfo
and DirectoryInfo classes to find duplicate names, then as you say decide how to
determine if files with the same name truly are the same file.

If he doesn't need to do that in C#, he could use "Beyond Compare"
(www.ScooterSoftware.com), an excellent folder comparison program.
There may be a way to use it from C# using its plugin interface, but I
haven't tried that.

I need to do a generalized global search, and I can't count on the
file names being the same, so I can't go that route. Looks like I'll
have to write mine from the ground up. Amazing that there's no C#
code available for this.
Thanks, Rob. I'll pass that on to him.
Good luck with your project,

Otis Mukinfus
http://www.arltex.com
http://www.tomchilders.com
Sep 18 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Lowell Kirsh | last post by:
I have a script which I use to find all duplicates of files within a given directory and all its subdirectories. It seems like it's longer than it needs to be but I can't figure out how to shorten...
44
by: Xah Lee | last post by:
here's a large exercise that uses what we built before. suppose you have tens of thousands of files in various directories. Some of these files are identical, but you don't know which ones are...
7
by: Hank Reed | last post by:
I am trying to use the spell checker on an unbound control in Access 2000. I run the checker in the AfterUpdate event of the control. After the spell checker is done, I get the following message:...
2
by: news | last post by:
I just upgraded to PHP 4.4.2 on my Slackware 10.2 system. And Apache/mySQL/PHP all work great through a browser. No errors. But when I try to run a PHP script through the command line, which I...
8
by: Joe | last post by:
Hello All: Does anyone know of a spell checker that works with .NET? Any options will be welcome. TIA, -- Joe
1
by: David Shorthouse | last post by:
Hey folks, I have a jog file upload routine & I'd like to also edit the meta data associated with the uploaded file. I'm not too interested in the exif metadata, but the simple file system...
4
by: sweetguy1only | last post by:
Hi all, I am a MS Access developer using VB 6 (yes, I know it is a bit old). The problem I am having is, I have a software that allows my customers to put in the information of their clients....
6
by: Neil | last post by:
Is there way to have control over the MS-Access spell checking (besides just launching it)? We want to tell it to check all records, but skip certain fields (or, alternatively, ONLY check certain...
9
by: ARC | last post by:
Hello all, I developed a tool a year or so ago for adding your own spell-checker to an access application. This is mainly for those using the runtime, as you can't distribute the spell-checker...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.