473,804 Members | 4,408 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Duplicate file checker in C#?

rob
Does anyone know of a duplicate file checker project in C#? Couldn't
locate anything on CodeProject or SourceForge.

Has anyone here considered writing one?
Sep 15 '06 #1
5 7191
On Fri, 15 Sep 2006 06:44:31 -0400, rob <ro*@nospam.com wrote:
>Does anyone know of a duplicate file checker project in C#? Couldn't
locate anything on CodeProject or SourceForge.

Has anyone here considered writing one?
What do you mean by duplicate file checker?

Do you want to compare the content of two files, or do you want to see if a file
exists in more than one place on a drive or drives?
Good luck with your project,

Otis Mukinfus
http://www.arltex.com
http://www.tomchilders.com
Sep 15 '06 #2
Rob
On Fri, 15 Sep 2006 07:04:27 -0500, Otis Mukinfus
<ph***@emailadd ress.comwrote:
>On Fri, 15 Sep 2006 06:44:31 -0400, rob <ro*@nospam.com wrote:
>>Does anyone know of a duplicate file checker project in C#? Couldn't
locate anything on CodeProject or SourceForge.

Has anyone here considered writing one?
>What do you mean by duplicate file checker?

Do you want to compare the content of two files, or do you want to see if a file
exists in more than one place on a drive or drives?
I should have said "Finder" rather than "Checker".

Dupe finders usually track down multiple copies of one file existing
within a set of folders. Used for hunting down disk-hogging
duplicates of large files. Differences in commercial/PD dupe-finders
are primarily the UI, but there are also variations on the method for
fingerprinting files (no assumptions are made that the names or dates
are identical). The usual approach is to identify files by doing an
MD5 or sorting by size and doing a byte-by-byte compare (BTW, I can't
see why the MD5 would be any faster than byte-by-byte, except if more
than two copies of one file are present).

So it's a matter of recursing through folder structures, logging
files, then finding out if they are duplicates. The process after
that is usually where things are missing. Everyone has their own
ideas about how to deal with the dupes after they are located.

Given the need to customize the UI, I thought this would be one of the
most-hacked types of programs out there, but I found nothing in C# on
Sourceforge.

By the way, my own interest is just for my own use, not for any
commercial endeavor. It would be a cool thing to post as a community
effort, so I was surprised it had not been done.
Sep 15 '06 #3
On Fri, 15 Sep 2006 19:07:06 -0400, Rob <Ro*@nospam.com wrote:
>On Fri, 15 Sep 2006 07:04:27 -0500, Otis Mukinfus
<ph***@emailad dress.comwrote:
>>On Fri, 15 Sep 2006 06:44:31 -0400, rob <ro*@nospam.com wrote:
>>>Does anyone know of a duplicate file checker project in C#? Couldn't
locate anything on CodeProject or SourceForge.

Has anyone here considered writing one?
>>What do you mean by duplicate file checker?

Do you want to compare the content of two files, or do you want to see if a file
exists in more than one place on a drive or drives?

I should have said "Finder" rather than "Checker".

Dupe finders usually track down multiple copies of one file existing
within a set of folders. Used for hunting down disk-hogging
duplicates of large files. Differences in commercial/PD dupe-finders
are primarily the UI, but there are also variations on the method for
fingerprinti ng files (no assumptions are made that the names or dates
are identical). The usual approach is to identify files by doing an
MD5 or sorting by size and doing a byte-by-byte compare (BTW, I can't
see why the MD5 would be any faster than byte-by-byte, except if more
than two copies of one file are present).

So it's a matter of recursing through folder structures, logging
files, then finding out if they are duplicates. The process after
that is usually where things are missing. Everyone has their own
ideas about how to deal with the dupes after they are located.

Given the need to customize the UI, I thought this would be one of the
most-hacked types of programs out there, but I found nothing in C# on
Sourceforge.

By the way, my own interest is just for my own use, not for any
commercial endeavor. It would be a cool thing to post as a community
effort, so I was surprised it had not been done.
I was interested when I saw your post because a co-worker of mine has been given
a similar task. His assignment was to write something that compares the files
on two servers to determine if both have the same set of files. Actually I'm
glad he got the task rather than me. I think he will probably use the FileInfo
and DirectoryInfo classes to find duplicate names, then as you say decide how to
determine if files with the same name truly are the same file. After that he'll
have to figure out which is the correct one.

Regarding the solution to your project. It sounds like you have the methodology
worked out. Time to start coding ;o)
Good luck with your project,

Otis Mukinfus
http://www.arltex.com
http://www.tomchilders.com
Sep 16 '06 #4
Rob
On Fri, 15 Sep 2006 22:26:54 -0500, Otis Mukinfus
<ph***@emailadd ress.comwrote:
>I was interested when I saw your post because a co-worker of mine has been given
a similar task. His assignment was to write something that compares the files
on two servers to determine if both have the same set of files. Actually I'm
glad he got the task rather than me. I think he will probably use the FileInfo
and DirectoryInfo classes to find duplicate names, then as you say decide how to
determine if files with the same name truly are the same file.
If he doesn't need to do that in C#, he could use "Beyond Compare"
(www.ScooterSoftware.com), an excellent folder comparison program.
There may be a way to use it from C# using its plugin interface, but I
haven't tried that.

I need to do a generalized global search, and I can't count on the
file names being the same, so I can't go that route. Looks like I'll
have to write mine from the ground up. Amazing that there's no C#
code available for this.
Sep 17 '06 #5
On Sun, 17 Sep 2006 02:43:59 -0400, Rob <Ro*@nospam.com wrote:
>On Fri, 15 Sep 2006 22:26:54 -0500, Otis Mukinfus
<ph***@emailad dress.comwrote:
>>I was interested when I saw your post because a co-worker of mine has been given
a similar task. His assignment was to write something that compares the files
on two servers to determine if both have the same set of files. Actually I'm
glad he got the task rather than me. I think he will probably use the FileInfo
and DirectoryInfo classes to find duplicate names, then as you say decide how to
determine if files with the same name truly are the same file.

If he doesn't need to do that in C#, he could use "Beyond Compare"
(www.ScooterSoftware.com), an excellent folder comparison program.
There may be a way to use it from C# using its plugin interface, but I
haven't tried that.

I need to do a generalized global search, and I can't count on the
file names being the same, so I can't go that route. Looks like I'll
have to write mine from the ground up. Amazing that there's no C#
code available for this.
Thanks, Rob. I'll pass that on to him.
Good luck with your project,

Otis Mukinfus
http://www.arltex.com
http://www.tomchilders.com
Sep 18 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
2336
by: Lowell Kirsh | last post by:
I have a script which I use to find all duplicates of files within a given directory and all its subdirectories. It seems like it's longer than it needs to be but I can't figure out how to shorten it. Perhaps there are some python features or libraries I'm not taking advantage of. The way it works is that it puts references to all the files in a dictionary with file size being the key. The dictionary can hold multiple values per key....
44
4074
by: Xah Lee | last post by:
here's a large exercise that uses what we built before. suppose you have tens of thousands of files in various directories. Some of these files are identical, but you don't know which ones are identical with which. Write a program that prints out which file are redundant copies. Here's the spec. -------------------------- The program is to be used on the command line. Its arguments are one or
7
3518
by: Hank Reed | last post by:
I am trying to use the spell checker on an unbound control in Access 2000. I run the checker in the AfterUpdate event of the control. After the spell checker is done, I get the following message: The Macro or Function set to the BeforeUpdate or ValidationRule Property for this field is preventing MS Access from saving the data in the field. I have no validation code or mask and no code in the Before Update event.
2
7705
by: news | last post by:
I just upgraded to PHP 4.4.2 on my Slackware 10.2 system. And Apache/mySQL/PHP all work great through a browser. No errors. But when I try to run a PHP script through the command line, which I need to do, I get blocks of errors like: root@slackserve:/var/www/htdocs# php ./phptest.php PHP Warning: Unknown(): Unable to load dynamic library '/usr/lib/php/extensions/mysql.so' - libmysqlclient.so.14: cannot open shared object file: No such...
8
2118
by: Joe | last post by:
Hello All: Does anyone know of a spell checker that works with .NET? Any options will be welcome. TIA, -- Joe
1
1905
by: David Shorthouse | last post by:
Hey folks, I have a jog file upload routine & I'd like to also edit the meta data associated with the uploaded file. I'm not too interested in the exif metadata, but the simple file system metadata. Is there a way to do this with asp? Dave --
4
6256
by: sweetguy1only | last post by:
Hi all, I am a MS Access developer using VB 6 (yes, I know it is a bit old). The problem I am having is, I have a software that allows my customers to put in the information of their clients. A client of one of my customer has a last name of "Cotten". Whenever they put in the last name as "Cotten", the spell checker automatically converts it to "Cotton". The thing is, I am sending them the run-time version of Access. Many of my...
6
10834
by: Neil | last post by:
Is there way to have control over the MS-Access spell checking (besides just launching it)? We want to tell it to check all records, but skip certain fields (or, alternatively, ONLY check certain fields). Is that possible? Alternatively, if that's not, we noticed that the spell checker skips fields that are disabled. So one could disable the fields to be skipped; run the spell checker; and then re-enable those fields when done. But how...
9
6277
by: ARC | last post by:
Hello all, I developed a tool a year or so ago for adding your own spell-checker to an access application. This is mainly for those using the runtime, as you can't distribute the spell-checker ability. After many complaints from my runtime customers, I decided to develop my own, which was a good challenge. I just wanted to give something back to the board / fellow access developers and offer the source for free. This board has been,...
0
9706
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9579
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10332
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10320
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10077
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7620
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5521
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4299
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2991
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.