473,385 Members | 1,798 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Best way to compare the contents of two directories

I have two directory trees that I want to compare and I'm trying to
figure out what the best way of doing this would be. I am using walk
to get a list of all of the files in each directory.

I am using this code to compare the file lists:

def compare_files(first_list, second_list, first_dir, second_dir):
missing = in_first_only(first_list, second_list)
for item in missing:
index = first_list.index(item)
print first_list[index] + ' does not exist in ' +
second_dir[index]
first_list.pop(index); first_dir.pop(index)
return first_list, second_list, first_dir, second_dir

However, before I actually compare the files, I want to compare the
directories and if a directory is mising in either set, I want to
report it:

dir_list_a = ['d:\\results\\foldera\\','d:\\results\\folderb\\', 'd:\\results\\folderc\\']
dir_list_b = ['c:\\results\\foldera\\','c:\\results\\folderb\\']

output:
'folderc' exists in d:\results but not in c:\results
I am using splitall (from the Python Cookbook) to split the paths into
there parts and appending this to a list, but I can't figure out the
best way to compare the contents of the resulting 2 lists and I think
I am starting to make things *too* complicated:

def splitall(path):
"""
Source: Python Cookbook
Credit: Trent Mick

Split a path into all of its parts.
"""
allparts = []
while 1:
parts = os.path.split(path)
if parts[0] == path:
allparts.insert(0, parts[0])
break
elif parts[1] == path:
allparts.insert(0, parts[1])
break
else:
path = parts[0]
allparts.insert(0, parts[1])
return allparts

After using this, I end up with this:

dir_list_a = [['d:\\', 'results', 'foldera', 'd:\\', 'results',
'folderb', 'd:\\', 'results', 'folderc']]
dir_list_b =
[['d:\\', 'results', 'foldera', 'd:\\', 'results', 'folderb']]
Jul 18 '05 #1
6 4510
ro***********@palmsource.com (Robin Siebler) wrote in message news:<95**************************@posting.google. com>...
I have two directory trees that I want to compare and I'm trying to
figure out what the best way of doing this would be. I am using walk
to get a list of all of the files in each directory.


Once you have the two lists, look into difflib. E.g.,

import difflib
for i in difflib.ndiff(list1, list2):
print i

Dan Gass has recently contributed some code that can produce
a side-by-side difference in HTML format. I believe it is in the 2.4
release, but you can also get it from
"""https://sourceforge.net/tracker/?func=detail&atid=305470&aid=914575&group_id=5470" ""

import difflib
tbl = difflib.HtmlDiff().make_file(list1, list2)
f = open("diffs.html", "w")
f.write(tbl)
f.close()

--dang
Jul 18 '05 #2
[Robin Siebler]
However, before I actually compare the files, I want to compare the
directories and if a directory is mising in either set, I want to
report it:


The operative word is "set".

Try using sets.py:
one_only = Set(dirlistone) - Set(dirlisttwo)
two_only = Set(dirlisttwo) - Set(dirlistone)
Raymond Hettinger
Jul 18 '05 #3
"Raymond Hettinger" <vz******@verizon.net> wrote in message news:<Zxc1d.412$W73.185@trndny03>...
[Robin Siebler]
However, before I actually compare the files, I want to compare the
directories and if a directory is mising in either set, I want to
report it:


The operative word is "set".

Try using sets.py:
one_only = Set(dirlistone) - Set(dirlisttwo)
two_only = Set(dirlisttwo) - Set(dirlistone)
Raymond Hettinger


I get the following error:

NameError: name 'Set' is not defined

I'm using ActivePython 2.2.3. Is this something that has been added
to a later version of Python?
Jul 18 '05 #4
On Mon, 2004-09-13 at 11:37 -0700, Robin Siebler wrote:
"Raymond Hettinger" <vz******@verizon.net> wrote in message news:<Zxc1d.412$W73.185@trndny03>...
[Robin Siebler]
However, before I actually compare the files, I want to compare the
directories and if a directory is mising in either set, I want to
report it:


The operative word is "set".

Try using sets.py:
one_only = Set(dirlistone) - Set(dirlisttwo)
two_only = Set(dirlisttwo) - Set(dirlistone)
Raymond Hettinger


I get the following error:

NameError: name 'Set' is not defined

I'm using ActivePython 2.2.3. Is this something that has been added
to a later version of Python?


I think sets were added in 2.3, but either way you must still 'from sets
import Set' before using them.

Regards,
Cliff

--
Cliff Wells <cl************@comcast.net>

Jul 18 '05 #5
Cliff Wells <clifford.wells <at> comcast.net> writes:

I think sets were added in 2.3, but either way you must still 'from sets
import Set' before using them.


It also might be good to get in the habit of writing this as:

from sets import Set as set

so that when you move to Python 2.4, where set() is a builtin, all you have to
do is remove the import.

Python 2.4a3 (#56, Sep 2 2004, 20:50:21) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
set(abs(2*i - 6) for i in range(10))

set([0, 2, 4, 6, 8, 10, 12])

Steve

Jul 18 '05 #6
> > I'm using ActivePython 2.2.3. Is this something that has been added
to a later version of Python?


I think sets were added in 2.3, but either way you must still 'from sets
import Set' before using them.


Right.

Also, sets.py is now Py2.2 compatibility, so you can take the current module off
of CVS and use it directly.
Raymond Hettinger
Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: could ildg | last post by:
I want to compare 2 directories, and find If all of theire sub-folders and files and sub-files are identical.. If not the same, I want know which files or folders are not the same. I know filecmp...
2
by: Edwin M. Turner | last post by:
I would like the directories of my Website and their contents containing images to be visible just like it is when you navigate on your local machine using Explorer. That way anyone can browse the...
0
by: Jeff Higgins | last post by:
Hi all Newbie to XPath. Can someone help with an XPath expression(s) that will accomplish the following using JAXP in J2SE-1.5. Thanks Jeff Higgins In a class I have two private org.w3c.dom...
19
by: David zhu | last post by:
I've got different result when comparing two strings using "==" and string.Compare(). The two strings seems to have same value "1202002" in the quick watch, and both have the same length 7 which I...
26
by: puzzlecracker | last post by:
It'd be interesting to compare the learning practices of c++ practitioners. I'll start with mine The C++ Programming Language C++ Primer Effective C++ More Effective C++ Effective STL The...
3
by: could.net | last post by:
I want to compare 2 directories: dir1 and dir2. What I want to do is to get these informations: 1. does they have the same number of files and sub-directories? 2. does each file with the same name...
6
by: JM | last post by:
I have never used a (content management system) CMS before but I need one for my internship as a webdeveloper. Requirements: runs on Apache, linux or unix, MySQL and PHP (maybe Windows server...
5
by: stamatis32 | last post by:
Hello everybody, i want to make a prog to compare the contents of 2 dirs. You can compare easily 2 files, but how this happend with dirs. I am very confused, i dont know where to start from, can...
1
by: Richard | last post by:
I'm writing code to compare two directories side-by-side, like FolderMatch or FolderSync. I've used ListBoxes in VB6, so have the idea about concatenating strings for list items, but have no...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.