469,352 Members | 2,054 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,352 developers. It's quick & easy.

What's the cleanest way to compare 2 dictionary?

Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Thanks,

Aug 9 '06 #1
11 14248

John Henry wrote:
Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Thanks,
I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values

Something like:

a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:-3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common - common_eq
If you now simple set arithmatic, it should read OK.

- Paddy.

Aug 9 '06 #2

Paddy wrote:
John Henry wrote:
Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Thanks,
I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values

Something like:

a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:-3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common - common_eq
If you now simple set arithmatic, it should read OK.

- Paddy.
Thanks, that's very clean. Give me good reason to move up to Python
2.4.

Aug 9 '06 #3

John Henry wrote:
Paddy wrote:
John Henry wrote:
Hi list,
>
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
>
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
>
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
>
Thanks,
I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values

Something like:

a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:-3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common - common_eq
If you now simple set arithmatic, it should read OK.

- Paddy.

Thanks, that's very clean. Give me good reason to move up to Python
2.4.
Oh, wait, works in 2.3 too.

Just have to:

from sets import Set as set

Aug 9 '06 #4

John Henry wrote:
Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.

Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:

a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -2 to
save typing :-)

Now that happens if the other dictionary contains:

b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}

Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?

All comparisons are equal, but some comparisons are more equal than
others :-)

Cheers,
John

Aug 9 '06 #5
John,

Yes, there are several scenerios.

a) Comparing keys only.

That's been answered (although I haven't gotten it to work under 2.3
yet)

b) Comparing records.

Now it gets more fun - as you pointed out. I was assuming that there
is no short cut here. If the key exists on both set, and if I wish to
know if the records are the same, I would have to do record by record
comparsion. However, since there are only a handful of records per
key, this wouldn't be so bad. Maybe I just overload the compare
operator or something.

John Machin wrote:
John Henry wrote:
Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.

Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:

a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -2 to
save typing :-)

Now that happens if the other dictionary contains:

b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}

Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?

All comparisons are equal, but some comparisons are more equal than
others :-)

Cheers,
John
Aug 10 '06 #6

John Machin wrote:
John Henry wrote:
Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.

Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:

a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -2 to
save typing :-)

Now that happens if the other dictionary contains:

b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}

Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?

All comparisons are equal, but some comparisons are more equal than
others :-)

Cheers,
John
Hi Johns,
The following is my attempt to give more/deeper comparison info.
Assume you have your data parsed and presented as two dicts a and b
each having as values a dict representing a record.
Further assume you have a function that can compute if two record level
dicts are the same and another function that can compute if two values
in a record level dict are the same.

With a slight modification of my earlier prog we get:

def komparator(a,b, check_equal):
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if check_equal(a[k],b[k]))
common_neq = _common - common_eq
return (a_xclusive, b_xclusive, common_eq, common_neq)

a_xclusive, b_xclusive, common_eq, common_neq = komparator(a,b,
record_dict__equality_checker)

common_neq = [ (key,
komparator(a[key],b[key], value__equality_checker) )
for key in common_neq ]

Now we get extra info on intra record differences with little extra
code.

Look out though, you could get swamped with data :-)

- Paddy.

Aug 10 '06 #7
John Henry wrote:
John,

Yes, there are several scenerios.

a) Comparing keys only.

That's been answered (although I haven't gotten it to work under 2.3
yet)
(1) What's the problem with getting it to work under 2.3?
(2) Why not upgrade?
>
b) Comparing records.
You haven't got that far yet. The next problem is actually comparing
two *collections* of records, and you need to decide whether for
equality purposes the collections should be treated as an unordered
list, an ordered list, a set, or something else. Then you need to
consider how equality of records is to be defined e.g. case sensitive
or not.
>
Now it gets more fun - as you pointed out. I was assuming that there
is no short cut here. If the key exists on both set, and if I wish to
know if the records are the same, I would have to do record by record
comparsion. However, since there are only a handful of records per
key, this wouldn't be so bad. Maybe I just overload the compare
operator or something.
IMHO, "something" would be better than "overload the compare operator".
In any case, you need to DEFINE what you mean by equality of a
collection of records, *then* implement it.

"only a handful":. Naturally 0 and 1 are special, but otherwise the
number of records in the bag shoudn't really be a factor in your
implementation.

HTH,
John

Aug 10 '06 #8

John Machin wrote:
John Henry wrote:
John,

Yes, there are several scenerios.

a) Comparing keys only.

That's been answered (although I haven't gotten it to work under 2.3
yet)

(1) What's the problem with getting it to work under 2.3?
(2) Why not upgrade?
Let me comment on this part first, I am still chewing other parts of
your message.

When I do it under 2.3, I get:

common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax

Don't know why that is.

I can't upgrade yet. Some part of my code doesn't compile under 2.4
and I haven't got a chance to investigate further.

Aug 11 '06 #9
In <11*********************@h48g2000cwc.googlegroups. com>, John Henry
wrote:
When I do it under 2.3, I get:

common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax

Don't know why that is.
There are no generator expressions in 2.3. Turn it into a list
comprehension::

common_eq = set([k for k in _common if a[k] == b[k]])

Ciao,
Marc 'BlackJack' Rintsch
Aug 11 '06 #10
Thank you. That works.
Marc 'BlackJack' Rintsch wrote:
In <11*********************@h48g2000cwc.googlegroups. com>, John Henry
wrote:
When I do it under 2.3, I get:

common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax

Don't know why that is.

There are no generator expressions in 2.3. Turn it into a list
comprehension::

common_eq = set([k for k in _common if a[k] == b[k]])

Ciao,
Marc 'BlackJack' Rintsch
Aug 11 '06 #11

I have gone the whole hog and got something thats run-able:

========dict_diff.py=============================

from pprint import pprint as pp

a = {1:{'1':'1'}, 2:{'2':'2'}, 3:dict("AA BB CC".split()), 4:{'4':'4'}}
b = { 2:{'2':'2'}, 3:dict("BB CD EE".split()), 5:{'5':'5'}}
def record_comparator(a,b, check_equal):
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if check_equal(a[k],b[k]))
common_neq = _common - common_eq
return {"A excl keys":a_xclusive, "B excl keys":b_xclusive,
"Common & eq":common_eq, "Common keys neq
values":common_neq}

comp_result = record_comparator(a,b, dict.__eq__)

# Further dataon common keys, neq values
common_neq = comp_result["Common keys neq values"]
common_neq = [ (key, record_comparator(a[key],b[key], str.__eq__))
for key in common_neq ]
comp_result["Common keys neq values"] = common_neq

print "\na =",; pp(a)
print "\nb =",; pp(b)
print "\ncomp_result = " ; pp(comp_result)

==========================================

When run it gives:

a ={1: {'1': '1'},
2: {'2': '2'},
3: {'A': 'A', 'C': 'C', 'B': 'B'},
4: {'4': '4'}}

b ={2: {'2': '2'}, 3: {'C': 'D', 'B': 'B', 'E': 'E'}, 5: {'5': '5'}}

comp_result =
{'A excl keys': set([1, 4]),
'B excl keys': set([5]),
'Common & eq': set([2]),
'Common keys neq values': [(3,
{'A excl keys': set(['A']),
'B excl keys': set(['E']),
'Common & eq': set(['B']),
'Common keys neq values': set(['C'])})]}
- Paddy.

Aug 11 '06 #12

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by William Stacey [MVP] | last post: by
21 posts views Thread by Helge Jensen | last post: by
2 posts views Thread by Locia | last post: by
50 posts views Thread by lovecreatesbea... | last post: by
7 posts views Thread by shapper | last post: by
21 posts views Thread by Peter Duniho | last post: by
14 posts views Thread by Jukka K. Korpela | last post: by
reply views Thread by suresh191 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.