473,322 Members | 1,714 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

What's the cleanest way to compare 2 dictionary?

Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Thanks,

Aug 9 '06 #1
11 14440

John Henry wrote:
Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Thanks,
I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values

Something like:

a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:-3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common - common_eq
If you now simple set arithmatic, it should read OK.

- Paddy.

Aug 9 '06 #2

Paddy wrote:
John Henry wrote:
Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Thanks,
I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values

Something like:

a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:-3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common - common_eq
If you now simple set arithmatic, it should read OK.

- Paddy.
Thanks, that's very clean. Give me good reason to move up to Python
2.4.

Aug 9 '06 #3

John Henry wrote:
Paddy wrote:
John Henry wrote:
Hi list,
>
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
>
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
>
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
>
Thanks,
I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values

Something like:

a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:-3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common - common_eq
If you now simple set arithmatic, it should read OK.

- Paddy.

Thanks, that's very clean. Give me good reason to move up to Python
2.4.
Oh, wait, works in 2.3 too.

Just have to:

from sets import Set as set

Aug 9 '06 #4

John Henry wrote:
Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.

Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:

a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -2 to
save typing :-)

Now that happens if the other dictionary contains:

b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}

Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?

All comparisons are equal, but some comparisons are more equal than
others :-)

Cheers,
John

Aug 9 '06 #5
John,

Yes, there are several scenerios.

a) Comparing keys only.

That's been answered (although I haven't gotten it to work under 2.3
yet)

b) Comparing records.

Now it gets more fun - as you pointed out. I was assuming that there
is no short cut here. If the key exists on both set, and if I wish to
know if the records are the same, I would have to do record by record
comparsion. However, since there are only a handful of records per
key, this wouldn't be so bad. Maybe I just overload the compare
operator or something.

John Machin wrote:
John Henry wrote:
Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.

Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:

a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -2 to
save typing :-)

Now that happens if the other dictionary contains:

b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}

Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?

All comparisons are equal, but some comparisons are more equal than
others :-)

Cheers,
John
Aug 10 '06 #6

John Machin wrote:
John Henry wrote:
Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.

Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:

a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -2 to
save typing :-)

Now that happens if the other dictionary contains:

b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}

Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?

All comparisons are equal, but some comparisons are more equal than
others :-)

Cheers,
John
Hi Johns,
The following is my attempt to give more/deeper comparison info.
Assume you have your data parsed and presented as two dicts a and b
each having as values a dict representing a record.
Further assume you have a function that can compute if two record level
dicts are the same and another function that can compute if two values
in a record level dict are the same.

With a slight modification of my earlier prog we get:

def komparator(a,b, check_equal):
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if check_equal(a[k],b[k]))
common_neq = _common - common_eq
return (a_xclusive, b_xclusive, common_eq, common_neq)

a_xclusive, b_xclusive, common_eq, common_neq = komparator(a,b,
record_dict__equality_checker)

common_neq = [ (key,
komparator(a[key],b[key], value__equality_checker) )
for key in common_neq ]

Now we get extra info on intra record differences with little extra
code.

Look out though, you could get swamped with data :-)

- Paddy.

Aug 10 '06 #7
John Henry wrote:
John,

Yes, there are several scenerios.

a) Comparing keys only.

That's been answered (although I haven't gotten it to work under 2.3
yet)
(1) What's the problem with getting it to work under 2.3?
(2) Why not upgrade?
>
b) Comparing records.
You haven't got that far yet. The next problem is actually comparing
two *collections* of records, and you need to decide whether for
equality purposes the collections should be treated as an unordered
list, an ordered list, a set, or something else. Then you need to
consider how equality of records is to be defined e.g. case sensitive
or not.
>
Now it gets more fun - as you pointed out. I was assuming that there
is no short cut here. If the key exists on both set, and if I wish to
know if the records are the same, I would have to do record by record
comparsion. However, since there are only a handful of records per
key, this wouldn't be so bad. Maybe I just overload the compare
operator or something.
IMHO, "something" would be better than "overload the compare operator".
In any case, you need to DEFINE what you mean by equality of a
collection of records, *then* implement it.

"only a handful":. Naturally 0 and 1 are special, but otherwise the
number of records in the bag shoudn't really be a factor in your
implementation.

HTH,
John

Aug 10 '06 #8

John Machin wrote:
John Henry wrote:
John,

Yes, there are several scenerios.

a) Comparing keys only.

That's been answered (although I haven't gotten it to work under 2.3
yet)

(1) What's the problem with getting it to work under 2.3?
(2) Why not upgrade?
Let me comment on this part first, I am still chewing other parts of
your message.

When I do it under 2.3, I get:

common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax

Don't know why that is.

I can't upgrade yet. Some part of my code doesn't compile under 2.4
and I haven't got a chance to investigate further.

Aug 11 '06 #9
In <11*********************@h48g2000cwc.googlegroups. com>, John Henry
wrote:
When I do it under 2.3, I get:

common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax

Don't know why that is.
There are no generator expressions in 2.3. Turn it into a list
comprehension::

common_eq = set([k for k in _common if a[k] == b[k]])

Ciao,
Marc 'BlackJack' Rintsch
Aug 11 '06 #10
Thank you. That works.
Marc 'BlackJack' Rintsch wrote:
In <11*********************@h48g2000cwc.googlegroups. com>, John Henry
wrote:
When I do it under 2.3, I get:

common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax

Don't know why that is.

There are no generator expressions in 2.3. Turn it into a list
comprehension::

common_eq = set([k for k in _common if a[k] == b[k]])

Ciao,
Marc 'BlackJack' Rintsch
Aug 11 '06 #11

I have gone the whole hog and got something thats run-able:

========dict_diff.py=============================

from pprint import pprint as pp

a = {1:{'1':'1'}, 2:{'2':'2'}, 3:dict("AA BB CC".split()), 4:{'4':'4'}}
b = { 2:{'2':'2'}, 3:dict("BB CD EE".split()), 5:{'5':'5'}}
def record_comparator(a,b, check_equal):
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if check_equal(a[k],b[k]))
common_neq = _common - common_eq
return {"A excl keys":a_xclusive, "B excl keys":b_xclusive,
"Common & eq":common_eq, "Common keys neq
values":common_neq}

comp_result = record_comparator(a,b, dict.__eq__)

# Further dataon common keys, neq values
common_neq = comp_result["Common keys neq values"]
common_neq = [ (key, record_comparator(a[key],b[key], str.__eq__))
for key in common_neq ]
comp_result["Common keys neq values"] = common_neq

print "\na =",; pp(a)
print "\nb =",; pp(b)
print "\ncomp_result = " ; pp(comp_result)

==========================================

When run it gives:

a ={1: {'1': '1'},
2: {'2': '2'},
3: {'A': 'A', 'C': 'C', 'B': 'B'},
4: {'4': '4'}}

b ={2: {'2': '2'}, 3: {'C': 'D', 'B': 'B', 'E': 'E'}, 5: {'5': '5'}}

comp_result =
{'A excl keys': set([1, 4]),
'B excl keys': set([5]),
'Common & eq': set([2]),
'Common keys neq values': [(3,
{'A excl keys': set(['A']),
'B excl keys': set(['E']),
'Common & eq': set(['B']),
'Common keys neq values': set(['C'])})]}
- Paddy.

Aug 11 '06 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: William Stacey [MVP] | last post by:
Trying to figure out Dictionary<> and using CaseInsensitive Comparer<> like I did with normal Hashtable. The Hashtable can take a case insenstive Comparer and a Case insensitive HashCode provider....
21
by: Helge Jensen | last post by:
I've got some data that has Set structure, that is membership, insert and delete is fast (O(1), hashing). I can't find a System.Collections interface that matches the operations naturally offered...
59
by: Chris Dunaway | last post by:
The C# 3.0 spec (http://msdn.microsoft.com/vcsharp/future/) contains a feature called "Implicitly typed local variables". The type of the variable is determined at compile time based on the...
2
by: Locia | last post by:
How can I compare "if argument"? example: if (leftExpression==RightExpression) After parsing I know the type of RightExpression. I suppone that if RightExpression is wrap into " " is a...
50
by: lovecreatesbea... | last post by:
Could you extract examples of the characteristics of C itself to demonstrate what the advantages of C are? What are its pleasant, expressive and versatile characteristics? And what are its...
6
by: Tony | last post by:
Hello! My first question: I just can't figure out what is the usefulness of Comparer.Default.Compare(somestring1, somestring2); because I can just the same use...
7
by: shapper | last post by:
Hello, I have two lists, A and B, of a same class which has two properties: ID and Name A items have only the Name defined. B items have the ID and the Name defined. I want to create a...
21
by: Peter Duniho | last post by:
On Fri, 18 Jul 2008 07:03:37 -0700, Ben Voigt <rbv@nospam.nospamwrote: I agree whole-heartedly about being closer to Java. But the OP didn't ask about Java. :) I disagree on the...
14
by: Jukka K. Korpela | last post by:
pecan wrote: You mean you want some extra hard labor and decided that xhtml is a good way to get deep into pointless trouble. In that case, you are quite right. A bit more? There _is_ such...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.