Hi list,
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Thanks, 11 14440
John Henry wrote:
Hi list,
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Thanks,
I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values
Something like:
a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:-3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common - common_eq
If you now simple set arithmatic, it should read OK.
- Paddy.
Paddy wrote:
John Henry wrote:
Hi list,
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Thanks,
I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values
Something like:
a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:-3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common - common_eq
If you now simple set arithmatic, it should read OK.
- Paddy.
Thanks, that's very clean. Give me good reason to move up to Python
2.4.
John Henry wrote:
Paddy wrote:
John Henry wrote:
Hi list,
>
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
>
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
>
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
>
Thanks,
I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values
Something like:
a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:-3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common - common_eq
If you now simple set arithmatic, it should read OK.
- Paddy.
Thanks, that's very clean. Give me good reason to move up to Python
2.4.
Oh, wait, works in 2.3 too.
Just have to:
from sets import Set as set
John Henry wrote:
Hi list,
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.
Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:
a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -2 to
save typing :-)
Now that happens if the other dictionary contains:
b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}
Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?
All comparisons are equal, but some comparisons are more equal than
others :-)
Cheers,
John
John,
Yes, there are several scenerios.
a) Comparing keys only.
That's been answered (although I haven't gotten it to work under 2.3
yet)
b) Comparing records.
Now it gets more fun - as you pointed out. I was assuming that there
is no short cut here. If the key exists on both set, and if I wish to
know if the records are the same, I would have to do record by record
comparsion. However, since there are only a handful of records per
key, this wouldn't be so bad. Maybe I just overload the compare
operator or something.
John Machin wrote:
John Henry wrote:
Hi list,
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.
Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:
a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -2 to
save typing :-)
Now that happens if the other dictionary contains:
b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}
Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?
All comparisons are equal, but some comparisons are more equal than
others :-)
Cheers,
John
John Machin wrote:
John Henry wrote:
Hi list,
I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.
I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.
What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )
Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.
Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:
a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -2 to
save typing :-)
Now that happens if the other dictionary contains:
b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}
Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?
All comparisons are equal, but some comparisons are more equal than
others :-)
Cheers,
John
Hi Johns,
The following is my attempt to give more/deeper comparison info.
Assume you have your data parsed and presented as two dicts a and b
each having as values a dict representing a record.
Further assume you have a function that can compute if two record level
dicts are the same and another function that can compute if two values
in a record level dict are the same.
With a slight modification of my earlier prog we get:
def komparator(a,b, check_equal):
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if check_equal(a[k],b[k]))
common_neq = _common - common_eq
return (a_xclusive, b_xclusive, common_eq, common_neq)
a_xclusive, b_xclusive, common_eq, common_neq = komparator(a,b,
record_dict__equality_checker)
common_neq = [ (key,
komparator(a[key],b[key], value__equality_checker) )
for key in common_neq ]
Now we get extra info on intra record differences with little extra
code.
Look out though, you could get swamped with data :-)
- Paddy.
John Henry wrote:
John,
Yes, there are several scenerios.
a) Comparing keys only.
That's been answered (although I haven't gotten it to work under 2.3
yet)
(1) What's the problem with getting it to work under 2.3?
(2) Why not upgrade?
>
b) Comparing records.
You haven't got that far yet. The next problem is actually comparing
two *collections* of records, and you need to decide whether for
equality purposes the collections should be treated as an unordered
list, an ordered list, a set, or something else. Then you need to
consider how equality of records is to be defined e.g. case sensitive
or not.
>
Now it gets more fun - as you pointed out. I was assuming that there
is no short cut here. If the key exists on both set, and if I wish to
know if the records are the same, I would have to do record by record
comparsion. However, since there are only a handful of records per
key, this wouldn't be so bad. Maybe I just overload the compare
operator or something.
IMHO, "something" would be better than "overload the compare operator".
In any case, you need to DEFINE what you mean by equality of a
collection of records, *then* implement it.
"only a handful":. Naturally 0 and 1 are special, but otherwise the
number of records in the bag shoudn't really be a factor in your
implementation.
HTH,
John
John Machin wrote:
John Henry wrote:
John,
Yes, there are several scenerios.
a) Comparing keys only.
That's been answered (although I haven't gotten it to work under 2.3
yet)
(1) What's the problem with getting it to work under 2.3?
(2) Why not upgrade?
Let me comment on this part first, I am still chewing other parts of
your message.
When I do it under 2.3, I get:
common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax
Don't know why that is.
I can't upgrade yet. Some part of my code doesn't compile under 2.4
and I haven't got a chance to investigate further.
In <11*********************@h48g2000cwc.googlegroups. com>, John Henry
wrote:
When I do it under 2.3, I get:
common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax
Don't know why that is.
There are no generator expressions in 2.3. Turn it into a list
comprehension::
common_eq = set([k for k in _common if a[k] == b[k]])
Ciao,
Marc 'BlackJack' Rintsch
Thank you. That works.
Marc 'BlackJack' Rintsch wrote:
In <11*********************@h48g2000cwc.googlegroups. com>, John Henry
wrote:
When I do it under 2.3, I get:
common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax
Don't know why that is.
There are no generator expressions in 2.3. Turn it into a list
comprehension::
common_eq = set([k for k in _common if a[k] == b[k]])
Ciao,
Marc 'BlackJack' Rintsch
I have gone the whole hog and got something thats run-able:
========dict_diff.py=============================
from pprint import pprint as pp
a = {1:{'1':'1'}, 2:{'2':'2'}, 3:dict("AA BB CC".split()), 4:{'4':'4'}}
b = { 2:{'2':'2'}, 3:dict("BB CD EE".split()), 5:{'5':'5'}}
def record_comparator(a,b, check_equal):
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if check_equal(a[k],b[k]))
common_neq = _common - common_eq
return {"A excl keys":a_xclusive, "B excl keys":b_xclusive,
"Common & eq":common_eq, "Common keys neq
values":common_neq}
comp_result = record_comparator(a,b, dict.__eq__)
# Further dataon common keys, neq values
common_neq = comp_result["Common keys neq values"]
common_neq = [ (key, record_comparator(a[key],b[key], str.__eq__))
for key in common_neq ]
comp_result["Common keys neq values"] = common_neq
print "\na =",; pp(a)
print "\nb =",; pp(b)
print "\ncomp_result = " ; pp(comp_result)
==========================================
When run it gives:
a ={1: {'1': '1'},
2: {'2': '2'},
3: {'A': 'A', 'C': 'C', 'B': 'B'},
4: {'4': '4'}}
b ={2: {'2': '2'}, 3: {'C': 'D', 'B': 'B', 'E': 'E'}, 5: {'5': '5'}}
comp_result =
{'A excl keys': set([1, 4]),
'B excl keys': set([5]),
'Common & eq': set([2]),
'Common keys neq values': [(3,
{'A excl keys': set(['A']),
'B excl keys': set(['E']),
'Common & eq': set(['B']),
'Common keys neq values': set(['C'])})]}
- Paddy. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: William Stacey [MVP] |
last post by:
Trying to figure out Dictionary<> and using CaseInsensitive Comparer<> like
I did with normal Hashtable. The Hashtable can take a case insenstive
Comparer and a Case insensitive HashCode provider....
|
by: Helge Jensen |
last post by:
I've got some data that has Set structure, that is membership, insert
and delete is fast (O(1), hashing). I can't find a System.Collections
interface that matches the operations naturally offered...
|
by: Chris Dunaway |
last post by:
The C# 3.0 spec (http://msdn.microsoft.com/vcsharp/future/) contains a
feature called "Implicitly typed local variables".
The type of the variable is determined at compile time based on the...
|
by: Locia |
last post by:
How can I compare "if argument"?
example: if (leftExpression==RightExpression)
After parsing I know the type of RightExpression.
I suppone that if RightExpression is wrap into " " is a...
|
by: lovecreatesbea... |
last post by:
Could you extract examples of the characteristics of C itself to
demonstrate what the advantages of C are? What are its pleasant,
expressive and versatile characteristics?
And what are its...
|
by: Tony |
last post by:
Hello!
My first question:
I just can't figure out what is the usefulness of
Comparer.Default.Compare(somestring1, somestring2);
because I can just the same use...
|
by: shapper |
last post by:
Hello,
I have two lists, A and B, of a same class which has two properties:
ID and Name
A items have only the Name defined.
B items have the ID and the Name defined.
I want to create a...
|
by: Peter Duniho |
last post by:
On Fri, 18 Jul 2008 07:03:37 -0700, Ben Voigt
<rbv@nospam.nospamwrote:
I agree whole-heartedly about being closer to Java. But the OP didn't ask
about Java. :)
I disagree on the...
|
by: Jukka K. Korpela |
last post by:
pecan wrote:
You mean you want some extra hard labor and decided that xhtml is a good way
to get deep into pointless trouble. In that case, you are quite right.
A bit more? There _is_ such...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
| |