471,073 Members | 1,112 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,073 software developers and data experts.

Case-insensitive dict, non-destructive, fast, anyone?

I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).

d = CiDict([('Hi', 12),('hoho',13)])
d['hi']
12
d.keys()


['Hi','hoho']

Note that 'Hi' preserved the case. I imagine that 'Hi' and 'hi' would
need to share the same hash value in order for the lookup to be fast.

Anyone have a an implementation that I could use? Quick googling only
produced implementations that coerce all keys to lowercase.

--
Ville Vainio http://tinyurl.com/2prnb
Jul 18 '05 #1
11 1768
Ville Vainio wrote:
I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).


Store the original key together with the value and use a lowercase key
for lookup.

only a sketch:

class MyDict:
def __init__ (self):
self.inner = {}

def __setitem__ (self, key, value):
self.inner [key.lower ()] = (key, value]

def __getitem__ (self, key):
realkey, realvalue = self.inner [self]
return realvalue

def get (self, key, default = None):
try:
return self [key]
except KeyError:
return default
# or: return self.inner.get (key.lower (), (None, default)) [1]

def keys (self):
return [realkey for realkey, realval in self.inner.values ()]

def values (self):
return [realval for realkey, realval in self.inner.values )]

def items ():
return self.inner.values ()

# iterators are left as an exercise
Jul 18 '05 #2
>>>>> "Daniel" == Daniel Dittmar <da************@sap.corp> writes:

Daniel> Ville Vainio wrote:
I need a dict (well, it would be optimal anyway) class that
stores the keys as strings without coercing the case to upper
or lower, but still provides fast lookup (i.e. uses hash
table).


Daniel> Store the original key together with the value and use a
Daniel> lowercase key for lookup.

That's what I thought initially, but the strings take most of the
space in dict and I didn't feel like doubling the size.

It would be the "simplest thing that could possibly work", though.

--
Ville Vainio http://tinyurl.com/2prnb
Jul 18 '05 #3
Ville Vainio wrote:
>>"Daniel" == Daniel Dittmar <da************@sap.corp> writes:

Daniel> Ville Vainio wrote:
>> I need a dict (well, it would be optimal anyway) class that
>> stores the keys as strings without coercing the case to upper
>> or lower, but still provides fast lookup (i.e. uses hash
>> table).


Daniel> Store the original key together with the value and use a
Daniel> lowercase key for lookup.

That's what I thought initially, but the strings take most of the
space in dict and I didn't feel like doubling the size.


You could write a string wrapper that changes comparison and hashing.
I'm not sure that this (+ 1 (Python object + instance dictionary)) would
use less memory than the other proposal (+ 1 Tuple + 1 String).

Daniel

Jul 18 '05 #4
On 01 Apr 2005 15:55:58 +0300, Ville Vainio <vi***@spammers.com>
wrote:
>> "Daniel" == Daniel Dittmar <da************@sap.corp> writes:
Daniel> Ville Vainio wrote:
>> I need a dict (well, it would be optimal anyway) class that
>> stores the keys as strings without coercing the case to upper
>> or lower, but still provides fast lookup (i.e. uses hash
>> table).


Daniel> Store the original key together with the value and use a
Daniel> lowercase key for lookup.

That's what I thought initially, but the strings take most of the
space in dict and I didn't feel like doubling the size.

It would be the "simplest thing that could possibly work", though.


Try access the keys indirectly though another dictionary. That way
you don't have to change the original.

Lkeys = {}
for k dict.keys():
Lkeys[ k.lower] = dict[k]

Then use:

value = dict[ Lkeys[ key.lower() ] ]

To get your value from the original dictionary.

Watch out for duplicate keys in differing case in the original dict.
Ron

Jul 18 '05 #5
[Ville Vainio]
I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).


class S(str): def __hash__(self):
return hash(self.lower())
def __eq__(self, other):
return self.lower() == other.lower()

d = {}
d[S('ThE')] = 'quick'
d[S('the')] 'quick' d

{'ThE': 'quick'}
Raymond Hettinger
Jul 18 '05 #6
On Fri, 01 Apr 2005 18:52:00 GMT, "Raymond Hettinger" <vz******@verizon.net> wrote:
[Ville Vainio]
I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).


class S(str): def __hash__(self):
return hash(self.lower())
def __eq__(self, other):
return self.lower() == other.lower()

d = {}
d[S('ThE')] = 'quick'
d[S('the')]'quick' d{'ThE': 'quick'}

Building on your S to sneak in a generalized idea ;-)
class idict(dict): ... def __setitem__(self, key, value):
... dict.__setitem__(self, S(key), value)
... def __getitem__(self, key):
... return dict.__getitem__(self, S(key))
... d = idict()
d['ThE'] = 'quick'
d['the'] 'quick' d

{'ThE': 'quick'}

Ok, the idea:
I wonder if a dict with a general override hook for hashing all keys would be useful.
E.g., a dict.__keyhash__ that would take key as arg and default as now returning key.__hash__()
but that you could override. Seems like this could potentially be more efficient than key wrappers,
and would also make it so you wouldn't have to chase all the affected methods in doing an idict
like the above (e.g., get, setdefault, update etc. etc.)

Regards,
Bengt Richter
Jul 18 '05 #7
[Bengt Richter]
I wonder if a dict with a general override hook for hashing all keys would be useful. E.g., a dict.__keyhash__ that would take key as arg and default as now returning key.__hash__() but that you could override. Seems like this could potentially be more efficient than key wrappers, and would also make it so you wouldn't have to chase all the affected methods in doing an idict like the above (e.g., get, setdefault, update etc. etc.)


There has also been a discussion of adding a seqdict that maintains a keys in
insertion order. Another idea was to have a defaultdict that could be
instructed to create values as necessary (either zero for counting or [] for
list building). Putting the three ideas together, perhaps we should write an
extension module with a custom dictionary type with methods/attributes
implementing those ideas. Essentially, the thought is to put all the bells and
whistles in one place. If the extension became popular, it could ultimately
wend its way into the collections module.

A facetious naming idea would be to call it TrickyDict, a relative to DictMixin
and no relation to a former U.S. president ;-)

t = TrickyDict()
t.keytransform(str.lower)
t['abC'] = 1 # case insensitive dictionary
assert t['Abc'] == 1

t = TrickyDict()
t.setdefaultvalue(0)
t['x'] += 1 # very bag like
assert t['x'] == 1

t = TrickyDict()
t.setdefaultfunction(list)
t['x'] += ['first'] # rather like: t.setdefault('x',
[]).append('first')
t['x'] += ['second']
assert t == {'x': ['first', 'second']}

t = TrickyDict()
t.sortedkeys = True
t['x'] = 1
t['y'] = 2
assert t.keys() == ['x', 'y'] # order is guaranteed

This universal dictionary type could also incorporate some methods for other
recurrent themes such as a inverting a dictionary:

def invdict(self)
return self: dict((v,k) for k,v in self.iteritems()).

For the performance minded, there could also be a control for dictionary
speed/space efficiency:

t = TrickyDict()
t.maxkeydensity = 40 # resize whenever the dictionary is more than 40%
full

Taken together, these six attributes/methods could cover many wished for
features for the 10% of the cases where a regular dictionary doesn't provide the
best solution.
Raymond
Jul 18 '05 #8
On Fri, 01 Apr 2005 23:04:42 GMT, "Raymond Hettinger" <vz******@verizon.net> wrote:
[Bengt Richter]
I wonder if a dict with a general override hook for hashing all keys would beuseful.
E.g., a dict.__keyhash__ that would take key as arg and default as now

returning key.__hash__()
but that you could override. Seems like this could potentially be more

efficient than key wrappers,
and would also make it so you wouldn't have to chase all the affected methods

in doing an idict
like the above (e.g., get, setdefault, update etc. etc.)


There has also been a discussion of adding a seqdict that maintains a keys in
insertion order. Another idea was to have a defaultdict that could be
instructed to create values as necessary (either zero for counting or [] for
list building). Putting the three ideas together, perhaps we should write an
extension module with a custom dictionary type with methods/attributes
implementing those ideas. Essentially, the thought is to put all the bells and
whistles in one place. If the extension became popular, it could ultimately
wend its way into the collections module.

A facetious naming idea would be to call it TrickyDict, a relative to DictMixin
and no relation to a former U.S. president ;-)

t = TrickyDict()
t.keytransform(str.lower)
t['abC'] = 1 # case insensitive dictionary
assert t['Abc'] == 1

assert t.keys() == ['abC'] # actual keys unchanged (but transformed for hash and cmp)
[...]
Taken together, these six attributes/methods could cover many wished for
features for the 10% of the cases where a regular dictionary doesn't provide the
best solution.

You think as much as 10% ?

Regards,
Bengt Richter
Jul 18 '05 #9
> >Taken together, these six attributes/methods could cover many wished for
features for the 10% of the cases where a regular dictionary doesn't provide thebest solution.
You think as much as 10% ?


Rounded up from 9.6 ;-)

More important than the percentage is the clarity of the resulting code and the
avoidance of continous reinvention of workarounds.

Separating tool features into a basic and an advanced version is common solution
to managing option/customization complexity.
Raymond
Jul 18 '05 #10

"Raymond Hettinger" <vz******@verizon.net> wrote in message
news:rul3e.4647$db.487@trndny07...
More important than the percentage is the clarity of the resulting code
and the
avoidance of continous reinvention of workarounds.

Separating tool features into a basic and an advanced version is common
solution
to managing option/customization complexity.


A super bells & whistles dict would certainly pre-answer a lot of queries
and save much newsgroup/list response time.

tjr

Jul 18 '05 #11
>>>>> "Bengt" == Bengt Richter <bo**@oz.net> writes:

Bengt> I wonder if a dict with a general override hook for hashing
Bengt> all keys would be useful. E.g., a dict.__keyhash__ that
Bengt> would take key as arg and default as now returning
Bengt> key.__hash__() but that you could override. Seems like this

There would need to be an override hook for key comparison as well (I
suppose it always uses == operation now?). But yes, I think it would
be *much* more useful than any 'keytransform' feature - is there any
use for 'keytransform' feature apart from the uses that would be
better covered by hash/comparison hooks?

It would be lovely to have something like this in the stdlib (or
anywhere, for that matter). Think about the use cases for hashing via
by os.path.normcase, str.lower...

--
Ville Vainio http://tinyurl.com/2prnb
Jul 18 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

17 posts views Thread by Newbie | last post: by
19 posts views Thread by Robert Scheer | last post: by
6 posts views Thread by deanfamily11 | last post: by
15 posts views Thread by YitzHandel | last post: by
7 posts views Thread by Lauren Quantrell | last post: by
22 posts views Thread by John | last post: by
17 posts views Thread by Navodit | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.