473,385 Members | 1,734 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Case-insensitive dict, non-destructive, fast, anyone?

I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).

d = CiDict([('Hi', 12),('hoho',13)])
d['hi']
12
d.keys()


['Hi','hoho']

Note that 'Hi' preserved the case. I imagine that 'Hi' and 'hi' would
need to share the same hash value in order for the lookup to be fast.

Anyone have a an implementation that I could use? Quick googling only
produced implementations that coerce all keys to lowercase.

--
Ville Vainio http://tinyurl.com/2prnb
Jul 18 '05 #1
11 1893
Ville Vainio wrote:
I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).


Store the original key together with the value and use a lowercase key
for lookup.

only a sketch:

class MyDict:
def __init__ (self):
self.inner = {}

def __setitem__ (self, key, value):
self.inner [key.lower ()] = (key, value]

def __getitem__ (self, key):
realkey, realvalue = self.inner [self]
return realvalue

def get (self, key, default = None):
try:
return self [key]
except KeyError:
return default
# or: return self.inner.get (key.lower (), (None, default)) [1]

def keys (self):
return [realkey for realkey, realval in self.inner.values ()]

def values (self):
return [realval for realkey, realval in self.inner.values )]

def items ():
return self.inner.values ()

# iterators are left as an exercise
Jul 18 '05 #2
>>>>> "Daniel" == Daniel Dittmar <da************@sap.corp> writes:

Daniel> Ville Vainio wrote:
I need a dict (well, it would be optimal anyway) class that
stores the keys as strings without coercing the case to upper
or lower, but still provides fast lookup (i.e. uses hash
table).


Daniel> Store the original key together with the value and use a
Daniel> lowercase key for lookup.

That's what I thought initially, but the strings take most of the
space in dict and I didn't feel like doubling the size.

It would be the "simplest thing that could possibly work", though.

--
Ville Vainio http://tinyurl.com/2prnb
Jul 18 '05 #3
Ville Vainio wrote:
>>"Daniel" == Daniel Dittmar <da************@sap.corp> writes:

Daniel> Ville Vainio wrote:
>> I need a dict (well, it would be optimal anyway) class that
>> stores the keys as strings without coercing the case to upper
>> or lower, but still provides fast lookup (i.e. uses hash
>> table).


Daniel> Store the original key together with the value and use a
Daniel> lowercase key for lookup.

That's what I thought initially, but the strings take most of the
space in dict and I didn't feel like doubling the size.


You could write a string wrapper that changes comparison and hashing.
I'm not sure that this (+ 1 (Python object + instance dictionary)) would
use less memory than the other proposal (+ 1 Tuple + 1 String).

Daniel

Jul 18 '05 #4
On 01 Apr 2005 15:55:58 +0300, Ville Vainio <vi***@spammers.com>
wrote:
>> "Daniel" == Daniel Dittmar <da************@sap.corp> writes:
Daniel> Ville Vainio wrote:
>> I need a dict (well, it would be optimal anyway) class that
>> stores the keys as strings without coercing the case to upper
>> or lower, but still provides fast lookup (i.e. uses hash
>> table).


Daniel> Store the original key together with the value and use a
Daniel> lowercase key for lookup.

That's what I thought initially, but the strings take most of the
space in dict and I didn't feel like doubling the size.

It would be the "simplest thing that could possibly work", though.


Try access the keys indirectly though another dictionary. That way
you don't have to change the original.

Lkeys = {}
for k dict.keys():
Lkeys[ k.lower] = dict[k]

Then use:

value = dict[ Lkeys[ key.lower() ] ]

To get your value from the original dictionary.

Watch out for duplicate keys in differing case in the original dict.
Ron

Jul 18 '05 #5
[Ville Vainio]
I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).


class S(str): def __hash__(self):
return hash(self.lower())
def __eq__(self, other):
return self.lower() == other.lower()

d = {}
d[S('ThE')] = 'quick'
d[S('the')] 'quick' d

{'ThE': 'quick'}
Raymond Hettinger
Jul 18 '05 #6
On Fri, 01 Apr 2005 18:52:00 GMT, "Raymond Hettinger" <vz******@verizon.net> wrote:
[Ville Vainio]
I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).


class S(str): def __hash__(self):
return hash(self.lower())
def __eq__(self, other):
return self.lower() == other.lower()

d = {}
d[S('ThE')] = 'quick'
d[S('the')]'quick' d{'ThE': 'quick'}

Building on your S to sneak in a generalized idea ;-)
class idict(dict): ... def __setitem__(self, key, value):
... dict.__setitem__(self, S(key), value)
... def __getitem__(self, key):
... return dict.__getitem__(self, S(key))
... d = idict()
d['ThE'] = 'quick'
d['the'] 'quick' d

{'ThE': 'quick'}

Ok, the idea:
I wonder if a dict with a general override hook for hashing all keys would be useful.
E.g., a dict.__keyhash__ that would take key as arg and default as now returning key.__hash__()
but that you could override. Seems like this could potentially be more efficient than key wrappers,
and would also make it so you wouldn't have to chase all the affected methods in doing an idict
like the above (e.g., get, setdefault, update etc. etc.)

Regards,
Bengt Richter
Jul 18 '05 #7
[Bengt Richter]
I wonder if a dict with a general override hook for hashing all keys would be useful. E.g., a dict.__keyhash__ that would take key as arg and default as now returning key.__hash__() but that you could override. Seems like this could potentially be more efficient than key wrappers, and would also make it so you wouldn't have to chase all the affected methods in doing an idict like the above (e.g., get, setdefault, update etc. etc.)


There has also been a discussion of adding a seqdict that maintains a keys in
insertion order. Another idea was to have a defaultdict that could be
instructed to create values as necessary (either zero for counting or [] for
list building). Putting the three ideas together, perhaps we should write an
extension module with a custom dictionary type with methods/attributes
implementing those ideas. Essentially, the thought is to put all the bells and
whistles in one place. If the extension became popular, it could ultimately
wend its way into the collections module.

A facetious naming idea would be to call it TrickyDict, a relative to DictMixin
and no relation to a former U.S. president ;-)

t = TrickyDict()
t.keytransform(str.lower)
t['abC'] = 1 # case insensitive dictionary
assert t['Abc'] == 1

t = TrickyDict()
t.setdefaultvalue(0)
t['x'] += 1 # very bag like
assert t['x'] == 1

t = TrickyDict()
t.setdefaultfunction(list)
t['x'] += ['first'] # rather like: t.setdefault('x',
[]).append('first')
t['x'] += ['second']
assert t == {'x': ['first', 'second']}

t = TrickyDict()
t.sortedkeys = True
t['x'] = 1
t['y'] = 2
assert t.keys() == ['x', 'y'] # order is guaranteed

This universal dictionary type could also incorporate some methods for other
recurrent themes such as a inverting a dictionary:

def invdict(self)
return self: dict((v,k) for k,v in self.iteritems()).

For the performance minded, there could also be a control for dictionary
speed/space efficiency:

t = TrickyDict()
t.maxkeydensity = 40 # resize whenever the dictionary is more than 40%
full

Taken together, these six attributes/methods could cover many wished for
features for the 10% of the cases where a regular dictionary doesn't provide the
best solution.
Raymond
Jul 18 '05 #8
On Fri, 01 Apr 2005 23:04:42 GMT, "Raymond Hettinger" <vz******@verizon.net> wrote:
[Bengt Richter]
I wonder if a dict with a general override hook for hashing all keys would beuseful.
E.g., a dict.__keyhash__ that would take key as arg and default as now

returning key.__hash__()
but that you could override. Seems like this could potentially be more

efficient than key wrappers,
and would also make it so you wouldn't have to chase all the affected methods

in doing an idict
like the above (e.g., get, setdefault, update etc. etc.)


There has also been a discussion of adding a seqdict that maintains a keys in
insertion order. Another idea was to have a defaultdict that could be
instructed to create values as necessary (either zero for counting or [] for
list building). Putting the three ideas together, perhaps we should write an
extension module with a custom dictionary type with methods/attributes
implementing those ideas. Essentially, the thought is to put all the bells and
whistles in one place. If the extension became popular, it could ultimately
wend its way into the collections module.

A facetious naming idea would be to call it TrickyDict, a relative to DictMixin
and no relation to a former U.S. president ;-)

t = TrickyDict()
t.keytransform(str.lower)
t['abC'] = 1 # case insensitive dictionary
assert t['Abc'] == 1

assert t.keys() == ['abC'] # actual keys unchanged (but transformed for hash and cmp)
[...]
Taken together, these six attributes/methods could cover many wished for
features for the 10% of the cases where a regular dictionary doesn't provide the
best solution.

You think as much as 10% ?

Regards,
Bengt Richter
Jul 18 '05 #9
> >Taken together, these six attributes/methods could cover many wished for
features for the 10% of the cases where a regular dictionary doesn't provide thebest solution.
You think as much as 10% ?


Rounded up from 9.6 ;-)

More important than the percentage is the clarity of the resulting code and the
avoidance of continous reinvention of workarounds.

Separating tool features into a basic and an advanced version is common solution
to managing option/customization complexity.
Raymond
Jul 18 '05 #10

"Raymond Hettinger" <vz******@verizon.net> wrote in message
news:rul3e.4647$db.487@trndny07...
More important than the percentage is the clarity of the resulting code
and the
avoidance of continous reinvention of workarounds.

Separating tool features into a basic and an advanced version is common
solution
to managing option/customization complexity.


A super bells & whistles dict would certainly pre-answer a lot of queries
and save much newsgroup/list response time.

tjr

Jul 18 '05 #11
>>>>> "Bengt" == Bengt Richter <bo**@oz.net> writes:

Bengt> I wonder if a dict with a general override hook for hashing
Bengt> all keys would be useful. E.g., a dict.__keyhash__ that
Bengt> would take key as arg and default as now returning
Bengt> key.__hash__() but that you could override. Seems like this

There would need to be an override hook for key comparison as well (I
suppose it always uses == operation now?). But yes, I think it would
be *much* more useful than any 'keytransform' feature - is there any
use for 'keytransform' feature apart from the uses that would be
better covered by hash/comparison hooks?

It would be lovely to have something like this in the stdlib (or
anywhere, for that matter). Think about the use cases for hashing via
by os.path.normcase, str.lower...

--
Ville Vainio http://tinyurl.com/2prnb
Jul 18 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
by: Newbie | last post by:
Dear friends, I am having a hard time understanding how to use a SELECT CASE in ASP. I have used it in VB but never in ASP scripting. Scenerio: I have 2 textboxes on a form that I have to...
19
by: Robert Scheer | last post by:
Hi. In VBScript I can use a Select Case statement like that: Select Case X Case 1 to 10 'X is between 1 and 10 Case 11,14,16 'X is 11 or 14 or 16 End Select
2
by: cs168 | last post by:
Hi I am new in ASP programming so I do use the very basic and simple way to do all my stuff. Now I do really got stuck at how can I loop thru the calculation for all my selection.. My full code is as...
6
by: deanfamily11 | last post by:
I've set up a case statement to have my program determine where on the Cartesian plane a point the user enters is located. I keep getting the C2051 error when I compile. Any help? #include...
15
by: YitzHandel | last post by:
The following is an extract from my program, which I seperated out and compiled as its own program. The problem is that as soon as a case tests true, all the cases below it execute as well. So...
10
by: MLH | last post by:
Suppose the following... Dim A as Date A=#7/24/2005# I wish to compare value of A against 2 other values: 1) 8/1/2005 2) 9/1/2005 Which is better and why... First:
7
by: Lauren Quantrell | last post by:
Is there any speed/resource advantage/disadvantage in using Select Case x Case 1 Case 2 etc. many more cases... End Select VS.
22
by: John | last post by:
Hi Folks, I'm experimenting a little with creating a custom CEdit control so that I can decide on what the user is allowed to type into the control. I started off only allowing floating point...
17
by: Navodit | last post by:
So I have some code like: if (document.Insurance.State.selectedIndex == 1) { ifIll(); } else if (document.Insurance.State.selectedIndex == 2) { elseKan(); }
24
by: clockworx05 | last post by:
Hey guys i have this program that i need to write for class. here are the instructions: Write a function called foo that asks the user for their age. Pass the age value to a function called...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.