469,323 Members | 1,560 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,323 developers. It's quick & easy.

Rookie question about data types (hashtables)

Hello all -

I'm a Java and C++ developer, looking to pick up some Python and
wxPython skills for prototyping and quick projects. I've been developing
for about 15 years, and am very familiar with the wxWindows framework (I
wrote one of the early Java bindings for it), but I've only been exposed
to Python for a few days.

I like to learn by doing, so I decided to just dive in with a small
project I have a need for... a Spanish-English dictionary that lives in
the system tray and pops up on demand with a hotkey. I've borrowed the
dictionary data from the Pythońol project on SourceForge, and learned the
Python methods for file I/O and parsing strings. I now have 80,000 or so
cleanly-parsed Spanish-English word pairs, and need to find the
appropriate data structure to store them in.

What I would LIKE to have is a hashtable-like structure that lets me
retrieve multiple values based on a partial key. I want to have a text
field for the user to type in a Spanish word, and a box below containing
a list of English words. With each character that the user types, I
would like to narrow down the words listed in the box below. That is,
when you type the letter "a" the box will contain all English words
starting with the letter "a". When you then type "n", the box will
narrow further and contain all words starting with "an", etc... until
you've narrowed it down to appropriate word. I figure that this will be
easier than having users enter entire words in one shot and then try to
implement some intelligence for near-matches.

Therefore, it would be great if there was some hashtable-like data
structure that lets me say "Return all values where the key starts with
'an'". Actually, it would be even better if I could provide multiple
options and get a union in one pass... such as "Return all values where
the key starts with 'an' or 'ań'".

I haven't yet turned up EXACTLY what I'm looking for, but I thought
I would ask in case everyone would say, "Duh! There's a native data type
for that, silly!". Even if that's not the case, does anyone know of any
examples out there I can look at where such a data structure has been
implemented manually? Thanks!

Steve
Jul 18 '05 #1
7 1601
Steve D. Perkins wrote:
What I would LIKE to have is a hashtable-like structure that lets
me retrieve multiple values based on a partial key.


Sounds like a sorted list would work best. A binary search (see the bisect
module) lets find all matches in O(log n) time.
--
Rainer Deyke - ra*****@eldwood.com - http://eldwood.com
Jul 18 '05 #2
the bsddb module has a binary tree database but that might be overkill.
Other than that I doubt there's anything in standard python that would help.

If you don't plan to add any words to your dictionary you could order the
list and use a binary search to find the entries.
you could use python list for that :-)
Jul 18 '05 #3
Rainer Deyke" <ra*****@eldwood.com> wrote:
What I would LIKE to have is a hashtable-like structure that lets
me retrieve multiple values based on a partial key.
Sounds like a sorted list would work best. A binary search (see the bisect
module) lets find all matches in O(log n) time.


Probably yes. I'm replying just in case someone would like to have
their hash values sorted in alphabetical order too. The code below
might be of interest to hobbyist programmers who don't care about
practical issues :-) (also not tested beyond what you see)

Anton

from string import ascii_lowercase as alc

def antonian_sorted_unrank(k,base,ndig):
res = []
while k > -1 :
r,k = divmod(k,int('1'*ndig,base))
k,ndig = k-1,ndig-1
res.append(r)
return res

def antonian_sorted_rank(seq, base,ndig):
res = 0
for x in seq:
res = x*int('1'*ndig,base)+res+1
ndig -= 1
return res-1

def word_rank(word,ndig):
d = [alc.index(c) for c in word]
return antonian_sorted_rank(d,len(alc),ndig)

def word_unrank(i,ndig):
L = antonian_sorted_unrank(i,len(alc),ndig)
return "".join([alc[j] for j in L])

def test():
L = ["a","ab","aa","b","bab","ba"]
maxlen = max(map(len,L))
print L
def wu(i): return word_unrank(i,maxlen)
def wr(w): return word_rank(w,maxlen)
H = map(wr,L)
print H
H.sort()
print H
R = map(wu,H)
print R
L.sort()
assert L == R

if __name__=='__main__':
test()

Jul 18 '05 #4

"Steve D. Perkins" <do************@hotmail.com> wrote:
Hello all - Hi
when you type the letter "a" the box will contain all English words
starting with the letter "a". When you then type "n", the box will
narrow further and contain all words starting with "an", etc... until
Maybe you could use a normal python dictionary (you call it hashtable).
Look at this script (You would need one dictionary for each direction.):

#!/usr/bin/python

# generate dictionaries (AKA hashtables).
# direction: English --> German
EN2DE = {}
# direction: German --> English
DE2EN = {}

# inserts a word and its translation into the dict d
def insertWord( d, key, val):
# does the key exist
if d.has_key( key):
# yes
d[ key].append( val)
else:
# no
d[ key] = [val]

# returns a list words that begin with s
def getBeginWith( d, s):
l = s.__len__()
res = []
keys = d.keys()
for k in keys:
if k.__len__()>=l and k[:l] == s:
# use "k[:l].lower() == s.lower()" to ignore case
res.append( k)
return res

# insert some entries
insertWord( EN2DE, "someword", "meaning 1")
insertWord( EN2DE, "someword", "meaning 2")
insertWord( EN2DE, "my_codingstyle", "evil")
insertWord( EN2DE, "son", "Sohn")
insertWord( EN2DE, "sunday", "Sonntag")
insertWord( EN2DE, "wednesday", "Mittwoch")
insertWord( EN2DE, "Weapon of mass destruction", "George Walker Bush")
insertWord( EN2DE, "sold", "verkauft")

# show that it works
print "all keys that begin with 'so': ", getBeginWith( EN2DE, "so")
print "translations for 'someword': ", EN2DE['someword']

Instead of doing this you could maybe subclass UserList from the standard
library. Steve


HTH, Axel
Jul 18 '05 #5
Steve D. Perkins wrote:
[snip]
What I would LIKE to have is a hashtable-like structure that lets me
retrieve multiple values based on a partial key.
AFAIK, if you can do queries on a "hashtable-like structure" with a partial key,
it cannot be a real hashtable...
I want to have a text
field for the user to type in a Spanish word, and a box below containing
a list of English words. With each character that the user types, I
would like to narrow down the words listed in the box below. That is,
when you type the letter "a" the box will contain all English words
starting with the letter "a". When you then type "n", the box will
narrow further and contain all words starting with "an", etc... until
you've narrowed it down to appropriate word.


I'd use a Python dictionary indexed not on words, but on letters in the word.
So, for a dictionary translating english words into french, you'd have for example:

dictionary['m']['o']['n']['d']['a']['y']['translation'] = 'lundi'

(NB: having 'translation' as the last key is to solve the problem of words
included in another; see 'sun' and 'sunday'. BTW, any key that is not a single
character can be used...)

Here is an example implementation:

--------------------------------------------------------
def addWord(myDict, originalWord, translatedWord):
d = myDict
for c in originalWord:
if not d.has_key(c): d[c] = {}
d = d[c]
d['translation'] = translatedWord

def _allTranslations(rootWord, d):
res = []
for c in d.keys():
if c == 'translation':
res.append((rootWord, d[c]))
else:
res += _allTranslations(rootWord + c, d[c])
return res

def getTranslations(myDict, word):
d = myDict
for c in word:
if not d.has_key(c): return []
d = d[c]
return _allTranslations(word, d)
--------------------------------------------------------

And then you'd do:

frenchDict = {}
## Week days
addWord(frenchDict, 'monday', 'lundi')
addWord(frenchDict, 'tuesday', 'mardi')
# ...
addWord(frenchDict, 'sunday', 'dimanche')
## Months
addWord(frenchDict, 'january', 'janvier')
addWord(frenchDict, 'february', 'fevrier')
# ...
addWord(frenchDict, 'december', 'decembre')

print getTranslations(frenchDict, 'wednesday')
print getTranslations(frenchDict, 'ju')
print getTranslations(frenchDict, 's')

I don't know how it scales up for big dictionaries, not to mention huge ones...

HTH
--
- Eric Brunel <eric dot brunel at pragmadev dot com> -
PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com

Jul 18 '05 #6
>> What I would LIKE to have is a hashtable-like structure that lets
me retrieve multiple values based on a partial key.


Sounds like a sorted list would work best. A binary search (see the
bisect module) lets find all matches in O(log n) time.

Hmm... a double-linked-list with recursion, like back in Freshman
Programming 101? I'm embarrassed to say that this didn't even cross my
mind originally. I've been coding in Java for too long, when I encounter
programs my instincts are to find an open-source plug-in that's already
been written to solve the problem.

Jul 18 '05 #7
Steve D. Perkins wrote:
Sounds like a sorted list would work best. A binary search (see the
bisect module) lets find all matches in O(log n) time.

Hmm... a double-linked-list with recursion, like back in Freshman
Programming 101?


I was thinking about a standard Python list, which is actually implemented
as an array, not a linked list. Binary searches don't work on linked lists.
--
Rainer Deyke - ra*****@eldwood.com - http://eldwood.com
Jul 18 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

10 posts views Thread by William S. Perrin | last post: by
8 posts views Thread by Robin Tucker | last post: by
19 posts views Thread by Dennis | last post: by
1 post views Thread by Curtis | last post: by
3 posts views Thread by Dst | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by Gurmeet2796 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.