473,320 Members | 1,863 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Rookie question about data types (hashtables)

Hello all -

I'm a Java and C++ developer, looking to pick up some Python and
wxPython skills for prototyping and quick projects. I've been developing
for about 15 years, and am very familiar with the wxWindows framework (I
wrote one of the early Java bindings for it), but I've only been exposed
to Python for a few days.

I like to learn by doing, so I decided to just dive in with a small
project I have a need for... a Spanish-English dictionary that lives in
the system tray and pops up on demand with a hotkey. I've borrowed the
dictionary data from the Pythoñol project on SourceForge, and learned the
Python methods for file I/O and parsing strings. I now have 80,000 or so
cleanly-parsed Spanish-English word pairs, and need to find the
appropriate data structure to store them in.

What I would LIKE to have is a hashtable-like structure that lets me
retrieve multiple values based on a partial key. I want to have a text
field for the user to type in a Spanish word, and a box below containing
a list of English words. With each character that the user types, I
would like to narrow down the words listed in the box below. That is,
when you type the letter "a" the box will contain all English words
starting with the letter "a". When you then type "n", the box will
narrow further and contain all words starting with "an", etc... until
you've narrowed it down to appropriate word. I figure that this will be
easier than having users enter entire words in one shot and then try to
implement some intelligence for near-matches.

Therefore, it would be great if there was some hashtable-like data
structure that lets me say "Return all values where the key starts with
'an'". Actually, it would be even better if I could provide multiple
options and get a union in one pass... such as "Return all values where
the key starts with 'an' or 'añ'".

I haven't yet turned up EXACTLY what I'm looking for, but I thought
I would ask in case everyone would say, "Duh! There's a native data type
for that, silly!". Even if that's not the case, does anyone know of any
examples out there I can look at where such a data structure has been
implemented manually? Thanks!

Steve
Jul 18 '05 #1
7 1735
Steve D. Perkins wrote:
What I would LIKE to have is a hashtable-like structure that lets
me retrieve multiple values based on a partial key.


Sounds like a sorted list would work best. A binary search (see the bisect
module) lets find all matches in O(log n) time.
--
Rainer Deyke - ra*****@eldwood.com - http://eldwood.com
Jul 18 '05 #2
the bsddb module has a binary tree database but that might be overkill.
Other than that I doubt there's anything in standard python that would help.

If you don't plan to add any words to your dictionary you could order the
list and use a binary search to find the entries.
you could use python list for that :-)
Jul 18 '05 #3
Rainer Deyke" <ra*****@eldwood.com> wrote:
What I would LIKE to have is a hashtable-like structure that lets
me retrieve multiple values based on a partial key.
Sounds like a sorted list would work best. A binary search (see the bisect
module) lets find all matches in O(log n) time.


Probably yes. I'm replying just in case someone would like to have
their hash values sorted in alphabetical order too. The code below
might be of interest to hobbyist programmers who don't care about
practical issues :-) (also not tested beyond what you see)

Anton

from string import ascii_lowercase as alc

def antonian_sorted_unrank(k,base,ndig):
res = []
while k > -1 :
r,k = divmod(k,int('1'*ndig,base))
k,ndig = k-1,ndig-1
res.append(r)
return res

def antonian_sorted_rank(seq, base,ndig):
res = 0
for x in seq:
res = x*int('1'*ndig,base)+res+1
ndig -= 1
return res-1

def word_rank(word,ndig):
d = [alc.index(c) for c in word]
return antonian_sorted_rank(d,len(alc),ndig)

def word_unrank(i,ndig):
L = antonian_sorted_unrank(i,len(alc),ndig)
return "".join([alc[j] for j in L])

def test():
L = ["a","ab","aa","b","bab","ba"]
maxlen = max(map(len,L))
print L
def wu(i): return word_unrank(i,maxlen)
def wr(w): return word_rank(w,maxlen)
H = map(wr,L)
print H
H.sort()
print H
R = map(wu,H)
print R
L.sort()
assert L == R

if __name__=='__main__':
test()

Jul 18 '05 #4

"Steve D. Perkins" <do************@hotmail.com> wrote:
Hello all - Hi
when you type the letter "a" the box will contain all English words
starting with the letter "a". When you then type "n", the box will
narrow further and contain all words starting with "an", etc... until
Maybe you could use a normal python dictionary (you call it hashtable).
Look at this script (You would need one dictionary for each direction.):

#!/usr/bin/python

# generate dictionaries (AKA hashtables).
# direction: English --> German
EN2DE = {}
# direction: German --> English
DE2EN = {}

# inserts a word and its translation into the dict d
def insertWord( d, key, val):
# does the key exist
if d.has_key( key):
# yes
d[ key].append( val)
else:
# no
d[ key] = [val]

# returns a list words that begin with s
def getBeginWith( d, s):
l = s.__len__()
res = []
keys = d.keys()
for k in keys:
if k.__len__()>=l and k[:l] == s:
# use "k[:l].lower() == s.lower()" to ignore case
res.append( k)
return res

# insert some entries
insertWord( EN2DE, "someword", "meaning 1")
insertWord( EN2DE, "someword", "meaning 2")
insertWord( EN2DE, "my_codingstyle", "evil")
insertWord( EN2DE, "son", "Sohn")
insertWord( EN2DE, "sunday", "Sonntag")
insertWord( EN2DE, "wednesday", "Mittwoch")
insertWord( EN2DE, "Weapon of mass destruction", "George Walker Bush")
insertWord( EN2DE, "sold", "verkauft")

# show that it works
print "all keys that begin with 'so': ", getBeginWith( EN2DE, "so")
print "translations for 'someword': ", EN2DE['someword']

Instead of doing this you could maybe subclass UserList from the standard
library. Steve


HTH, Axel
Jul 18 '05 #5
Steve D. Perkins wrote:
[snip]
What I would LIKE to have is a hashtable-like structure that lets me
retrieve multiple values based on a partial key.
AFAIK, if you can do queries on a "hashtable-like structure" with a partial key,
it cannot be a real hashtable...
I want to have a text
field for the user to type in a Spanish word, and a box below containing
a list of English words. With each character that the user types, I
would like to narrow down the words listed in the box below. That is,
when you type the letter "a" the box will contain all English words
starting with the letter "a". When you then type "n", the box will
narrow further and contain all words starting with "an", etc... until
you've narrowed it down to appropriate word.


I'd use a Python dictionary indexed not on words, but on letters in the word.
So, for a dictionary translating english words into french, you'd have for example:

dictionary['m']['o']['n']['d']['a']['y']['translation'] = 'lundi'

(NB: having 'translation' as the last key is to solve the problem of words
included in another; see 'sun' and 'sunday'. BTW, any key that is not a single
character can be used...)

Here is an example implementation:

--------------------------------------------------------
def addWord(myDict, originalWord, translatedWord):
d = myDict
for c in originalWord:
if not d.has_key(c): d[c] = {}
d = d[c]
d['translation'] = translatedWord

def _allTranslations(rootWord, d):
res = []
for c in d.keys():
if c == 'translation':
res.append((rootWord, d[c]))
else:
res += _allTranslations(rootWord + c, d[c])
return res

def getTranslations(myDict, word):
d = myDict
for c in word:
if not d.has_key(c): return []
d = d[c]
return _allTranslations(word, d)
--------------------------------------------------------

And then you'd do:

frenchDict = {}
## Week days
addWord(frenchDict, 'monday', 'lundi')
addWord(frenchDict, 'tuesday', 'mardi')
# ...
addWord(frenchDict, 'sunday', 'dimanche')
## Months
addWord(frenchDict, 'january', 'janvier')
addWord(frenchDict, 'february', 'fevrier')
# ...
addWord(frenchDict, 'december', 'decembre')

print getTranslations(frenchDict, 'wednesday')
print getTranslations(frenchDict, 'ju')
print getTranslations(frenchDict, 's')

I don't know how it scales up for big dictionaries, not to mention huge ones...

HTH
--
- Eric Brunel <eric dot brunel at pragmadev dot com> -
PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com

Jul 18 '05 #6
>> What I would LIKE to have is a hashtable-like structure that lets
me retrieve multiple values based on a partial key.


Sounds like a sorted list would work best. A binary search (see the
bisect module) lets find all matches in O(log n) time.

Hmm... a double-linked-list with recursion, like back in Freshman
Programming 101? I'm embarrassed to say that this didn't even cross my
mind originally. I've been coding in Java for too long, when I encounter
programs my instincts are to find an open-source plug-in that's already
been written to solve the problem.

Jul 18 '05 #7
Steve D. Perkins wrote:
Sounds like a sorted list would work best. A binary search (see the
bisect module) lets find all matches in O(log n) time.

Hmm... a double-linked-list with recursion, like back in Freshman
Programming 101?


I was thinking about a standard Python list, which is actually implemented
as an array, not a linked list. Binary searches don't work on linked lists.
--
Rainer Deyke - ra*****@eldwood.com - http://eldwood.com
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: Don Bruder | last post by:
A week or two ago, I asked here about porting Python to C. Got some good answers (Note 1) but now I've got another question. Actually, more a request for clarification of a topic that both the...
10
by: William S. Perrin | last post by:
I'm a python rookie, anyone have and suggestions to streamline this function? Thanks in advance..... def getdata(myurl): sock = urllib.urlopen(myurl) xmlSrc = sock.read() sock.close() ...
11
by: Don Bruder | last post by:
Got a stumper here. I imagine that for someone experienced in C++, this is too pathetic for words. For a rookie, using this project as a sort of "midterm exam" in his self-taught "how to program in...
1
by: Chris Austin | last post by:
A co-worker asked me today what would happen if Session.Add() method was called multiple times with the same key. My first thought was that it would throw some sort of duplicate key exception. ...
8
by: Robin Tucker | last post by:
When I create a hashtable hashing on Object-->Item, can I mix "string" and "integer" as the key types? I have a single thumbnail cache for a database with (hashed on key) and a file view (hashed...
19
by: Dennis | last post by:
I have a public variable in a class of type color declared as follows: public mycolor as color = color.Empty I want to check to see if the user has specified a color like; if mycolor =...
1
by: Curtis | last post by:
Does anyone know the proper method to save information to embedded hashtables. I am trying to save parent/child information to hashtables but I am not getting the correct results I have made a...
3
by: Dst | last post by:
Hi i'm trying to make a very simple web site using visual studio 2005. I'm completely noob at this so i need some pointers to get me started. As i understand frames should not be used in...
2
by: PAzevedo | last post by:
I have this Hashtable of Hashtables, and I'm accessing this object from multiple threads, now the Hashtable object is thread safe for reading, but not for writing, so I lock the object every time I...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.