Petr Jakeš wrote:
I have a standard 12-key mobile phone keypad connected to my Linux
machine as a I2C peripheral. I would like to write a code which allows
the text entry to the computer using this keypad (something like T9 on
the mobile phones)
According to the http://www.yorku.ca/mack/uist01.html
dictionary-based disambiguation is coming in the mind.
With dictionary-based disambiguation, each key is pressed only once.
For example, to enter the, the user enters 8-4-3-0. The 0 key, for
SPACE, delimits words and terminates disambiguation of the preceding
keys. The key sequence 8-4-3 has 3 × 3 × 3 = 27 possible renderings
(see Figure 1). The system compares the possibilities to a dictionary
of words to guess the intended word.
I would like to ask some guru here to give me the direction which
technique (Python functionality) or which strategy to use to solve
this riddle.
Thanks for your advices and comments
Regards
Petr Jakes
I can think of 2 approaches to this, 1) Map the numbers to parts of a
regular expression, and then use this to search through the
dictiionary. 2) Pre-compute a copy of the dictionary converted to it's
numerical equivalent, then just match the numbers.
The basic structure you need for both of these is simple. For the
first method you use
keys = ['','abc','def','ghi',....']
then if you have s="123321"
''.join(['[%s]' % keys[int(l)] for l in s])
will give you a string like
'[abc][def][ghi][def][abc]', which you can then use to match words...
I think the second solution would end up being faster, as long as you
have the memory - no regex work, plus, you can sort the wordlist.
The following quickly written class seems to work nicely:
import string
import bisect
letters = string.lowercase
numbers = '2223334445556667777888999'
letter_mapping = dict(zip(letters, numbers))
class phone:
def __init__(self):
self.read_dictionary()
def word_as_numbers(self, word):
nums=''
for letter in word:
if letter in letter_mapping:
nums += letter_mapping[letter]
return nums
def read_dictionary(self):
words = []
for line in file("/usr/share/dict/words"):
word = line.strip().lower()
nums = self.word_as_numbers(word)
words.append((nums, word))
words.sort()
self.dict = words
def get_matching_words(self, number_str):
tup = (number_str,)
left = bisect.bisect_left(self.dict, tup)
for num, word in self.dict[left:]:
if num.startswith(number_str):
yield word
else:
break
It takes a second or two to read the list of words in, but matching is
instant thanks to bisect:
In [14]:%time p=phone.phone()
CPU times: user 1.65 s, sys: 0.00 s, total: 1.65 s
Wall time: 1.66
In [15]:%time list(p.get_matching_words('43556'))
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.01
Out[15]:['hello', 'hellman', "hellman's", "hello's", 'hellos']
It seems the ruby version just posted takes a similar approach, but
uses an actual tree.. using the bisect module keeps it simple.
--
- Justin