473,386 Members | 1,798 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

technique to enter text using a mobile phone keypad (T9dictionary-based disambiguation)

I have a standard 12-key mobile phone keypad connected to my Linux
machine as a I2C peripheral. I would like to write a code which allows
the text entry to the computer using this keypad (something like T9 on
the mobile phones)

According to the http://www.yorku.ca/mack/uist01.html
dictionary-based disambiguation is coming in the mind.

With dictionary-based disambiguation, each key is pressed only once.
For example, to enter the, the user enters 8-4-3-0. The 0 key, for
SPACE, delimits words and terminates disambiguation of the preceding
keys. The key sequence 8-4-3 has 3 × 3 × 3 = 27 possible renderings
(see Figure 1). The system compares the possibilities to a dictionary
of words to guess the intended word.

I would like to ask some guru here to give me the direction which
technique (Python functionality) or which strategy to use to solve
this riddle.

Thanks for your advices and comments

Regards

Petr Jakes

Aug 8 '06 #1
10 6381

Petr Jakeš wrote:
I have a standard 12-key mobile phone keypad connected to my Linux
machine as a I2C peripheral. I would like to write a code which allows
the text entry to the computer using this keypad (something like T9 on
the mobile phones)

According to the http://www.yorku.ca/mack/uist01.html
dictionary-based disambiguation is coming in the mind.

With dictionary-based disambiguation, each key is pressed only once.
For example, to enter the, the user enters 8-4-3-0. The 0 key, for
SPACE, delimits words and terminates disambiguation of the preceding
keys. The key sequence 8-4-3 has 3 × 3 × 3 = 27 possible renderings
(see Figure 1). The system compares the possibilities to a dictionary
of words to guess the intended word.
http://rubyquiz.com/quiz20.html

Aug 9 '06 #2
Petr Jakeš wrote:
I have a standard 12-key mobile phone keypad connected to my Linux
machine as a I2C peripheral. I would like to write a code which allows
the text entry to the computer using this keypad (something like T9 on
the mobile phones)

According to the http://www.yorku.ca/mack/uist01.html
dictionary-based disambiguation is coming in the mind.

With dictionary-based disambiguation, each key is pressed only once.
For example, to enter the, the user enters 8-4-3-0. The 0 key, for
SPACE, delimits words and terminates disambiguation of the preceding
keys. The key sequence 8-4-3 has 3 × 3 × 3 = 27 possible renderings
(see Figure 1). The system compares the possibilities to a dictionary
of words to guess the intended word.

I would like to ask some guru here to give me the direction which
technique (Python functionality) or which strategy to use to solve
this riddle.

Thanks for your advices and comments

Regards

Petr Jakes
I can think of 2 approaches to this, 1) Map the numbers to parts of a
regular expression, and then use this to search through the
dictiionary. 2) Pre-compute a copy of the dictionary converted to it's
numerical equivalent, then just match the numbers.

The basic structure you need for both of these is simple. For the
first method you use
keys = ['','abc','def','ghi',....']

then if you have s="123321"
''.join(['[%s]' % keys[int(l)] for l in s])
will give you a string like
'[abc][def][ghi][def][abc]', which you can then use to match words...

I think the second solution would end up being faster, as long as you
have the memory - no regex work, plus, you can sort the wordlist.

The following quickly written class seems to work nicely:

import string
import bisect

letters = string.lowercase
numbers = '2223334445556667777888999'
letter_mapping = dict(zip(letters, numbers))

class phone:
def __init__(self):
self.read_dictionary()

def word_as_numbers(self, word):
nums=''
for letter in word:
if letter in letter_mapping:
nums += letter_mapping[letter]
return nums

def read_dictionary(self):
words = []
for line in file("/usr/share/dict/words"):
word = line.strip().lower()
nums = self.word_as_numbers(word)
words.append((nums, word))

words.sort()
self.dict = words

def get_matching_words(self, number_str):
tup = (number_str,)
left = bisect.bisect_left(self.dict, tup)

for num, word in self.dict[left:]:
if num.startswith(number_str):
yield word
else:
break
It takes a second or two to read the list of words in, but matching is
instant thanks to bisect:
In [14]:%time p=phone.phone()
CPU times: user 1.65 s, sys: 0.00 s, total: 1.65 s
Wall time: 1.66

In [15]:%time list(p.get_matching_words('43556'))
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.01
Out[15]:['hello', 'hellman', "hellman's", "hello's", 'hellos']

It seems the ruby version just posted takes a similar approach, but
uses an actual tree.. using the bisect module keeps it simple.

--
- Justin

Aug 9 '06 #3
Thanks a lot. It works flawlessly and I have learned few new Python
"tricks" as well.
Petr Jakes

Aug 9 '06 #4
Justin Azoff:
It takes a second or two to read the list of words in,
Nice solution. If you want to speed up the initialization phase you may
use something like this (it requires a bit more memory, because lines
contains all the words).

Note that the words and numbers have the same sorting order, so you may
use this to speed up the sorting a little, like doing it on words only
(that is the lines list), but for small dictionaries sort is fast
enough already, so this isn't much important.

Note: you have to add another 9 to numbers, because z too is associated
to 9.

import string

class Phone:
def __init__(self):
numbers = '22233344455566677778889999'
convtable = string.maketrans(string.lowercase, numbers)
lines =
file("/usr/share/dict/words").read().lower().splitlines()
words = []
for line in lines:
word = line.strip()
nums = word.translate(convtable)
words.append( (nums, word) )

words.sort()
self.dict = words

p = Phone()

Bye,
bearophile

Aug 9 '06 #5
Note that this is essentially a data-compression problem, so the most
accurate solution is probably to use an instrumeted PAQ compressor in a
certain smart way, but you have to work a lot to implement this
solution, and maybe this problem doesn't deserve all this work.

Bye,
bearophile

Aug 9 '06 #6
I've tested that sorting just the strings instead of the tuples (and
removing the stripping) reduces the running time enough:

def __init__(self):
numbers = '22233344455566677778889999'
conv = string.maketrans(string.lowercase, numbers)
lines =
file("/usr/share/dict/words").read().lower().splitlines()
# lines = map(str.strip, lines)
lines.sort()
self.dict = [(word.translate(conv), word) for word in lines]

If the words file is already sorted you can skip the sorting line.
If the file contains extraneous spaces, you can strip them uncommenting
that line.

Bye

Aug 9 '06 #7
be************@lycos.com wrote:
I've tested that sorting just the strings instead of the tuples (and
removing the stripping) reduces the running time enough:

def __init__(self):
numbers = '22233344455566677778889999'
conv = string.maketrans(string.lowercase, numbers)
lines =
file("/usr/share/dict/words").read().lower().splitlines()
# lines = map(str.strip, lines)
lines.sort()
self.dict = [(word.translate(conv), word) for word in lines]

If the words file is already sorted you can skip the sorting line.
If the file contains extraneous spaces, you can strip them uncommenting
that line.
1. Wouldn't it be a good idea to process the raw dictionary *once* and
cPickle the result?

2. All responses so far seem to have missed a major point in the
research paper quoted by the OP: each word has a *frequency* associated
with it. When there are multiple choices (e.g. "43" -["he", "if",
"id", ...]), the user is presented with the choices in descending
frequency order. Note that if one of the sort keys is (-frequency), the
actual frequency doesn't need to be retained in the prepared
dictionary.

3. Anyone interested in the techniques & heuristics involved in this
type of exercise might like to look at input methods for languages like
Chinese -- instead of 26 letters mapped to 8 digits, you have tens of
thousands of characters of wildly varying frequency mapped to e.g. 400+
Pinyin "words" entered on a "standard" keyboard.

Cheers,
John

Aug 9 '06 #8
John Machin:
2. All responses so far seem to have missed a major point in the
research paper quoted by the OP: each word has a *frequency* associated
with it. When there are multiple choices (e.g. "43" -["he", "if",
"id", ...]), the user is presented with the choices in descending
frequency order.
I haven't missed it; if you use the instrumeted PAQ compressor
approach, you gain the frequency information and more :-)

Bye,
bearophile

Aug 9 '06 #9

be************@lycos.com wrote:
John Machin:
2. All responses so far seem to have missed a major point in the
research paper quoted by the OP: each word has a *frequency* associated
with it. When there are multiple choices (e.g. "43" -["he", "if",
"id", ...]), the user is presented with the choices in descending
frequency order.

I haven't missed it; if you use the instrumeted PAQ compressor
approach, you gain the frequency information and more :-)
I didn't comment on that before because:

(1) I thought it sounded like a tool in search of a problem -- the
problem being to produce a user interface that meets conflicting goals
(few keystrokes and few mistakes and minimal
hurl-the-device-out-of-the-window frustrations); compression of the
dictionary is of course desirable but I wouldn't have thought that that
should have been foremost in the design process.

(2) Googling for instrumen?ted PAQ compress(or|ion) yielded nothing
that seemed relevant -- can you supply a link or two?

Cheers,
John

Aug 9 '06 #10
Yu-Xi Lim:

Thank you for your comments, and sorry for my last cryptic answer.
>I think Bearophile isn't refering to compression of the dictionary, but the predictive algorithms used by modern data compressors. However, I think he's over-complicating the issue. It is *not* a data compression problem, imho.<
I agree that my solution is probably too much complex for most purposes
(using a compressor simpler than PAQ is probably better for most of
such purposes), but I think it is a data compression problem, because
compressing data essentially means predicting the next bit, and this
program has to predict what's the most probable letter that the user
wanted to add. See Dasher too at the end of this post.

>While predictive input is desired, the PAQ algorithm utilizes multiple "contexts" (the novel contribution of the paper mentioned below). This is intended for general purpose data compressors which work on a variety of data, such as uncompressed graphics and audio, text, or other binary data. There is however, only one context in this case.<
PAQ8 manages many contexts. Some of them are fit for digital
audio/images, or Jpeg, etc. Such contexts can be removed (disabled)
from the PAQ source code, it's not difficult. But PAQ8 contains many
(more than one) contexts just for textual data, and you can keep such
contexts, because they improve text compression, so they improve the
prediction. (Removing those contexts improves speed and even more it
reduces memory used). For this program I think you can keep the
following ones: Order n, Sparse, Text, Formatted text, Fixed record
length, Context gap, Indirect. If you are short on memory you can
probably remove some of them. If you use the keyboard to input specific
kinds of data, you may add a context for them too.

>A more advanced system (beyond regular T9 and comparable to Motorola's iTap) may consider the context of the word. So typing followed 2255#466 would make "call home" the most likely word.<
A good compressor (a PPM can be okay too) can do this too, its contexts
can be many chars long (but you need lot of memory, probably too much
for a telephone of today).

>https://www.cs.fit.edu/Projects/tech...cs-2005-16.pdf
Note that this document doesn't explain the new versions, that contain
a new good idea. Code for the last version:
http://cs.fit.edu/~mmahoney/compression/paq8h.zip

You can be interested in a similar project, that uses a PPM:
http://www.inference.phy.cam.ac.uk/djw30/dasher/

Using an instrumented PAQ this Dasher can be improved a little (speed
of the algorithm isn't important, because you are compressing few
bits/minute. The dictionaries created by the PAQ can be even frozen, in
some cases, so they can be read from disk/flash at the start of the
program.

Bye,
strong bear hugs,
bearophile

Aug 10 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Tomasz \Boruh\ Borowiak | last post by:
Does anybody have any idea how to write a c++ console application which simulates the mobile phone keyboard ? for Example: when i write SMS I hit 2 - I get "a" I hit 22 - I get "b" I hit 222...
2
by: James | last post by:
Is there a way to send SMS Text messages from a PC to a Mobile phone in C#? Can you elaborate as to what components i will need? See my final year project is to built a C# Program that will send a...
3
by: Sara | last post by:
HI, I want to code a program to detect GSM mobile (any kind) which connected through serial port to computer and then be able to send SMS through this mobile phone to other mobile phones, could...
4
by: InnoCreate | last post by:
Hi everyone, I've developed a couple of asp.net1.1 websites and these are viewable using my mobile phone. I've now moved over to asp.net2 and i'm unable to view asp.net2 websites on my phone. I've...
3
by: Melson | last post by:
Hi May I know is there a way to change the Numeric keypad into mobile phone keypad? Regards Melson
0
by: amit saha | last post by:
Hi list! The most exciting thing that i am doing offlate has been this.. I am programming on my mobile using my new favourite language, Python. Though this thing might not be new to many of you,...
1
by: Simovic | last post by:
Hello ,I'm hoping someone might be able to help me out. I want to create a program that will allow me to press a key on mobile phone keypad .Once that has been done i will be able to automate things...
0
by: govind161986 | last post by:
Hi, Is there a way to get the address through GPS if the mobile phone is GPS enabled and also if mobile phone is not GPS enabled by using mobile web forms developed in .net 2.0? Ultimately I...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.