technique to enter text using a mobile phone keypad (T9dictionary-based disambiguation)

Petr Jakeš

I have a standard 12-key mobile phone keypad connected to my Linux
machine as a I2C peripheral. I would like to write a code which allows
the text entry to the computer using this keypad (something like T9 on
the mobile phones)

According to the http://www.yorku.ca/mack/uist01.html
dictionary-based disambiguation is coming in the mind.

With dictionary-based disambiguation, each key is pressed only once.
For example, to enter the, the user enters 8-4-3-0. The 0 key, for
SPACE, delimits words and terminates disambiguation of the preceding
keys. The key sequence 8-4-3 has 3 × 3 × 3 = 27 possible renderings
(see Figure 1). The system compares the possibilities to a dictionary
of words to guess the intended word.

I would like to ask some guru here to give me the direction which
technique (Python functionality) or which strategy to use to solve
this riddle.

Thanks for your advices and comments

Regards

Petr Jakes

Aug 8 '06 #1

Subscribe Post Reply

6381

gene tani

Petr Jakeš wrote:

I have a standard 12-key mobile phone keypad connected to my Linux
machine as a I2C peripheral. I would like to write a code which allows
the text entry to the computer using this keypad (something like T9 on
the mobile phones)

According to the http://www.yorku.ca/mack/uist01.html
dictionary-based disambiguation is coming in the mind.

With dictionary-based disambiguation, each key is pressed only once.
For example, to enter the, the user enters 8-4-3-0. The 0 key, for
SPACE, delimits words and terminates disambiguation of the preceding
keys. The key sequence 8-4-3 has 3 × 3 × 3 = 27 possible renderings
(see Figure 1). The system compares the possibilities to a dictionary
of words to guess the intended word.

http://rubyquiz.com/quiz20.html

Aug 9 '06 #2

Justin Azoff

Petr Jakeš wrote:

I have a standard 12-key mobile phone keypad connected to my Linux
machine as a I2C peripheral. I would like to write a code which allows
the text entry to the computer using this keypad (something like T9 on
the mobile phones)

According to the http://www.yorku.ca/mack/uist01.html
dictionary-based disambiguation is coming in the mind.

With dictionary-based disambiguation, each key is pressed only once.
For example, to enter the, the user enters 8-4-3-0. The 0 key, for
SPACE, delimits words and terminates disambiguation of the preceding
keys. The key sequence 8-4-3 has 3 × 3 × 3 = 27 possible renderings
(see Figure 1). The system compares the possibilities to a dictionary
of words to guess the intended word.

I would like to ask some guru here to give me the direction which
technique (Python functionality) or which strategy to use to solve
this riddle.

Thanks for your advices and comments

Regards

Petr Jakes

I can think of 2 approaches to this, 1) Map the numbers to parts of a
regular expression, and then use this to search through the
dictiionary. 2) Pre-compute a copy of the dictionary converted to it's
numerical equivalent, then just match the numbers.

The basic structure you need for both of these is simple. For the
first method you use
keys = ['','abc','def','ghi',....']

then if you have s="123321"
''.join(['[%s]' % keys[int(l)] for l in s])
will give you a string like
'[abc][def][ghi][def][abc]', which you can then use to match words...

I think the second solution would end up being faster, as long as you
have the memory - no regex work, plus, you can sort the wordlist.

The following quickly written class seems to work nicely:

import string
import bisect

letters = string.lowercase
numbers = '2223334445556667777888999'
letter_mapping = dict(zip(letters, numbers))

class phone:
def __init__(self):
self.read_dictionary()

def word_as_numbers(self, word):
nums=''
for letter in word:
if letter in letter_mapping:
nums += letter_mapping[letter]
return nums

def read_dictionary(self):
words = []
for line in file("/usr/share/dict/words"):
word = line.strip().lower()
nums = self.word_as_numbers(word)
words.append((nums, word))

words.sort()
self.dict = words

def get_matching_words(self, number_str):
tup = (number_str,)
left = bisect.bisect_left(self.dict, tup)

for num, word in self.dict[left:]:
if num.startswith(number_str):
yield word
else:
break
It takes a second or two to read the list of words in, but matching is
instant thanks to bisect:
In [14]:%time p=phone.phone()
CPU times: user 1.65 s, sys: 0.00 s, total: 1.65 s
Wall time: 1.66

In [15]:%time list(p.get_matching_words('43556'))
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.01
Out[15]:['hello', 'hellman', "hellman's", "hello's", 'hellos']

It seems the ruby version just posted takes a similar approach, but
uses an actual tree.. using the bisect module keeps it simple.

--
- Justin

Aug 9 '06 #3

Petr Jakes

Thanks a lot. It works flawlessly and I have learned few new Python
"tricks" as well.
Petr Jakes

Aug 9 '06 #4

bearophileHUGS

Justin Azoff:

It takes a second or two to read the list of words in,

Nice solution. If you want to speed up the initialization phase you may
use something like this (it requires a bit more memory, because lines
contains all the words).

Note that the words and numbers have the same sorting order, so you may
use this to speed up the sorting a little, like doing it on words only
(that is the lines list), but for small dictionaries sort is fast
enough already, so this isn't much important.

Note: you have to add another 9 to numbers, because z too is associated
to 9.

import string

class Phone:
def __init__(self):
numbers = '22233344455566677778889999'
convtable = string.maketrans(string.lowercase, numbers)
lines =
file("/usr/share/dict/words").read().lower().splitlines()
words = []
for line in lines:
word = line.strip()
nums = word.translate(convtable)
words.append( (nums, word) )

words.sort()
self.dict = words

p = Phone()

Bye,
bearophile

Aug 9 '06 #5

bearophileHUGS

Note that this is essentially a data-compression problem, so the most
accurate solution is probably to use an instrumeted PAQ compressor in a
certain smart way, but you have to work a lot to implement this
solution, and maybe this problem doesn't deserve all this work.

Bye,
bearophile

Aug 9 '06 #6

bearophileHUGS

I've tested that sorting just the strings instead of the tuples (and
removing the stripping) reduces the running time enough:

def __init__(self):
numbers = '22233344455566677778889999'
conv = string.maketrans(string.lowercase, numbers)
lines =
file("/usr/share/dict/words").read().lower().splitlines()
# lines = map(str.strip, lines)
lines.sort()
self.dict = [(word.translate(conv), word) for word in lines]

If the words file is already sorted you can skip the sorting line.
If the file contains extraneous spaces, you can strip them uncommenting
that line.

Bye

Aug 9 '06 #7

John Machin

be************@lycos.com wrote:

I've tested that sorting just the strings instead of the tuples (and
removing the stripping) reduces the running time enough:

def __init__(self):
numbers = '22233344455566677778889999'
conv = string.maketrans(string.lowercase, numbers)
lines =
file("/usr/share/dict/words").read().lower().splitlines()
# lines = map(str.strip, lines)
lines.sort()
self.dict = [(word.translate(conv), word) for word in lines]

If the words file is already sorted you can skip the sorting line.
If the file contains extraneous spaces, you can strip them uncommenting
that line.

1. Wouldn't it be a good idea to process the raw dictionary *once* and
cPickle the result?

2. All responses so far seem to have missed a major point in the
research paper quoted by the OP: each word has a *frequency* associated
with it. When there are multiple choices (e.g. "43" -["he", "if",
"id", ...]), the user is presented with the choices in descending
frequency order. Note that if one of the sort keys is (-frequency), the
actual frequency doesn't need to be retained in the prepared
dictionary.

3. Anyone interested in the techniques & heuristics involved in this
type of exercise might like to look at input methods for languages like
Chinese -- instead of 26 letters mapped to 8 digits, you have tens of
thousands of characters of wildly varying frequency mapped to e.g. 400+
Pinyin "words" entered on a "standard" keyboard.

Cheers,
John

Aug 9 '06 #8

bearophileHUGS

John Machin:

2. All responses so far seem to have missed a major point in the
research paper quoted by the OP: each word has a *frequency* associated
with it. When there are multiple choices (e.g. "43" -["he", "if",
"id", ...]), the user is presented with the choices in descending
frequency order.

I haven't missed it; if you use the instrumeted PAQ compressor
approach, you gain the frequency information and more :-)

Bye,
bearophile

Aug 9 '06 #9

John Machin

be************@lycos.com wrote:

John Machin:
2. All responses so far seem to have missed a major point in the
research paper quoted by the OP: each word has a *frequency* associated
with it. When there are multiple choices (e.g. "43" -["he", "if",
"id", ...]), the user is presented with the choices in descending
frequency order.

I haven't missed it; if you use the instrumeted PAQ compressor
approach, you gain the frequency information and more :-)

I didn't comment on that before because:

(1) I thought it sounded like a tool in search of a problem -- the
problem being to produce a user interface that meets conflicting goals
(few keystrokes and few mistakes and minimal
hurl-the-device-out-of-the-window frustrations); compression of the
dictionary is of course desirable but I wouldn't have thought that that
should have been foremost in the design process.

(2) Googling for instrumen?ted PAQ compress(or|ion) yielded nothing
that seemed relevant -- can you supply a link or two?

Cheers,
John

Aug 9 '06 #10

bearophileHUGS

Yu-Xi Lim:

Thank you for your comments, and sorry for my last cryptic answer.

>I think Bearophile isn't refering to compression of the dictionary, but the predictive algorithms used by modern data compressors. However, I think he's over-complicating the issue. It is *not* a data compression problem, imho.<

I agree that my solution is probably too much complex for most purposes
(using a compressor simpler than PAQ is probably better for most of
such purposes), but I think it is a data compression problem, because
compressing data essentially means predicting the next bit, and this
program has to predict what's the most probable letter that the user
wanted to add. See Dasher too at the end of this post.

>While predictive input is desired, the PAQ algorithm utilizes multiple "contexts" (the novel contribution of the paper mentioned below). This is intended for general purpose data compressors which work on a variety of data, such as uncompressed graphics and audio, text, or other binary data. There is however, only one context in this case.<

PAQ8 manages many contexts. Some of them are fit for digital
audio/images, or Jpeg, etc. Such contexts can be removed (disabled)
from the PAQ source code, it's not difficult. But PAQ8 contains many
(more than one) contexts just for textual data, and you can keep such
contexts, because they improve text compression, so they improve the
prediction. (Removing those contexts improves speed and even more it
reduces memory used). For this program I think you can keep the
following ones: Order n, Sparse, Text, Formatted text, Fixed record
length, Context gap, Indirect. If you are short on memory you can
probably remove some of them. If you use the keyboard to input specific
kinds of data, you may add a context for them too.

>A more advanced system (beyond regular T9 and comparable to Motorola's iTap) may consider the context of the word. So typing followed 2255#466 would make "call home" the most likely word.<

A good compressor (a PPM can be okay too) can do this too, its contexts
can be many chars long (but you need lot of memory, probably too much
for a telephone of today).

>https://www.cs.fit.edu/Projects/tech...cs-2005-16.pdf

Note that this document doesn't explain the new versions, that contain
a new good idea. Code for the last version:
http://cs.fit.edu/~mmahoney/compression/paq8h.zip

You can be interested in a similar project, that uses a PPM:
http://www.inference.phy.cam.ac.uk/djw30/dasher/

Using an instrumented PAQ this Dasher can be improved a little (speed
of the algorithm isn't important, because you are compressing few
bits/minute. The dictionaries created by the PAQ can be even frozen, in
some cases, so they can be read from disk/flash at the start of the
program.

Bye,
strong bear hugs,
bearophile

Aug 10 '06 #11

Similar topics

mobile phone keyboard

by: Tomasz \Boruh\ Borowiak | last post by:

Does anybody have any idea how to write a c++ console application which simulates the mobile phone keyboard ? for Example: when i write SMS I hit 2 - I get "a" I hit 22 - I get "b" I hit 222...

C / C++

SMS Text Message from PC To Mobile Phone in C#

by: James | last post by:

Is there a way to send SMS Text messages from a PC to a Mobile phone in C#? Can you elaborate as to what components i will need? See my final year project is to built a C# Program that will send a...

C# / C Sharp

detect mobile phone and send SMS by it

by: Sara | last post by:

HI, I want to code a program to detect GSM mobile (any kind) which connected through serial port to computer and then be able to send SMS through this mobile phone to other mobile phones, could...

C# / C Sharp

can't view asp.net2 pages using mobile phone

by: InnoCreate | last post by:

Hi everyone, I've developed a couple of asp.net1.1 websites and these are viewable using my mobile phone. I've now moved over to asp.net2 and i'm unable to view asp.net2 websites on my phone. I've...

ASP.NET

Change Numeric keypad to mobile phone keypad

by: Melson | last post by:

Hi May I know is there a way to change the Numeric keypad into mobile phone keypad? Regards Melson

C# / C Sharp

Python on your mobile phone

by: amit saha | last post by:

Hi list! The most exciting thing that i am doing offlate has been this.. I am programming on my mobile using my new favourite language, Python. Though this thing might not be new to many of you,...

Python

KeySend to mobile phone

by: Simovic | last post by:

Hello ,I'm hoping someone might be able to help me out. I want to create a program that will allow me to press a key on mobile phone keypad .Once that has been done i will be able to automate things...

Mobile Development

How to get address of mobile phone using mobile web forms in .net 2.0

by: govind161986 | last post by:

Hi, Is there a way to get the address through GPS if the mobile phone is GPS enabled and also if mobile phone is not GPS enabled by using mobile web forms developed in .net 2.0? Ultimately I...

Mobile Development

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing