making a typing speed tester

Referred here from the tutor list.

I'm trying to write a program to test someones typing speed and show
them their mistakes. However I'm getting weird results when looking
for the differences in longer (than 100 chars) strings:

import difflib

# a tape measure string (just makes it easier to locate a given index)
a =
'1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69
-72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139
-143-147-151-155-159-163-167-171-175-179-183-187-191-195--200'

# now with a few mistakes
b = '1-3-5-7-
l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78
-81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147-
151-m55-159-163-167-a71-175j179-183-187-191-195--200'

s = difflib.SequenceMatcher(None, a ,b)
ms = s.get_matching_blocks()

print ms

>>>[(0, 0, 8), (200, 200, 0)]

Have I made a mistake or is this function designed to give up when the
input strings get too long? If so what could I use instead to compute
the mistakes in a typed text?

---------- Forwarded message ----------
From: Evert Rol

Hi Tom,

Ok, I wasn't on the list last year, but I was a few days ago, so
persistence pays off; partly, as I don't have a full answer.

I got curious and looked at the source of difflib. There's a method
__chain_b() which sets up the b2j variable, which contains the
occurrences of characters in string b. So cutting b to 199
characters, it looks like this:
b2j= 19 {'a': [168], 'b': [122], 'm': [152], 'k': [86], 'v':
[125], '-': [1, 3, 5, 7, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 42,
45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93,
96, 99, 103, 107, 111, 115, 119, 123, 127, 131, 135, 139, 143, 147,
151, 155, 159, 163, 167, 171, 179, 183, 187, 191, 195, 196], 'l': [8,
98], 'o': [39], 'j': [175], '1': [0, 10, 13, 16, 20, 50, 80, 100,
104, 108, 109, 110, 112, 113, 116, 117, 120, 124, 128, 130, 132, 136,
140, 144, 148, 150, 156, 160, 164, 170, 172, 176, 180, 184, 188, 190,
192], '0': [29, 59, 89, 101, 105, 198], '3': [2, 28, 31, 32, 34, 37,
62, 92, 102, 129, 133, 137, 142, 162, 182], '2': [11, 19, 22, 25, 41,
71, 121, 197], '5': [4, 14, 44, 49, 52, 55, 74, 114, 134, 149, 153,
154, 157, 174, 194], '4': [23, 40, 43, 46, 53, 83, 141, 145], '7':
[6, 26, 56, 70, 73, 76, 106, 126, 146, 166, 169, 173, 177, 186], '6':
[35, 58, 61, 64, 65, 67, 95, 161, 165], '9': [38, 68, 88, 91, 94, 97,
118, 138, 158, 178, 189, 193], '8': [17, 47, 77, 79, 82, 85, 181,
185]}

This little detour is because of how b2j is built. Here's a part from
the comments of __chain_b():

# Before the tricks described here, __chain_b was by far the most
# time-consuming routine in the whole module! If anyone sees
# Jim Roskind, thank him again for profile.py -- I never would
# have guessed that.

And the part of the actual code reads:
b = self.b
n = len(b)
self.b2j = b2j = {}
populardict = {}
for i, elt in enumerate(b):
if elt in b2j:
indices = b2j[elt]
if n >= 200 and len(indices) * 100 n: # <--- !!
populardict[elt] = 1
del indices[:]
else:
indices.append(i)
else:
b2j[elt] = [i]

So you're right: it has a stop at the (somewhat arbitrarily) limit of
200 characters. How that exactly works, I don't know (needs more
delving into the code), though it looks like there also need to be a
lot of indices (len(indices*100>n); I guess that's caused in your
strings by the dashes, '1's and '0's (that's why I printed the b2j
string).
If you feel safe enough and on a fast platform, you can probably up
that limit (or even put it somewhere as an optional variable in the
code, which I would think is generally better).
Not sure who the author of the module is (doesn't list in the file
itself), but perhaps you can find out and email him/her, to see what
can be altered.

Hope that helps.

Evert

Nov 14 '07 #1

Subscribe Post Reply

2058

kyosohma

On Nov 14, 11:56 am, tavspamno...@googlemail.com wrote:

Referred here from the tutor list.

I'm trying to write a program to test someones typing speed and show
them their mistakes. However I'm getting weird results when looking
for the differences in longer (than 100 chars) strings:

import difflib

# a tape measure string (just makes it easier to locate a given index)
a =
'1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69
-72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139
-143-147-151-155-159-163-167-171-175-179-183-187-191-195--200'

# now with a few mistakes
b = '1-3-5-7-
l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78
-81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147-
151-m55-159-163-167-a71-175j179-183-187-191-195--200'

s = difflib.SequenceMatcher(None, a ,b)
ms = s.get_matching_blocks()

print ms

>>[(0, 0, 8), (200, 200, 0)]

Have I made a mistake or is this function designed to give up when the
input strings get too long? If so what could I use instead to compute
the mistakes in a typed text?
---------- Forwarded message ----------
From: Evert Rol

Hi Tom,

Ok, I wasn't on the list last year, but I was a few days ago, so
persistence pays off; partly, as I don't have a full answer.

I got curious and looked at the source of difflib. There's a method
__chain_b() which sets up the b2j variable, which contains the
occurrences of characters in string b. So cutting b to 199
characters, it looks like this:
b2j= 19 {'a': [168], 'b': [122], 'm': [152], 'k': [86], 'v':
[125], '-': [1, 3, 5, 7, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 42,
45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93,
96, 99, 103, 107, 111, 115, 119, 123, 127, 131, 135, 139, 143, 147,
151, 155, 159, 163, 167, 171, 179, 183, 187, 191, 195, 196], 'l': [8,
98], 'o': [39], 'j': [175], '1': [0, 10, 13, 16, 20, 50, 80, 100,
104, 108, 109, 110, 112, 113, 116, 117, 120, 124, 128, 130, 132, 136,
140, 144, 148, 150, 156, 160, 164, 170, 172, 176, 180, 184, 188, 190,
192], '0': [29, 59, 89, 101, 105, 198], '3': [2, 28, 31, 32, 34, 37,
62, 92, 102, 129, 133, 137, 142, 162, 182], '2': [11, 19, 22, 25, 41,
71, 121, 197], '5': [4, 14, 44, 49, 52, 55, 74, 114, 134, 149, 153,
154, 157, 174, 194], '4': [23, 40, 43, 46, 53, 83, 141, 145], '7':
[6, 26, 56, 70, 73, 76, 106, 126, 146, 166, 169, 173, 177, 186], '6':
[35, 58, 61, 64, 65, 67, 95, 161, 165], '9': [38, 68, 88, 91, 94, 97,
118, 138, 158, 178, 189, 193], '8': [17, 47, 77, 79, 82, 85, 181,
185]}

This little detour is because of how b2j is built. Here's a part from
the comments of __chain_b():

# Before the tricks described here, __chain_b was by far the most
# time-consuming routine in the whole module! If anyone sees
# Jim Roskind, thank him again for profile.py -- I never would
# have guessed that.

And the part of the actual code reads:
b = self.b
n = len(b)
self.b2j = b2j = {}
populardict = {}
for i, elt in enumerate(b):
if elt in b2j:
indices = b2j[elt]
if n >= 200 and len(indices) * 100 n: # <--- !!
populardict[elt] = 1
del indices[:]
else:
indices.append(i)
else:
b2j[elt] = [i]

So you're right: it has a stop at the (somewhat arbitrarily) limit of
200 characters. How that exactly works, I don't know (needs more
delving into the code), though it looks like there also need to be a
lot of indices (len(indices*100>n); I guess that's caused in your
strings by the dashes, '1's and '0's (that's why I printed the b2j
string).
If you feel safe enough and on a fast platform, you can probably up
that limit (or even put it somewhere as an optional variable in the
code, which I would think is generally better).
Not sure who the author of the module is (doesn't list in the file
itself), but perhaps you can find out and email him/her, to see what
can be altered.

Hope that helps.

Evert

I would use the time module to "time" the user. Then you should be
able to compare the original string with the user inputted string
using cmp.

<code>
# untested

start = time.time()
print 'some complicated long string'

# you should use a GUI toolkit's textbox rather than
# using a variable
user_string = raw_input('Please type the string above as quickly and
accurately as you can:\n\n')
end = time.time()
print 'amount of time to complete: %s seconds' % (end-start)

# do the comparison here
# which I am not sure how to do right now
</code>

See the following for ideas on comparing similar strings/iterables:

http://www.velocityreviews.com/forum...r-strings.html

Mike

Nov 14 '07 #2

Gabriel Genellina

En Wed, 14 Nov 2007 14:56:25 -0300, <ta**********@googlemail.comescribió:

>I'm trying to write a program to test someones typing speed and show
them their mistakes. However I'm getting weird results when looking
for the differences in longer (than 100 chars) strings:

import difflib

# a tape measure string (just makes it easier to locate a given index)
a =
'1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69
-72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139
-143-147-151-155-159-163-167-171-175-179-183-187-191-195--200'

# now with a few mistakes
b = '1-3-5-7-
l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78
-81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147-
151-m55-159-163-167-a71-175j179-183-187-191-195--200'

s = difflib.SequenceMatcher(None, a ,b)
ms = s.get_matching_blocks()

print ms

>>>>[(0, 0, 8), (200, 200, 0)]

Have I made a mistake or is this function designed to give up when the
input strings get too long? If so what could I use instead to compute
the mistakes in a typed text?

Yes, there are some limitations on how SequenceMatcher works.

---------- Forwarded message ----------
From: Evert Rol
[...]
And the part of the actual code reads:

if n >= 200 and len(indices) * 100 n: # <--- !!
populardict[elt] = 1
del indices[:]
else:
indices.append(i)>

So you're right: it has a stop at the (somewhat arbitrarily) limit of
200 characters. [...]If you feel safe enough and on a fast platform, you
can probably up
that limit (or even put it somewhere as an optional variable in the
code, which I would think is generally better).

If you try with a slightly shorter text (190 chars, by example) you get
the expected result, pretty fast:

pys = difflib.SequenceMatcher(None, a[:190], b[:190])
pyms = s.get_matching_blocks()
pyprint ms
[(0, 0, 8), (9, 9, 30), (40, 40, 46), (87, 87, 11), (99, 99, 23), (123,
123, 2),
(126, 126, 26), (153, 153, 15), (169, 169, 6), (176, 176, 14), (190, 190,
0)]

So it appears that your strings are hitting that (arbitrary) limit. From
the algorithm point of view, your strings are a rather degenerate case: so
many '-' and '0' and '1's to match.
Try increasing that 200 to somewhat larger than your strings.

--
Gabriel Genellina

Nov 15 '07 #3

Similar topics

doctest.Tester is deprecated

by: Michele Simionato | last post by:

Some time ago I hacked a custom solution to run doctests on text files containing documentation. The solution involved this kind of game: tester=doctest.Tester(globs={},verbose=1)...

Python

Intellisense and the psychology of typing

by: andrew.queisser | last post by:

Yesterday I typed in some C++ code that called a function with two ints. Intellisense (auto-complete) helpfully told me that the first formal parameter was called "frontLight" and the second...

Python

Speed up you typing speed more than 100 wpm

by: zheenma | last post by:

There is a program that speed up your typing and avoid spelling errors. My friends use it ,their averege is more than 100 wpm. You can download it from www.wamasoft.com. And It has a library for...

C / C++

PostgreSQL insert speed tests

by: Sezai YILMAZ | last post by:

Hello I need high throughput while inserting into PostgreSQL. Because of that I did some PostgreSQL insert performance tests. ------------------------------------------------------------ --...

PostgreSQL Database

669

What is Expressiveness in a Computer Language

by: Xah Lee | last post by:

in March, i posted a essay â€œWhat is Expressiveness in a Computer Languageâ€, archived at: http://xahlee.org/perl-python/what_is_expresiveness.html I was informed then that there is a academic...

Python

Speed in VB 2005

by: Stephen Plotnick | last post by:

I upgraded from VB 2003 to VB 2005. I have a 4000 record table in Access that I'm using oleDB to do a data adapter fill to a Data Grid. In VB.2003 it takes 2-3 seconds to load the data grid in...

Visual Basic .NET

C# vs. VB.NET: typing speed

by: byteschreck | last post by:

I recently switched temporarily from C# to VB.NET to see what the differences are. To my surprise I am *much* faster with VB.NET. I don't know why, pressing the shift key to capitalize letters...

Visual Basic .NET

Speed problem!

by: staticfire | last post by:

Hi i am in need of help with an ajax related problem on my site. I've coded an active users list which shows the members and guests online. I'm sure you've all seen active user lists before, like the...

Javascript

Execution speed question

by: Suresh Pillai | last post by:

I am performing simulations on networks (graphs). I have a question on speed of execution (assuming very ample memory for now). I simplify the details of my simulation below, as the question I...

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++