By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
428,786 Members | 2,241 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 428,786 IT Pros & Developers. It's quick & easy.

making a typing speed tester

P: n/a
Referred here from the tutor list.
I'm trying to write a program to test someones typing speed and show
them their mistakes. However I'm getting weird results when looking
for the differences in longer (than 100 chars) strings:

import difflib

# a tape measure string (just makes it easier to locate a given index)
a =
'1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69
-72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139
-143-147-151-155-159-163-167-171-175-179-183-187-191-195--200'

# now with a few mistakes
b = '1-3-5-7-
l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78
-81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147-
151-m55-159-163-167-a71-175j179-183-187-191-195--200'

s = difflib.SequenceMatcher(None, a ,b)
ms = s.get_matching_blocks()

print ms
>>>[(0, 0, 8), (200, 200, 0)]

Have I made a mistake or is this function designed to give up when the
input strings get too long? If so what could I use instead to compute
the mistakes in a typed text?
---------- Forwarded message ----------
From: Evert Rol

Hi Tom,

Ok, I wasn't on the list last year, but I was a few days ago, so
persistence pays off; partly, as I don't have a full answer.

I got curious and looked at the source of difflib. There's a method
__chain_b() which sets up the b2j variable, which contains the
occurrences of characters in string b. So cutting b to 199
characters, it looks like this:
b2j= 19 {'a': [168], 'b': [122], 'm': [152], 'k': [86], 'v':
[125], '-': [1, 3, 5, 7, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 42,
45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93,
96, 99, 103, 107, 111, 115, 119, 123, 127, 131, 135, 139, 143, 147,
151, 155, 159, 163, 167, 171, 179, 183, 187, 191, 195, 196], 'l': [8,
98], 'o': [39], 'j': [175], '1': [0, 10, 13, 16, 20, 50, 80, 100,
104, 108, 109, 110, 112, 113, 116, 117, 120, 124, 128, 130, 132, 136,
140, 144, 148, 150, 156, 160, 164, 170, 172, 176, 180, 184, 188, 190,
192], '0': [29, 59, 89, 101, 105, 198], '3': [2, 28, 31, 32, 34, 37,
62, 92, 102, 129, 133, 137, 142, 162, 182], '2': [11, 19, 22, 25, 41,
71, 121, 197], '5': [4, 14, 44, 49, 52, 55, 74, 114, 134, 149, 153,
154, 157, 174, 194], '4': [23, 40, 43, 46, 53, 83, 141, 145], '7':
[6, 26, 56, 70, 73, 76, 106, 126, 146, 166, 169, 173, 177, 186], '6':
[35, 58, 61, 64, 65, 67, 95, 161, 165], '9': [38, 68, 88, 91, 94, 97,
118, 138, 158, 178, 189, 193], '8': [17, 47, 77, 79, 82, 85, 181,
185]}

This little detour is because of how b2j is built. Here's a part from
the comments of __chain_b():

# Before the tricks described here, __chain_b was by far the most
# time-consuming routine in the whole module! If anyone sees
# Jim Roskind, thank him again for profile.py -- I never would
# have guessed that.

And the part of the actual code reads:
b = self.b
n = len(b)
self.b2j = b2j = {}
populardict = {}
for i, elt in enumerate(b):
if elt in b2j:
indices = b2j[elt]
if n >= 200 and len(indices) * 100 n: # <--- !!
populardict[elt] = 1
del indices[:]
else:
indices.append(i)
else:
b2j[elt] = [i]

So you're right: it has a stop at the (somewhat arbitrarily) limit of
200 characters. How that exactly works, I don't know (needs more
delving into the code), though it looks like there also need to be a
lot of indices (len(indices*100>n); I guess that's caused in your
strings by the dashes, '1's and '0's (that's why I printed the b2j
string).
If you feel safe enough and on a fast platform, you can probably up
that limit (or even put it somewhere as an optional variable in the
code, which I would think is generally better).
Not sure who the author of the module is (doesn't list in the file
itself), but perhaps you can find out and email him/her, to see what
can be altered.

Hope that helps.

Evert

Nov 14 '07 #1
Share this Question
Share on Google+
2 Replies


P: n/a
On Nov 14, 11:56 am, tavspamno...@googlemail.com wrote:
Referred here from the tutor list.
I'm trying to write a program to test someones typing speed and show
them their mistakes. However I'm getting weird results when looking
for the differences in longer (than 100 chars) strings:
import difflib
# a tape measure string (just makes it easier to locate a given index)
a =
'1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69
-72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139
-143-147-151-155-159-163-167-171-175-179-183-187-191-195--200'
# now with a few mistakes
b = '1-3-5-7-
l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78
-81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147-
151-m55-159-163-167-a71-175j179-183-187-191-195--200'
s = difflib.SequenceMatcher(None, a ,b)
ms = s.get_matching_blocks()
print ms
>>[(0, 0, 8), (200, 200, 0)]
Have I made a mistake or is this function designed to give up when the
input strings get too long? If so what could I use instead to compute
the mistakes in a typed text?
---------- Forwarded message ----------
From: Evert Rol

Hi Tom,

Ok, I wasn't on the list last year, but I was a few days ago, so
persistence pays off; partly, as I don't have a full answer.

I got curious and looked at the source of difflib. There's a method
__chain_b() which sets up the b2j variable, which contains the
occurrences of characters in string b. So cutting b to 199
characters, it looks like this:
b2j= 19 {'a': [168], 'b': [122], 'm': [152], 'k': [86], 'v':
[125], '-': [1, 3, 5, 7, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 42,
45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93,
96, 99, 103, 107, 111, 115, 119, 123, 127, 131, 135, 139, 143, 147,
151, 155, 159, 163, 167, 171, 179, 183, 187, 191, 195, 196], 'l': [8,
98], 'o': [39], 'j': [175], '1': [0, 10, 13, 16, 20, 50, 80, 100,
104, 108, 109, 110, 112, 113, 116, 117, 120, 124, 128, 130, 132, 136,
140, 144, 148, 150, 156, 160, 164, 170, 172, 176, 180, 184, 188, 190,
192], '0': [29, 59, 89, 101, 105, 198], '3': [2, 28, 31, 32, 34, 37,
62, 92, 102, 129, 133, 137, 142, 162, 182], '2': [11, 19, 22, 25, 41,
71, 121, 197], '5': [4, 14, 44, 49, 52, 55, 74, 114, 134, 149, 153,
154, 157, 174, 194], '4': [23, 40, 43, 46, 53, 83, 141, 145], '7':
[6, 26, 56, 70, 73, 76, 106, 126, 146, 166, 169, 173, 177, 186], '6':
[35, 58, 61, 64, 65, 67, 95, 161, 165], '9': [38, 68, 88, 91, 94, 97,
118, 138, 158, 178, 189, 193], '8': [17, 47, 77, 79, 82, 85, 181,
185]}

This little detour is because of how b2j is built. Here's a part from
the comments of __chain_b():

# Before the tricks described here, __chain_b was by far the most
# time-consuming routine in the whole module! If anyone sees
# Jim Roskind, thank him again for profile.py -- I never would
# have guessed that.

And the part of the actual code reads:
b = self.b
n = len(b)
self.b2j = b2j = {}
populardict = {}
for i, elt in enumerate(b):
if elt in b2j:
indices = b2j[elt]
if n >= 200 and len(indices) * 100 n: # <--- !!
populardict[elt] = 1
del indices[:]
else:
indices.append(i)
else:
b2j[elt] = [i]

So you're right: it has a stop at the (somewhat arbitrarily) limit of
200 characters. How that exactly works, I don't know (needs more
delving into the code), though it looks like there also need to be a
lot of indices (len(indices*100>n); I guess that's caused in your
strings by the dashes, '1's and '0's (that's why I printed the b2j
string).
If you feel safe enough and on a fast platform, you can probably up
that limit (or even put it somewhere as an optional variable in the
code, which I would think is generally better).
Not sure who the author of the module is (doesn't list in the file
itself), but perhaps you can find out and email him/her, to see what
can be altered.

Hope that helps.

Evert
I would use the time module to "time" the user. Then you should be
able to compare the original string with the user inputted string
using cmp.

<code>
# untested

start = time.time()
print 'some complicated long string'

# you should use a GUI toolkit's textbox rather than
# using a variable
user_string = raw_input('Please type the string above as quickly and
accurately as you can:\n\n')
end = time.time()
print 'amount of time to complete: %s seconds' % (end-start)

# do the comparison here
# which I am not sure how to do right now
</code>

See the following for ideas on comparing similar strings/iterables:

http://www.velocityreviews.com/forum...r-strings.html

Mike

Nov 14 '07 #2

P: n/a
En Wed, 14 Nov 2007 14:56:25 -0300, <ta**********@googlemail.comescribió:
>I'm trying to write a program to test someones typing speed and show
them their mistakes. However I'm getting weird results when looking
for the differences in longer (than 100 chars) strings:

import difflib

# a tape measure string (just makes it easier to locate a given index)
a =
'1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69
-72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139
-143-147-151-155-159-163-167-171-175-179-183-187-191-195--200'

# now with a few mistakes
b = '1-3-5-7-
l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78
-81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147-
151-m55-159-163-167-a71-175j179-183-187-191-195--200'

s = difflib.SequenceMatcher(None, a ,b)
ms = s.get_matching_blocks()

print ms
>>>>[(0, 0, 8), (200, 200, 0)]

Have I made a mistake or is this function designed to give up when the
input strings get too long? If so what could I use instead to compute
the mistakes in a typed text?
Yes, there are some limitations on how SequenceMatcher works.
---------- Forwarded message ----------
From: Evert Rol
[...]
And the part of the actual code reads:
if n >= 200 and len(indices) * 100 n: # <--- !!
populardict[elt] = 1
del indices[:]
else:
indices.append(i)>
So you're right: it has a stop at the (somewhat arbitrarily) limit of
200 characters. [...]If you feel safe enough and on a fast platform, you
can probably up
that limit (or even put it somewhere as an optional variable in the
code, which I would think is generally better).
If you try with a slightly shorter text (190 chars, by example) you get
the expected result, pretty fast:

pys = difflib.SequenceMatcher(None, a[:190], b[:190])
pyms = s.get_matching_blocks()
pyprint ms
[(0, 0, 8), (9, 9, 30), (40, 40, 46), (87, 87, 11), (99, 99, 23), (123,
123, 2),
(126, 126, 26), (153, 153, 15), (169, 169, 6), (176, 176, 14), (190, 190,
0)]

So it appears that your strings are hitting that (arbitrary) limit. From
the algorithm point of view, your strings are a rather degenerate case: so
many '-' and '0' and '1's to match.
Try increasing that 200 to somewhat larger than your strings.

--
Gabriel Genellina

Nov 15 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.