Referred here from the tutor list.
I'm trying to write a program to test someones typing speed and show
them their mistakes. However I'm getting weird results when looking
for the differences in longer (than 100 chars) strings:
import difflib
# a tape measure string (just makes it easier to locate a given index)
a =
'1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69
-72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139
-143-147-151-155-159-163-167-171-175-179-183-187-191-195--200'
# now with a few mistakes
b = '1-3-5-7-
l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78
-81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147-
151-m55-159-163-167-a71-175j179-183-187-191-195--200'
s = difflib.SequenceMatcher(None, a ,b)
ms = s.get_matching_blocks()
print ms
>>>[(0, 0, 8), (200, 200, 0)]
Have I made a mistake or is this function designed to give up when the
input strings get too long? If so what could I use instead to compute
the mistakes in a typed text?
---------- Forwarded message ----------
From: Evert Rol
Hi Tom,
Ok, I wasn't on the list last year, but I was a few days ago, so
persistence pays off; partly, as I don't have a full answer.
I got curious and looked at the source of difflib. There's a method
__chain_b() which sets up the b2j variable, which contains the
occurrences of characters in string b. So cutting b to 199
characters, it looks like this:
b2j= 19 {'a': [168], 'b': [122], 'm': [152], 'k': [86], 'v':
[125], '-': [1, 3, 5, 7, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 42,
45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93,
96, 99, 103, 107, 111, 115, 119, 123, 127, 131, 135, 139, 143, 147,
151, 155, 159, 163, 167, 171, 179, 183, 187, 191, 195, 196], 'l': [8,
98], 'o': [39], 'j': [175], '1': [0, 10, 13, 16, 20, 50, 80, 100,
104, 108, 109, 110, 112, 113, 116, 117, 120, 124, 128, 130, 132, 136,
140, 144, 148, 150, 156, 160, 164, 170, 172, 176, 180, 184, 188, 190,
192], '0': [29, 59, 89, 101, 105, 198], '3': [2, 28, 31, 32, 34, 37,
62, 92, 102, 129, 133, 137, 142, 162, 182], '2': [11, 19, 22, 25, 41,
71, 121, 197], '5': [4, 14, 44, 49, 52, 55, 74, 114, 134, 149, 153,
154, 157, 174, 194], '4': [23, 40, 43, 46, 53, 83, 141, 145], '7':
[6, 26, 56, 70, 73, 76, 106, 126, 146, 166, 169, 173, 177, 186], '6':
[35, 58, 61, 64, 65, 67, 95, 161, 165], '9': [38, 68, 88, 91, 94, 97,
118, 138, 158, 178, 189, 193], '8': [17, 47, 77, 79, 82, 85, 181,
185]}
This little detour is because of how b2j is built. Here's a part from
the comments of __chain_b():
# Before the tricks described here, __chain_b was by far the most
# time-consuming routine in the whole module! If anyone sees
# Jim Roskind, thank him again for profile.py -- I never would
# have guessed that.
And the part of the actual code reads:
b = self.b
n = len(b)
self.b2j = b2j = {}
populardict = {}
for i, elt in enumerate(b):
if elt in b2j:
indices = b2j[elt]
if n >= 200 and len(indices) * 100 n: # <--- !!
populardict[elt] = 1
del indices[:]
else:
indices.append(i)
else:
b2j[elt] = [i]
So you're right: it has a stop at the (somewhat arbitrarily) limit of
200 characters. How that exactly works, I don't know (needs more
delving into the code), though it looks like there also need to be a
lot of indices (len(indices*100>n); I guess that's caused in your
strings by the dashes, '1's and '0's (that's why I printed the b2j
string).
If you feel safe enough and on a fast platform, you can probably up
that limit (or even put it somewhere as an optional variable in the
code, which I would think is generally better).
Not sure who the author of the module is (doesn't list in the file
itself), but perhaps you can find out and email him/her, to see what
can be altered.
Hope that helps.
Evert 2 2058
On Nov 14, 11:56 am, tavspamno...@googlemail.com wrote:
Referred here from the tutor list.
I'm trying to write a program to test someones typing speed and show
them their mistakes. However I'm getting weird results when looking
for the differences in longer (than 100 chars) strings:
import difflib
# a tape measure string (just makes it easier to locate a given index)
a =
'1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69
-72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139
-143-147-151-155-159-163-167-171-175-179-183-187-191-195--200'
# now with a few mistakes
b = '1-3-5-7-
l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78
-81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147-
151-m55-159-163-167-a71-175j179-183-187-191-195--200'
s = difflib.SequenceMatcher(None, a ,b)
ms = s.get_matching_blocks()
print ms
>>[(0, 0, 8), (200, 200, 0)]
Have I made a mistake or is this function designed to give up when the
input strings get too long? If so what could I use instead to compute
the mistakes in a typed text?
---------- Forwarded message ----------
From: Evert Rol
Hi Tom,
Ok, I wasn't on the list last year, but I was a few days ago, so
persistence pays off; partly, as I don't have a full answer.
I got curious and looked at the source of difflib. There's a method
__chain_b() which sets up the b2j variable, which contains the
occurrences of characters in string b. So cutting b to 199
characters, it looks like this:
b2j= 19 {'a': [168], 'b': [122], 'm': [152], 'k': [86], 'v':
[125], '-': [1, 3, 5, 7, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 42,
45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93,
96, 99, 103, 107, 111, 115, 119, 123, 127, 131, 135, 139, 143, 147,
151, 155, 159, 163, 167, 171, 179, 183, 187, 191, 195, 196], 'l': [8,
98], 'o': [39], 'j': [175], '1': [0, 10, 13, 16, 20, 50, 80, 100,
104, 108, 109, 110, 112, 113, 116, 117, 120, 124, 128, 130, 132, 136,
140, 144, 148, 150, 156, 160, 164, 170, 172, 176, 180, 184, 188, 190,
192], '0': [29, 59, 89, 101, 105, 198], '3': [2, 28, 31, 32, 34, 37,
62, 92, 102, 129, 133, 137, 142, 162, 182], '2': [11, 19, 22, 25, 41,
71, 121, 197], '5': [4, 14, 44, 49, 52, 55, 74, 114, 134, 149, 153,
154, 157, 174, 194], '4': [23, 40, 43, 46, 53, 83, 141, 145], '7':
[6, 26, 56, 70, 73, 76, 106, 126, 146, 166, 169, 173, 177, 186], '6':
[35, 58, 61, 64, 65, 67, 95, 161, 165], '9': [38, 68, 88, 91, 94, 97,
118, 138, 158, 178, 189, 193], '8': [17, 47, 77, 79, 82, 85, 181,
185]}
This little detour is because of how b2j is built. Here's a part from
the comments of __chain_b():
# Before the tricks described here, __chain_b was by far the most
# time-consuming routine in the whole module! If anyone sees
# Jim Roskind, thank him again for profile.py -- I never would
# have guessed that.
And the part of the actual code reads:
b = self.b
n = len(b)
self.b2j = b2j = {}
populardict = {}
for i, elt in enumerate(b):
if elt in b2j:
indices = b2j[elt]
if n >= 200 and len(indices) * 100 n: # <--- !!
populardict[elt] = 1
del indices[:]
else:
indices.append(i)
else:
b2j[elt] = [i]
So you're right: it has a stop at the (somewhat arbitrarily) limit of
200 characters. How that exactly works, I don't know (needs more
delving into the code), though it looks like there also need to be a
lot of indices (len(indices*100>n); I guess that's caused in your
strings by the dashes, '1's and '0's (that's why I printed the b2j
string).
If you feel safe enough and on a fast platform, you can probably up
that limit (or even put it somewhere as an optional variable in the
code, which I would think is generally better).
Not sure who the author of the module is (doesn't list in the file
itself), but perhaps you can find out and email him/her, to see what
can be altered.
Hope that helps.
Evert
I would use the time module to "time" the user. Then you should be
able to compare the original string with the user inputted string
using cmp.
<code>
# untested
start = time.time()
print 'some complicated long string'
# you should use a GUI toolkit's textbox rather than
# using a variable
user_string = raw_input('Please type the string above as quickly and
accurately as you can:\n\n')
end = time.time()
print 'amount of time to complete: %s seconds' % (end-start)
# do the comparison here
# which I am not sure how to do right now
</code>
See the following for ideas on comparing similar strings/iterables: http://www.velocityreviews.com/forum...r-strings.html
Mike
En Wed, 14 Nov 2007 14:56:25 -0300, <ta**********@googlemail.comescribió:
>I'm trying to write a program to test someones typing speed and show them their mistakes. However I'm getting weird results when looking for the differences in longer (than 100 chars) strings:
import difflib
# a tape measure string (just makes it easier to locate a given index) a = '1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69 -72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139 -143-147-151-155-159-163-167-171-175-179-183-187-191-195--200'
# now with a few mistakes b = '1-3-5-7- l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78 -81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147- 151-m55-159-163-167-a71-175j179-183-187-191-195--200'
s = difflib.SequenceMatcher(None, a ,b) ms = s.get_matching_blocks()
print ms
>>>>[(0, 0, 8), (200, 200, 0)]
Have I made a mistake or is this function designed to give up when the input strings get too long? If so what could I use instead to compute the mistakes in a typed text?
Yes, there are some limitations on how SequenceMatcher works.
---------- Forwarded message ----------
From: Evert Rol
[...]
And the part of the actual code reads:
if n >= 200 and len(indices) * 100 n: # <--- !!
populardict[elt] = 1
del indices[:]
else:
indices.append(i)>
So you're right: it has a stop at the (somewhat arbitrarily) limit of
200 characters. [...]If you feel safe enough and on a fast platform, you
can probably up
that limit (or even put it somewhere as an optional variable in the
code, which I would think is generally better).
If you try with a slightly shorter text (190 chars, by example) you get
the expected result, pretty fast:
pys = difflib.SequenceMatcher(None, a[:190], b[:190])
pyms = s.get_matching_blocks()
pyprint ms
[(0, 0, 8), (9, 9, 30), (40, 40, 46), (87, 87, 11), (99, 99, 23), (123,
123, 2),
(126, 126, 26), (153, 153, 15), (169, 169, 6), (176, 176, 14), (190, 190,
0)]
So it appears that your strings are hitting that (arbitrary) limit. From
the algorithm point of view, your strings are a rather degenerate case: so
many '-' and '0' and '1's to match.
Try increasing that 200 to somewhat larger than your strings.
--
Gabriel Genellina This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Michele Simionato |
last post by:
Some time ago I hacked a custom solution to run doctests
on text files containing documentation. The solution
involved this kind of game:
tester=doctest.Tester(globs={},verbose=1)...
|
by: andrew.queisser |
last post by:
Yesterday I typed in some C++ code that called a function with two
ints. Intellisense (auto-complete) helpfully told me that the first
formal parameter was called "frontLight" and the second...
|
by: zheenma |
last post by:
There is a program that speed up your typing and avoid spelling errors.
My friends use it ,their averege is more than 100 wpm.
You can download it from www.wamasoft.com.
And It has a library for...
|
by: Sezai YILMAZ |
last post by:
Hello
I need high throughput while inserting into PostgreSQL. Because of that I
did some PostgreSQL insert performance tests.
------------------------------------------------------------
--...
|
by: Xah Lee |
last post by:
in March, i posted a essay “What is Expressiveness in a Computer
Languageâ€, archived at:
http://xahlee.org/perl-python/what_is_expresiveness.html
I was informed then that there is a academic...
|
by: Stephen Plotnick |
last post by:
I upgraded from VB 2003 to VB 2005.
I have a 4000 record table in Access that I'm using oleDB to do a data
adapter fill to a Data Grid. In VB.2003 it takes 2-3 seconds to load the
data grid in...
|
by: byteschreck |
last post by:
I recently switched temporarily from C# to VB.NET to see what the
differences are.
To my surprise I am *much* faster with VB.NET. I don't know why,
pressing the shift key to capitalize letters...
|
by: staticfire |
last post by:
Hi i am in need of help with an ajax related problem on my site. I've coded an active users list which shows the members and guests online. I'm sure you've all seen active user lists before, like the...
|
by: Suresh Pillai |
last post by:
I am performing simulations on networks (graphs). I have a question on
speed of execution (assuming very ample memory for now). I simplify the
details of my simulation below, as the question I...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
|
by: ryjfgjl |
last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
| |